arxiv:2407.19600
Boris Orekhov
nevmenandr
AI & ML interests
Natural Language Processing, Poetry Generation, Linguistics, Low-resource languages
Recent Activity
posted an update about 24 hours ago
๐ฅ New Russian Stylometry Dataset!
Russian Stylometric Dataset (RSD) โ 322 texts from the 19th โ early 20th centuries (16 million words), prepared for analysis in stylo (R) and machine learning (Python).
๐ What's inside?
Fiction, journalism, scientific texts, drama, poetry
Grouped by author, gender, age, genre, literary movements (Romanticism/Realism)
Character speech (Tolstoy, Gogol, Ostrovsky)
Generated texts (LSTM, GPT)
๐ Use cases: authorship attribution, clustering, classification, benchmarking methods.
๐ Public domain + GPL-3.0 license.
๐ Learn more: https://github.com/nevmenandr/RSD
DOI: 10.5281/zenodo.20701309 posted an update 3 days ago
https://huggingface.co/nevmenandr/char-based-lstm-russian-poetry-https://huggingface.co/nevmenandr/char-based-lstm-russian-poetry-mandelshtam
https://huggingface.co/nevmenandr/char-based-lstm-russian-poetry-hexameter
https://huggingface.co/papers/2306.02771
๐ RNN vs. Transformers: How an Old Architecture Better Perceives Poetic Style
In the era of Transformer dominance, we often forget that old RNNs (especially character-level LSTMs) remain irreplaceable for tasks where *individual style*, rhythm, and micro-patterns matter. These three models are clear proof of that.
๐ฏ Why does this matter today?
- **Stylistic analysis**: RNNs better capture meter, repetitions, and unexpected tonal shifts.
- **Teaching poetics**: generating "almost correct" but hallucinating lines helps explore the boundaries of style.
- **Nostalgia and replication**: a reminder that not everything is measured by BLEU and perplexity.
๐ผ๏ธ Visualization
Attached is an infographic comparing the three models (architecture, style, generation sample).
> RNNs aren't dead. They're just writing poetry in silence. updated a dataset 8 months ago
nevmenandr/russian-20th-century-bigrams