Post
36
๐ฅ New Russian Stylometry Dataset!
Russian Stylometric Dataset (RSD) โ 322 texts from the 19th โ early 20th centuries (16 million words), prepared for analysis in stylo (R) and machine learning (Python).
๐ What's inside?
Fiction, journalism, scientific texts, drama, poetry
Grouped by author, gender, age, genre, literary movements (Romanticism/Realism)
Character speech (Tolstoy, Gogol, Ostrovsky)
Generated texts (LSTM, GPT)
๐ Use cases: authorship attribution, clustering, classification, benchmarking methods.
๐ Public domain + GPL-3.0 license.
๐ Learn more: https://github.com/nevmenandr/RSD
DOI: 10.5281/zenodo.20701309
Russian Stylometric Dataset (RSD) โ 322 texts from the 19th โ early 20th centuries (16 million words), prepared for analysis in stylo (R) and machine learning (Python).
๐ What's inside?
Fiction, journalism, scientific texts, drama, poetry
Grouped by author, gender, age, genre, literary movements (Romanticism/Realism)
Character speech (Tolstoy, Gogol, Ostrovsky)
Generated texts (LSTM, GPT)
๐ Use cases: authorship attribution, clustering, classification, benchmarking methods.
๐ Public domain + GPL-3.0 license.
๐ Learn more: https://github.com/nevmenandr/RSD
DOI: 10.5281/zenodo.20701309