Completed
research
GenreNeuro
Neural classifier for Russian children's books — 89.16% accuracy across 6 genres from text descriptions
By the numbers
0%
Test accuracy
+58.9pp
Jump over baseline
0
Training entries
0
Genres
The Problem
What I was solving
Russian childrens book catalogs have thousands of titles and inconsistent genre tags. Librarians or publishers end up sorting manually or trusting whatever genre the first cataloger guessed. A frequency-based baseline that always predicts the most common genre hits about 30% accuracy — useless.
My Approach
How I built it
Keras/TensorFlow classifier on text descriptions. Architecture deliberately simple: Embedding → GlobalAveragePooling → two Dense layers with Dropout. Six genres: Prose, Fairy Tales, Adventure, Educational, Poetry, Young Adult. Trained on 9,409 entries from the companion scraper project. Class weighting because "Fairy Tales" had 10x more samples than "Young Adult". Tokenization handles Russian morphology without a separate stemmer — the embedding learns forms on its own.
Tech choices
- Keras/TensorFlow— Simple architectures for small datasets. GlobalAveragePooling worked better than LSTM here — less overfitting on 9k samples.
- Class weighting— Without it, the model just predicts "Fairy Tales" for everything and gets 40% accuracy. With it, minority genres stop being invisible.
- Dropout 0.5— Aggressive regularization on a small dataset. 0.3 started overfitting by epoch 20; 0.5 plateaued gracefully.
Outcome
What came out of it
89.16% test accuracy — 58.9 percentage points over the frequency baseline. Confusion matrix shows which genres the model confuses (Adventure vs Prose is the main pair, which is intellectually honest — they really do overlap). Ships with a small inference script that takes a book description and returns a genre with confidence scores.