GenreNeuro

Neural classifier for Russian children's books — 89.16% accuracy across 6 genres from text descriptions

View Source

By the numbers

Test accuracy

+58.9pp

Jump over baseline

Training entries

Genres

The Problem

What I was solving

Russian childrens book catalogs have thousands of titles and inconsistent genre tags. Librarians or publishers end up sorting manually or trusting whatever genre the first cataloger guessed. A frequency-based baseline that always predicts the most common genre hits about 30% accuracy — useless.

My Approach

How I built it

Keras/TensorFlow classifier on text descriptions. Architecture deliberately simple: Embedding → GlobalAveragePooling → two Dense layers with Dropout. Six genres: Prose, Fairy Tales, Adventure, Educational, Poetry, Young Adult. Trained on 9,409 entries from the companion scraper project. Class weighting because "Fairy Tales" had 10x more samples than "Young Adult". Tokenization handles Russian morphology without a separate stemmer — the embedding learns forms on its own.

Tech choices

Keras/TensorFlow— Simple architectures for small datasets. GlobalAveragePooling worked better than LSTM here — less overfitting on 9k samples.
Class weighting— Without it, the model just predicts "Fairy Tales" for everything and gets 40% accuracy. With it, minority genres stop being invisible.
Dropout 0.5— Aggressive regularization on a small dataset. 0.3 started overfitting by epoch 20; 0.5 plateaued gracefully.

Outcome

What came out of it

89.16% test accuracy — 58.9 percentage points over the frequency baseline. Confusion matrix shows which genres the model confuses (Adventure vs Prose is the main pair, which is intellectually honest — they really do overlap). Ships with a small inference script that takes a book description and returns a genre with confidence scores.