# Model Card: Difficulty Model ## Model Details - **Model Name:** difficulty_model - **Model Version:** difficulty_model_v2_baseline_001 - **Algorithm:** RandomForestRegressor - **Framework:** scikit-learn - **Trained At:** 2026-05-21T05:59:09.943332+00:00 - **Seed:** 42 ## Intended Use Estimate question difficulty as a continuous score in [0, 1] based on question features (bloom_score, grade, subject, question_type). Used in the difficulty estimation endpoint to predict how hard a question is for a given grade level. ## Training Data - **Source:** training_lo_tagging.csv + questions.csv (for question_type) - **Split Counts:** train=3912, validation=1033, test=875 - **Features:** bloom_score (numeric), grade (numeric), subject (OrdinalEncoded), question_type (OrdinalEncoded) - **Target:** difficulty_score (continuous [0, 1]) ## Metrics ### Validation Set - MAE: 0.3475 - R-squared: 0.5003 - Per-bucket MAE: {'easy': 0.3058, 'medium': 0.2934, 'hard': 0.6563} ### Test Set - MAE: 0.3519 - R-squared: 0.4685 - Per-bucket MAE: {'easy': 0.325, 'medium': 0.2885, 'hard': 0.6797} ## Known Limitations - Trained on synthetic data only — performance on real questions is unknown. - difficulty_score distribution may not reflect real-world difficulty. - OrdinalEncoder assumes an ordering that may not be meaningful for subject/question_type. - Per-bucket MAE depends on the quality of the difficulty string labels. - Limited feature set (4 features); text-based features could improve performance. ## Fallback Behavior When the model is not loaded or confidence is below threshold, the system falls back to a rule-based difficulty estimation using bloom_score and grade-level heuristics.