Spaces:
Sleeping
Sleeping
Model Card: Difficulty Model
Model Details
- Model Name: difficulty_model
- Model Version: difficulty_model_v2_baseline_001
- Algorithm: RandomForestRegressor
- Framework: scikit-learn
- Trained At: 2026-05-21T05:59:09.943332+00:00
- Seed: 42
Intended Use
Estimate question difficulty as a continuous score in [0, 1] based on question features (bloom_score, grade, subject, question_type). Used in the difficulty estimation endpoint to predict how hard a question is for a given grade level.
Training Data
- Source: training_lo_tagging.csv + questions.csv (for question_type)
- Split Counts: train=3912, validation=1033, test=875
- Features: bloom_score (numeric), grade (numeric), subject (OrdinalEncoded), question_type (OrdinalEncoded)
- Target: difficulty_score (continuous [0, 1])
Metrics
Validation Set
- MAE: 0.3475
- R-squared: 0.5003
- Per-bucket MAE: {'easy': 0.3058, 'medium': 0.2934, 'hard': 0.6563}
Test Set
- MAE: 0.3519
- R-squared: 0.4685
- Per-bucket MAE: {'easy': 0.325, 'medium': 0.2885, 'hard': 0.6797}
Known Limitations
- Trained on synthetic data only — performance on real questions is unknown.
- difficulty_score distribution may not reflect real-world difficulty.
- OrdinalEncoder assumes an ordering that may not be meaningful for subject/question_type.
- Per-bucket MAE depends on the quality of the difficulty string labels.
- Limited feature set (4 features); text-based features could improve performance.
Fallback Behavior
When the model is not loaded or confidence is below threshold, the system falls back to a rule-based difficulty estimation using bloom_score and grade-level heuristics.