work-sejal
Add models and dataset
8ece6f3

Model Card: Difficulty Model

Model Details

  • Model Name: difficulty_model
  • Model Version: difficulty_model_v2_baseline_001
  • Algorithm: RandomForestRegressor
  • Framework: scikit-learn
  • Trained At: 2026-05-21T05:59:09.943332+00:00
  • Seed: 42

Intended Use

Estimate question difficulty as a continuous score in [0, 1] based on question features (bloom_score, grade, subject, question_type). Used in the difficulty estimation endpoint to predict how hard a question is for a given grade level.

Training Data

  • Source: training_lo_tagging.csv + questions.csv (for question_type)
  • Split Counts: train=3912, validation=1033, test=875
  • Features: bloom_score (numeric), grade (numeric), subject (OrdinalEncoded), question_type (OrdinalEncoded)
  • Target: difficulty_score (continuous [0, 1])

Metrics

Validation Set

  • MAE: 0.3475
  • R-squared: 0.5003
  • Per-bucket MAE: {'easy': 0.3058, 'medium': 0.2934, 'hard': 0.6563}

Test Set

  • MAE: 0.3519
  • R-squared: 0.4685
  • Per-bucket MAE: {'easy': 0.325, 'medium': 0.2885, 'hard': 0.6797}

Known Limitations

  • Trained on synthetic data only — performance on real questions is unknown.
  • difficulty_score distribution may not reflect real-world difficulty.
  • OrdinalEncoder assumes an ordering that may not be meaningful for subject/question_type.
  • Per-bucket MAE depends on the quality of the difficulty string labels.
  • Limited feature set (4 features); text-based features could improve performance.

Fallback Behavior

When the model is not loaded or confidence is below threshold, the system falls back to a rule-based difficulty estimation using bloom_score and grade-level heuristics.