Spaces:

orderlymirror
/

aaa

Sleeping

App Files Files Community

aaa / data /artifacts /models /difficulty_model /model_card.md

work-sejal

Add models and dataset

8ece6f3 18 days ago

preview code

raw

history blame contribute delete

1.71 kB

Model Card: Difficulty Model

Model Details

Model Name: difficulty_model
Model Version: difficulty_model_v2_baseline_001
Algorithm: RandomForestRegressor
Framework: scikit-learn
Trained At: 2026-05-21T05:59:09.943332+00:00
Seed: 42

Intended Use

Estimate question difficulty as a continuous score in [0, 1] based on question features (bloom_score, grade, subject, question_type). Used in the difficulty estimation endpoint to predict how hard a question is for a given grade level.

Training Data

Source: training_lo_tagging.csv + questions.csv (for question_type)
Split Counts: train=3912, validation=1033, test=875
Features: bloom_score (numeric), grade (numeric), subject (OrdinalEncoded), question_type (OrdinalEncoded)
Target: difficulty_score (continuous [0, 1])

Metrics

Validation Set

MAE: 0.3475
R-squared: 0.5003
Per-bucket MAE: {'easy': 0.3058, 'medium': 0.2934, 'hard': 0.6563}

Test Set

MAE: 0.3519
R-squared: 0.4685
Per-bucket MAE: {'easy': 0.325, 'medium': 0.2885, 'hard': 0.6797}

Known Limitations

Trained on synthetic data only — performance on real questions is unknown.
difficulty_score distribution may not reflect real-world difficulty.
OrdinalEncoder assumes an ordering that may not be meaningful for subject/question_type.
Per-bucket MAE depends on the quality of the difficulty string labels.
Limited feature set (4 features); text-based features could improve performance.

Fallback Behavior

When the model is not loaded or confidence is below threshold, the system falls back to a rule-based difficulty estimation using bloom_score and grade-level heuristics.