| # Model Name |
|
|
| This model predicts whether a chat message should earn participation points. It was developed for the FEV Participation Points project, which studied an intervention where elementary and middle school tutors received guidance on awarding participation points during math tutoring sessions. The tutoring is chat-based. |
|
|
| --- |
|
|
| ## Training Details |
|
|
| ## Base model |
|
|
| Bert base model |
|
|
| ### Datasets |
| The dataset consisted on a subset of 1,000 messages that include the word "point" in the utterance. |
|
|
| | Dataset | Split | Size | Source | Notes | |
| |---------|-------|------|--------|-------| |
| | Tutor math chats | train | 1,000 | Shared by tutoring provider| Contains only utterances with the word "point" | |
|
|
|
|
| ### Hyperparameters |
|
|
| | Parameter | Value | |
| |-----------|-------| |
| | Learning rate | 1e-5 | |
| | Batch size | 8 | |
| | Optimizer | AdamW (beta1=0.9, beta2=0.999, epsilon=1*10-8) | |
| | Epochs / Steps | 20 epochs with early stopping (F1 on minority class)| |
| | Warmup | 0 | |
| | Weight decay | 0.01 | |
| |
| --- |
| |
| ## Evaluation |
| |
| ### Results |
| |
| | Model | Dataset | Split | Metric | Score | |
| |-------|---------|-------|--------|-------| |
| | This model | Subset of math messages with points awarded | test | F1 - Yes | 0.9943 | |
| | This model | Subset of math messages with points awarded | test | F1 - No | 0.9583 | |
| |
| ### Limitations and Caveats |
| |
| - Model is highly specific for taks related to FEV Participation points |
| - The model was trained on a subset of messages that include the word "point" in the utterance |
| |
| --- |
| |
| ## How to Use |
| |
| ### Message Structure |
| |
| The classifier predicts directly on the message, with no previous context or following utterances. |
| |
| |
| ### Running instructions |
| |
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| import torch |
| |
| model_dir = "model_outputs" # or a specific checkpoint folder |
| tokenizer = AutoTokenizer.from_pretrained(model_dir) |
| model = AutoModelForSequenceClassification.from_pretrained(model_dir) |
| model.eval() |
| |
| text = "Your message here" |
| inputs = tokenizer(text, return_tensors="pt", truncation=True) |
| with torch.no_grad(): |
| logits = model(**inputs).logits |
| pred_id = logits.argmax(dim=-1).item() |
| label = {0: "no", 1: "yes"}[pred_id] |
| print(label) |
| ``` |
| |
| --- |
| |
| ## Code and Responsibles |
| |
| **Repository:** https://github.com/scale-nssa/fev_partpoints_nlp |
| **Maintainers / Contributors:** FEV Participation Points team (lead: JP Martinez) |
|
|
| --- |
|
|
| ## Bias and Fairness |
|
|
| Dataset does not have information about the tutor or student demographic |
|
|
| --- |
|
|
| ## License |
|
|
| This model is released under [License Name](https://example.com/license). |
|
|
| --- |
|
|