Update README.md
Browse files
README.md
CHANGED
|
@@ -7,7 +7,7 @@ tags:
|
|
| 7 |
- CodeScaler
|
| 8 |
license: mit
|
| 9 |
datasets:
|
| 10 |
-
- LARK-Lab/CodeScalerPair-
|
| 11 |
language:
|
| 12 |
- en
|
| 13 |
base_model:
|
|
@@ -44,7 +44,7 @@ base_model:
|
|
| 44 |
|
| 45 |
We propose **CodeScaler**, an execution-free reward model designed to scale both reinforcement learning training and test-time inference for code generation. **CodeScaler** is trained on carefully curated preference data derived from verified code problems and incorporates syntax-aware code extraction and validity-preserving reward shaping to ensure stable and robust optimization.
|
| 46 |
|
| 47 |
-
This model is the official CodeScaler-1.7B trained from Skywork/Skywork-Reward-V2-Qwen3-1.7B on [LARK-Lab/CodeScalerPair-
|
| 48 |
|
| 49 |
## Performance on RM-Bench
|
| 50 |
| Model | Code | Chat | Math | Safety | Easy | Normal | Hard | Avg |
|
|
|
|
| 7 |
- CodeScaler
|
| 8 |
license: mit
|
| 9 |
datasets:
|
| 10 |
+
- LARK-Lab/CodeScalerPair-51K
|
| 11 |
language:
|
| 12 |
- en
|
| 13 |
base_model:
|
|
|
|
| 44 |
|
| 45 |
We propose **CodeScaler**, an execution-free reward model designed to scale both reinforcement learning training and test-time inference for code generation. **CodeScaler** is trained on carefully curated preference data derived from verified code problems and incorporates syntax-aware code extraction and validity-preserving reward shaping to ensure stable and robust optimization.
|
| 46 |
|
| 47 |
+
This model is the official CodeScaler-1.7B trained from Skywork/Skywork-Reward-V2-Qwen3-1.7B on [LARK-Lab/CodeScalerPair-51K](https://huggingface.co/datasets/LARK-Lab/CodeScalerPair-51K).
|
| 48 |
|
| 49 |
## Performance on RM-Bench
|
| 50 |
| Model | Code | Chat | Math | Safety | Easy | Normal | Hard | Avg |
|