logo

GRM-7b is a general-purpose reasoning-focused 7B model fine-tuned to improve multi-domain reasoning (math, logic, coding, and broad problem-solving). It is designed to be a strong, practical “daily driver” for general reasoning tasks and as a solid base for further fine-tuning.


Key features

  • Dedicated reasoning behavior for general tasks (stepwise problem solving, better consistency).
  • Strong 7B-scale model — practical for local inference and experimentation.
  • Multi-domain mixture: reasoning + code + math + (some) medical reasoning data.
  • Fine-tune friendly: intended as a good starting point for your own SFT/GRPO/DPO pipelines.

Benchmarks

Model Data AIME24 AIME25 AMC23 MATH500 HMMT O2/25 LCB 06/24-01/25 CodeElo CodeForces GPQA-D JEEBench
OpenThinker-7B 30.7 22.0 72.5 82.8 15.7 26.1 11.1 14.9 38.6 45.3
GRM-7b 69.0 53.3 93.5 90.0 42.7 51.7 31.0 32.2 53.7 72.4
DeepSeek-R1-Distill-Qwen-32B 51.3 38.0 92.0 88.0 25.0 34.5 19.9 21.1 33.2 50.4
OpenR1-Distill-7B 57.7 39.7 87.0 88.0 25.7 30.7 30.1 29.3 58.9 68.7
Llama-3.1-Nemotron-Nano-8B-v1 62.0 48.0 94.0 89.4 26.7 50.9 30.9 32.9 52.9 70.7
AceReason-Nemotron-7B 71.0 50.7 93.8 89.8 33.3 44.3 32.9 30.9 52.9 64.3
Downloads last month
27
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OrionLLM/GRM-7b

Base model

Qwen/Qwen2.5-7B
Finetuned
(2852)
this model
Quantizations
2 models

Datasets used to train OrionLLM/GRM-7b

Collection including OrionLLM/GRM-7b