A newer version of this model is available: OrionLLM/GRM-2.6-Plus

logo

GRM-7b is a general-purpose reasoning-focused 7B model fine-tuned to improve multi-domain reasoning (math, logic, coding, and broad problem-solving). It is designed to be a strong, practical “daily driver” for general reasoning tasks and as a solid base for further fine-tuning.

Key features

Dedicated reasoning behavior for general tasks (stepwise problem solving, better consistency).
Strong 7B-scale model — practical for local inference and experimentation.
Multi-domain mixture: reasoning + code + math + (some) medical reasoning data.
Fine-tune friendly: intended as a good starting point for your own SFT/GRPO/DPO pipelines.

Benchmarks

Model	Data	AIME24	AIME25	AMC23	MATH500	HMMT O2/25	LCB 06/24-01/25	CodeElo	CodeForces	GPQA-D	JEEBench
OpenThinker-7B	✅	30.7	22.0	72.5	82.8	15.7	26.1	11.1	14.9	38.6	45.3
GRM-7b	✅	69.0	53.3	93.5	90.0	42.7	51.7	31.0	32.2	53.7	72.4
DeepSeek-R1-Distill-Qwen-32B	❌	51.3	38.0	92.0	88.0	25.0	34.5	19.9	21.1	33.2	50.4
OpenR1-Distill-7B	✅	57.7	39.7	87.0	88.0	25.7	30.7	30.1	29.3	58.9	68.7
Llama-3.1-Nemotron-Nano-8B-v1	✅	62.0	48.0	94.0	89.4	26.7	50.9	30.9	32.9	52.9	70.7
AceReason-Nemotron-7B	✅	71.0	50.7	93.8	89.8	33.3	44.3	32.9	30.9	52.9	64.3