QuantumStackOverflow
/

ASTER_4B_RL

Reinforcement Learning

Eval Results (legacy)

Model card Files Files and versions

ASTER_4B_RL / README.md

QuantumStackOverflow's picture

QuantumStackOverflow

Update README.md

ace14f6 verified 3 days ago

|

history blame contribute delete

2.41 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3-4B-Thinking-2507
	tags:
	- aster
	- reinforcement-learning
	- sft
	- reproduction
	metrics:
	- accuracy
	model-index:
	- name: ASTER_4B
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AIME 2025
	type: aime2025
	metrics:
	- name: Accuracy
	type: accuracy
	value: 87.7
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HMMT 2025 Feb
	type: hmmt_2025_feb
	metrics:
	- name: Accuracy
	type: accuracy
	value: 77.1
	---

	# ASTER_4B (Independent Reproduction)

	[![Paper](https://img.shields.io/badge/Paper-ArXiv.2602.01204-B31B1B.svg)](https://arxiv.org/pdf/2602.01204)
	[![GitHub](https://img.shields.io/badge/GitHub-Reproduction_Code-black)](https://github.com/Rainyrou/ASTER)
	[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://huggingface.co/datasets/choosealicense/licenses/apache-2.0)

	## Model Description

	ASTER_4B is an independent reproduction of the ASTER framework. This model is fine-tuned based on [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507), strictly adhering to the experimental details and hyperparameter settings described in the original ASTER paper.

	> ⚠️ Note: This is a reproduction project. We aim to verify the effectiveness of the ASTER method by strictly following the official paper's details.

	## Training Data (SFT)

	The model was trained using our reproduced dataset: Aster_SFT4K.

	This dataset serves as a tiny yet effective SFT set, constructed to replicate the exact data distribution and formatting used in the original ASTER experiments. You can find the dataset details here:
	* Dataset Repo: [ASTER_SFT4K](https://huggingface.co/datasets/QuantumStackOverflow/ASTER_SFT4K)

	## Evaluation Results

	We evaluated the model's performance on challenging mathematical benchmarks. The evaluation was conducted under the exact generation configuration specified in the ASTER paper to ensure fair comparison.

	Generation Config:
	* Temperature: `1.0`
	* Top_p: `1.0`
	* Max_context_length: `96256`

	\| Benchmark \| Score (%) \|
	\| :--- \| :--- \|
	\| AIME 2025 \| 87.7 \|
	\| HMMT 2025 (Feb) \| 77.1 \|