ASTER_4B_RL / README.md

QuantumStackOverflow

Update README.md

ace14f6 verified 3 days ago

preview code

raw

history blame contribute delete

2.41 kB

metadata

license: apache-2.0
base_model: Qwen/Qwen3-4B-Thinking-2507
tags:
  - aster
  - reinforcement-learning
  - sft
  - reproduction
metrics:
  - accuracy
model-index:
  - name: ASTER_4B
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AIME 2025
          type: aime2025
        metrics:
          - name: Accuracy
            type: accuracy
            value: 87.7
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HMMT 2025 Feb
          type: hmmt_2025_feb
        metrics:
          - name: Accuracy
            type: accuracy
            value: 77.1

ASTER_4B (Independent Reproduction)

Model Description

ASTER_4B is an independent reproduction of the ASTER framework. This model is fine-tuned based on Qwen/Qwen3-4B-Thinking-2507, strictly adhering to the experimental details and hyperparameter settings described in the original ASTER paper.

⚠️ Note: This is a reproduction project. We aim to verify the effectiveness of the ASTER method by strictly following the official paper's details.

Training Data (SFT)

The model was trained using our reproduced dataset: Aster_SFT4K.

This dataset serves as a tiny yet effective SFT set, constructed to replicate the exact data distribution and formatting used in the original ASTER experiments. You can find the dataset details here:

Dataset Repo: ASTER_SFT4K

Evaluation Results

We evaluated the model's performance on challenging mathematical benchmarks. The evaluation was conducted under the exact generation configuration specified in the ASTER paper to ensure fair comparison.

Generation Config:

Temperature: 1.0
Top_p: 1.0
Max_context_length: 96256

Benchmark	Score (%)
AIME 2025	87.7
HMMT 2025 (Feb)	77.1