๐Ÿง  Darwin-2B-Opus

Darwin V8 ์‹œ๋ฆฌ์ฆˆ์˜ 2B ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ Claude Opus 4.5/4.6 ๋ฐ Sonnet 4.6์˜ ์ถ”๋ก  ์Šคํƒ€์ผ์„ ์ฃผ์ž…ํ•œ Qwen3.5-2B ๊ธฐ๋ฐ˜ ๋ชจ๋ธ.


๐Ÿงฌ ๊ฐ€๊ณ„๋„ (Pedigree)


๐Ÿ† Darwin V8 ์‹œ๋ฆฌ์ฆˆ ์ •๋ณด

ํ•ญ๋ชฉ ๊ฐ’
๋ชจ๋ธ ํฌ๊ธฐ 2.3B ํŒŒ๋ผ๋ฏธํ„ฐ
์•„ํ‚คํ…์ฒ˜ Qwen3.5 (hybrid attention)
ํ•™์Šต ๋ฐฉ์‹ SFT with LoRA (all-linear, rank=16)
ํ•™์Šต ๋ฐ์ดํ„ฐ 9,762 ์ƒ˜ํ”Œ (Claude Opus/Sonnet + ํ•œ๊ตญ์–ด reasoning)
ํ•™์Šต ์‹œ๊ฐ„ 29๋ถ„ (8ร—B200 GPU)
์ตœ์ข… Loss 0.837
Token Accuracy 76.6%

๐Ÿ“Š ๋ฒค์น˜๋งˆํฌ (GPQA Diamond 198)

  • ์ •ํ™•๋„: 37.37% (74/198)
  • ๋‹ต๋ณ€ ์ถ”์ถœ ์„ฑ๊ณต๋ฅ  ๊ธฐ์ค€ ์ •๋‹ต๋ฅ : 50.7%

๐Ÿš€ ๋น ๋ฅธ ์‚ฌ์šฉ๋ฒ•

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "FINAL-Bench/Darwin-2B-Opus"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)

messages = [
    {"role": "user", "content": "2024๋…„ ํ•œ๊ตญ ์ตœ์ €์‹œ๊ธ‰ 9,860์›์ด๋‹ค. ์ฃผ 40์‹œ๊ฐ„ ร— 4์ฃผ ์ž„๊ธˆ์€?"}
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=800,
        do_sample=False,
        pad_token_id=tok.eos_token_id,
    )
print(tok.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

๐Ÿงฌ Darwin V8 ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ

[Qwen/Qwen3.5-2B] โ”€โ”€โ”€โ”€ Base ๋ชจ๋ธ (๋™๊ฒฐ)
        +
[9,762 Claude Opus/Sonnet + ํ•œ๊ตญ์–ด Reasoning ์ƒ˜ํ”Œ]
        โ†“
[SFT Training]
  - LoRA (all-linear, r=16, ฮฑ=32)
  - Learning rate: 2e-4 (V8 rule: ร—10 FullFT)
  - 2 epochs, bf16, 8ร—B200 DDP
  - Loss: 0.991 โ†’ 0.837 (-15%)
  - Token accuracy: 73.9% โ†’ 76.6% (+2.7%p)
        โ†“
[LoRA merge into base weights]
        โ†“
[Darwin-2B-Opus] โ† ์ด ๋ชจ๋ธ

๐Ÿ“Š ํ•™์Šต ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ

์นดํ…Œ๊ณ ๋ฆฌ ์ƒ˜ํ”Œ ์ˆ˜ % ์ถœ์ฒ˜
General Reasoning 4,422 45% Opus 4.5/4.6, Sonnet 4.6
Math (English) 1,960 20% DeepSeek-v3.2 OpenR1-Math
Code (English) 1,680 17% DeepSeek-v3.2 CodeReasoning + GPT-5 Codex
Korean Thinking 200 2% Multilingual-Thinking-Korean
Korean Math 1,500 15% orca-math-word-problems-korean
ํ•ฉ๊ณ„ (ํ•„ํ„ฐ ํ›„) 9,762 100% -

๐ŸŽฏ Darwin V8 ์„ค๊ณ„ ์ฒ ํ•™

  1. LoRA Without Regret โ€” all-linear target, LR ร— 10, rank=16์œผ๋กœ ์ถฉ๋ถ„
  2. Response Distillation โ€” Pre-generated Opus traces๋กœ ๋น„์šฉ ํšจ์œจ์  ์ฆ๋ฅ˜
  3. ํ•œ๊ตญ์–ด Reasoning ๊ฐ•ํ™” โ€” KoAlpaca ๊ฐ„๋‹จ QA ๋Œ€์‹  Claude ์ถ”๋ก  ๊ถค์  ์‚ฌ์šฉ
  4. Merge-and-Deploy โ€” LoRA ์–ด๋Œ‘ํ„ฐ ํ†ตํ•ฉ ํ›„ ์ถ”๊ฐ€ ์˜์กด์„ฑ ์—†์ด ๋ฐฐํฌ

๐Ÿ“ ์ƒ˜ํ”Œ ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ (5๋ฌธ์ œ)

์œ ํ˜• ์ •๋‹ต ๋น„๊ณ 
์˜์–ด ์ˆ˜ํ•™ (๊ธฐ์ฐจ ์†๋„) โœ… 80 km/h LaTeX ๋‹จ๊ณ„๋ณ„ ํ’€์ด
์˜์–ด ๋…ผ๋ฆฌ (ํ‚ค ๋น„๊ต) โœ… Carol ์ถ”์ด์œจ ๋ช…์‹œ
์˜์–ด ์ฝ”๋“œ (์†Œ์ˆ˜ ํŒ๋ณ„) โœ… ์ •ํ™• docstring + ๋ณต์žก๋„ ๋ถ„์„
ํ•œ๊ตญ์–ด ์‹œ๊ธ‰ ๊ณ„์‚ฐ โœ… 1,577,600์› ๋‹จ๊ณ„๋ณ„ ํ•œ๊ตญ์–ด ์„ค๋ช…
ํ•œ๊ตญ์–ด ์—ฐ๋ฆฝ๋ฐฉ์ •์‹ โœ… 1,200์› ์ •์„ ํ’€์ด + ๊ฒ€์ฆ

5/5 ์ •๋‹ต โ€” ์˜์–ด+ํ•œ๊ตญ์–ด ๋ชจ๋‘ ์™„๋ฒฝ โญ


โš ๏ธ ์ œํ•œ ์‚ฌํ•ญ

  • ๊ทœ๋ชจ: 2.3B ํŒŒ๋ผ๋ฏธํ„ฐ (Darwin ์‹œ๋ฆฌ์ฆˆ ์ตœ์†Œ)
  • GPQA Diamond: 37.37% (๋Œ€ํ˜• ๋ชจ๋ธ ๋Œ€๋น„ ๋‚ฎ์ง€๋งŒ 2B ์ค‘ ์ตœ๊ณ  ์ˆ˜์ค€)
  • ๊ธด ์ปจํ…์ŠคํŠธ: ํ•™์Šต ์‹œ max_length=4,096๋กœ ํ•™์Šต๋จ
  • ์ง€์‹ ํ•œ๊ณ„: 2B ๋ชจ๋ธ์€ ๋ฐฑ๊ณผ์‚ฌ์ „์  ์ง€์‹ ํ•œ๊ณ„ ์žˆ์Œ

๐Ÿ”— ๊ด€๋ จ ๋ชจ๋ธ

  • ๐Ÿงฉ FINAL-Bench/Darwin-2B-Opus-LoRA โ€” ์ด ๋ชจ๋ธ์˜ LoRA ์–ด๋Œ‘ํ„ฐ ๋‹จ๋… ๋ฒ„์ „ (67MB)
  • โšก FINAL-Bench/Darwin-2B-Opus-ONNX โ€” ๋ธŒ๋ผ์šฐ์ €/WebGPU์šฉ ONNX ์–‘์žํ™” ๋ฒ„์ „ (์˜ˆ์ •)

๐Ÿ† Darwin ์‹œ๋ฆฌ์ฆˆ


๐Ÿชช ๋ผ์ด์„ ์Šค

  • Base model: Apache 2.0 (Qwen)
  • ํ•™์Šต ๋ฐ์ดํ„ฐ: ๊ฐ ๋ฐ์ดํ„ฐ์…‹ ๊ฐœ๋ณ„ ๋ผ์ด์„ ์Šค ์ฐธ์กฐ
  • ์ด ๋ชจ๋ธ: Apache 2.0

๐Ÿ™ ํฌ๋ ˆ๋”ง

  • Base: Qwen team (Alibaba)
  • Teacher: Anthropic (Claude Opus 4.5/4.6, Sonnet 4.6)
  • ๋ฐ์ดํ„ฐ ๊ณต๊ฐœ: nohurry, TeichAI, kuotient, PoSTMEDIA
  • Training & Release: FINAL-Bench / VIDRAFT_LAB

Darwin V8 ยท Part of the evolutionary model series by FINAL-Bench

Downloads last month
472
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for FINAL-Bench/Darwin-2B-Opus

Finetuned
Qwen/Qwen3.5-2B
Finetuned
(110)
this model
Quantizations
2 models

Collection including FINAL-Bench/Darwin-2B-Opus