Add README: Darwin V8 lastbrain (Qwen3.5-2B father + Opus-Distill LoRA mother merged)
7a96f3e verified metadata
license: apache-2.0
base_model: Qwen/Qwen3.5-2B
tags:
- qwen
- qwen3.5
- reasoning
- distillation
- claude-opus
- darwin-v8
- sft
- lora
- merged
language:
- en
- ko
- zh
- ja
pipeline_tag: text-generation
library_name: transformers
๐ง lastbrain โ Darwin V8
Darwin V8 ๊ธฐ๋ฐ Claude Opus ์ฆ๋ฅ ๋ชจ๋ธ (2B ํ๋ผ๋ฏธํฐ)
- ๐จ Father (Base):
Qwen/Qwen3.5-2B - ๐ฉ Mother (LoRA Adapter):
FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1 - ๐ถ Child (This model):
FINAL-Bench/lastbrainโ merged full-weight standalone
๐ฆ ํน์ง
- Base: Qwen3.5-2B (2.3B ํ๋ผ๋ฏธํฐ, ํ์ด๋ธ๋ฆฌ๋ ์ดํ ์ )
- Training: SFT + LoRA (
all-linear, rank=16, ฮฑ=32) - Teachers: Claude Opus 4.5 / 4.6, Claude Sonnet 4.6 (pre-generated reasoning traces)
- Data: 4,451 ๊ณ ํ์ง ์ถ๋ก ๊ถค์ (4๊ฐ ๊ณต๊ฐ ๋ฐ์ดํฐ์ )
- Merged: LoRA ์ด๋ํฐ๊ฐ base ๊ฐ์ค์น์ ์์ ํตํฉ๋์ด ๋ ๋ฆฝ ์คํ ๊ฐ๋ฅ
๐ ๋น ๋ฅธ ์ฌ์ฉ๋ฒ
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "FINAL-Bench/lastbrain"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)
messages = [
{"role": "user", "content": "If a train travels 60 km in 45 minutes, what is its speed in km/h?"}
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=800,
do_sample=False,
pad_token_id=tok.eos_token_id,
)
print(tok.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
์์ ์ถ๋ ฅ:
To find the speed of the train in km/h, we need to convert the given time from minutes to hours.
**Given:**
- Distance = 60 km
- Time = 45 minutes
**Step 1: Convert time to hours**
Since there are 60 minutes in 1 hour:
**Step 2: Calculate speed**
**Final Answer:** The speed of the train is **80 km/h**.
๐งฌ Darwin V8 ํ์ต ํ์ดํ๋ผ์ธ
[Qwen/Qwen3.5-2B] โโโโ Base ๋ชจ๋ธ (๋๊ฒฐ)
+
[4,451 Claude Opus/Sonnet reasoning traces]
โ
[SFT Training]
- LoRA (all-linear, r=16, ฮฑ=32)
- Learning rate: 2e-4 (V8 rule: ร10 FullFT)
- 2 epochs, bf16, 8รB200 DDP
- Loss: 1.33 โ 1.10 (-17%)
- Token accuracy: 68% โ 72% (+4%p)
โ
[LoRA merge into base weights]
โ
[lastbrain] โ ์ด ๋ชจ๋ธ
๐ ํ์ต ๋ฐ์ดํฐ ๊ตฌ์ฑ
| ๋ฐ์ดํฐ์ | ์ํ ์ | ์ถ์ฒ Teacher |
|---|---|---|
| nohurry/Opus-4.6-Reasoning-3000x-filtered | 2,326 | Claude Opus 4.6 |
| TeichAI/Claude-Opus-4.6-Reasoning-887x | 887 | Claude Opus 4.6 |
| TeichAI/claude-4.5-opus-high-reasoning-250x | 250 | Claude Opus 4.5 |
| TeichAI/Claude-Sonnet-4.6-Reasoning-1100x | 1,100 | Claude Sonnet 4.6 |
| ํฉ๊ณ (ํํฐ ํ) | 4,451 | - |
๐ฏ ์ค๊ณ ์ฒ ํ (Darwin V8)
- LoRA Without Regret โ
all-lineartarget, high LR, ์์ rank๋ OK - Response Distillation โ pre-generated Opus traces๋ก ๋น์ฉ ํจ์จ์ ์ฆ๋ฅ
- Merge-and-Deploy โ LoRA ์ด๋ํฐ ํตํฉ ํ ์ถ๊ฐ ์์กด์ฑ ์์ด ๋ฐฐํฌ
๐ ์ฌํ ๋ฐฉ๋ฒ
์ด ๋ชจ๋ธ์ ๋ค์ ๋ ์ปดํฌ๋ํธ๋ฅผ mergeํ์ฌ ๋ง๋ค์ด์ก์ต๋๋ค:
from transformers import AutoModelForCausalLM
from peft import PeftModel
import torch
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3.5-2B", torch_dtype=torch.bfloat16
)
model = PeftModel.from_pretrained(
base, "FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1"
)
merged = model.merge_and_unload()
merged.save_pretrained("./lastbrain")
๐ ์ํ ํ ์คํธ ๊ฒฐ๊ณผ (4๋ฌธ์ )
| ์ ํ | ์ ๋ต ์ฌ๋ถ | ์๋ต ๊ธธ์ด |
|---|---|---|
| Math (๊ธฐ์ฐจ ์๋) | โ 80 km/h | 771์ |
| Logic (ํค ๋น๊ต) | โ Carol | 354์ |
| Code (์์ ํ๋ณ) | โ Python ํจ์ | 1,712์ |
| Korean (์ต์ ์๊ธ) | โ 1,577,600์ | 142์ |
Markdown/LaTeX/Step-by-Step ๊ตฌ์กฐํ๋ ๋ต๋ณ ์์ฐ์ค๋ฝ๊ฒ ์์ฑ
โ ๏ธ ์ ํ ์ฌํญ
- ๊ท๋ชจ: 2.3B ํ๋ผ๋ฏธํฐ (์ํ ๋ชจ๋ธ)
- ํ๊ตญ์ด ๊ณ์ฐ ์ ํ์ฑ: ๋๋ก ์ซ์ ์ค๋ฅ ๋ฐ์ ๊ฐ๋ฅ (์ํ ๋ชจ๋ธ ํ๊ณ)
- ๊ธด ์ปจํ ์คํธ: ํ์ต ์ max_length=4,096์ผ๋ก ํ์ต๋จ
<think>ํ๊ทธ: ๋ช ์์ ์ฌ์ฉ ๋ฎ์ (reasoning์ ๋ณธ๋ฌธ์ ํตํฉ)
๐ชช ๋ผ์ด์ ์ค
- Base model: Apache 2.0 (Qwen)
- ํ์ต ๋ฐ์ดํฐ: ๊ฐ ๋ฐ์ดํฐ์ ๊ฐ๋ณ ๋ผ์ด์ ์ค ์ฐธ์กฐ
- ์ด ๋ชจ๋ธ: Apache 2.0
๐ ํฌ๋ ๋ง
- Base: Qwen team (Alibaba)
- Teacher: Anthropic (Claude Opus 4.5/4.6, Sonnet 4.6)
- ๋ฐ์ดํฐ ๊ณต๊ฐ: nohurry, TeichAI
- Training & Release: FINAL-Bench / VIDRAFT_LAB
๐ ๊ด๋ จ ๋ชจ๋ธ
- ๐ง
FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1โ ์ด ๋ชจ๋ธ์ LoRA ์ด๋ํฐ ๋จ๋ ๋ฒ์ - โก
FINAL-Bench/Qwen3.5-2B-Opus-SDPO-v1โ Phase 4 SDPO ์๊ธฐ์ฆ๋ฅ ๊ฐํ๋ณธ
Darwin V8 ยท Part of the evolutionary model merging series by VIDRAFT_LAB