lastbrain / README.md
SeaWolf-AI's picture
Add README: Darwin V8 lastbrain (Qwen3.5-2B father + Opus-Distill LoRA mother merged)
7a96f3e verified
---
license: apache-2.0
base_model: Qwen/Qwen3.5-2B
tags:
- qwen
- qwen3.5
- reasoning
- distillation
- claude-opus
- darwin-v8
- sft
- lora
- merged
language:
- en
- ko
- zh
- ja
pipeline_tag: text-generation
library_name: transformers
---
# ๐Ÿง  lastbrain โ€” Darwin V8
**Darwin V8 ๊ธฐ๋ฐ˜ Claude Opus ์ฆ๋ฅ˜ ๋ชจ๋ธ (2B ํŒŒ๋ผ๋ฏธํ„ฐ)**
- ๐Ÿ‘จ **Father (Base)**: [`Qwen/Qwen3.5-2B`](https://huggingface.co/Qwen/Qwen3.5-2B)
- ๐Ÿ‘ฉ **Mother (LoRA Adapter)**: [`FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1`](https://huggingface.co/FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1)
- ๐Ÿ‘ถ **Child (This model)**: `FINAL-Bench/lastbrain` โ€” merged full-weight standalone
---
## ๐Ÿ“ฆ ํŠน์ง•
- **Base**: Qwen3.5-2B (2.3B ํŒŒ๋ผ๋ฏธํ„ฐ, ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์–ดํ…์…˜)
- **Training**: SFT + LoRA (`all-linear`, rank=16, ฮฑ=32)
- **Teachers**: Claude Opus 4.5 / 4.6, Claude Sonnet 4.6 (pre-generated reasoning traces)
- **Data**: 4,451 ๊ณ ํ’ˆ์งˆ ์ถ”๋ก  ๊ถค์  (4๊ฐœ ๊ณต๊ฐœ ๋ฐ์ดํ„ฐ์…‹)
- **Merged**: LoRA ์–ด๋Œ‘ํ„ฐ๊ฐ€ base ๊ฐ€์ค‘์น˜์— ์™„์ „ ํ†ตํ•ฉ๋˜์–ด **๋…๋ฆฝ ์‹คํ–‰ ๊ฐ€๋Šฅ**
---
## ๐Ÿš€ ๋น ๋ฅธ ์‚ฌ์šฉ๋ฒ•
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "FINAL-Bench/lastbrain"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)
messages = [
{"role": "user", "content": "If a train travels 60 km in 45 minutes, what is its speed in km/h?"}
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=800,
do_sample=False,
pad_token_id=tok.eos_token_id,
)
print(tok.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
```
**์˜ˆ์‹œ ์ถœ๋ ฅ**:
```
To find the speed of the train in km/h, we need to convert the given time from minutes to hours.
**Given:**
- Distance = 60 km
- Time = 45 minutes
**Step 1: Convert time to hours**
Since there are 60 minutes in 1 hour:
$$\text{Time in hours} = \frac{45}{60} = 0.75 \text{ hours}$$
**Step 2: Calculate speed**
$$\text{Speed} = \frac{60}{0.75} = 80 \text{ km/h}$$
**Final Answer:** The speed of the train is **80 km/h**.
```
---
## ๐Ÿงฌ Darwin V8 ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ
```
[Qwen/Qwen3.5-2B] โ”€โ”€โ”€โ”€ Base ๋ชจ๋ธ (๋™๊ฒฐ)
+
[4,451 Claude Opus/Sonnet reasoning traces]
โ†“
[SFT Training]
- LoRA (all-linear, r=16, ฮฑ=32)
- Learning rate: 2e-4 (V8 rule: ร—10 FullFT)
- 2 epochs, bf16, 8ร—B200 DDP
- Loss: 1.33 โ†’ 1.10 (-17%)
- Token accuracy: 68% โ†’ 72% (+4%p)
โ†“
[LoRA merge into base weights]
โ†“
[lastbrain] โ† ์ด ๋ชจ๋ธ
```
---
## ๐Ÿ“Š ํ•™์Šต ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ
| ๋ฐ์ดํ„ฐ์…‹ | ์ƒ˜ํ”Œ ์ˆ˜ | ์ถœ์ฒ˜ Teacher |
|---------|--------|------|
| [nohurry/Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) | 2,326 | Claude Opus 4.6 |
| [TeichAI/Claude-Opus-4.6-Reasoning-887x](https://huggingface.co/datasets/TeichAI/Claude-Opus-4.6-Reasoning-887x) | 887 | Claude Opus 4.6 |
| [TeichAI/claude-4.5-opus-high-reasoning-250x](https://huggingface.co/datasets/TeichAI/claude-4.5-opus-high-reasoning-250x) | 250 | Claude Opus 4.5 |
| [TeichAI/Claude-Sonnet-4.6-Reasoning-1100x](https://huggingface.co/datasets/TeichAI/Claude-Sonnet-4.6-Reasoning-1100x) | 1,100 | Claude Sonnet 4.6 |
| **ํ•ฉ๊ณ„ (ํ•„ํ„ฐ ํ›„)** | **4,451** | - |
---
## ๐ŸŽฏ ์„ค๊ณ„ ์ฒ ํ•™ (Darwin V8)
1. **LoRA Without Regret** โ€” `all-linear` target, high LR, ์ž‘์€ rank๋„ OK
2. **Response Distillation** โ€” pre-generated Opus traces๋กœ ๋น„์šฉ ํšจ์œจ์  ์ฆ๋ฅ˜
3. **Merge-and-Deploy** โ€” LoRA ์–ด๋Œ‘ํ„ฐ ํ†ตํ•ฉ ํ›„ ์ถ”๊ฐ€ ์˜์กด์„ฑ ์—†์ด ๋ฐฐํฌ
---
## ๐Ÿ” ์žฌํ˜„ ๋ฐฉ๋ฒ•
์ด ๋ชจ๋ธ์€ ๋‹ค์Œ ๋‘ ์ปดํฌ๋„ŒํŠธ๋ฅผ mergeํ•˜์—ฌ ๋งŒ๋“ค์–ด์กŒ์Šต๋‹ˆ๋‹ค:
```python
from transformers import AutoModelForCausalLM
from peft import PeftModel
import torch
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3.5-2B", torch_dtype=torch.bfloat16
)
model = PeftModel.from_pretrained(
base, "FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1"
)
merged = model.merge_and_unload()
merged.save_pretrained("./lastbrain")
```
---
## ๐Ÿ“ ์ƒ˜ํ”Œ ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ (4๋ฌธ์ œ)
| ์œ ํ˜• | ์ •๋‹ต ์—ฌ๋ถ€ | ์‘๋‹ต ๊ธธ์ด |
|-----|---------|---------|
| Math (๊ธฐ์ฐจ ์†๋„) | โœ… 80 km/h | 771์ž |
| Logic (ํ‚ค ๋น„๊ต) | โœ… Carol | 354์ž |
| Code (์†Œ์ˆ˜ ํŒ๋ณ„) | โœ… Python ํ•จ์ˆ˜ | 1,712์ž |
| Korean (์ตœ์ €์‹œ๊ธ‰) | โœ… 1,577,600์› | 142์ž |
**Markdown/LaTeX/Step-by-Step ๊ตฌ์กฐํ™”๋œ ๋‹ต๋ณ€ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ƒ์„ฑ**
---
## โš ๏ธ ์ œํ•œ ์‚ฌํ•ญ
- **๊ทœ๋ชจ**: 2.3B ํŒŒ๋ผ๋ฏธํ„ฐ (์†Œํ˜• ๋ชจ๋ธ)
- **ํ•œ๊ตญ์–ด ๊ณ„์‚ฐ ์ •ํ™•์„ฑ**: ๋•Œ๋กœ ์ˆซ์ž ์˜ค๋ฅ˜ ๋ฐœ์ƒ ๊ฐ€๋Šฅ (์†Œํ˜• ๋ชจ๋ธ ํ•œ๊ณ„)
- **๊ธด ์ปจํ…์ŠคํŠธ**: ํ•™์Šต ์‹œ max_length=4,096์œผ๋กœ ํ•™์Šต๋จ
- **`<think>` ํƒœ๊ทธ**: ๋ช…์‹œ์  ์‚ฌ์šฉ ๋‚ฎ์Œ (reasoning์„ ๋ณธ๋ฌธ์— ํ†ตํ•ฉ)
---
## ๐Ÿชช ๋ผ์ด์„ ์Šค
- Base model: Apache 2.0 (Qwen)
- ํ•™์Šต ๋ฐ์ดํ„ฐ: ๊ฐ ๋ฐ์ดํ„ฐ์…‹ ๊ฐœ๋ณ„ ๋ผ์ด์„ ์Šค ์ฐธ์กฐ
- ์ด ๋ชจ๋ธ: Apache 2.0
---
## ๐Ÿ™ ํฌ๋ ˆ๋”ง
- **Base**: Qwen team (Alibaba)
- **Teacher**: Anthropic (Claude Opus 4.5/4.6, Sonnet 4.6)
- **๋ฐ์ดํ„ฐ ๊ณต๊ฐœ**: nohurry, TeichAI
- **Training & Release**: FINAL-Bench / VIDRAFT_LAB
---
## ๐Ÿ”— ๊ด€๋ จ ๋ชจ๋ธ
- ๐Ÿง  [`FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1`](https://huggingface.co/FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1) โ€” ์ด ๋ชจ๋ธ์˜ **LoRA ์–ด๋Œ‘ํ„ฐ ๋‹จ๋… ๋ฒ„์ „**
- โšก [`FINAL-Bench/Qwen3.5-2B-Opus-SDPO-v1`](https://huggingface.co/FINAL-Bench/Qwen3.5-2B-Opus-SDPO-v1) โ€” Phase 4 SDPO ์ž๊ธฐ์ฆ๋ฅ˜ ๊ฐ•ํ™”๋ณธ
---
*Darwin V8 ยท Part of the evolutionary model merging series by VIDRAFT_LAB*