Add README: Darwin V8 lastbrain (Qwen3.5-2B father + Opus-Distill LoRA mother merged)
7a96f3e verified | license: apache-2.0 | |
| base_model: Qwen/Qwen3.5-2B | |
| tags: | |
| - qwen | |
| - qwen3.5 | |
| - reasoning | |
| - distillation | |
| - claude-opus | |
| - darwin-v8 | |
| - sft | |
| - lora | |
| - merged | |
| language: | |
| - en | |
| - ko | |
| - zh | |
| - ja | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| # ๐ง lastbrain โ Darwin V8 | |
| **Darwin V8 ๊ธฐ๋ฐ Claude Opus ์ฆ๋ฅ ๋ชจ๋ธ (2B ํ๋ผ๋ฏธํฐ)** | |
| - ๐จ **Father (Base)**: [`Qwen/Qwen3.5-2B`](https://huggingface.co/Qwen/Qwen3.5-2B) | |
| - ๐ฉ **Mother (LoRA Adapter)**: [`FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1`](https://huggingface.co/FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1) | |
| - ๐ถ **Child (This model)**: `FINAL-Bench/lastbrain` โ merged full-weight standalone | |
| --- | |
| ## ๐ฆ ํน์ง | |
| - **Base**: Qwen3.5-2B (2.3B ํ๋ผ๋ฏธํฐ, ํ์ด๋ธ๋ฆฌ๋ ์ดํ ์ ) | |
| - **Training**: SFT + LoRA (`all-linear`, rank=16, ฮฑ=32) | |
| - **Teachers**: Claude Opus 4.5 / 4.6, Claude Sonnet 4.6 (pre-generated reasoning traces) | |
| - **Data**: 4,451 ๊ณ ํ์ง ์ถ๋ก ๊ถค์ (4๊ฐ ๊ณต๊ฐ ๋ฐ์ดํฐ์ ) | |
| - **Merged**: LoRA ์ด๋ํฐ๊ฐ base ๊ฐ์ค์น์ ์์ ํตํฉ๋์ด **๋ ๋ฆฝ ์คํ ๊ฐ๋ฅ** | |
| --- | |
| ## ๐ ๋น ๋ฅธ ์ฌ์ฉ๋ฒ | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| import torch | |
| model_id = "FINAL-Bench/lastbrain" | |
| tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True | |
| ) | |
| messages = [ | |
| {"role": "user", "content": "If a train travels 60 km in 45 minutes, what is its speed in km/h?"} | |
| ] | |
| prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = tok(prompt, return_tensors="pt").to(model.device) | |
| with torch.no_grad(): | |
| outputs = model.generate( | |
| **inputs, | |
| max_new_tokens=800, | |
| do_sample=False, | |
| pad_token_id=tok.eos_token_id, | |
| ) | |
| print(tok.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)) | |
| ``` | |
| **์์ ์ถ๋ ฅ**: | |
| ``` | |
| To find the speed of the train in km/h, we need to convert the given time from minutes to hours. | |
| **Given:** | |
| - Distance = 60 km | |
| - Time = 45 minutes | |
| **Step 1: Convert time to hours** | |
| Since there are 60 minutes in 1 hour: | |
| $$\text{Time in hours} = \frac{45}{60} = 0.75 \text{ hours}$$ | |
| **Step 2: Calculate speed** | |
| $$\text{Speed} = \frac{60}{0.75} = 80 \text{ km/h}$$ | |
| **Final Answer:** The speed of the train is **80 km/h**. | |
| ``` | |
| --- | |
| ## ๐งฌ Darwin V8 ํ์ต ํ์ดํ๋ผ์ธ | |
| ``` | |
| [Qwen/Qwen3.5-2B] โโโโ Base ๋ชจ๋ธ (๋๊ฒฐ) | |
| + | |
| [4,451 Claude Opus/Sonnet reasoning traces] | |
| โ | |
| [SFT Training] | |
| - LoRA (all-linear, r=16, ฮฑ=32) | |
| - Learning rate: 2e-4 (V8 rule: ร10 FullFT) | |
| - 2 epochs, bf16, 8รB200 DDP | |
| - Loss: 1.33 โ 1.10 (-17%) | |
| - Token accuracy: 68% โ 72% (+4%p) | |
| โ | |
| [LoRA merge into base weights] | |
| โ | |
| [lastbrain] โ ์ด ๋ชจ๋ธ | |
| ``` | |
| --- | |
| ## ๐ ํ์ต ๋ฐ์ดํฐ ๊ตฌ์ฑ | |
| | ๋ฐ์ดํฐ์ | ์ํ ์ | ์ถ์ฒ Teacher | | |
| |---------|--------|------| | |
| | [nohurry/Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) | 2,326 | Claude Opus 4.6 | | |
| | [TeichAI/Claude-Opus-4.6-Reasoning-887x](https://huggingface.co/datasets/TeichAI/Claude-Opus-4.6-Reasoning-887x) | 887 | Claude Opus 4.6 | | |
| | [TeichAI/claude-4.5-opus-high-reasoning-250x](https://huggingface.co/datasets/TeichAI/claude-4.5-opus-high-reasoning-250x) | 250 | Claude Opus 4.5 | | |
| | [TeichAI/Claude-Sonnet-4.6-Reasoning-1100x](https://huggingface.co/datasets/TeichAI/Claude-Sonnet-4.6-Reasoning-1100x) | 1,100 | Claude Sonnet 4.6 | | |
| | **ํฉ๊ณ (ํํฐ ํ)** | **4,451** | - | | |
| --- | |
| ## ๐ฏ ์ค๊ณ ์ฒ ํ (Darwin V8) | |
| 1. **LoRA Without Regret** โ `all-linear` target, high LR, ์์ rank๋ OK | |
| 2. **Response Distillation** โ pre-generated Opus traces๋ก ๋น์ฉ ํจ์จ์ ์ฆ๋ฅ | |
| 3. **Merge-and-Deploy** โ LoRA ์ด๋ํฐ ํตํฉ ํ ์ถ๊ฐ ์์กด์ฑ ์์ด ๋ฐฐํฌ | |
| --- | |
| ## ๐ ์ฌํ ๋ฐฉ๋ฒ | |
| ์ด ๋ชจ๋ธ์ ๋ค์ ๋ ์ปดํฌ๋ํธ๋ฅผ mergeํ์ฌ ๋ง๋ค์ด์ก์ต๋๋ค: | |
| ```python | |
| from transformers import AutoModelForCausalLM | |
| from peft import PeftModel | |
| import torch | |
| base = AutoModelForCausalLM.from_pretrained( | |
| "Qwen/Qwen3.5-2B", torch_dtype=torch.bfloat16 | |
| ) | |
| model = PeftModel.from_pretrained( | |
| base, "FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1" | |
| ) | |
| merged = model.merge_and_unload() | |
| merged.save_pretrained("./lastbrain") | |
| ``` | |
| --- | |
| ## ๐ ์ํ ํ ์คํธ ๊ฒฐ๊ณผ (4๋ฌธ์ ) | |
| | ์ ํ | ์ ๋ต ์ฌ๋ถ | ์๋ต ๊ธธ์ด | | |
| |-----|---------|---------| | |
| | Math (๊ธฐ์ฐจ ์๋) | โ 80 km/h | 771์ | | |
| | Logic (ํค ๋น๊ต) | โ Carol | 354์ | | |
| | Code (์์ ํ๋ณ) | โ Python ํจ์ | 1,712์ | | |
| | Korean (์ต์ ์๊ธ) | โ 1,577,600์ | 142์ | | |
| **Markdown/LaTeX/Step-by-Step ๊ตฌ์กฐํ๋ ๋ต๋ณ ์์ฐ์ค๋ฝ๊ฒ ์์ฑ** | |
| --- | |
| ## โ ๏ธ ์ ํ ์ฌํญ | |
| - **๊ท๋ชจ**: 2.3B ํ๋ผ๋ฏธํฐ (์ํ ๋ชจ๋ธ) | |
| - **ํ๊ตญ์ด ๊ณ์ฐ ์ ํ์ฑ**: ๋๋ก ์ซ์ ์ค๋ฅ ๋ฐ์ ๊ฐ๋ฅ (์ํ ๋ชจ๋ธ ํ๊ณ) | |
| - **๊ธด ์ปจํ ์คํธ**: ํ์ต ์ max_length=4,096์ผ๋ก ํ์ต๋จ | |
| - **`<think>` ํ๊ทธ**: ๋ช ์์ ์ฌ์ฉ ๋ฎ์ (reasoning์ ๋ณธ๋ฌธ์ ํตํฉ) | |
| --- | |
| ## ๐ชช ๋ผ์ด์ ์ค | |
| - Base model: Apache 2.0 (Qwen) | |
| - ํ์ต ๋ฐ์ดํฐ: ๊ฐ ๋ฐ์ดํฐ์ ๊ฐ๋ณ ๋ผ์ด์ ์ค ์ฐธ์กฐ | |
| - ์ด ๋ชจ๋ธ: Apache 2.0 | |
| --- | |
| ## ๐ ํฌ๋ ๋ง | |
| - **Base**: Qwen team (Alibaba) | |
| - **Teacher**: Anthropic (Claude Opus 4.5/4.6, Sonnet 4.6) | |
| - **๋ฐ์ดํฐ ๊ณต๊ฐ**: nohurry, TeichAI | |
| - **Training & Release**: FINAL-Bench / VIDRAFT_LAB | |
| --- | |
| ## ๐ ๊ด๋ จ ๋ชจ๋ธ | |
| - ๐ง [`FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1`](https://huggingface.co/FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1) โ ์ด ๋ชจ๋ธ์ **LoRA ์ด๋ํฐ ๋จ๋ ๋ฒ์ ** | |
| - โก [`FINAL-Bench/Qwen3.5-2B-Opus-SDPO-v1`](https://huggingface.co/FINAL-Bench/Qwen3.5-2B-Opus-SDPO-v1) โ Phase 4 SDPO ์๊ธฐ์ฆ๋ฅ ๊ฐํ๋ณธ | |
| --- | |
| *Darwin V8 ยท Part of the evolutionary model merging series by VIDRAFT_LAB* | |