--- license: apache-2.0 base_model: Qwen/Qwen3.5-2B tags: - qwen - qwen3.5 - reasoning - distillation - claude-opus - darwin-v8 - sft - lora - merged language: - en - ko - zh - ja pipeline_tag: text-generation library_name: transformers --- # ๐Ÿง  lastbrain โ€” Darwin V8 **Darwin V8 ๊ธฐ๋ฐ˜ Claude Opus ์ฆ๋ฅ˜ ๋ชจ๋ธ (2B ํŒŒ๋ผ๋ฏธํ„ฐ)** - ๐Ÿ‘จ **Father (Base)**: [`Qwen/Qwen3.5-2B`](https://huggingface.co/Qwen/Qwen3.5-2B) - ๐Ÿ‘ฉ **Mother (LoRA Adapter)**: [`FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1`](https://huggingface.co/FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1) - ๐Ÿ‘ถ **Child (This model)**: `FINAL-Bench/lastbrain` โ€” merged full-weight standalone --- ## ๐Ÿ“ฆ ํŠน์ง• - **Base**: Qwen3.5-2B (2.3B ํŒŒ๋ผ๋ฏธํ„ฐ, ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์–ดํ…์…˜) - **Training**: SFT + LoRA (`all-linear`, rank=16, ฮฑ=32) - **Teachers**: Claude Opus 4.5 / 4.6, Claude Sonnet 4.6 (pre-generated reasoning traces) - **Data**: 4,451 ๊ณ ํ’ˆ์งˆ ์ถ”๋ก  ๊ถค์  (4๊ฐœ ๊ณต๊ฐœ ๋ฐ์ดํ„ฐ์…‹) - **Merged**: LoRA ์–ด๋Œ‘ํ„ฐ๊ฐ€ base ๊ฐ€์ค‘์น˜์— ์™„์ „ ํ†ตํ•ฉ๋˜์–ด **๋…๋ฆฝ ์‹คํ–‰ ๊ฐ€๋Šฅ** --- ## ๐Ÿš€ ๋น ๋ฅธ ์‚ฌ์šฉ๋ฒ• ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "FINAL-Bench/lastbrain" tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) messages = [ {"role": "user", "content": "If a train travels 60 km in 45 minutes, what is its speed in km/h?"} ] prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tok(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=800, do_sample=False, pad_token_id=tok.eos_token_id, ) print(tok.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)) ``` **์˜ˆ์‹œ ์ถœ๋ ฅ**: ``` To find the speed of the train in km/h, we need to convert the given time from minutes to hours. **Given:** - Distance = 60 km - Time = 45 minutes **Step 1: Convert time to hours** Since there are 60 minutes in 1 hour: $$\text{Time in hours} = \frac{45}{60} = 0.75 \text{ hours}$$ **Step 2: Calculate speed** $$\text{Speed} = \frac{60}{0.75} = 80 \text{ km/h}$$ **Final Answer:** The speed of the train is **80 km/h**. ``` --- ## ๐Ÿงฌ Darwin V8 ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ ``` [Qwen/Qwen3.5-2B] โ”€โ”€โ”€โ”€ Base ๋ชจ๋ธ (๋™๊ฒฐ) + [4,451 Claude Opus/Sonnet reasoning traces] โ†“ [SFT Training] - LoRA (all-linear, r=16, ฮฑ=32) - Learning rate: 2e-4 (V8 rule: ร—10 FullFT) - 2 epochs, bf16, 8ร—B200 DDP - Loss: 1.33 โ†’ 1.10 (-17%) - Token accuracy: 68% โ†’ 72% (+4%p) โ†“ [LoRA merge into base weights] โ†“ [lastbrain] โ† ์ด ๋ชจ๋ธ ``` --- ## ๐Ÿ“Š ํ•™์Šต ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ | ๋ฐ์ดํ„ฐ์…‹ | ์ƒ˜ํ”Œ ์ˆ˜ | ์ถœ์ฒ˜ Teacher | |---------|--------|------| | [nohurry/Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) | 2,326 | Claude Opus 4.6 | | [TeichAI/Claude-Opus-4.6-Reasoning-887x](https://huggingface.co/datasets/TeichAI/Claude-Opus-4.6-Reasoning-887x) | 887 | Claude Opus 4.6 | | [TeichAI/claude-4.5-opus-high-reasoning-250x](https://huggingface.co/datasets/TeichAI/claude-4.5-opus-high-reasoning-250x) | 250 | Claude Opus 4.5 | | [TeichAI/Claude-Sonnet-4.6-Reasoning-1100x](https://huggingface.co/datasets/TeichAI/Claude-Sonnet-4.6-Reasoning-1100x) | 1,100 | Claude Sonnet 4.6 | | **ํ•ฉ๊ณ„ (ํ•„ํ„ฐ ํ›„)** | **4,451** | - | --- ## ๐ŸŽฏ ์„ค๊ณ„ ์ฒ ํ•™ (Darwin V8) 1. **LoRA Without Regret** โ€” `all-linear` target, high LR, ์ž‘์€ rank๋„ OK 2. **Response Distillation** โ€” pre-generated Opus traces๋กœ ๋น„์šฉ ํšจ์œจ์  ์ฆ๋ฅ˜ 3. **Merge-and-Deploy** โ€” LoRA ์–ด๋Œ‘ํ„ฐ ํ†ตํ•ฉ ํ›„ ์ถ”๊ฐ€ ์˜์กด์„ฑ ์—†์ด ๋ฐฐํฌ --- ## ๐Ÿ” ์žฌํ˜„ ๋ฐฉ๋ฒ• ์ด ๋ชจ๋ธ์€ ๋‹ค์Œ ๋‘ ์ปดํฌ๋„ŒํŠธ๋ฅผ mergeํ•˜์—ฌ ๋งŒ๋“ค์–ด์กŒ์Šต๋‹ˆ๋‹ค: ```python from transformers import AutoModelForCausalLM from peft import PeftModel import torch base = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen3.5-2B", torch_dtype=torch.bfloat16 ) model = PeftModel.from_pretrained( base, "FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1" ) merged = model.merge_and_unload() merged.save_pretrained("./lastbrain") ``` --- ## ๐Ÿ“ ์ƒ˜ํ”Œ ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ (4๋ฌธ์ œ) | ์œ ํ˜• | ์ •๋‹ต ์—ฌ๋ถ€ | ์‘๋‹ต ๊ธธ์ด | |-----|---------|---------| | Math (๊ธฐ์ฐจ ์†๋„) | โœ… 80 km/h | 771์ž | | Logic (ํ‚ค ๋น„๊ต) | โœ… Carol | 354์ž | | Code (์†Œ์ˆ˜ ํŒ๋ณ„) | โœ… Python ํ•จ์ˆ˜ | 1,712์ž | | Korean (์ตœ์ €์‹œ๊ธ‰) | โœ… 1,577,600์› | 142์ž | **Markdown/LaTeX/Step-by-Step ๊ตฌ์กฐํ™”๋œ ๋‹ต๋ณ€ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ƒ์„ฑ** --- ## โš ๏ธ ์ œํ•œ ์‚ฌํ•ญ - **๊ทœ๋ชจ**: 2.3B ํŒŒ๋ผ๋ฏธํ„ฐ (์†Œํ˜• ๋ชจ๋ธ) - **ํ•œ๊ตญ์–ด ๊ณ„์‚ฐ ์ •ํ™•์„ฑ**: ๋•Œ๋กœ ์ˆซ์ž ์˜ค๋ฅ˜ ๋ฐœ์ƒ ๊ฐ€๋Šฅ (์†Œํ˜• ๋ชจ๋ธ ํ•œ๊ณ„) - **๊ธด ์ปจํ…์ŠคํŠธ**: ํ•™์Šต ์‹œ max_length=4,096์œผ๋กœ ํ•™์Šต๋จ - **`` ํƒœ๊ทธ**: ๋ช…์‹œ์  ์‚ฌ์šฉ ๋‚ฎ์Œ (reasoning์„ ๋ณธ๋ฌธ์— ํ†ตํ•ฉ) --- ## ๐Ÿชช ๋ผ์ด์„ ์Šค - Base model: Apache 2.0 (Qwen) - ํ•™์Šต ๋ฐ์ดํ„ฐ: ๊ฐ ๋ฐ์ดํ„ฐ์…‹ ๊ฐœ๋ณ„ ๋ผ์ด์„ ์Šค ์ฐธ์กฐ - ์ด ๋ชจ๋ธ: Apache 2.0 --- ## ๐Ÿ™ ํฌ๋ ˆ๋”ง - **Base**: Qwen team (Alibaba) - **Teacher**: Anthropic (Claude Opus 4.5/4.6, Sonnet 4.6) - **๋ฐ์ดํ„ฐ ๊ณต๊ฐœ**: nohurry, TeichAI - **Training & Release**: FINAL-Bench / VIDRAFT_LAB --- ## ๐Ÿ”— ๊ด€๋ จ ๋ชจ๋ธ - ๐Ÿง  [`FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1`](https://huggingface.co/FINAL-Bench/Qwen3.5-2B-Opus-Distill-v1) โ€” ์ด ๋ชจ๋ธ์˜ **LoRA ์–ด๋Œ‘ํ„ฐ ๋‹จ๋… ๋ฒ„์ „** - โšก [`FINAL-Bench/Qwen3.5-2B-Opus-SDPO-v1`](https://huggingface.co/FINAL-Bench/Qwen3.5-2B-Opus-SDPO-v1) โ€” Phase 4 SDPO ์ž๊ธฐ์ฆ๋ฅ˜ ๊ฐ•ํ™”๋ณธ --- *Darwin V8 ยท Part of the evolutionary model merging series by VIDRAFT_LAB*