3amthoughts commited on
Commit
871f00e
·
verified ·
1 Parent(s): 1398fdd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -54
README.md CHANGED
@@ -1,17 +1,90 @@
1
- 🌌 DeepLink-R1
2
- <div align="center">
3
- <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2.png" width="200" alt="DeepLink-R1 Logo Concept"/>
4
- </div>
5
- DeepLink-R1 is a reasoning-focused Large Language Model built on the Qwen2.5-7B architecture and distilled from DeepSeek-R1. Engineered to embody the persona of a "Logical Architect," this model doesn't just provide answers—it constructs transparent, mathematically rigorous blueprints of thought.
6
- By utilizing the <think> tag, DeepLink-R1 exposes its internal reasoning process before delivering its final, refined response.
7
- 🔗 Quick Links
8
- Primary Model (BF16/FP16): 3amthoughts/DeepLink-R1
9
- Quantized Model (GGUF): 3amthoughts/DeepLink-R1-GGUF
10
- 🧠 The "Logical Architect" Persona
11
- DeepLink-R1 is designed for complex problem-solving, coding, and mathematical reasoning. When prompted, the model will output a structured thought process enclosed in <think> ... </think> tags, allowing users to follow the logical steps taken to arrive at the conclusion.
12
- 💻 Usage & Inference
13
- DeepLink-R1 uses the ChatML prompt format.
14
- Option 1: Using transformers (Python)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  code
16
  Python
17
  from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -23,7 +96,7 @@ tokenizer = AutoTokenizer.from_pretrained(model_id)
23
  model = AutoModelForCausalLM.from_pretrained(
24
  model_id,
25
  torch_dtype=torch.bfloat16,
26
- device_map="auto"
27
  )
28
 
29
  messages = [
@@ -31,42 +104,6 @@ messages = [
31
  {"role": "user", "content": "How many 'r's are in the word strawberry?"}
32
  ]
33
 
34
- text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
35
- inputs = tokenizer(text, return_tensors="pt").to("cuda")
36
-
37
- outputs = model.generate(**inputs, max_new_tokens=1024)
38
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
39
- Option 2: Using llama.cpp or Ollama (GGUF)
40
- For local, CPU-friendly, or low-VRAM inference, use the GGUF version.
41
- code
42
- Bash
43
- # Example using llama.cpp
44
- ./main -m Qwen3.5-4B.Q4_K_M.gguf -n 1024 -p "<|im_start|>system\nYou are a logical architect.<|im_end|>\n<|im_start|>user\nSolve this math problem...<|im_end|>\n<|im_start|>assistant\n<think>\n"
45
- 🏗️ Training Methodology: The Forge
46
- DeepLink-R1 was trained using Unsloth for 2x faster, memory-efficient fine-tuning, successfully navigating the constraints of a single Tesla T4 (16GB VRAM) GPU.
47
- Hardware & Framework Optimizations
48
- Framework: Unsloth & Hugging Face trl
49
- Hardware: 1x NVIDIA Tesla T4 (16GB)
50
- Memory Management:
51
- Loaded in 4-bit quantization via bitsandbytes.
52
- Enabled Unsloth's optimized Gradient Checkpointing.
53
- Dynamic Max Sequence Length (2048 - 4096) to maintain stability during specific training phases.
54
- LoRA Configuration
55
- We utilized Low-Rank Adaptation (LoRA) to efficiently update the model's weights:
56
- Target Modules: All linear layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
57
- Rank (r): 16
58
- Alpha: 16
59
- Dropout: 0
60
- Hyperparameters
61
- Optimizer: AdamW 8-bit
62
- Learning Rate: 2e-4
63
- Global Batch Size: 8 (1 per device × 8 gradient accumulation steps)
64
- Training Steps: 350
65
- Note: Training successfully managed a runtime restart by resuming from an uploaded adapter state, ensuring zero progress loss.
66
- 📚 Dataset Engineering: The Knowledge
67
- To forge the "Logical Architect," we engineered a high-fidelity intelligence mixture by streaming and combining three elite reasoning datasets. All data was strictly aligned to the ChatML template to ensure seamless integration.
68
- ServiceNow-AI/R1-Distill-SFT: Provided the foundational reasoning logic and structured thought generation.
69
- open-r1/Mixture-of-Thoughts: Introduced highly diverse cognitive patterns and problem-solving approaches.
70
- bespokelabs/Bespoke-Stratos-17k: Applied for high-tier refinement, mathematical rigor, and complex multi-step logic.
71
- 🏆 The Result
72
- DeepLink-R1 stands as a testament to efficient distillation. It proves that with precise dataset curation, ChatML alignment, and aggressive memory optimization (Unsloth + 4-bit LoRA), a 7B parameter model can achieve elite logical depth on highly accessible hardware.
 
1
+ ---
2
+ base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - chain-of-thought
7
+ - deepseek
8
+ - qwen
9
+ - unsloth
10
+ - lora
11
+ - gguf
12
+ - chatml
13
+ - agent
14
+ - code
15
+ - thinking
16
+ license: apache-2.0
17
+ ---
18
+
19
+ # 🌌 DeepLink-R1
20
+
21
+ **DeepLink-R1** is a highly specialized, reasoning-focused large language model designed to act as a **"Logical Architect."** Built on top of the **`deepseek-ai/DeepSeek-R1-Distill-Qwen-7B`** architecture, this model doesn't just provide answers—it constructs transparent, mathematically rigorous blueprints of thought.
22
+
23
+ It is trained to "think" before it speaks using `<think>` tags, exposing its internal logical deduction process before delivering a final, refined response.
24
+
25
+ Created by **3amthoughts**.
26
+
27
+ ### ⚡ Model Highlights
28
+ * **Architecture:** 7B Parameters (Base: `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B`)
29
+ * **Format:** Available in BF16/FP16 (Transformers) and GGUF (Q4_K_M for local execution via llama.cpp/Ollama)
30
+ * **Capabilities:** Deep logical reasoning, mathematical rigor, complex problem-solving, and transparent thought generation.
31
+ * **Training:** Fine-tuned using QLoRA via Unsloth on a high-fidelity mixture of elite reasoning datasets.
32
+
33
+ ### 🎯 Key Capabilities
34
+ * **Transparent Reasoning:** Trained to output its internal thought processes inside `<think>` tags, drastically reducing hallucinations and logical leaps.
35
+ * **High-Tier Logic:** Excels at math, coding, and multi-step deduction by breaking down complex problems into manageable architectural blueprints.
36
+ * **Lightweight & Fast:** Distilled into a 7B parameter footprint, making elite-level reasoning runnable on consumer hardware (MacBooks, RTX 3090/4090s).
37
+
38
+ ---
39
+
40
+ ### 🛠️ How it Works (Prompting & Output)
41
+ To get the model to trigger its reasoning engine, you must use the standard ChatML template and prompt it to think step-by-step.
42
+
43
+ **System Prompt:**
44
+ > "You are a logical architect. Think step-by-step."
45
+
46
+ **Example Interaction**
47
+ **User:**
48
+ > "If I have a 5-liter jug and a 3-liter jug, how can I measure exactly 4 liters of water?"
49
+
50
+ **DeepLink-R1 Output:**
51
+ ```text
52
+ <think>
53
+ To measure exactly 4 liters using a 5-liter jug (J5) and a 3-liter jug (J3):
54
+ 1. Fill J5 to the top (J5 = 5L, J3 = 0L).
55
+ 2. Pour water from J5 into J3 until J3 is full. This takes 3 liters. (J5 = 2L, J3 = 3L).
56
+ 3. Empty J3 completely. (J5 = 2L, J3 = 0L).
57
+ 4. Pour the remaining 2 liters from J5 into J3. (J5 = 0L, J3 = 2L).
58
+ 5. Fill J5 to the top again. (J5 = 5L, J3 = 2L).
59
+ 6. Pour water from J5 into J3 until J3 is full. Since J3 already has 2 liters, it only takes 1 liter to fill it. (J5 = 4L, J3 = 3L).
60
+ Now, J5 contains exactly 4 liters.
61
+ </think>
62
+ To measure exactly 4 liters, follow these steps:
63
+ 1. Fill the 5-liter jug completely.
64
+ 2. Pour water from the 5-liter jug into the 3-liter jug until it is full, leaving exactly 2 liters in the 5-liter jug.
65
+ 3. Empty the 3-liter jug.
66
+ 4. Pour the 2 liters from the 5-liter jug into the empty 3-liter jug.
67
+ 5. Refill the 5-liter jug completely.
68
+ 6. Carefully pour water from the 5-liter jug into the 3-liter jug until the 3-liter jug is full. Since it already contains 2 liters, it will only take 1 liter to fill.
69
+ You now have exactly 4 liters of water remaining in the 5-liter jug.
70
+
71
+ 💻 Prompt Format (ChatML)
72
+ DeepLink-R1 strictly uses the ChatML prompt format.
73
+ code
74
+ Text
75
+ <|im_start|>system
76
+ You are a logical architect. Think step-by-step.<|im_end|>
77
+ <|im_start|>user
78
+ How many 'r's are in the word strawberry?<|im_end|>
79
+ <|im_start|>assistant
80
+ <think>
81
+ ...
82
+ </think>
83
+ ...<|im_end|>
84
+
85
+
86
+ 🚀 Usage
87
+ Using transformers (Python)
88
  code
89
  Python
90
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
96
  model = AutoModelForCausalLM.from_pretrained(
97
  model_id,
98
  torch_dtype=torch.bfloat16,
99
+ device_map="auto",
100
  )
101
 
102
  messages = [
 
104
  {"role": "user", "content": "How many 'r's are in the word strawberry?"}
105
  ]
106
 
107
+ inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
108
+ outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.6)
109
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))