3amthoughts
/

DeepLink-R1

@@ -1,17 +1,90 @@
-🌌 DeepLink-R1
-<div align="center">
-<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2.png" width="200" alt="DeepLink-R1 Logo Concept"/>
-</div>
-DeepLink-R1 is a reasoning-focused Large Language Model built on the Qwen2.5-7B architecture and distilled from DeepSeek-R1. Engineered to embody the persona of a "Logical Architect," this model doesn't just provide answers—it constructs transparent, mathematically rigorous blueprints of thought.
-By utilizing the <think> tag, DeepLink-R1 exposes its internal reasoning process before delivering its final, refined response.
-🔗 Quick Links
-Primary Model (BF16/FP16): 3amthoughts/DeepLink-R1
-Quantized Model (GGUF): 3amthoughts/DeepLink-R1-GGUF
-🧠 The "Logical Architect" Persona
-DeepLink-R1 is designed for complex problem-solving, coding, and mathematical reasoning. When prompted, the model will output a structured thought process enclosed in <think> ... </think> tags, allowing users to follow the logical steps taken to arrive at the conclusion.
-💻 Usage & Inference
-DeepLink-R1 uses the ChatML prompt format.
-Option 1: Using transformers (Python)
 code
 Python
 from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -23,7 +96,7 @@ tokenizer = AutoTokenizer.from_pretrained(model_id)
 model = AutoModelForCausalLM.from_pretrained(
     model_id,
     torch_dtype=torch.bfloat16,
-    device_map="auto"
 )
 messages = [
@@ -31,42 +104,6 @@ messages = [
     {"role": "user", "content": "How many 'r's are in the word strawberry?"}
 ]
-text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-inputs = tokenizer(text, return_tensors="pt").to("cuda")
-outputs = model.generate(**inputs, max_new_tokens=1024)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-Option 2: Using llama.cpp or Ollama (GGUF)
-For local, CPU-friendly, or low-VRAM inference, use the GGUF version.
-code
-Bash
-# Example using llama.cpp
-./main -m Qwen3.5-4B.Q4_K_M.gguf -n 1024 -p "<|im_start|>system\nYou are a logical architect.<|im_end|>\n<|im_start|>user\nSolve this math problem...<|im_end|>\n<|im_start|>assistant\n<think>\n"
-🏗️ Training Methodology: The Forge
-DeepLink-R1 was trained using Unsloth for 2x faster, memory-efficient fine-tuning, successfully navigating the constraints of a single Tesla T4 (16GB VRAM) GPU.
-Hardware & Framework Optimizations
-Framework: Unsloth & Hugging Face trl
-Hardware: 1x NVIDIA Tesla T4 (16GB)
-Memory Management:
-Loaded in 4-bit quantization via bitsandbytes.
-Enabled Unsloth's optimized Gradient Checkpointing.
-Dynamic Max Sequence Length (2048 - 4096) to maintain stability during specific training phases.
-LoRA Configuration
-We utilized Low-Rank Adaptation (LoRA) to efficiently update the model's weights:
-Target Modules: All linear layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
-Rank (r): 16
-Alpha: 16
-Dropout: 0
-Hyperparameters
-Optimizer: AdamW 8-bit
-Learning Rate: 2e-4
-Global Batch Size: 8 (1 per device × 8 gradient accumulation steps)
-Training Steps: 350
-Note: Training successfully managed a runtime restart by resuming from an uploaded adapter state, ensuring zero progress loss.
-📚 Dataset Engineering: The Knowledge
-To forge the "Logical Architect," we engineered a high-fidelity intelligence mixture by streaming and combining three elite reasoning datasets. All data was strictly aligned to the ChatML template to ensure seamless integration.
-ServiceNow-AI/R1-Distill-SFT: Provided the foundational reasoning logic and structured thought generation.
-open-r1/Mixture-of-Thoughts: Introduced highly diverse cognitive patterns and problem-solving approaches.
-bespokelabs/Bespoke-Stratos-17k: Applied for high-tier refinement, mathematical rigor, and complex multi-step logic.
-🏆 The Result
-DeepLink-R1 stands as a testament to efficient distillation. It proves that with precise dataset curation, ChatML alignment, and aggressive memory optimization (Unsloth + 4-bit LoRA), a 7B parameter model can achieve elite logical depth on highly accessible hardware.

+---
+base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
+library_name: transformers
+tags:
+- reasoning
+- chain-of-thought
+- deepseek
+- qwen
+- unsloth
+- lora
+- gguf
+- chatml
+- agent
+- code
+- thinking
+license: apache-2.0
+---
+# 🌌 DeepLink-R1
+**DeepLink-R1** is a highly specialized, reasoning-focused large language model designed to act as a **"Logical Architect."** Built on top of the **`deepseek-ai/DeepSeek-R1-Distill-Qwen-7B`** architecture, this model doesn't just provide answers—it constructs transparent, mathematically rigorous blueprints of thought.
+It is trained to "think" before it speaks using `<think>` tags, exposing its internal logical deduction process before delivering a final, refined response.
+Created by **3amthoughts**.
+### ⚡ Model Highlights
+* **Architecture:** 7B Parameters (Base: `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B`)
+* **Format:** Available in BF16/FP16 (Transformers) and GGUF (Q4_K_M for local execution via llama.cpp/Ollama)
+* **Capabilities:** Deep logical reasoning, mathematical rigor, complex problem-solving, and transparent thought generation.
+* **Training:** Fine-tuned using QLoRA via Unsloth on a high-fidelity mixture of elite reasoning datasets.
+### 🎯 Key Capabilities
+* **Transparent Reasoning:** Trained to output its internal thought processes inside `<think>` tags, drastically reducing hallucinations and logical leaps.
+* **High-Tier Logic:** Excels at math, coding, and multi-step deduction by breaking down complex problems into manageable architectural blueprints.
+* **Lightweight & Fast:** Distilled into a 7B parameter footprint, making elite-level reasoning runnable on consumer hardware (MacBooks, RTX 3090/4090s).
+---
+### 🛠️ How it Works (Prompting & Output)
+To get the model to trigger its reasoning engine, you must use the standard ChatML template and prompt it to think step-by-step.
+**System Prompt:**
+> "You are a logical architect. Think step-by-step."
+**Example Interaction**
+**User:**
+> "If I have a 5-liter jug and a 3-liter jug, how can I measure exactly 4 liters of water?"
+**DeepLink-R1 Output:**
+```text
+<think>
+To measure exactly 4 liters using a 5-liter jug (J5) and a 3-liter jug (J3):
+1. Fill J5 to the top (J5 = 5L, J3 = 0L).
+2. Pour water from J5 into J3 until J3 is full. This takes 3 liters. (J5 = 2L, J3 = 3L).
+3. Empty J3 completely. (J5 = 2L, J3 = 0L).
+4. Pour the remaining 2 liters from J5 into J3. (J5 = 0L, J3 = 2L).
+5. Fill J5 to the top again. (J5 = 5L, J3 = 2L).
+6. Pour water from J5 into J3 until J3 is full. Since J3 already has 2 liters, it only takes 1 liter to fill it. (J5 = 4L, J3 = 3L).
+Now, J5 contains exactly 4 liters.
+</think>
+To measure exactly 4 liters, follow these steps:
+1. Fill the 5-liter jug completely.
+2. Pour water from the 5-liter jug into the 3-liter jug until it is full, leaving exactly 2 liters in the 5-liter jug.
+3. Empty the 3-liter jug.
+4. Pour the 2 liters from the 5-liter jug into the empty 3-liter jug.
+5. Refill the 5-liter jug completely.
+6. Carefully pour water from the 5-liter jug into the 3-liter jug until the 3-liter jug is full. Since it already contains 2 liters, it will only take 1 liter to fill.
+You now have exactly 4 liters of water remaining in the 5-liter jug.
+💻 Prompt Format (ChatML)
+DeepLink-R1 strictly uses the ChatML prompt format.
+code
+Text
+<|im_start|>system
+You are a logical architect. Think step-by-step.<|im_end|>
+<|im_start|>user
+How many 'r's are in the word strawberry?<|im_end|>
+<|im_start|>assistant
+<think>
+...
+</think>
+...<|im_end|>
+🚀 Usage
+Using transformers (Python)
 code
 Python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model = AutoModelForCausalLM.from_pretrained(
     model_id,
     torch_dtype=torch.bfloat16,
+    device_map="auto",
 )
 messages = [
     {"role": "user", "content": "How many 'r's are in the word strawberry?"}
 ]
+inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
+outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.6)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))