README.md · 3amthoughts/DeepLink-R1-GGUF at main

DeepLink-R1-GGUF / README.md

3amthoughts

Update README.md

f8a6b8a verified about 19 hours ago

preview code

raw

history blame contribute delete

4.32 kB

	---
	base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
	library_name: transformers
	tags:
	- reasoning
	- chain-of-thought
	- deepseek
	- qwen
	- gguf
	- bnb
	- 4bit
	- unsloth
	- chatml
	- agent
	- code
	- thinking
	- distilled
	license: apache-2.0
	---

	# 🌌 DeepLink-R1

	DeepLink-R1 is a highly specialized, reasoning-focused large language model designed to act as a "Logical Architect." Built on top of the `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` architecture, this model doesn't just provide answers—it constructs transparent, mathematically rigorous blueprints of thought.

	It is trained to "think" before it speaks using `<think>` tags, exposing its internal logical deduction process before delivering a final, refined response.

	Created by 3amthoughts.

	### ⚡ Model Highlights
	* Architecture: 7B Parameters (Base: `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B`)
	* Format: Available in BF16/FP16 (Transformers) and GGUF (Q4_K_M for local execution via llama.cpp/Ollama)
	* Capabilities: Deep logical reasoning, mathematical rigor, complex problem-solving, and transparent thought generation.
	* Training: Fine-tuned using QLoRA via Unsloth on a high-fidelity mixture of elite reasoning datasets.

	### 🎯 Key Capabilities
	* Transparent Reasoning: Trained to output its internal thought processes inside `<think>` tags, drastically reducing hallucinations and logical leaps.
	* High-Tier Logic: Excels at math, coding, and multi-step deduction by breaking down complex problems into manageable architectural blueprints.
	* Lightweight & Fast: Distilled into a 7B parameter footprint, making elite-level reasoning runnable on consumer hardware (MacBooks, RTX 3090/4090s).

	---

	### 🛠️ How it Works (Prompting & Output)
	To get the model to trigger its reasoning engine, you must use the standard ChatML template and prompt it to think step-by-step.

	System Prompt:
	> "You are a logical architect. Think step-by-step."

	Example Interaction
	User:
	> "If I have a 5-liter jug and a 3-liter jug, how can I measure exactly 4 liters of water?"

	DeepLink-R1 Output:
	```text
	<think>
	To measure exactly 4 liters using a 5-liter jug (J5) and a 3-liter jug (J3):
	1. Fill J5 to the top (J5 = 5L, J3 = 0L).
	2. Pour water from J5 into J3 until J3 is full. This takes 3 liters. (J5 = 2L, J3 = 3L).
	3. Empty J3 completely. (J5 = 2L, J3 = 0L).
	4. Pour the remaining 2 liters from J5 into J3. (J5 = 0L, J3 = 2L).
	5. Fill J5 to the top again. (J5 = 5L, J3 = 2L).
	6. Pour water from J5 into J3 until J3 is full. Since J3 already has 2 liters, it only takes 1 liter to fill it. (J5 = 4L, J3 = 3L).
	Now, J5 contains exactly 4 liters.
	</think>
	To measure exactly 4 liters, follow these steps:
	1. Fill the 5-liter jug completely.
	2. Pour water from the 5-liter jug into the 3-liter jug until it is full, leaving exactly 2 liters in the 5-liter jug.
	3. Empty the 3-liter jug.
	4. Pour the 2 liters from the 5-liter jug into the empty 3-liter jug.
	5. Refill the 5-liter jug completely.
	6. Carefully pour water from the 5-liter jug into the 3-liter jug until the 3-liter jug is full. Since it already contains 2 liters, it will only take 1 liter to fill.
	You now have exactly 4 liters of water remaining in the 5-liter jug.
	```


	💻 Prompt Format (ChatML)
	DeepLink-R1 strictly uses the ChatML prompt format.
	``` code
	Text
	<\|im_start\|>system
	You are a logical architect. Think step-by-step.<\|im_end\|>
	<\|im_start\|>user
	How many 'r's are in the word strawberry?<\|im_end\|>
	<\|im_start\|>assistant
	<think>
	...
	</think>
	...<\|im_end\|>
	```

	🚀 Usage
	Using transformers (Python)
	```code
	Python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_id = "3amthoughts/DeepLink-R1"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)

	messages = [
	{"role": "system", "content": "You are a logical architect. Think step-by-step."},
	{"role": "user", "content": "How many 'r's are in the word strawberry?"}
	]

	inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
	outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.6)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```