ArunkumarVR commited on
Commit
c89abb5
·
verified ·
1 Parent(s): a29defa

Update README: Canonical DeepBrainz Model Card

Browse files
Files changed (1) hide show
  1. README.md +93 -6
README.md CHANGED
@@ -2,14 +2,101 @@
2
  license: apache-2.0
3
  language:
4
  - en
 
5
  tags:
6
- - deepbrainz
7
- - reasoning
8
- - 4b
9
  - qwen3
 
 
 
 
 
 
 
10
  ---
 
11
  # DeepBrainz-R1-4B-16K
12
 
13
- **DeepBrainz-R1-4B-16K** is a 4B parameter reasoning model trained by DeepBrainz AI.
14
- - **Context:** 16,384
15
- - **Architecture:** Qwen3-4B (Hybrid Sharding Reconstruction)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  language:
4
  - en
5
+ pipeline_tag: text-generation
6
  tags:
 
 
 
7
  - qwen3
8
+ - reasoning
9
+ - long-context
10
+ - distillation
11
+ - math
12
+ - enterprise
13
+ - research
14
+ base_model: Qwen/Qwen3-4B
15
  ---
16
+
17
  # DeepBrainz-R1-4B-16K
18
 
19
+ **DeepBrainz-R1-4B-16K** is a high-performance reasoning model in the **DeepBrainz-R series**, designed for structured problem-solving, analysis, and enterprise research workflows.
20
+
21
+ It is distilled from the **Qwen3-32B** teacher model into a compact **4B** architecture using **Online Policy Distillation (OPD)**, emphasizing reasoning quality and instruction robustness over a **16K context window**.
22
+
23
+ ---
24
+
25
+ ## Model Highlights
26
+
27
+ - **4B Parameters**: Optimized balance of performance and inference cost.
28
+ - **16K Context Length**: Capable of processing medium-to-long documents and reasoning chains.
29
+ - **Distilled Precision**: Trained via NeMo-RL OPD from a **Qwen3-32B** teacher.
30
+ - **Architecture**: Standard Qwen3 (Dense), optimized for modern GPU inference.
31
+
32
+ ---
33
+
34
+ ## Intended Use
35
+
36
+ - **Complex Reasoning**: Multi-step math, logic puzzles, and code analysis.
37
+ - **Agentic Workflows**: Reliable planning and tool use within 16K context.
38
+ - **Research**: Investigating distillation scaling laws (32B $\to$ 4B).
39
+ - **Efficient Deployment**: Fits easily on consumer GPUs and edge servers.
40
+
41
+ *Note: This model is optimized for reasoning tasks. For general conversational chit-chat, we recommend applying a specific instruction template.*
42
+
43
+ ---
44
+
45
+ ## Usage
46
+
47
+ ```python
48
+ from transformers import AutoModelForCausalLM, AutoTokenizer
49
+ import torch
50
+
51
+ model_id = "DeepBrainz/DeepBrainz-R1-4B-16K"
52
+
53
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
54
+ model = AutoModelForCausalLM.from_pretrained(
55
+ model_id,
56
+ torch_dtype="bfloat16",
57
+ device_map="auto"
58
+ )
59
+
60
+ # Example: Math Reasoning
61
+ prompt = "Solve step by step: If 3x + 7 = 22, what is x?"
62
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
63
+
64
+ outputs = model.generate(
65
+ **inputs,
66
+ max_new_tokens=512,
67
+ temperature=0.6,
68
+ top_p=0.95,
69
+ do_sample=True
70
+ )
71
+
72
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
73
+ ```
74
+
75
+ ---
76
+
77
+ ## Training Summary
78
+
79
+ The model was produced using a **multi-stage optimization process** involving large-scale supervision and iterative refinement to improve reasoning quality and robustness.
80
+
81
+ - **Teacher**: Qwen3-32B (Dense)
82
+ - **Student**: Qwen3-4B
83
+ - **Method**: Online Policy Distillation (OPD)
84
+ - **Context**: 16,384 tokens
85
+
86
+ ---
87
+
88
+ ## Limitations
89
+
90
+ Performance depends on task complexity and inference configuration. While significantly stronger than smaller models, it may still hallucinate on obscure facts compared to 30B+ models.
91
+
92
+ ---
93
+
94
+ ## License
95
+
96
+ Apache 2.0
97
+
98
+ ---
99
+
100
+ ## About DeepBrainz
101
+
102
+ DeepBrainz builds reasoning-first AI systems focused on efficiency, structure, and real-world problem-solving.