Matrix 2
Model Description
Matrix 2 is a fine-tuned version of DeepSeek-R1-Distill-Qwen-7B, trained on a focused mixture of chain-of-thought reasoning, math, coding, and logic data. It is the flagship reasoning model of the Inelly lineup -- built for deep, accurate, step-by-step problem solving.
- Developed by: Bry (GenueAI)
- Base model: DeepSeek-R1-Distill-Qwen-7B
- Fine-tuning method: QLoRA (4-bit NF4, rank 16)
- Parameters: 7.62B (base) + ~6.5M trainable (LoRA adapters)
- License: MIT (inherited from DeepSeek-R1)
Intended Use
Matrix 2 is intended for:
- Deep Chain-of-Thought reasoning – Multi-step problem solving with clear logic
- Mathematics – Algebra, arithmetic, word problems, multi-step calculations
- Code generation – Python functions with proper logic and comments
- Logical deduction – Syllogisms, puzzles, transitive reasoning
- Scientific explanations – Physics, biology, general science
- Complex instruction following – Multi-part tasks requiring structured thinking
Out of Scope
- Not intended for production deployment without further safety evaluation
- Safety alignment inherited from DeepSeek-R1 base; fine-tuning data did not include adversarial safety examples
- Larger memory footprint than 1.5B/3B variants (~5.2GB)
Training Data
Matrix 2 was fine-tuned for 1 epoch on ~5,225 samples drawn from:
| Dataset | Samples | Purpose |
|---|---|---|
| Bespoke-Stratos-35k | 3,000 | Chain-of-thought math & reasoning |
| OpenThoughts-114k | 2,500 | Code generation with reasoning |
| dolphin-r1 | 2,000 | General reasoning (DeepSeek-R1 distill) |
All samples were deduplicated and reasoning-weighted (2x oversample for CoT examples). Maximum sequence length: 512 tokens.
Training Hyperparameters
| Parameter | Value |
|---|---|
| Base model | DeepSeek-R1-Distill-Qwen-7B |
| Quantization | 4-bit NF4 (bitsandbytes) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Learning rate | 2e-4 |
| Batch size | 8 (gradient accumulation) |
| Epochs | 1 |
| Max seq length | 512 |
| Optimizer | AdamW 8-bit |
| LR scheduler | cosine |
| Warmup ratio | 0.05 |
| Training time | ~74 min |
| Hardware | RTX 3090 (24GB VRAM) |
Model Architecture
| Property | Value |
|---|---|
| Model type | Qwen2ForCausalLM |
| Hidden size | 3,584 |
| Layers | 28 |
| Attention heads | 28 |
| Head dim | 128 |
| Intermediate size | 18,944 |
| Vocab size | 152,064 |
| Context length | 131,072 |
| Total parameters | ~7.62B |
| Trainable parameters | ~6.5M (LoRA) |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("path/to/matrix-2", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("path/to/matrix-2")
messages = [{"role": "user", "content": "Solve for x: 3x + 7 = 22. Show all steps."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9)
response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
Performance
Informal GPU testing across 8 categories:
| Category | Result |
|---|---|
| Chain-of-Thought reasoning | ✅ Excellent multi-step logic |
| Math | ✅ Accurate with detailed work shown |
| Code generation | ✅ Clean, well-commented Python |
| Logic puzzles | ✅ Thorough deductive reasoning |
| General knowledge | ✅ Accurate, detailed explanations |
| Complex reasoning | ✅ Handles multi-step word problems well |
Inelly / GenueAI Model Family
| Model | Size | Focus |
|---|---|---|
| Matrix 2 (this model) | 7B | Deep CoT reasoning, math, coding |
| Inelly 4.5 | 3B | Conversation + politeness + CoT |
| Inelly 4.5 Blaze | 1.5B | Fast reasoning + CoT |
Limitations
- Safety: Inherited from DeepSeek-R1 base; not specifically safety-tuned. May occasionally follow harmful instructions.
- Memory: Requires ~5.2GB VRAM for inference (FP16)
- Context length: Fine-tuned on 512-token sequences; base supports 128K but fine-tuned performance is optimized for shorter contexts
- Factual accuracy: May hallucinate in specialized domains (law, medicine, finance)
- Speed: Slower than 1.5B/3B variants due to size
Acknowledgments
- DeepSeek-R1 by DeepSeek AI (base model)
- Bespoke Labs for Stratos dataset
- OpenThoughts team
- Cognitive Computations for dolphin-r1
Citation
@misc{matrix2,
title = {Matrix 2: A 7B Chain-of-Thought Reasoning Model},
author = {Bry},
organization = {GenueAI},
year = {2026},
note = {Fine-tuned from DeepSeek-R1-Distill-Qwen-7B using QLoRA},
}