# Matrix 2 ## Model Description **Matrix 2** is a fine-tuned version of [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B), trained on a focused mixture of chain-of-thought reasoning, math, coding, and logic data. It is the flagship reasoning model of the Inelly lineup -- built for deep, accurate, step-by-step problem solving. - **Developed by:** Bry (GenueAI) - **Base model:** DeepSeek-R1-Distill-Qwen-7B - **Fine-tuning method:** QLoRA (4-bit NF4, rank 16) - **Parameters:** 7.62B (base) + ~6.5M trainable (LoRA adapters) - **License:** MIT (inherited from DeepSeek-R1) --- ## Intended Use Matrix 2 is intended for: - **Deep Chain-of-Thought reasoning** – Multi-step problem solving with clear logic - **Mathematics** – Algebra, arithmetic, word problems, multi-step calculations - **Code generation** – Python functions with proper logic and comments - **Logical deduction** – Syllogisms, puzzles, transitive reasoning - **Scientific explanations** – Physics, biology, general science - **Complex instruction following** – Multi-part tasks requiring structured thinking ### Out of Scope - Not intended for production deployment without further safety evaluation - Safety alignment inherited from DeepSeek-R1 base; fine-tuning data did not include adversarial safety examples - Larger memory footprint than 1.5B/3B variants (~5.2GB) --- ## Training Data Matrix 2 was fine-tuned for 1 epoch on ~5,225 samples drawn from: | Dataset | Samples | Purpose | |---|---|---| | [Bespoke-Stratos-35k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-35k) | 3,000 | Chain-of-thought math & reasoning | | [OpenThoughts-114k](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) | 2,500 | Code generation with reasoning | | [dolphin-r1](https://huggingface.co/datasets/cognitivecomputations/dolphin-r1) | 2,000 | General reasoning (DeepSeek-R1 distill) | All samples were deduplicated and reasoning-weighted (2x oversample for CoT examples). Maximum sequence length: 512 tokens. --- ## Training Hyperparameters | Parameter | Value | |---|---| | Base model | DeepSeek-R1-Distill-Qwen-7B | | Quantization | 4-bit NF4 (bitsandbytes) | | LoRA rank | 16 | | LoRA alpha | 32 | | LoRA dropout | 0.05 | | Learning rate | 2e-4 | | Batch size | 8 (gradient accumulation) | | Epochs | 1 | | Max seq length | 512 | | Optimizer | AdamW 8-bit | | LR scheduler | cosine | | Warmup ratio | 0.05 | | Training time | ~74 min | | Hardware | RTX 3090 (24GB VRAM) | --- ## Model Architecture | Property | Value | |---|---| | Model type | Qwen2ForCausalLM | | Hidden size | 3,584 | | Layers | 28 | | Attention heads | 28 | | Head dim | 128 | | Intermediate size | 18,944 | | Vocab size | 152,064 | | Context length | 131,072 | | Total parameters | ~7.62B | | Trainable parameters | ~6.5M (LoRA) | --- ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("path/to/matrix-2", torch_dtype=torch.float16, device_map="auto") tokenizer = AutoTokenizer.from_pretrained("path/to/matrix-2") messages = [{"role": "user", "content": "Solve for x: 3x + 7 = 22. Show all steps."}] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) output = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9) response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True) print(response) ``` --- ## Performance Informal GPU testing across 8 categories: | Category | Result | |---|---| | Chain-of-Thought reasoning | ✅ Excellent multi-step logic | | Math | ✅ Accurate with detailed work shown | | Code generation | ✅ Clean, well-commented Python | | Logic puzzles | ✅ Thorough deductive reasoning | | General knowledge | ✅ Accurate, detailed explanations | | Complex reasoning | ✅ Handles multi-step word problems well | --- ## Inelly / GenueAI Model Family | Model | Size | Focus | |---|---|---| | **Matrix 2** (this model) | 7B | Deep CoT reasoning, math, coding | | Inelly 4.5 | 3B | Conversation + politeness + CoT | | Inelly 4.5 Blaze | 1.5B | Fast reasoning + CoT | --- ## Limitations - **Safety:** Inherited from DeepSeek-R1 base; not specifically safety-tuned. May occasionally follow harmful instructions. - **Memory:** Requires ~5.2GB VRAM for inference (FP16) - **Context length:** Fine-tuned on 512-token sequences; base supports 128K but fine-tuned performance is optimized for shorter contexts - **Factual accuracy:** May hallucinate in specialized domains (law, medicine, finance) - **Speed:** Slower than 1.5B/3B variants due to size --- ## Acknowledgments - [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) by DeepSeek AI (base model) - [Bespoke Labs](https://huggingface.co/bespokelabs) for Stratos dataset - [OpenThoughts](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) team - [Cognitive Computations](https://huggingface.co/cognitivecomputations) for dolphin-r1 --- ## Citation ``` @misc{matrix2, title = {Matrix 2: A 7B Chain-of-Thought Reasoning Model}, author = {Bry}, organization = {GenueAI}, year = {2026}, note = {Fine-tuned from DeepSeek-R1-Distill-Qwen-7B using QLoRA}, } ```