Instructions to use Matrix-Corp/Zenith-7b-V1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Matrix-Corp/Zenith-7b-V1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Matrix-Corp/Zenith-7b-V1")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Matrix-Corp/Zenith-7b-V1", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Matrix-Corp/Zenith-7b-V1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Matrix-Corp/Zenith-7b-V1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Matrix-Corp/Zenith-7b-V1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Matrix-Corp/Zenith-7b-V1
- SGLang
How to use Matrix-Corp/Zenith-7b-V1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Matrix-Corp/Zenith-7b-V1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Matrix-Corp/Zenith-7b-V1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Matrix-Corp/Zenith-7b-V1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Matrix-Corp/Zenith-7b-V1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Matrix-Corp/Zenith-7b-V1 with Docker Model Runner:
docker model run hf.co/Matrix-Corp/Zenith-7b-V1
Zenith-7B V1
Standard GPU-optimized language model with code generation and emotional intelligence capabilities.
Features
- 7B Parameter Model: Efficient for consumer GPUs (8-16GB VRAM)
- Code Generation: Fine-tuned on Qwen2.5-Coder base for exceptional programming abilities
- Emotional Intelligence: EQ adapter for recognizing and responding to emotions
- OpenThoughts Integration: Trained on high-quality reasoning data
- LoRA/QLoRA Support: Efficient fine-tuning with 4-bit quantization
- Ollama Compatible: Ready-to-use Modelfile for easy deployment
Quick Start
Installation
# Clone and setup
cd Zenith/V1/7B
pip install -r requirements.txt
Training
# Full fine-tuning
python train.py \
--base_model Qwen/Qwen2.5-Coder-7B \
--train_data path/to/train.json \
--epochs 3 \
--batch_size 4 \
--learning_rate 2e-5
# LoRA fine-tuning (recommended for most users)
python train.py \
--base_model Qwen/Qwen2.5-Coder-7B \
--train_data path/to/train.json \
--use_lora \
--lora_r 16 \
--lora_alpha 32 \
--epochs 3 \
--batch_size 8
Inference
# Interactive mode
python inference.py --checkpoint ./outputs/checkpoint-final
# Single prompt
python inference.py \
--checkpoint ./outputs/checkpoint-final \
--prompt "Write a Python function to reverse a linked list" \
--max_new_tokens 512
Ollama Deployment
# Build and run with Ollama
ollama create zenith-7b -f Modelfile
ollama run zenith-7b "Explain quantum computing in simple terms"
Project Structure
Zenith/V1/7B/
├── configs/ # Configuration files
│ ├── zenith_config.py # Model architecture config
│ ├── data_config.py # Data processing config
│ └── training_config.py # Training hyperparameters
├── data/ # Data processing modules
│ ├── openthoughts_processor.py
│ ├── quality_filter.py
│ ├── curriculum_sampler.py
│ ├── advanced_tokenizer.py
│ └── preprocessing.py
├── src/ # Source code
│ ├── models/
│ │ ├── zenith_model.py
│ │ ├── dense_layer.py
│ │ └── moe_layer.py
│ └── utils/
├── scripts/ # Utility scripts
├── tests/ # Test suite
├── train.py # Main training script
├── inference.py # Inference and generation
├── test_model.py # Model validation tests
├── finetune_qwen.py # Qwen fine-tuning guide
├── Modelfile # Ollama configuration
├── requirements.txt # Python dependencies
└── README.md # This file
Configuration
The model uses a unified configuration system in configs/zenith_config.py:
from configs.zenith_config import get_7b_config
config = get_7b_config()
# Parameters:
# - hidden_size: 4096
# - num_layers: 32
# - num_heads: 32
# - num_experts: 0 (dense only, set >1 for MoE)
# - use_eq_adapter: True (emotional intelligence)
# - max_seq_len: 8192
Data Processing
OpenThoughts Integration
The data pipeline supports the OpenThoughts-1.2M dataset:
from data.openthoughts_processor import OpenThoughtsProcessor, OpenThoughtsConfig
config = OpenThoughtsConfig(
dataset_name="open-thoughts/OpenThoughts3-1.2M",
streaming=True,
quality_filtering=True,
curriculum_learning=True,
augmentation=True
)
processor = OpenThoughtsProcessor(config)
dataset = processor.load_dataset()
Quality Filtering
Multi-dimensional quality assessment:
- Length appropriateness
- Language detection (English only)
- Repetition detection
- Coherence scoring
- Structure validation
- Thought quality (for CoT data)
Curriculum Learning
Progressive training stages:
- Foundation: High-quality, well-structured samples
- Reasoning: Chain-of-thought and problem-solving
- Code: Programming and technical content
- Full: Complete dataset with all samples
Advanced Features
MoE (Mixture of Experts)
Enable sparse activation for better performance:
python train.py --use_moe --num_experts 8
- Top-2 routing with load balancing
- 60% of layers use MoE (middle layers)
- Shared router groups for efficiency
EQ Adapter
Emotional intelligence module:
python train.py --use_eq_adapter --eq_loss_weight 0.1
- Frustration detection (regression)
- 8-emotion classification
- Fused with attention mechanism
LoRA/QLoRA
Efficient fine-tuning with low-rank adaptation:
# LoRA
python train.py --use_lora --lora_r 16 --lora_alpha 32
# QLoRA (4-bit quantization)
python train.py --use_qlora --use_lora --lora_r 8
Testing
Run the test suite:
python test_model.py
Tests include:
- Model creation and initialization
- Forward pass and gradient flow
- Text generation
- Multi-task outputs (EQ adapter)
- Loss computation
Requirements
See requirements.txt for full dependencies. Key packages:
- torch>=2.0.0
- transformers>=4.35.0
- datasets>=2.14.0
- accelerate>=0.24.0
- peft>=0.6.0 (for LoRA)
- bitsandbytes>=0.41.0 (for QLoRA)
- tensorboard>=2.14.0
Performance Tips
- Mixed Precision: Use
--mixed_precision bf16for faster training (Ampere+ GPUs) - Gradient Checkpointing: Enabled by default to reduce memory
- Batch Size: Adjust based on VRAM (4-8 for 7B full, 16-32 for LoRA)
- Sequence Length: Longer sequences use more memory; adjust
--max_seq_length
Troubleshooting
Out of Memory
- Reduce batch size
- Use gradient accumulation
- Enable LoRA/QLoRA
- Use mixed precision
- Reduce sequence length
Slow Training
- Increase batch size if possible
- Use more gradient accumulation steps
- Ensure data loading is not the bottleneck
- Use mixed precision
Poor Quality Outputs
- Train longer (more epochs)
- Use higher quality data
- Adjust learning rate (try 1e-5 to 5e-5)
- Enable curriculum learning
- Use quality filtering
Citation
If you use Zenith-7B in your research, please cite:
@misc{zenith-7b-2025,
title={Zenith-7B: A Hybrid MoE Model for Code and Emotional Intelligence},
year={2025},
publisher={Zenith Project}
}
License
[Specify your license here]
Contact
For issues and questions, please open an issue on the project repository.