| # Enhanced Hybrid Transformer 416M | |
| π **416,417,792 parameter** transformer with modern optimizations. | |
| ## Features | |
| - **24 layers** Γ **16 heads** | |
| - **GQA-4** (Grouped Query Attention) | |
| - **SwiGLU** activation | |
| - **RMSNorm** normalization | |
| - **RoPE** positional embeddings | |
| ## Contents | |
| - `pytorch_model.bin` - Model weights | |
| - `config.json` - Model configuration | |
| - `tokenizer.json` - Tokenizer files | |
| - `README.md` - This file | |
| ## Usage | |
| Load with the original repository code for full functionality. | |
| --- | |
| π Generated with [Claude Code](https://claude.ai/code) | |