AutoTrain Advanced
Create powerful AI models without code
I do LLM for my projects
___ _ ____ _
|_ | |_| | _ \ __ _ ___| | __
| | | | | |_) / _` |/ __| |/ /
/\__/ / | | | _ < (_| | (__| <
\____/ |_| |_| \_\__,_|\___|_|\_\
Java Intelligent Rack System with PyTorch models with ML script and chatbots
JiRack LLM by CMS Manhattan with Models open source code
Web services with OnnxRuntime Launch Inference images with Open AI and Ollama REST AP support in docker hub.
OnnxRuntime suppots to run models on many GPU cards and Data Centers
Docker repo : https://hub.docker.com/u/cmsmanhattan
Creating the world's first 405B parameter ternary model
Democratizing access to massive language models through extreme efficiency. Training state-of-the-art LLMs on accessible hardware using 1.58-bit (ternary) precision.
A family of efficient large language models based on BitNet architecture with ternary weights {-1, 0, 1}.
| Model | Parameters | Size | Status | Link |
|---|---|---|---|---|
| JiRackTernary_1b | 1B | ~350MB | ā Released | Download |
| JiRackTernary_8b | 8B | ~3GB | ā Released | Download |
| Model | Parameters | Size | Status | ETA |
|---|---|---|---|---|
| JiRackTernary_70b | 70B | ~25GB | š§ Training (Step 15,600+) | Q2 2026 |
| JiRackTernary_405b | 405B | ~115GB | š„ WORLD'S FIRST 405B TERNARY | Q3 2026 |
Traditional LLaMA-3 70B: ~140 GB (FP16)
JiRackTernary 70B: ~25 GB (1.58-bit)
Compression ratio: 7x smaller! š„
- Base: LLaMA-3 architecture
- Precision: 1.58-bit ternary weights {-1, 0, 1}
- Layers: Custom JiRackBitLinear with weight packing
- Normalization: RMSNorm
- Training: Layer-by-layer with gradient accumulation
70B Model:
āāā Hardware: A100 80GB (Colab Pro+)
āāā Method: Layer-by-layer training
āāā Batch size: 1 (micro)
āāā Sequence length: 768 tokens
āāā Cost: ~$50/month
405B Model:
āāā Hardware: H200 141GB (Colab Enterprise)
āāā Method: Advanced layer-by-layer
āāā Optimized for massive scale
āāā World's first 405B ternary model š
| Model | Size | Precision | Memory | Training Cost |
|---|---|---|---|---|
| LLaMA-3 70B | 140GB | FP16 | Massive cluster | $$$$$$ |
| LLaMA-3 70B (4-bit) | 35GB | 4-bit | 2-4x A100 | N/A (PTQ) |
| JiRackTernary 70B | 25GB | 1.58-bit | 1x A100 | $150-200 |
70B Model:
405B Model:
š Technical Paper (In Preparation)
š Benchmark Suite
š¤ Open Source Release
ā
Train massive models without supercomputers
ā
Reproduce frontier research on Colab
ā
Enable new compression research directions
ā
Deploy 405B-class models on fewer GPUs
ā
Faster inference with ternary operations
ā
Lower hosting costs (7x smaller)
ā
Democratization of large language models
ā
Accessible AI for everyone
ā
Open research methodology
@misc{jiracternary2026,
author = {CMSManhattan (kgrabko)},
title = {JiRackTernary: Scaling Ternary Neural Networks to 405 Billion Parameters},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/CMSManhattan}
}
World's First 405B Ternary Model š„
Proving that massive language models can be trained efficiently on accessible hardware
Track our journey:
Making AI accessible, one ternary weight at a time. āØ
Last updated: 2026-02-09
PHONE 516-777-0945
Demo JiRack LLM and CMS Manhattan RAG System
Download RAG System
git clone https://grabko1@bitbucket.org/cmsmanhattan/rag.git
Deployment kit on Docker or Kubernetes with API Gateway and Service Discovery by Microservice Architecture https://www.youtube.com/watch?v=M4Q8_Dr35Cc
Deployment script for RAG System: https://bitbucket.org/cmsmanhattan/rag