AI & ML interests

I do LLM for my projects

Recent Activity

kgrabkoĀ  updated a Space 30 days ago
CMSManhattan/README
kgrabkoĀ  updated a model about 2 months ago
CMSManhattan/JiRack_GPT5_1b
kgrabkoĀ  updated a model about 2 months ago
CMSManhattan/JiRackTernary_140b
View all activity

Organization Card
     ___   _  ____             _
    |_  | |_| |  _ \ __ _  ___| | __
      | | | | | |_) / _` |/ __| |/ /
  /\__/ / | | |  _ < (_| | (__|   <
  \____/  |_| |_| \_\__,_|\___|_|\_\

  Java Intelligent Rack System with PyTorch models with ML script and chatbots
  JiRack LLM by CMS Manhattan with Models open source code
 
  Web services with OnnxRuntime Launch Inference images with Open AI and Ollama REST AP support in docker hub.
  OnnxRuntime suppots to run models on many GPU cards and Data Centers
  Docker repo : https://hub.docker.com/u/cmsmanhattan

šŸ”„ CMSManhattan - Frontier Ternary Neural Networks

Creating the world's first 405B parameter ternary model

šŸŽÆ Mission

Democratizing access to massive language models through extreme efficiency. Training state-of-the-art LLMs on accessible hardware using 1.58-bit (ternary) precision.


šŸ† JiRackTernary Series

A family of efficient large language models based on BitNet architecture with ternary weights {-1, 0, 1}.

🌐 Public Models

Model Parameters Size Status Link
JiRackTernary_1b 1B ~350MB āœ… Released Download
JiRackTernary_8b 8B ~3GB āœ… Released Download

šŸ”’ Private Models (In Training)

Model Parameters Size Status ETA
JiRackTernary_70b 70B ~25GB 🚧 Training (Step 15,600+) Q2 2026
JiRackTernary_405b 405B ~115GB šŸ”„ WORLD'S FIRST 405B TERNARY Q3 2026

⚔ Key Innovations

7x Compression

Traditional LLaMA-3 70B:  ~140 GB (FP16)
JiRackTernary 70B:        ~25 GB  (1.58-bit)
Compression ratio:        7x smaller! šŸ”„

Accessible Training

  • 70B trained on single A100 80GB (Colab Pro+ - $50/month)
  • 405B trained on single H200 141GB (Colab Enterprise)
  • Novel layer-by-layer training approach
  • No supercomputer clusters required!

Production-Ready Architecture

  • LLaMA-based with BitLinear layers
  • Ultra-lean memory offloading
  • 4-in-1 weight packing
  • Optimized for inference speed

šŸ”¬ Technical Highlights

Architecture Details

- Base: LLaMA-3 architecture
- Precision: 1.58-bit ternary weights {-1, 0, 1}
- Layers: Custom JiRackBitLinear with weight packing
- Normalization: RMSNorm
- Training: Layer-by-layer with gradient accumulation

Training Infrastructure

70B Model:
ā”œā”€ā”€ Hardware: A100 80GB (Colab Pro+)
ā”œā”€ā”€ Method: Layer-by-layer training
ā”œā”€ā”€ Batch size: 1 (micro)
ā”œā”€ā”€ Sequence length: 768 tokens
└── Cost: ~$50/month

405B Model:
ā”œā”€ā”€ Hardware: H200 141GB (Colab Enterprise)
ā”œā”€ā”€ Method: Advanced layer-by-layer
ā”œā”€ā”€ Optimized for massive scale
└── World's first 405B ternary model šŸ†

šŸ“Š Performance

Model Comparison

Model Size Precision Memory Training Cost
LLaMA-3 70B 140GB FP16 Massive cluster $$$$$$
LLaMA-3 70B (4-bit) 35GB 4-bit 2-4x A100 N/A (PTQ)
JiRackTernary 70B 25GB 1.58-bit 1x A100 $150-200

Current Training Status (Updated: 2026-02-09)

70B Model:

  • Step: 15,600+
  • Loss: ~7-9
  • PPL: ~3,000-5,000
  • Status: Early training phase (target: 100k+ steps)

405B Model:

  • Status: Active training on H200
  • Target: World's first converged 405B ternary model
  • Timeline: Q3 2026 estimated completion

šŸŽ“ Research & Publications

Upcoming

šŸ“ Technical Paper (In Preparation)

  • Title: "JiRackTernary-405B: Scaling Ternary Neural Networks to 405 Billion Parameters"
  • Target: NeurIPS 2026 / ICML 2027

šŸ“Š Benchmark Suite

  • MMLU, HellaSwag, HumanEval
  • Comparison with LLaMA-3, Mixtral, DeepSeek
  • Efficiency metrics (inference speed, memory)

šŸŽ¤ Open Source Release

  • Training code & documentation
  • Layer-by-layer methodology
  • Reproducibility guidelines

šŸš€ Why This Matters

For Researchers

āœ… Train massive models without supercomputers
āœ… Reproduce frontier research on Colab
āœ… Enable new compression research directions

For Industry

āœ… Deploy 405B-class models on fewer GPUs
āœ… Faster inference with ternary operations
āœ… Lower hosting costs (7x smaller)

For Community

āœ… Democratization of large language models
āœ… Accessible AI for everyone
āœ… Open research methodology


šŸ“– Learn More

Blog Posts (Coming Soon)

  • šŸ”§ "Training 70B on a Single A100: Our Layer-by-Layer Approach"
  • šŸ“Š "Ternary Weights at Scale: Lessons from 15,000 Steps"
  • šŸš€ "Road to 405B: The Journey to World's First Ternary Mega-Model"

Technical Documentation

  • šŸ“š Architecture deep-dive
  • šŸ› ļø Training methodology
  • šŸ’» Code examples & tutorials

šŸ¤ Community

Get Involved

  • ⭐ Star our models on HuggingFace
  • šŸ’¬ Join discussions in model repos
  • šŸ› Report issues or suggestions
  • šŸ“§ Contact: [Your contact method]

Citation

@misc{jiracternary2026,
  author = {CMSManhattan (kgrabko)},
  title = {JiRackTernary: Scaling Ternary Neural Networks to 405 Billion Parameters},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/CMSManhattan}
}

šŸ† Recognition

World's First 405B Ternary Model šŸ”„
Proving that massive language models can be trained efficiently on accessible hardware


šŸ“Š Follow Progress

Track our journey:

  • šŸ”„ Regular updates in model repos
  • šŸ“ˆ Training metrics & visualizations
  • šŸŽÆ Milestone announcements
  • šŸŽ“ Research publications

Making AI accessible, one ternary weight at a time. ✨


Last updated: 2026-02-09

Trademark infomation : https://uspto.report/TM/90579072

  • SERIAL NUMBER 90579072