PrismML's Ternary Bonsai models in Trillim's TRNQ format
Trillim
company
AI & ML interests
Running AI on consumer hardware
Recent Activity
Organization Card
Trillim
We're building local AI that runs on the hardware you already have.
Trillim builds infrastructure for running models on consumer CPUs and edge devices — no GPU required. We train and fine-tune ternary ({-1, 0, 1}) models designed to run efficiently on commodity hardware, and build the tooling to deploy them.
What we believe
GPUs are powerful but expensive, power-hungry, and scarce. Ternary quantization changes the equation: models with {-1, 0, 1} weights don't need floating-point multipliers at all. The right software can make CPUs fast enough for real-time inference. AI should run anywhere — laptops, Raspberry Pis, edge devices — not just in datacenters.
What we're building
- DarkNet — our proprietary high-performance CPU inference engine purpose-built for ternary models, with hand-tuned SIMD kernels for x86 (AVX2) and ARM (NEON) - more supported architectures coming soon
- Tooling — an OpenAI-compatible API server, CLI chat interface, LoRA adapter hot-swap, and an integrated voice pipeline (STT + TTS)
- Models — ternary models fine-tuned and pre-quantized for efficient CPU inference, hosted here on HuggingFace. Look for the
-TRNQsuffix.
Supported model architectures
BitNet, Llama, Qwen2, Mistral
Links
models 12
Trillim/Bonsai-1.7B-TRNQ
Text Generation • Updated • 170
Trillim/Bonsai-4B-TRNQ
Text Generation • Updated • 163
Trillim/Bonsai-8B-TRNQ
Text Generation • Updated • 169
Trillim/Bonsai-8BT-TRNQ
Text Generation • Updated • 57 • 2
Trillim/Bonsai-4BT-TRNQ
Text Generation • Updated • 31
Trillim/Bonsai-1.7BT-TRNQ
Text Generation • Updated • 32
Trillim/BitNet-Search-LoRA-TRNQ
Updated
Trillim/BitNet-GenZ-LoRA-TRNQ
Updated
Trillim/Llama3-TRNQ
Updated • 17
Trillim/BitNet-Large-TRNQ
Updated • 11
datasets 0
None public yet