---
title: README
emoji: ⚡
colorFrom: indigo
colorTo: purple
sdk: static
pinned: false
license: mit
---

# AtomGradient — Bringing AI to the Edge

**We are an independent research group dedicated to making AI run efficiently on edge devices.**  
We believe powerful AI should be private, accessible, and free from cloud dependency. All our research is open-source.

🌐 [atomgradient.com](https://atomgradient.com) · 🐙 [GitHub](https://github.com/AtomGradient) · 🚀 [EchoStream AI](https://www.echostream-ai.com/)

---

## Research

### [Prism — Cross-Domain Personal Data Integration on Consumer Hardware](https://atomgradient.github.io/Prism/)

Integrating finance, diet, mood, and reading data entirely on consumer Apple Silicon, producing emergent cross-domain insights with zero data leakage.

- 📈 **1.48x** cross-domain insight emergence (IIR)
- 🔒 **125.5x** federation compression, zero data leakage
- ⚡ **49.9 TPS** real-time inference (35B on M2 Ultra)

[[GitHub]](https://github.com/AtomGradient/Prism) · [[Paper]](https://atomgradient.github.io/Prism/)

---

### [ANE Batch Prefill — On-Device Parallel LLM Inference](https://atomgradient.github.io/hybird-batch-prefill-on-ane/)

Fused matrix-vector kernels enabling concurrent ANE batch prefill + GPU decode on Apple Silicon for Qwen3.5 models.

- 🚀 **11.3x** ANE batch prefill speedup (268 tok/s)
- 🔋 **79%** power reduction for prefill component
- ⏱️ **<30 ms** state transfer overhead

[[GitHub]](https://github.com/AtomGradient/hybird-batch-prefill-on-ane) · [[Paper]](https://atomgradient.github.io/hybird-batch-prefill-on-ane/)

---

### [hybrid-ane-mlx-bench — Disaggregated LLM Inference on Apple Silicon](https://atomgradient.github.io/hybrid-ane-mlx-bench/)

Benchmarking CoreML ANE prefill + MLX GPU decode for Qwen3.5 on Apple Silicon, with four inference strategies compared.

- 🔄 ANE prefill matches GPU at **~410 tokens**
- 🔋 **282x** GPU power reduction during prefill
- 📊 4 inference pipelines benchmarked

[[GitHub]](https://github.com/AtomGradient/hybrid-ane-mlx-bench) · [[Paper]](https://atomgradient.github.io/hybrid-ane-mlx-bench/)

---

### [swift-qwen3-tts — On-Device Text-to-Speech](https://atomgradient.github.io/swift-qwen3-tts/)

Native Swift implementation of Qwen3 TTS 0.6B for real-time, on-device speech synthesis.

- 📦 **67%** model compression (2.35 GB → 808 MB)
- 🎙️ Real-time synthesis (**RTF 0.68x**)
- 🌍 12 languages supported

[[GitHub]](https://github.com/AtomGradient/swift-qwen3-tts) · [[Paper]](https://atomgradient.github.io/swift-qwen3-tts/)

---

### [Gemma-Prune — On-Device Vision Language Model](https://atomgradient.github.io/swift-gemma-cli/)

Multi-stage compression pipeline for deploying Gemma 3 4B VLM on consumer hardware.

- 📦 **25%** model compression (2.8 GB → 2.1 GB)
- 📝 **110 tok/s** text generation
- 🖼️ **3.4x** image processing speedup

[[GitHub]](https://github.com/AtomGradient/swift-gemma-cli) · [[Paper]](https://atomgradient.github.io/swift-gemma-cli/)

---

### [OptMLX — MLX Memory Optimization Research](https://atomgradient.github.io/OptMLX/)

Exploring memory optimization techniques for the MLX framework on Apple Silicon.

- ⚡ Up to **20x** faster mmap loading
- 🔄 Zero-copy model loading
- 📊 Comprehensive benchmarks

[[GitHub]](https://github.com/AtomGradient/OptMLX) · [[Paper]](https://atomgradient.github.io/OptMLX/)

---

## About

AtomGradient is an independent research group dedicated to making AI run efficiently on edge devices. Our research powers [EchoStream AI](https://www.echostream-ai.com/) — a product line bringing on-device AI capabilities to real-world applications.

`Edge AI` · `Privacy-First` · `Open Research`