Spaces:
Running
Running
| title: README | |
| emoji: ⚡ | |
| colorFrom: indigo | |
| colorTo: purple | |
| sdk: static | |
| pinned: false | |
| license: mit | |
| # AtomGradient — Bringing AI to the Edge | |
| **We are an independent research group dedicated to making AI run efficiently on edge devices.** | |
| We believe powerful AI should be private, accessible, and free from cloud dependency. All our research is open-source. | |
| 🌐 [atomgradient.com](https://atomgradient.com) · 🐙 [GitHub](https://github.com/AtomGradient) · 🚀 [EchoStream AI](https://www.echostream-ai.com/) | |
| --- | |
| ## Research | |
| ### [Prism — Cross-Domain Personal Data Integration on Consumer Hardware](https://atomgradient.github.io/Prism/) | |
| Integrating finance, diet, mood, and reading data entirely on consumer Apple Silicon, producing emergent cross-domain insights with zero data leakage. | |
| - 📈 **1.48x** cross-domain insight emergence (IIR) | |
| - 🔒 **125.5x** federation compression, zero data leakage | |
| - ⚡ **49.9 TPS** real-time inference (35B on M2 Ultra) | |
| [[GitHub]](https://github.com/AtomGradient/Prism) · [[Paper]](https://atomgradient.github.io/Prism/) | |
| --- | |
| ### [ANE Batch Prefill — On-Device Parallel LLM Inference](https://atomgradient.github.io/hybird-batch-prefill-on-ane/) | |
| Fused matrix-vector kernels enabling concurrent ANE batch prefill + GPU decode on Apple Silicon for Qwen3.5 models. | |
| - 🚀 **11.3x** ANE batch prefill speedup (268 tok/s) | |
| - 🔋 **79%** power reduction for prefill component | |
| - ⏱️ **<30 ms** state transfer overhead | |
| [[GitHub]](https://github.com/AtomGradient/hybird-batch-prefill-on-ane) · [[Paper]](https://atomgradient.github.io/hybird-batch-prefill-on-ane/) | |
| --- | |
| ### [hybrid-ane-mlx-bench — Disaggregated LLM Inference on Apple Silicon](https://atomgradient.github.io/hybrid-ane-mlx-bench/) | |
| Benchmarking CoreML ANE prefill + MLX GPU decode for Qwen3.5 on Apple Silicon, with four inference strategies compared. | |
| - 🔄 ANE prefill matches GPU at **~410 tokens** | |
| - 🔋 **282x** GPU power reduction during prefill | |
| - 📊 4 inference pipelines benchmarked | |
| [[GitHub]](https://github.com/AtomGradient/hybrid-ane-mlx-bench) · [[Paper]](https://atomgradient.github.io/hybrid-ane-mlx-bench/) | |
| --- | |
| ### [swift-qwen3-tts — On-Device Text-to-Speech](https://atomgradient.github.io/swift-qwen3-tts/) | |
| Native Swift implementation of Qwen3 TTS 0.6B for real-time, on-device speech synthesis. | |
| - 📦 **67%** model compression (2.35 GB → 808 MB) | |
| - 🎙️ Real-time synthesis (**RTF 0.68x**) | |
| - 🌍 12 languages supported | |
| [[GitHub]](https://github.com/AtomGradient/swift-qwen3-tts) · [[Paper]](https://atomgradient.github.io/swift-qwen3-tts/) | |
| --- | |
| ### [Gemma-Prune — On-Device Vision Language Model](https://atomgradient.github.io/swift-gemma-cli/) | |
| Multi-stage compression pipeline for deploying Gemma 3 4B VLM on consumer hardware. | |
| - 📦 **25%** model compression (2.8 GB → 2.1 GB) | |
| - 📝 **110 tok/s** text generation | |
| - 🖼️ **3.4x** image processing speedup | |
| [[GitHub]](https://github.com/AtomGradient/swift-gemma-cli) · [[Paper]](https://atomgradient.github.io/swift-gemma-cli/) | |
| --- | |
| ### [OptMLX — MLX Memory Optimization Research](https://atomgradient.github.io/OptMLX/) | |
| Exploring memory optimization techniques for the MLX framework on Apple Silicon. | |
| - ⚡ Up to **20x** faster mmap loading | |
| - 🔄 Zero-copy model loading | |
| - 📊 Comprehensive benchmarks | |
| [[GitHub]](https://github.com/AtomGradient/OptMLX) · [[Paper]](https://atomgradient.github.io/OptMLX/) | |
| --- | |
| ## About | |
| AtomGradient is an independent research group dedicated to making AI run efficiently on edge devices. Our research powers [EchoStream AI](https://www.echostream-ai.com/) — a product line bringing on-device AI capabilities to real-world applications. | |
| `Edge AI` · `Privacy-First` · `Open Research` | |