We are excited to announce Sipp.sh: a high-performance library for running AI inference locally and in the cloud through a unified API.
We began to realize that an LLM isn't just a chat interface for information retrieval. It can be integrated directly into web, games, or productivity apps to handle continuous monitoring and decision-making. It can act as a sort of "second brain,” the silent hand that guides and helps a user without them even realizing it. We see this as the next frontier of UX design, but this is only possible if developers have access to low-cost, zero-latency compute and absolute data privacy.
That's why we created Sipp. It’s an opinionated library that lets developers integrate local AI into any application, giving them the superpowers to completely rethink user experiences across the web, games, and desktop.
To achieve this, we built an entirely new stack in Rust and C++, working alongside the llama.cpp project. Through our work, we were able to contribute back to that community to help upgrade the GGML WebGPU backend. This deep optimization is what enables our fast, responsive decode speeds directly in the browser. Sipp ships as a zero-dependency library for desktop and web, achieving 3x to 5x speedup in token decode compared to popular alternatives.
We are already seeing some incredible use cases emerge from this, from continuous monitoring using local vision to the dynamic generation of game elements in a real-time wizard vs. wizard game.
The best part? It's fully open-source!
We see this as the start of a dialogue about what the future of user interaction is going to look like, and we built Sipp to lay the foundation for that exciting future. Check out the live demos on our site, run your own benchmarks, or come hang out with us in our Discord.
🚀 Introducing PerceptionDLM — the first multimodal diffusion LLM for parallel region perception!
Most MLLMs are autoregressive, so captioning N regions costs N sequential passes. PerceptionDLM instead describes ALL masked regions in a single denoising process. 🧩
✨ Highlights • ⚡ Up to 3.4× faster on dense multi-region captioning, with stable per-image latency • 🏆 PerceptionDLM-Base beats LLaDA-V on 15/16 multimodal benchmarks (new SOTA among open diffusion VLMs) • 📊 New benchmark: ParaDLC-Bench — jointly evaluates caption quality AND inference efficiency • 🔓 Code, models & benchmark all open-sourced
Our preprint is out! We attempt to model human teaching behaviors into agents yielding a unified framework that enables adaptive personalized learning experiences: LectūraAgents addresses the prevailing limitations in current AI learning systems with three essential capabilities: (1) a hierarchical multi-agent architecture modeled on academic standards. we observe that agents collaborating across hierarchies yield better personalized learning outcomes. (2) an adaptive embodied teaching mechanism, in which the instructor agent executes visible and pedagogically motivated teaching actions (e.g. handwrite, highlight, circle etc) on contents in a teaching environment while speaking. (3) to achieve this we propose a novel teaching action-speech alignment algorithm (TASA) that dynamically aligns speech with visual teaching actions: specifically, TASA temporally chops up speech segments into word-level tokens, performs salience heuristics analysis on learning contents (texts, images etc) then identifies relevant regions to apply pedagogical teaching actions that guide attention and augment understanding.
We conducted several experiments to assess these capabilities: starting with pedagogical evaluation of the various components under frontier models, comparative analysis with existing frameworks and an efficacy study with real students.
Results show consistent gains in standard instructional metrics (curated by expert educators) spanning lecture content quality, embodied teaching quality, assessment, and personalization over baseline systems, positioning LectūraAgents as a pedagogically grounded framework for personalized learning at scale.