Spaces:
Running
A newer version of the Gradio SDK is available: 6.18.0
title: CodeFlow
emoji: π
colorFrom: indigo
colorTo: blue
sdk: gradio
python_version: '3.13'
sdk_version: 6.16.0
app_file: app.py
pinned: true
license: mit
short_description: Turn code into a readable Mermaid.js flowchart π!
tags:
- build-small-hackathon
- backyard-ai
- llama-cpp
- field-notes
- sharing-is-caring
- off-brand
- off-the-grid
- code
- mermaid.js
- flowchart
- small-models
- seq2seq
- gradio
- agentic
π CodeFlow
Paste code β read its logic as a flowchart. A 30B coder model runs entirely on CPU via llama.cpp to translate source code into a clean, animated Mermaid.js control-flow diagram β with each node wired back to the exact lines it came from.
π Links
π Live Space Β· βΆοΈ Demo Video Β· π¦ Social Post Β· π Field Notes (blog) Β· π Agent Traces
β The Problem
Reading unfamiliar code means simulating its control flow in your head β chasing branches, loops, and early returns line by line. That's slow, error-prone, and gets worse the deeper the nesting. Existing "code β diagram" tools are usually rigid AST parsers (brittle, language-locked) or cloud LLM APIs (your code leaves the building).
CodeFlow turns any snippet into a scannable flowchart you can audit at a glance β generated by a real language model that runs 100% locally, so nothing is sent to an external API.
βοΈ How It Works
Paste code βββΆ Generate βββΆ POST /generate_flowchart (Gradio API)
β
number the source lines + structured system prompt
β
Qwen3-Coder-30B-A3B (llama.cpp Β· CPU)
β
<thinking> β¦reasoningβ¦ </thinking>
graph TD β¦ nodes & edges β¦
<linemap> A:1 B:2 C:3-4 </linemap>
β
strip reasoning Β· parse + validate the line-map Β· sanitize labels
β
{ mermaid, linemap } βββΆ append agent_traces.jsonl
β
Mermaid render + "trace-the-path" reveal + node β code linking
- You paste code (or pick a pre-rendered example) into the CodeMirror editor and hit Generate.
- The backend numbers the source lines and sends them with a strict system prompt to Qwen3-Coder running on llama.cpp.
- The model returns hidden
<thinking>, the Mermaidgraph, and a<linemap>mapping every node to its source line(s). - The server strips the reasoning, validates the line-map against the source, sanitizes labels for Mermaid, and returns
{ mermaid, linemap }. - The frontend renders the diagram with a trace-the-path reveal that flows out of a persistent Start node while the canvas scrolls along in real time.
- Node β code linking: hover a node to highlight its source lines, click a node to jump-and-edit them, or move your cursor over a line to light up the matching node.
- Every generation is captured as a structured agent trace (
/traces).
π§° Tech Stack
| Layer | What it is | Used for |
|---|---|---|
| Model | Qwen3-Coder-30B-A3B-Instruct (Mixture-of-Experts) | Code β Mermaid + line-map generation |
| Quantization | Unsloth Dynamic UD-Q3_K_XL GGUF (~3-bit) | Shrinks the 30B model to run on CPU |
| Inference | llama-cpp-python (llama.cpp) |
Local CPU inference (n_ctx=4096) |
| Model fetch | huggingface_hub |
Downloads the GGUF on first run |
| Server | Gradio gr.Server + FastAPI |
/generate_flowchart API, / UI, /traces |
| Frontend | A single self-contained frontend.html (vanilla JS + CSS custom properties) |
Editor, diagram, animation, theming |
| Editor | CodeMirror 6 β vendored bundle (static/cm.bundle.js) |
Syntax-highlighted code input |
| Diagrams | Mermaid.js 10 β vendored UMD (static/mermaid.min.js) |
Flowchart rendering |
| Animation | Web Animations API | Trace-the-path reveal + theme crossfade |
| Type | Fraunces Β· Hanken Grotesk Β· JetBrains Mono β vendored woff2 (static/fonts/) |
Custom, non-default look |
| Assets | All JS/CSS/fonts bundled into static/ (no CDN at runtime) |
True offline operation |
| Observability | Hand-rolled JSONL agent traces | One trace per generation, served at /traces |
| Tests | smoke-test.sh (headless Chrome) |
13 build/render checks |
| Deploy | Hugging Face Spaces | Hosting |
π’ Total Parameters
CodeFlow is driven by Qwen3-Coder-30B-A3B-Instruct β a Mixture-of-Experts model with:
- β 30.5 billion total parameters
- β 3.3 billion active parameters per token (128 experts, 8 activated)
It's served as an Unsloth Dynamic ~3-bit (UD-Q3_K_XL) GGUF, which compresses those 30B weights to a CPU-runnable footprint (~13 GB on disk) β letting a 30B-class model generate diagrams off the grid, with no GPU and no external API.
π Badges (5 / 6)
These map to the Space tags above.
| Badge | How CodeFlow earns it |
|---|---|
| π Off the Grid | No external API or CDN at runtime β period. The model runs fully locally (Qwen3-Coder GGUF on CPU via llama.cpp), and every frontend asset (Mermaid, CodeMirror, the Gradio client, all fonts) is vendored into static/. The Gradio share tunnel is off (share=False). The only network call in the whole project is the one-time model download at startup. The UI even runs fully offline from file://. |
| π¨ Off-Brand | Zero default-Gradio look. A bespoke single-file UI: custom "Pine & Sage" palette (one-word rust fallback), Fraunces + Hanken Grotesk type, a hand-drawn decision-node logo, restyled Mermaid nodes, and a trace-the-path reveal animation β deliberately designed not to look templated. |
| π Field Notes | See the blog post. |
| π€ Sharing is Caring | Open-source under MIT, a public Space, plus a social post sharing the process and learnings. |
| π€ Agentic | Every model generation is captured as a structured agent trace (input code, the model's reasoning, output, token usage, latency), downloadable at /traces. |
π₯ Demo
βΆοΈ Click above, or use the Demo Video link at the top.
π» Run It Locally
First launch downloads the ~13 GB GGUF from Hugging Face. CPU inference is slow (cold generations can take minutes) β the built-in examples render instantly because their diagrams are pre-computed.
# 1. Clone
git clone REPLACE_ME_repo_url CodeFlow
cd CodeFlow
# 2. Create a virtual env
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# 3. Install deps (uses a prebuilt CPU wheel for llama-cpp-python)
pip install -r requirements.txt
# 4. Run β opens a local Gradio URL
python app.py
Then open the printed URL. Preview the UI without the model by opening frontend.html directly in a browser (file://) β fully offline, since all assets are vendored in static/; the example presets render their diagrams instantly.
Rebuilding the vendored bundles (optional): the CodeMirror + Gradio-client bundles in
static/are produced bybuild/build.sh(needs Node). Mermaid and the fonts are downloaded intostatic/as well. You never need this to run the app β only to regenerate the bundles.
Endpoints: / (UI) Β· /generate_flowchart (API) Β· /traces (download all agent traces as JSONL).
ποΈ Repository Structure
CodeFlow/
βββ app.py # Gradio + FastAPI server: loads the model and exposes
β # /generate_flowchart (API), / (UI), /static, /traces
βββ frontend.html # Self-contained UI β CodeMirror editor, Mermaid render,
β # trace-the-path animation, nodeβcode linking, theming
βββ static/ # Vendored frontend assets β NO CDN at runtime
β βββ mermaid.min.js # Mermaid (UMD, ~3.2 MB)
β βββ cm.bundle.js # CodeMirror 6 (single IIFE bundle)
β βββ gradio-client.js # @gradio/client (IIFE bundle)
β βββ fonts.css # @font-face β local woff2
β βββ fonts/ # Fraunces Β· Hanken Grotesk Β· JetBrains Mono (woff2)
βββ build/ # Reproducible bundle build (Node) β build.sh + entry files
βββ requirements.txt # Python deps (CPU llama-cpp-python wheel, gradio, hub)
βββ smoke-test.sh # Headless-Chrome smoke test (13 checks)
βββ notes-for-blog.md # Field Notes β the full build log
βββ README.md # You are here
βββ LICENSE # MIT
βββ agent_traces.jsonl # (created at runtime) one JSON line per generation
β οΈ Limitations
- CPU inference is slow. A 30B model on CPU means cold generations can take minutes; the demo leans on pre-rendered examples for instant feedback.
- 3-bit quantization trades some fidelity for the ability to run a 30B model at all β occasional imperfect diagrams.
- 4096-token context β very large files won't fit; works best on functions/snippets.
- Line-map depends on the model. The
<linemap>is LLM-generated; the server validates and drops bad entries, so nodeβcode links can be partial on tricky code. - Paraphrased labels. Nodes describe logic in plain words (no raw code), so they read cleanly but aren't verbatim.
- Mermaid parse failures on unusual syntax are possible (the raw output is shown so nothing is lost).
- Ephemeral traces on Spaces.
agent_traces.jsonllives on the runtime filesystem and resets on restart/rebuild β download it before then.
π Credits
- Model: Qwen3-Coder (Qwen Team, Alibaba) β GGUF quant by Unsloth.
- Inference: llama.cpp via
llama-cpp-python(Andrei Betlen). - App framework: Gradio (Hugging Face).
- Diagrams: Mermaid.js Β· Editor: CodeMirror.
- Type: Fraunces, Hanken Grotesk, JetBrains Mono (Google Fonts, SIL OFL).
- Built for the Build Small Hackathon.
π License
Released under the MIT License β see LICENSE. Β© 2026 Rishi Jain.