---
title: CodeFlow
emoji: π
colorFrom: indigo
colorTo: blue
sdk: gradio
python_version: '3.13'
sdk_version: 6.16.0
app_file: app.py
pinned: true
license: mit
short_description: Turn code into a readable Mermaid.js flowchart π!
tags:
- build-small-hackathon
- backyard-ai
- llama-cpp
- field-notes
- sharing-is-caring
- off-brand
- off-the-grid
- code
- mermaid.js
- flowchart
- small-models
- seq2seq
- gradio
- agentic
---
# π CodeFlow
**Paste code β read its logic as a flowchart.** A 30B coder model runs entirely on **CPU via llama.cpp** to translate source code into a clean, animated [Mermaid.js](https://mermaid.js.org/) control-flow diagram β with each node wired back to the exact lines it came from.
### π Links
[π **Live Space**][space] Β· [βΆοΈ **Demo Video**][video] Β· [π¦ **Social Post**][social] Β· [π **Field Notes (blog)**][blog] Β· [π **Agent Traces**][traces]
[space]: REPLACE_ME "Hugging Face Space"
[video]: REPLACE_ME "Demo video"
[social]: REPLACE_ME "Social post"
[blog]: REPLACE_ME "Field notes / blog post"
[traces]: https://huggingface.co/datasets/build-small-hackathon/codeflow-agent-traces "Agent traces dataset"
---
## β The Problem
Reading unfamiliar code means simulating its control flow in your head β chasing branches, loops, and early returns line by line. That's slow, error-prone, and gets worse the deeper the nesting. Existing "code β diagram" tools are usually rigid AST parsers (brittle, language-locked) or cloud LLM APIs (your code leaves the building).
**CodeFlow** turns any snippet into a scannable flowchart you can audit at a glance β generated by a real language model that runs **100% locally**, so nothing is sent to an external API.
## βοΈ How It Works
```
Paste code βββΆ Generate βββΆ POST /generate_flowchart (Gradio API)
β
number the source lines + structured system prompt
β
Qwen3-Coder-30B-A3B (llama.cpp Β· CPU)
β
β¦reasoningβ¦
graph TD β¦ nodes & edges β¦
A:1 B:2 C:3-4
β
strip reasoning Β· parse + validate the line-map Β· sanitize labels
β
{ mermaid, linemap } βββΆ append agent_traces.jsonl
β
Mermaid render + "trace-the-path" reveal + node β code linking
```
1. You paste code (or pick a pre-rendered example) into the **CodeMirror** editor and hit **Generate**.
2. The backend numbers the source lines and sends them with a strict system prompt to **Qwen3-Coder** running on **llama.cpp**.
3. The model returns hidden ``, the Mermaid `graph`, and a `` mapping every node to its source line(s).
4. The server strips the reasoning, **validates** the line-map against the source, sanitizes labels for Mermaid, and returns `{ mermaid, linemap }`.
5. The frontend renders the diagram with a **trace-the-path reveal** that flows out of a persistent Start node while the canvas scrolls along in real time.
6. **Node β code linking:** hover a node to highlight its source lines, click a node to jump-and-edit them, or move your cursor over a line to light up the matching node.
7. Every generation is captured as a structured **agent trace** (`/traces`).
## π§° Tech Stack
| Layer | What it is | Used for |
|---|---|---|
| **Model** | [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen) (Mixture-of-Experts) | Code β Mermaid + line-map generation |
| **Quantization** | [Unsloth](https://huggingface.co/unsloth) Dynamic **UD-Q3_K_XL** GGUF (~3-bit) | Shrinks the 30B model to run on CPU |
| **Inference** | [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (llama.cpp) | Local CPU inference (`n_ctx=4096`) |
| **Model fetch** | `huggingface_hub` | Downloads the GGUF on first run |
| **Server** | [Gradio](https://www.gradio.app/) `gr.Server` + FastAPI | `/generate_flowchart` API, `/` UI, `/traces` |
| **Frontend** | A single self-contained `frontend.html` (vanilla JS + CSS custom properties) | Editor, diagram, animation, theming |
| **Editor** | [CodeMirror 6](https://codemirror.net/) β **vendored** bundle (`static/cm.bundle.js`) | Syntax-highlighted code input |
| **Diagrams** | [Mermaid.js 10](https://mermaid.js.org/) β **vendored** UMD (`static/mermaid.min.js`) | Flowchart rendering |
| **Animation** | Web Animations API | Trace-the-path reveal + theme crossfade |
| **Type** | Fraunces Β· Hanken Grotesk Β· JetBrains Mono β **vendored** woff2 (`static/fonts/`) | Custom, non-default look |
| **Assets** | All JS/CSS/fonts bundled into `static/` (no CDN at runtime) | True offline operation |
| **Observability** | Hand-rolled JSONL agent traces | One trace per generation, served at `/traces` |
| **Tests** | `smoke-test.sh` (headless Chrome) | 13 build/render checks |
| **Deploy** | Hugging Face Spaces | Hosting |
## π’ Total Parameters
CodeFlow is driven by **Qwen3-Coder-30B-A3B-Instruct** β a **Mixture-of-Experts** model with:
- **β 30.5 billion total parameters**
- **β 3.3 billion active parameters per token** (128 experts, 8 activated)
It's served as an **Unsloth Dynamic ~3-bit (UD-Q3_K_XL) GGUF**, which compresses those 30B weights to a CPU-runnable footprint (~13 GB on disk) β letting a 30B-class model generate diagrams **off the grid**, with no GPU and no external API.
## π
Badges (5 / 6)
These map to the Space tags above.
| Badge | How CodeFlow earns it |
|---|---|
| π **Off the Grid** | **No external API or CDN at runtime β period.** The model runs fully locally (Qwen3-Coder GGUF on CPU via llama.cpp), and *every* frontend asset (Mermaid, CodeMirror, the Gradio client, all fonts) is vendored into `static/`. The Gradio share tunnel is off (`share=False`). The **only** network call in the whole project is the one-time model download at startup. The UI even runs fully offline from `file://`. |
| π¨ **Off-Brand** | **Zero default-Gradio look.** A bespoke single-file UI: custom "Pine & Sage" palette (one-word rust fallback), Fraunces + Hanken Grotesk type, a hand-drawn decision-node logo, restyled Mermaid nodes, and a trace-the-path reveal animation β deliberately designed *not* to look templated. |
| π **Field Notes** | See the [blog post][blog]. |
| π€ **Sharing is Caring** | Open-source under **MIT**, a public Space, plus a [social post][social] sharing the process and learnings. |
| π€ **Agentic** | Every model generation is captured as a structured agent trace (input code, the model's reasoning, output, token usage, latency), downloadable at [`/traces`][traces]. |
## π₯ Demo
[][video]
> βΆοΈ Click above, or use the [Demo Video][video] link at the top.
## π» Run It Locally
> First launch downloads the **~13 GB GGUF** from Hugging Face. CPU inference is slow (cold generations can take minutes) β the built-in **examples render instantly** because their diagrams are pre-computed.
```bash
# 1. Clone
git clone REPLACE_ME_repo_url CodeFlow
cd CodeFlow
# 2. Create a virtual env
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# 3. Install deps (uses a prebuilt CPU wheel for llama-cpp-python)
pip install -r requirements.txt
# 4. Run β opens a local Gradio URL
python app.py
```
Then open the printed URL. **Preview the UI without the model** by opening `frontend.html` directly in a browser (`file://`) β fully offline, since all assets are vendored in `static/`; the example presets render their diagrams instantly.
> **Rebuilding the vendored bundles** (optional): the CodeMirror + Gradio-client bundles in `static/` are produced by `build/build.sh` (needs Node). Mermaid and the fonts are downloaded into `static/` as well. You never need this to *run* the app β only to regenerate the bundles.
**Endpoints:** `/` (UI) Β· `/generate_flowchart` (API) Β· `/traces` (download all agent traces as JSONL).
## ποΈ Repository Structure
```
CodeFlow/
βββ app.py # Gradio + FastAPI server: loads the model and exposes
β # /generate_flowchart (API), / (UI), /static, /traces
βββ frontend.html # Self-contained UI β CodeMirror editor, Mermaid render,
β # trace-the-path animation, nodeβcode linking, theming
βββ static/ # Vendored frontend assets β NO CDN at runtime
β βββ mermaid.min.js # Mermaid (UMD, ~3.2 MB)
β βββ cm.bundle.js # CodeMirror 6 (single IIFE bundle)
β βββ gradio-client.js # @gradio/client (IIFE bundle)
β βββ fonts.css # @font-face β local woff2
β βββ fonts/ # Fraunces Β· Hanken Grotesk Β· JetBrains Mono (woff2)
βββ build/ # Reproducible bundle build (Node) β build.sh + entry files
βββ requirements.txt # Python deps (CPU llama-cpp-python wheel, gradio, hub)
βββ smoke-test.sh # Headless-Chrome smoke test (13 checks)
βββ notes-for-blog.md # Field Notes β the full build log
βββ README.md # You are here
βββ LICENSE # MIT
βββ agent_traces.jsonl # (created at runtime) one JSON line per generation
```
## β οΈ Limitations
- **CPU inference is slow.** A 30B model on CPU means cold generations can take minutes; the demo leans on pre-rendered examples for instant feedback.
- **3-bit quantization** trades some fidelity for the ability to run a 30B model at all β occasional imperfect diagrams.
- **4096-token context** β very large files won't fit; works best on functions/snippets.
- **Line-map depends on the model.** The `` is LLM-generated; the server validates and drops bad entries, so nodeβcode links can be partial on tricky code.
- **Paraphrased labels.** Nodes describe logic in plain words (no raw code), so they read cleanly but aren't verbatim.
- **Mermaid parse failures** on unusual syntax are possible (the raw output is shown so nothing is lost).
- **Ephemeral traces on Spaces.** `agent_traces.jsonl` lives on the runtime filesystem and resets on restart/rebuild β download it before then.
## π Credits
- **Model:** [Qwen3-Coder](https://huggingface.co/Qwen) (Qwen Team, Alibaba) β GGUF quant by [Unsloth](https://huggingface.co/unsloth).
- **Inference:** [llama.cpp](https://github.com/ggml-org/llama.cpp) via [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (Andrei Betlen).
- **App framework:** [Gradio](https://www.gradio.app/) (Hugging Face).
- **Diagrams:** [Mermaid.js](https://mermaid.js.org/) Β· **Editor:** [CodeMirror](https://codemirror.net/).
- **Type:** Fraunces, Hanken Grotesk, JetBrains Mono ([Google Fonts](https://fonts.google.com/), SIL OFL).
- **Built for** the Build Small Hackathon.
## π License
Released under the **MIT License** β see [`LICENSE`](LICENSE). Β© 2026 Rishi Jain.