---
title: CodeFlow
emoji: 📊
colorFrom: indigo
colorTo: blue
sdk: gradio
python_version: '3.13'
sdk_version: 6.16.0
app_file: app.py
pinned: true
license: mit
short_description: Turn code into a readable Mermaid.js flowchart 📊!
tags:
- build-small-hackathon
- backyard-ai
- llama-cpp
- field-notes
- sharing-is-caring
- off-brand
- off-the-grid
- code
- mermaid.js
- flowchart
- small-models
- seq2seq
- gradio
- agentic
---

# 📊 CodeFlow

**Paste code → read its logic as a flowchart.** A 30B coder model runs entirely on **CPU via llama.cpp** to translate source code into a clean, animated [Mermaid.js](https://mermaid.js.org/) control-flow diagram — with each node wired back to the exact lines it came from.

### 🔗 Links

[🚀 **Live Space**][space] · [▶️ **Demo Video**][video] · [🐦 **Social Post**][social] · [📓 **Field Notes (blog)**][blog] · [🔍 **Agent Traces**][traces]

<!-- ╔═══════════════════════════════════════════════════════════════╗
     ║  FILL THESE IN — replace each REPLACE_ME with your real URL.   ║
     ╚═══════════════════════════════════════════════════════════════╝ -->
[space]:  REPLACE_ME  "Hugging Face Space"
[video]:  REPLACE_ME  "Demo video"
[social]: REPLACE_ME  "Social post"
[blog]:   REPLACE_ME  "Field notes / blog post"
[traces]: https://huggingface.co/datasets/build-small-hackathon/codeflow-agent-traces  "Agent traces dataset"

---

## ❓ The Problem

Reading unfamiliar code means simulating its control flow in your head — chasing branches, loops, and early returns line by line. That's slow, error-prone, and gets worse the deeper the nesting. Existing "code → diagram" tools are usually rigid AST parsers (brittle, language-locked) or cloud LLM APIs (your code leaves the building).

**CodeFlow** turns any snippet into a scannable flowchart you can audit at a glance — generated by a real language model that runs **100% locally**, so nothing is sent to an external API.

## ⚙️ How It Works

```
 Paste code ──▶ Generate ──▶ POST /generate_flowchart        (Gradio API)
                                    │
                    number the source lines + structured system prompt
                                    │
                     Qwen3-Coder-30B-A3B   (llama.cpp · CPU)
                                    │
                 <thinking> …reasoning… </thinking>
                 graph TD … nodes & edges …
                 <linemap> A:1  B:2  C:3-4 </linemap>
                                    │
        strip reasoning · parse + validate the line-map · sanitize labels
                                    │
                  { mermaid, linemap }  ──▶  append agent_traces.jsonl
                                    │
   Mermaid render + "trace-the-path" reveal + node ↔ code linking
```

1. You paste code (or pick a pre-rendered example) into the **CodeMirror** editor and hit **Generate**.
2. The backend numbers the source lines and sends them with a strict system prompt to **Qwen3-Coder** running on **llama.cpp**.
3. The model returns hidden `<thinking>`, the Mermaid `graph`, and a `<linemap>` mapping every node to its source line(s).
4. The server strips the reasoning, **validates** the line-map against the source, sanitizes labels for Mermaid, and returns `{ mermaid, linemap }`.
5. The frontend renders the diagram with a **trace-the-path reveal** that flows out of a persistent Start node while the canvas scrolls along in real time.
6. **Node ↔ code linking:** hover a node to highlight its source lines, click a node to jump-and-edit them, or move your cursor over a line to light up the matching node.
7. Every generation is captured as a structured **agent trace** (`/traces`).

## 🧰 Tech Stack

| Layer | What it is | Used for |
|---|---|---|
| **Model** | [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen) (Mixture-of-Experts) | Code → Mermaid + line-map generation |
| **Quantization** | [Unsloth](https://huggingface.co/unsloth) Dynamic **UD-Q3_K_XL** GGUF (~3-bit) | Shrinks the 30B model to run on CPU |
| **Inference** | [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (llama.cpp) | Local CPU inference (`n_ctx=4096`) |
| **Model fetch** | `huggingface_hub` | Downloads the GGUF on first run |
| **Server** | [Gradio](https://www.gradio.app/) `gr.Server` + FastAPI | `/generate_flowchart` API, `/` UI, `/traces` |
| **Frontend** | A single self-contained `frontend.html` (vanilla JS + CSS custom properties) | Editor, diagram, animation, theming |
| **Editor** | [CodeMirror 6](https://codemirror.net/) — **vendored** bundle (`static/cm.bundle.js`) | Syntax-highlighted code input |
| **Diagrams** | [Mermaid.js 10](https://mermaid.js.org/) — **vendored** UMD (`static/mermaid.min.js`) | Flowchart rendering |
| **Animation** | Web Animations API | Trace-the-path reveal + theme crossfade |
| **Type** | Fraunces · Hanken Grotesk · JetBrains Mono — **vendored** woff2 (`static/fonts/`) | Custom, non-default look |
| **Assets** | All JS/CSS/fonts bundled into `static/` (no CDN at runtime) | True offline operation |
| **Observability** | Hand-rolled JSONL agent traces | One trace per generation, served at `/traces` |
| **Tests** | `smoke-test.sh` (headless Chrome) | 13 build/render checks |
| **Deploy** | Hugging Face Spaces | Hosting |

## 🔢 Total Parameters

CodeFlow is driven by **Qwen3-Coder-30B-A3B-Instruct** — a **Mixture-of-Experts** model with:

- **≈ 30.5 billion total parameters**
- **≈ 3.3 billion active parameters per token** (128 experts, 8 activated)

It's served as an **Unsloth Dynamic ~3-bit (UD-Q3_K_XL) GGUF**, which compresses those 30B weights to a CPU-runnable footprint (~13 GB on disk) — letting a 30B-class model generate diagrams **off the grid**, with no GPU and no external API.

## 🏅 Badges (5 / 6)

These map to the Space tags above.

| Badge | How CodeFlow earns it |
|---|---|
| 🔌 **Off the Grid** | **No external API or CDN at runtime — period.** The model runs fully locally (Qwen3-Coder GGUF on CPU via llama.cpp), and *every* frontend asset (Mermaid, CodeMirror, the Gradio client, all fonts) is vendored into `static/`. The Gradio share tunnel is off (`share=False`). The **only** network call in the whole project is the one-time model download at startup. The UI even runs fully offline from `file://`. |
| 🎨 **Off-Brand** | **Zero default-Gradio look.** A bespoke single-file UI: custom "Pine & Sage" palette (one-word rust fallback), Fraunces + Hanken Grotesk type, a hand-drawn decision-node logo, restyled Mermaid nodes, and a trace-the-path reveal animation — deliberately designed *not* to look templated. |
| 📓 **Field Notes** | See the [blog post][blog]. |
| 🤝 **Sharing is Caring** | Open-source under **MIT**, a public Space, plus a [social post][social] sharing the process and learnings. |
| 🤖 **Agentic** | Every model generation is captured as a structured agent trace (input code, the model's reasoning, output, token usage, latency), downloadable at [`/traces`][traces]. |

## 🎥 Demo

[![Watch the demo](REPLACE_ME_thumbnail.png)][video]

> ▶️ Click above, or use the [Demo Video][video] link at the top.

## 💻 Run It Locally

> First launch downloads the **~13 GB GGUF** from Hugging Face. CPU inference is slow (cold generations can take minutes) — the built-in **examples render instantly** because their diagrams are pre-computed.

```bash
# 1. Clone
git clone REPLACE_ME_repo_url CodeFlow
cd CodeFlow

# 2. Create a virtual env
python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

# 3. Install deps (uses a prebuilt CPU wheel for llama-cpp-python)
pip install -r requirements.txt

# 4. Run — opens a local Gradio URL
python app.py
```

Then open the printed URL. **Preview the UI without the model** by opening `frontend.html` directly in a browser (`file://`) — fully offline, since all assets are vendored in `static/`; the example presets render their diagrams instantly.

> **Rebuilding the vendored bundles** (optional): the CodeMirror + Gradio-client bundles in `static/` are produced by `build/build.sh` (needs Node). Mermaid and the fonts are downloaded into `static/` as well. You never need this to *run* the app — only to regenerate the bundles.

**Endpoints:** `/` (UI) · `/generate_flowchart` (API) · `/traces` (download all agent traces as JSONL).

## 🗂️ Repository Structure

```
CodeFlow/
├── app.py             # Gradio + FastAPI server: loads the model and exposes
│                      #   /generate_flowchart (API), / (UI), /static, /traces
├── frontend.html      # Self-contained UI — CodeMirror editor, Mermaid render,
│                      #   trace-the-path animation, node↔code linking, theming
├── static/            # Vendored frontend assets — NO CDN at runtime
│   ├── mermaid.min.js #   Mermaid (UMD, ~3.2 MB)
│   ├── cm.bundle.js   #   CodeMirror 6 (single IIFE bundle)
│   ├── gradio-client.js #  @gradio/client (IIFE bundle)
│   ├── fonts.css      #   @font-face → local woff2
│   └── fonts/         #   Fraunces · Hanken Grotesk · JetBrains Mono (woff2)
├── build/             # Reproducible bundle build (Node) — build.sh + entry files
├── requirements.txt   # Python deps (CPU llama-cpp-python wheel, gradio, hub)
├── smoke-test.sh      # Headless-Chrome smoke test (13 checks)
├── notes-for-blog.md  # Field Notes — the full build log
├── README.md          # You are here
├── LICENSE            # MIT
└── agent_traces.jsonl # (created at runtime) one JSON line per generation
```

## ⚠️ Limitations

- **CPU inference is slow.** A 30B model on CPU means cold generations can take minutes; the demo leans on pre-rendered examples for instant feedback.
- **3-bit quantization** trades some fidelity for the ability to run a 30B model at all — occasional imperfect diagrams.
- **4096-token context** — very large files won't fit; works best on functions/snippets.
- **Line-map depends on the model.** The `<linemap>` is LLM-generated; the server validates and drops bad entries, so node↔code links can be partial on tricky code.
- **Paraphrased labels.** Nodes describe logic in plain words (no raw code), so they read cleanly but aren't verbatim.
- **Mermaid parse failures** on unusual syntax are possible (the raw output is shown so nothing is lost).
- **Ephemeral traces on Spaces.** `agent_traces.jsonl` lives on the runtime filesystem and resets on restart/rebuild — download it before then.

## 🙏 Credits

- **Model:** [Qwen3-Coder](https://huggingface.co/Qwen) (Qwen Team, Alibaba) — GGUF quant by [Unsloth](https://huggingface.co/unsloth).
- **Inference:** [llama.cpp](https://github.com/ggml-org/llama.cpp) via [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (Andrei Betlen).
- **App framework:** [Gradio](https://www.gradio.app/) (Hugging Face).
- **Diagrams:** [Mermaid.js](https://mermaid.js.org/) · **Editor:** [CodeMirror](https://codemirror.net/).
- **Type:** Fraunces, Hanken Grotesk, JetBrains Mono ([Google Fonts](https://fonts.google.com/), SIL OFL).
- **Built for** the Build Small Hackathon.

## 📄 License

Released under the **MIT License** — see [`LICENSE`](LICENSE). © 2026 Rishi Jain.