--- title: CodeFlow emoji: πŸ“Š colorFrom: indigo colorTo: blue sdk: gradio python_version: '3.13' sdk_version: 6.16.0 app_file: app.py pinned: true license: mit short_description: Turn code into a readable Mermaid.js flowchart πŸ“Š! tags: - build-small-hackathon - backyard-ai - llama-cpp - field-notes - sharing-is-caring - off-brand - off-the-grid - code - mermaid.js - flowchart - small-models - seq2seq - gradio - agentic --- # πŸ“Š CodeFlow **Paste code β†’ read its logic as a flowchart.** A 30B coder model runs entirely on **CPU via llama.cpp** to translate source code into a clean, animated [Mermaid.js](https://mermaid.js.org/) control-flow diagram β€” with each node wired back to the exact lines it came from. ### πŸ”— Links [πŸš€ **Live Space**][space] Β· [▢️ **Demo Video**][video] Β· [🐦 **Social Post**][social] Β· [πŸ““ **Field Notes (blog)**][blog] Β· [πŸ” **Agent Traces**][traces] [space]: REPLACE_ME "Hugging Face Space" [video]: REPLACE_ME "Demo video" [social]: REPLACE_ME "Social post" [blog]: REPLACE_ME "Field notes / blog post" [traces]: https://huggingface.co/datasets/build-small-hackathon/codeflow-agent-traces "Agent traces dataset" --- ## ❓ The Problem Reading unfamiliar code means simulating its control flow in your head β€” chasing branches, loops, and early returns line by line. That's slow, error-prone, and gets worse the deeper the nesting. Existing "code β†’ diagram" tools are usually rigid AST parsers (brittle, language-locked) or cloud LLM APIs (your code leaves the building). **CodeFlow** turns any snippet into a scannable flowchart you can audit at a glance β€” generated by a real language model that runs **100% locally**, so nothing is sent to an external API. ## βš™οΈ How It Works ``` Paste code ──▢ Generate ──▢ POST /generate_flowchart (Gradio API) β”‚ number the source lines + structured system prompt β”‚ Qwen3-Coder-30B-A3B (llama.cpp Β· CPU) β”‚ …reasoning… graph TD … nodes & edges … A:1 B:2 C:3-4 β”‚ strip reasoning Β· parse + validate the line-map Β· sanitize labels β”‚ { mermaid, linemap } ──▢ append agent_traces.jsonl β”‚ Mermaid render + "trace-the-path" reveal + node ↔ code linking ``` 1. You paste code (or pick a pre-rendered example) into the **CodeMirror** editor and hit **Generate**. 2. The backend numbers the source lines and sends them with a strict system prompt to **Qwen3-Coder** running on **llama.cpp**. 3. The model returns hidden ``, the Mermaid `graph`, and a `` mapping every node to its source line(s). 4. The server strips the reasoning, **validates** the line-map against the source, sanitizes labels for Mermaid, and returns `{ mermaid, linemap }`. 5. The frontend renders the diagram with a **trace-the-path reveal** that flows out of a persistent Start node while the canvas scrolls along in real time. 6. **Node ↔ code linking:** hover a node to highlight its source lines, click a node to jump-and-edit them, or move your cursor over a line to light up the matching node. 7. Every generation is captured as a structured **agent trace** (`/traces`). ## 🧰 Tech Stack | Layer | What it is | Used for | |---|---|---| | **Model** | [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen) (Mixture-of-Experts) | Code β†’ Mermaid + line-map generation | | **Quantization** | [Unsloth](https://huggingface.co/unsloth) Dynamic **UD-Q3_K_XL** GGUF (~3-bit) | Shrinks the 30B model to run on CPU | | **Inference** | [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (llama.cpp) | Local CPU inference (`n_ctx=4096`) | | **Model fetch** | `huggingface_hub` | Downloads the GGUF on first run | | **Server** | [Gradio](https://www.gradio.app/) `gr.Server` + FastAPI | `/generate_flowchart` API, `/` UI, `/traces` | | **Frontend** | A single self-contained `frontend.html` (vanilla JS + CSS custom properties) | Editor, diagram, animation, theming | | **Editor** | [CodeMirror 6](https://codemirror.net/) β€” **vendored** bundle (`static/cm.bundle.js`) | Syntax-highlighted code input | | **Diagrams** | [Mermaid.js 10](https://mermaid.js.org/) β€” **vendored** UMD (`static/mermaid.min.js`) | Flowchart rendering | | **Animation** | Web Animations API | Trace-the-path reveal + theme crossfade | | **Type** | Fraunces Β· Hanken Grotesk Β· JetBrains Mono β€” **vendored** woff2 (`static/fonts/`) | Custom, non-default look | | **Assets** | All JS/CSS/fonts bundled into `static/` (no CDN at runtime) | True offline operation | | **Observability** | Hand-rolled JSONL agent traces | One trace per generation, served at `/traces` | | **Tests** | `smoke-test.sh` (headless Chrome) | 13 build/render checks | | **Deploy** | Hugging Face Spaces | Hosting | ## πŸ”’ Total Parameters CodeFlow is driven by **Qwen3-Coder-30B-A3B-Instruct** β€” a **Mixture-of-Experts** model with: - **β‰ˆ 30.5 billion total parameters** - **β‰ˆ 3.3 billion active parameters per token** (128 experts, 8 activated) It's served as an **Unsloth Dynamic ~3-bit (UD-Q3_K_XL) GGUF**, which compresses those 30B weights to a CPU-runnable footprint (~13 GB on disk) β€” letting a 30B-class model generate diagrams **off the grid**, with no GPU and no external API. ## πŸ… Badges (5 / 6) These map to the Space tags above. | Badge | How CodeFlow earns it | |---|---| | πŸ”Œ **Off the Grid** | **No external API or CDN at runtime β€” period.** The model runs fully locally (Qwen3-Coder GGUF on CPU via llama.cpp), and *every* frontend asset (Mermaid, CodeMirror, the Gradio client, all fonts) is vendored into `static/`. The Gradio share tunnel is off (`share=False`). The **only** network call in the whole project is the one-time model download at startup. The UI even runs fully offline from `file://`. | | 🎨 **Off-Brand** | **Zero default-Gradio look.** A bespoke single-file UI: custom "Pine & Sage" palette (one-word rust fallback), Fraunces + Hanken Grotesk type, a hand-drawn decision-node logo, restyled Mermaid nodes, and a trace-the-path reveal animation β€” deliberately designed *not* to look templated. | | πŸ““ **Field Notes** | See the [blog post][blog]. | | 🀝 **Sharing is Caring** | Open-source under **MIT**, a public Space, plus a [social post][social] sharing the process and learnings. | | πŸ€– **Agentic** | Every model generation is captured as a structured agent trace (input code, the model's reasoning, output, token usage, latency), downloadable at [`/traces`][traces]. | ## πŸŽ₯ Demo [![Watch the demo](REPLACE_ME_thumbnail.png)][video] > ▢️ Click above, or use the [Demo Video][video] link at the top. ## πŸ’» Run It Locally > First launch downloads the **~13 GB GGUF** from Hugging Face. CPU inference is slow (cold generations can take minutes) β€” the built-in **examples render instantly** because their diagrams are pre-computed. ```bash # 1. Clone git clone REPLACE_ME_repo_url CodeFlow cd CodeFlow # 2. Create a virtual env python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate # 3. Install deps (uses a prebuilt CPU wheel for llama-cpp-python) pip install -r requirements.txt # 4. Run β€” opens a local Gradio URL python app.py ``` Then open the printed URL. **Preview the UI without the model** by opening `frontend.html` directly in a browser (`file://`) β€” fully offline, since all assets are vendored in `static/`; the example presets render their diagrams instantly. > **Rebuilding the vendored bundles** (optional): the CodeMirror + Gradio-client bundles in `static/` are produced by `build/build.sh` (needs Node). Mermaid and the fonts are downloaded into `static/` as well. You never need this to *run* the app β€” only to regenerate the bundles. **Endpoints:** `/` (UI) Β· `/generate_flowchart` (API) Β· `/traces` (download all agent traces as JSONL). ## πŸ—‚οΈ Repository Structure ``` CodeFlow/ β”œβ”€β”€ app.py # Gradio + FastAPI server: loads the model and exposes β”‚ # /generate_flowchart (API), / (UI), /static, /traces β”œβ”€β”€ frontend.html # Self-contained UI β€” CodeMirror editor, Mermaid render, β”‚ # trace-the-path animation, node↔code linking, theming β”œβ”€β”€ static/ # Vendored frontend assets β€” NO CDN at runtime β”‚ β”œβ”€β”€ mermaid.min.js # Mermaid (UMD, ~3.2 MB) β”‚ β”œβ”€β”€ cm.bundle.js # CodeMirror 6 (single IIFE bundle) β”‚ β”œβ”€β”€ gradio-client.js # @gradio/client (IIFE bundle) β”‚ β”œβ”€β”€ fonts.css # @font-face β†’ local woff2 β”‚ └── fonts/ # Fraunces Β· Hanken Grotesk Β· JetBrains Mono (woff2) β”œβ”€β”€ build/ # Reproducible bundle build (Node) β€” build.sh + entry files β”œβ”€β”€ requirements.txt # Python deps (CPU llama-cpp-python wheel, gradio, hub) β”œβ”€β”€ smoke-test.sh # Headless-Chrome smoke test (13 checks) β”œβ”€β”€ notes-for-blog.md # Field Notes β€” the full build log β”œβ”€β”€ README.md # You are here β”œβ”€β”€ LICENSE # MIT └── agent_traces.jsonl # (created at runtime) one JSON line per generation ``` ## ⚠️ Limitations - **CPU inference is slow.** A 30B model on CPU means cold generations can take minutes; the demo leans on pre-rendered examples for instant feedback. - **3-bit quantization** trades some fidelity for the ability to run a 30B model at all β€” occasional imperfect diagrams. - **4096-token context** β€” very large files won't fit; works best on functions/snippets. - **Line-map depends on the model.** The `` is LLM-generated; the server validates and drops bad entries, so node↔code links can be partial on tricky code. - **Paraphrased labels.** Nodes describe logic in plain words (no raw code), so they read cleanly but aren't verbatim. - **Mermaid parse failures** on unusual syntax are possible (the raw output is shown so nothing is lost). - **Ephemeral traces on Spaces.** `agent_traces.jsonl` lives on the runtime filesystem and resets on restart/rebuild β€” download it before then. ## πŸ™ Credits - **Model:** [Qwen3-Coder](https://huggingface.co/Qwen) (Qwen Team, Alibaba) β€” GGUF quant by [Unsloth](https://huggingface.co/unsloth). - **Inference:** [llama.cpp](https://github.com/ggml-org/llama.cpp) via [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (Andrei Betlen). - **App framework:** [Gradio](https://www.gradio.app/) (Hugging Face). - **Diagrams:** [Mermaid.js](https://mermaid.js.org/) Β· **Editor:** [CodeMirror](https://codemirror.net/). - **Type:** Fraunces, Hanken Grotesk, JetBrains Mono ([Google Fonts](https://fonts.google.com/), SIL OFL). - **Built for** the Build Small Hackathon. ## πŸ“„ License Released under the **MIT License** β€” see [`LICENSE`](LICENSE). Β© 2026 Rishi Jain.