Spaces:
Running
Running
| title: CodeFlow | |
| emoji: π | |
| colorFrom: indigo | |
| colorTo: blue | |
| sdk: gradio | |
| python_version: '3.13' | |
| sdk_version: 6.16.0 | |
| app_file: app.py | |
| pinned: true | |
| license: mit | |
| short_description: Turn code into a readable Mermaid.js flowchart π! | |
| tags: | |
| - build-small-hackathon | |
| - backyard-ai | |
| - llama-cpp | |
| - field-notes | |
| - sharing-is-caring | |
| - off-brand | |
| - off-the-grid | |
| - code | |
| - mermaid.js | |
| - flowchart | |
| - small-models | |
| - seq2seq | |
| - gradio | |
| - agentic | |
| # π CodeFlow | |
| **Paste code β read its logic as a flowchart.** A 30B coder model runs entirely on **CPU via llama.cpp** to translate source code into a clean, animated [Mermaid.js](https://mermaid.js.org/) control-flow diagram β with each node wired back to the exact lines it came from. | |
| ### π Links | |
| [π **Live Space**][space] Β· [βΆοΈ **Demo Video**][video] Β· [π¦ **Social Post**][social] Β· [π **Field Notes (blog)**][blog] Β· [π **Agent Traces**][traces] | |
| <!-- βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β FILL THESE IN β replace each REPLACE_ME with your real URL. β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ --> | |
| [space]: REPLACE_ME "Hugging Face Space" | |
| [video]: REPLACE_ME "Demo video" | |
| [social]: REPLACE_ME "Social post" | |
| [blog]: REPLACE_ME "Field notes / blog post" | |
| [traces]: https://huggingface.co/datasets/build-small-hackathon/codeflow-agent-traces "Agent traces dataset" | |
| --- | |
| ## β The Problem | |
| Reading unfamiliar code means simulating its control flow in your head β chasing branches, loops, and early returns line by line. That's slow, error-prone, and gets worse the deeper the nesting. Existing "code β diagram" tools are usually rigid AST parsers (brittle, language-locked) or cloud LLM APIs (your code leaves the building). | |
| **CodeFlow** turns any snippet into a scannable flowchart you can audit at a glance β generated by a real language model that runs **100% locally**, so nothing is sent to an external API. | |
| ## βοΈ How It Works | |
| ``` | |
| Paste code βββΆ Generate βββΆ POST /generate_flowchart (Gradio API) | |
| β | |
| number the source lines + structured system prompt | |
| β | |
| Qwen3-Coder-30B-A3B (llama.cpp Β· CPU) | |
| β | |
| <thinking> β¦reasoningβ¦ </thinking> | |
| graph TD β¦ nodes & edges β¦ | |
| <linemap> A:1 B:2 C:3-4 </linemap> | |
| β | |
| strip reasoning Β· parse + validate the line-map Β· sanitize labels | |
| β | |
| { mermaid, linemap } βββΆ append agent_traces.jsonl | |
| β | |
| Mermaid render + "trace-the-path" reveal + node β code linking | |
| ``` | |
| 1. You paste code (or pick a pre-rendered example) into the **CodeMirror** editor and hit **Generate**. | |
| 2. The backend numbers the source lines and sends them with a strict system prompt to **Qwen3-Coder** running on **llama.cpp**. | |
| 3. The model returns hidden `<thinking>`, the Mermaid `graph`, and a `<linemap>` mapping every node to its source line(s). | |
| 4. The server strips the reasoning, **validates** the line-map against the source, sanitizes labels for Mermaid, and returns `{ mermaid, linemap }`. | |
| 5. The frontend renders the diagram with a **trace-the-path reveal** that flows out of a persistent Start node while the canvas scrolls along in real time. | |
| 6. **Node β code linking:** hover a node to highlight its source lines, click a node to jump-and-edit them, or move your cursor over a line to light up the matching node. | |
| 7. Every generation is captured as a structured **agent trace** (`/traces`). | |
| ## π§° Tech Stack | |
| | Layer | What it is | Used for | | |
| |---|---|---| | |
| | **Model** | [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen) (Mixture-of-Experts) | Code β Mermaid + line-map generation | | |
| | **Quantization** | [Unsloth](https://huggingface.co/unsloth) Dynamic **UD-Q3_K_XL** GGUF (~3-bit) | Shrinks the 30B model to run on CPU | | |
| | **Inference** | [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (llama.cpp) | Local CPU inference (`n_ctx=4096`) | | |
| | **Model fetch** | `huggingface_hub` | Downloads the GGUF on first run | | |
| | **Server** | [Gradio](https://www.gradio.app/) `gr.Server` + FastAPI | `/generate_flowchart` API, `/` UI, `/traces` | | |
| | **Frontend** | A single self-contained `frontend.html` (vanilla JS + CSS custom properties) | Editor, diagram, animation, theming | | |
| | **Editor** | [CodeMirror 6](https://codemirror.net/) β **vendored** bundle (`static/cm.bundle.js`) | Syntax-highlighted code input | | |
| | **Diagrams** | [Mermaid.js 10](https://mermaid.js.org/) β **vendored** UMD (`static/mermaid.min.js`) | Flowchart rendering | | |
| | **Animation** | Web Animations API | Trace-the-path reveal + theme crossfade | | |
| | **Type** | Fraunces Β· Hanken Grotesk Β· JetBrains Mono β **vendored** woff2 (`static/fonts/`) | Custom, non-default look | | |
| | **Assets** | All JS/CSS/fonts bundled into `static/` (no CDN at runtime) | True offline operation | | |
| | **Observability** | Hand-rolled JSONL agent traces | One trace per generation, served at `/traces` | | |
| | **Tests** | `smoke-test.sh` (headless Chrome) | 13 build/render checks | | |
| | **Deploy** | Hugging Face Spaces | Hosting | | |
| ## π’ Total Parameters | |
| CodeFlow is driven by **Qwen3-Coder-30B-A3B-Instruct** β a **Mixture-of-Experts** model with: | |
| - **β 30.5 billion total parameters** | |
| - **β 3.3 billion active parameters per token** (128 experts, 8 activated) | |
| It's served as an **Unsloth Dynamic ~3-bit (UD-Q3_K_XL) GGUF**, which compresses those 30B weights to a CPU-runnable footprint (~13 GB on disk) β letting a 30B-class model generate diagrams **off the grid**, with no GPU and no external API. | |
| ## π Badges (5 / 6) | |
| These map to the Space tags above. | |
| | Badge | How CodeFlow earns it | | |
| |---|---| | |
| | π **Off the Grid** | **No external API or CDN at runtime β period.** The model runs fully locally (Qwen3-Coder GGUF on CPU via llama.cpp), and *every* frontend asset (Mermaid, CodeMirror, the Gradio client, all fonts) is vendored into `static/`. The Gradio share tunnel is off (`share=False`). The **only** network call in the whole project is the one-time model download at startup. The UI even runs fully offline from `file://`. | | |
| | π¨ **Off-Brand** | **Zero default-Gradio look.** A bespoke single-file UI: custom "Pine & Sage" palette (one-word rust fallback), Fraunces + Hanken Grotesk type, a hand-drawn decision-node logo, restyled Mermaid nodes, and a trace-the-path reveal animation β deliberately designed *not* to look templated. | | |
| | π **Field Notes** | See the [blog post][blog]. | | |
| | π€ **Sharing is Caring** | Open-source under **MIT**, a public Space, plus a [social post][social] sharing the process and learnings. | | |
| | π€ **Agentic** | Every model generation is captured as a structured agent trace (input code, the model's reasoning, output, token usage, latency), downloadable at [`/traces`][traces]. | | |
| ## π₯ Demo | |
| [][video] | |
| > βΆοΈ Click above, or use the [Demo Video][video] link at the top. | |
| ## π» Run It Locally | |
| > First launch downloads the **~13 GB GGUF** from Hugging Face. CPU inference is slow (cold generations can take minutes) β the built-in **examples render instantly** because their diagrams are pre-computed. | |
| ```bash | |
| # 1. Clone | |
| git clone REPLACE_ME_repo_url CodeFlow | |
| cd CodeFlow | |
| # 2. Create a virtual env | |
| python -m venv .venv | |
| source .venv/bin/activate # Windows: .venv\Scripts\activate | |
| # 3. Install deps (uses a prebuilt CPU wheel for llama-cpp-python) | |
| pip install -r requirements.txt | |
| # 4. Run β opens a local Gradio URL | |
| python app.py | |
| ``` | |
| Then open the printed URL. **Preview the UI without the model** by opening `frontend.html` directly in a browser (`file://`) β fully offline, since all assets are vendored in `static/`; the example presets render their diagrams instantly. | |
| > **Rebuilding the vendored bundles** (optional): the CodeMirror + Gradio-client bundles in `static/` are produced by `build/build.sh` (needs Node). Mermaid and the fonts are downloaded into `static/` as well. You never need this to *run* the app β only to regenerate the bundles. | |
| **Endpoints:** `/` (UI) Β· `/generate_flowchart` (API) Β· `/traces` (download all agent traces as JSONL). | |
| ## ποΈ Repository Structure | |
| ``` | |
| CodeFlow/ | |
| βββ app.py # Gradio + FastAPI server: loads the model and exposes | |
| β # /generate_flowchart (API), / (UI), /static, /traces | |
| βββ frontend.html # Self-contained UI β CodeMirror editor, Mermaid render, | |
| β # trace-the-path animation, nodeβcode linking, theming | |
| βββ static/ # Vendored frontend assets β NO CDN at runtime | |
| β βββ mermaid.min.js # Mermaid (UMD, ~3.2 MB) | |
| β βββ cm.bundle.js # CodeMirror 6 (single IIFE bundle) | |
| β βββ gradio-client.js # @gradio/client (IIFE bundle) | |
| β βββ fonts.css # @font-face β local woff2 | |
| β βββ fonts/ # Fraunces Β· Hanken Grotesk Β· JetBrains Mono (woff2) | |
| βββ build/ # Reproducible bundle build (Node) β build.sh + entry files | |
| βββ requirements.txt # Python deps (CPU llama-cpp-python wheel, gradio, hub) | |
| βββ smoke-test.sh # Headless-Chrome smoke test (13 checks) | |
| βββ notes-for-blog.md # Field Notes β the full build log | |
| βββ README.md # You are here | |
| βββ LICENSE # MIT | |
| βββ agent_traces.jsonl # (created at runtime) one JSON line per generation | |
| ``` | |
| ## β οΈ Limitations | |
| - **CPU inference is slow.** A 30B model on CPU means cold generations can take minutes; the demo leans on pre-rendered examples for instant feedback. | |
| - **3-bit quantization** trades some fidelity for the ability to run a 30B model at all β occasional imperfect diagrams. | |
| - **4096-token context** β very large files won't fit; works best on functions/snippets. | |
| - **Line-map depends on the model.** The `<linemap>` is LLM-generated; the server validates and drops bad entries, so nodeβcode links can be partial on tricky code. | |
| - **Paraphrased labels.** Nodes describe logic in plain words (no raw code), so they read cleanly but aren't verbatim. | |
| - **Mermaid parse failures** on unusual syntax are possible (the raw output is shown so nothing is lost). | |
| - **Ephemeral traces on Spaces.** `agent_traces.jsonl` lives on the runtime filesystem and resets on restart/rebuild β download it before then. | |
| ## π Credits | |
| - **Model:** [Qwen3-Coder](https://huggingface.co/Qwen) (Qwen Team, Alibaba) β GGUF quant by [Unsloth](https://huggingface.co/unsloth). | |
| - **Inference:** [llama.cpp](https://github.com/ggml-org/llama.cpp) via [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (Andrei Betlen). | |
| - **App framework:** [Gradio](https://www.gradio.app/) (Hugging Face). | |
| - **Diagrams:** [Mermaid.js](https://mermaid.js.org/) Β· **Editor:** [CodeMirror](https://codemirror.net/). | |
| - **Type:** Fraunces, Hanken Grotesk, JetBrains Mono ([Google Fonts](https://fonts.google.com/), SIL OFL). | |
| - **Built for** the Build Small Hackathon. | |
| ## π License | |
| Released under the **MIT License** β see [`LICENSE`](LICENSE). Β© 2026 Rishi Jain. | |