CodeFlow / README.md
Rishi-Jain-27's picture
Removed get traces temp and updated readme
3dffa87
---
title: CodeFlow
emoji: πŸ“Š
colorFrom: indigo
colorTo: blue
sdk: gradio
python_version: '3.13'
sdk_version: 6.16.0
app_file: app.py
pinned: true
license: mit
short_description: Turn code into a readable Mermaid.js flowchart πŸ“Š!
tags:
- build-small-hackathon
- backyard-ai
- llama-cpp
- field-notes
- sharing-is-caring
- off-brand
- off-the-grid
- code
- mermaid.js
- flowchart
- small-models
- seq2seq
- gradio
- agentic
---
# πŸ“Š CodeFlow
**Paste code β†’ read its logic as a flowchart.** A 30B coder model runs entirely on **CPU via llama.cpp** to translate source code into a clean, animated [Mermaid.js](https://mermaid.js.org/) control-flow diagram β€” with each node wired back to the exact lines it came from.
### πŸ”— Links
[πŸš€ **Live Space**][space] Β· [▢️ **Demo Video**][video] Β· [🐦 **Social Post**][social] Β· [πŸ““ **Field Notes (blog)**][blog] Β· [πŸ” **Agent Traces**][traces]
<!-- ╔═══════════════════════════════════════════════════════════════╗
β•‘ FILL THESE IN β€” replace each REPLACE_ME with your real URL. β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• -->
[space]: REPLACE_ME "Hugging Face Space"
[video]: REPLACE_ME "Demo video"
[social]: REPLACE_ME "Social post"
[blog]: REPLACE_ME "Field notes / blog post"
[traces]: https://huggingface.co/datasets/build-small-hackathon/codeflow-agent-traces "Agent traces dataset"
---
## ❓ The Problem
Reading unfamiliar code means simulating its control flow in your head β€” chasing branches, loops, and early returns line by line. That's slow, error-prone, and gets worse the deeper the nesting. Existing "code β†’ diagram" tools are usually rigid AST parsers (brittle, language-locked) or cloud LLM APIs (your code leaves the building).
**CodeFlow** turns any snippet into a scannable flowchart you can audit at a glance β€” generated by a real language model that runs **100% locally**, so nothing is sent to an external API.
## βš™οΈ How It Works
```
Paste code ──▢ Generate ──▢ POST /generate_flowchart (Gradio API)
β”‚
number the source lines + structured system prompt
β”‚
Qwen3-Coder-30B-A3B (llama.cpp Β· CPU)
β”‚
<thinking> …reasoning… </thinking>
graph TD … nodes & edges …
<linemap> A:1 B:2 C:3-4 </linemap>
β”‚
strip reasoning Β· parse + validate the line-map Β· sanitize labels
β”‚
{ mermaid, linemap } ──▢ append agent_traces.jsonl
β”‚
Mermaid render + "trace-the-path" reveal + node ↔ code linking
```
1. You paste code (or pick a pre-rendered example) into the **CodeMirror** editor and hit **Generate**.
2. The backend numbers the source lines and sends them with a strict system prompt to **Qwen3-Coder** running on **llama.cpp**.
3. The model returns hidden `<thinking>`, the Mermaid `graph`, and a `<linemap>` mapping every node to its source line(s).
4. The server strips the reasoning, **validates** the line-map against the source, sanitizes labels for Mermaid, and returns `{ mermaid, linemap }`.
5. The frontend renders the diagram with a **trace-the-path reveal** that flows out of a persistent Start node while the canvas scrolls along in real time.
6. **Node ↔ code linking:** hover a node to highlight its source lines, click a node to jump-and-edit them, or move your cursor over a line to light up the matching node.
7. Every generation is captured as a structured **agent trace** (`/traces`).
## 🧰 Tech Stack
| Layer | What it is | Used for |
|---|---|---|
| **Model** | [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen) (Mixture-of-Experts) | Code β†’ Mermaid + line-map generation |
| **Quantization** | [Unsloth](https://huggingface.co/unsloth) Dynamic **UD-Q3_K_XL** GGUF (~3-bit) | Shrinks the 30B model to run on CPU |
| **Inference** | [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (llama.cpp) | Local CPU inference (`n_ctx=4096`) |
| **Model fetch** | `huggingface_hub` | Downloads the GGUF on first run |
| **Server** | [Gradio](https://www.gradio.app/) `gr.Server` + FastAPI | `/generate_flowchart` API, `/` UI, `/traces` |
| **Frontend** | A single self-contained `frontend.html` (vanilla JS + CSS custom properties) | Editor, diagram, animation, theming |
| **Editor** | [CodeMirror 6](https://codemirror.net/) β€” **vendored** bundle (`static/cm.bundle.js`) | Syntax-highlighted code input |
| **Diagrams** | [Mermaid.js 10](https://mermaid.js.org/) β€” **vendored** UMD (`static/mermaid.min.js`) | Flowchart rendering |
| **Animation** | Web Animations API | Trace-the-path reveal + theme crossfade |
| **Type** | Fraunces Β· Hanken Grotesk Β· JetBrains Mono β€” **vendored** woff2 (`static/fonts/`) | Custom, non-default look |
| **Assets** | All JS/CSS/fonts bundled into `static/` (no CDN at runtime) | True offline operation |
| **Observability** | Hand-rolled JSONL agent traces | One trace per generation, served at `/traces` |
| **Tests** | `smoke-test.sh` (headless Chrome) | 13 build/render checks |
| **Deploy** | Hugging Face Spaces | Hosting |
## πŸ”’ Total Parameters
CodeFlow is driven by **Qwen3-Coder-30B-A3B-Instruct** β€” a **Mixture-of-Experts** model with:
- **β‰ˆ 30.5 billion total parameters**
- **β‰ˆ 3.3 billion active parameters per token** (128 experts, 8 activated)
It's served as an **Unsloth Dynamic ~3-bit (UD-Q3_K_XL) GGUF**, which compresses those 30B weights to a CPU-runnable footprint (~13 GB on disk) β€” letting a 30B-class model generate diagrams **off the grid**, with no GPU and no external API.
## πŸ… Badges (5 / 6)
These map to the Space tags above.
| Badge | How CodeFlow earns it |
|---|---|
| πŸ”Œ **Off the Grid** | **No external API or CDN at runtime β€” period.** The model runs fully locally (Qwen3-Coder GGUF on CPU via llama.cpp), and *every* frontend asset (Mermaid, CodeMirror, the Gradio client, all fonts) is vendored into `static/`. The Gradio share tunnel is off (`share=False`). The **only** network call in the whole project is the one-time model download at startup. The UI even runs fully offline from `file://`. |
| 🎨 **Off-Brand** | **Zero default-Gradio look.** A bespoke single-file UI: custom "Pine & Sage" palette (one-word rust fallback), Fraunces + Hanken Grotesk type, a hand-drawn decision-node logo, restyled Mermaid nodes, and a trace-the-path reveal animation β€” deliberately designed *not* to look templated. |
| πŸ““ **Field Notes** | See the [blog post][blog]. |
| 🀝 **Sharing is Caring** | Open-source under **MIT**, a public Space, plus a [social post][social] sharing the process and learnings. |
| πŸ€– **Agentic** | Every model generation is captured as a structured agent trace (input code, the model's reasoning, output, token usage, latency), downloadable at [`/traces`][traces]. |
## πŸŽ₯ Demo
[![Watch the demo](REPLACE_ME_thumbnail.png)][video]
> ▢️ Click above, or use the [Demo Video][video] link at the top.
## πŸ’» Run It Locally
> First launch downloads the **~13 GB GGUF** from Hugging Face. CPU inference is slow (cold generations can take minutes) β€” the built-in **examples render instantly** because their diagrams are pre-computed.
```bash
# 1. Clone
git clone REPLACE_ME_repo_url CodeFlow
cd CodeFlow
# 2. Create a virtual env
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# 3. Install deps (uses a prebuilt CPU wheel for llama-cpp-python)
pip install -r requirements.txt
# 4. Run β€” opens a local Gradio URL
python app.py
```
Then open the printed URL. **Preview the UI without the model** by opening `frontend.html` directly in a browser (`file://`) β€” fully offline, since all assets are vendored in `static/`; the example presets render their diagrams instantly.
> **Rebuilding the vendored bundles** (optional): the CodeMirror + Gradio-client bundles in `static/` are produced by `build/build.sh` (needs Node). Mermaid and the fonts are downloaded into `static/` as well. You never need this to *run* the app β€” only to regenerate the bundles.
**Endpoints:** `/` (UI) Β· `/generate_flowchart` (API) Β· `/traces` (download all agent traces as JSONL).
## πŸ—‚οΈ Repository Structure
```
CodeFlow/
β”œβ”€β”€ app.py # Gradio + FastAPI server: loads the model and exposes
β”‚ # /generate_flowchart (API), / (UI), /static, /traces
β”œβ”€β”€ frontend.html # Self-contained UI β€” CodeMirror editor, Mermaid render,
β”‚ # trace-the-path animation, node↔code linking, theming
β”œβ”€β”€ static/ # Vendored frontend assets β€” NO CDN at runtime
β”‚ β”œβ”€β”€ mermaid.min.js # Mermaid (UMD, ~3.2 MB)
β”‚ β”œβ”€β”€ cm.bundle.js # CodeMirror 6 (single IIFE bundle)
β”‚ β”œβ”€β”€ gradio-client.js # @gradio/client (IIFE bundle)
β”‚ β”œβ”€β”€ fonts.css # @font-face β†’ local woff2
β”‚ └── fonts/ # Fraunces Β· Hanken Grotesk Β· JetBrains Mono (woff2)
β”œβ”€β”€ build/ # Reproducible bundle build (Node) β€” build.sh + entry files
β”œβ”€β”€ requirements.txt # Python deps (CPU llama-cpp-python wheel, gradio, hub)
β”œβ”€β”€ smoke-test.sh # Headless-Chrome smoke test (13 checks)
β”œβ”€β”€ notes-for-blog.md # Field Notes β€” the full build log
β”œβ”€β”€ README.md # You are here
β”œβ”€β”€ LICENSE # MIT
└── agent_traces.jsonl # (created at runtime) one JSON line per generation
```
## ⚠️ Limitations
- **CPU inference is slow.** A 30B model on CPU means cold generations can take minutes; the demo leans on pre-rendered examples for instant feedback.
- **3-bit quantization** trades some fidelity for the ability to run a 30B model at all β€” occasional imperfect diagrams.
- **4096-token context** β€” very large files won't fit; works best on functions/snippets.
- **Line-map depends on the model.** The `<linemap>` is LLM-generated; the server validates and drops bad entries, so node↔code links can be partial on tricky code.
- **Paraphrased labels.** Nodes describe logic in plain words (no raw code), so they read cleanly but aren't verbatim.
- **Mermaid parse failures** on unusual syntax are possible (the raw output is shown so nothing is lost).
- **Ephemeral traces on Spaces.** `agent_traces.jsonl` lives on the runtime filesystem and resets on restart/rebuild β€” download it before then.
## πŸ™ Credits
- **Model:** [Qwen3-Coder](https://huggingface.co/Qwen) (Qwen Team, Alibaba) β€” GGUF quant by [Unsloth](https://huggingface.co/unsloth).
- **Inference:** [llama.cpp](https://github.com/ggml-org/llama.cpp) via [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (Andrei Betlen).
- **App framework:** [Gradio](https://www.gradio.app/) (Hugging Face).
- **Diagrams:** [Mermaid.js](https://mermaid.js.org/) Β· **Editor:** [CodeMirror](https://codemirror.net/).
- **Type:** Fraunces, Hanken Grotesk, JetBrains Mono ([Google Fonts](https://fonts.google.com/), SIL OFL).
- **Built for** the Build Small Hackathon.
## πŸ“„ License
Released under the **MIT License** β€” see [`LICENSE`](LICENSE). Β© 2026 Rishi Jain.