Spaces:

build-small-hackathon
/

CodeFlow

Running

App Files Files Community

CodeFlow / README.md

Rishi-Jain-27

Removed get traces temp and updated readme

3dffa87 about 7 hours ago

preview code

raw

history blame contribute delete

11.7 kB

	---
	title: CodeFlow
	emoji: 📊
	colorFrom: indigo
	colorTo: blue
	sdk: gradio
	python_version: '3.13'
	sdk_version: 6.16.0
	app_file: app.py
	pinned: true
	license: mit
	short_description: Turn code into a readable Mermaid.js flowchart 📊!
	tags:
	- build-small-hackathon
	- backyard-ai
	- llama-cpp
	- field-notes
	- sharing-is-caring
	- off-brand
	- off-the-grid
	- code
	- mermaid.js
	- flowchart
	- small-models
	- seq2seq
	- gradio
	- agentic
	---

	# 📊 CodeFlow

	Paste code → read its logic as a flowchart. A 30B coder model runs entirely on CPU via llama.cpp to translate source code into a clean, animated [Mermaid.js](https://mermaid.js.org/) control-flow diagram — with each node wired back to the exact lines it came from.

	### 🔗 Links

	[🚀 Live Space][space] · [▶️ Demo Video][video] · [🐦 Social Post][social] · [📓 Field Notes (blog)][blog] · [🔍 Agent Traces][traces]

	<!-- ╔═══════════════════════════════════════════════════════════════╗
	║ FILL THESE IN — replace each REPLACE_ME with your real URL. ║
	╚═══════════════════════════════════════════════════════════════╝ -->
	[space]: REPLACE_ME "Hugging Face Space"
	[video]: REPLACE_ME "Demo video"
	[social]: REPLACE_ME "Social post"
	[blog]: REPLACE_ME "Field notes / blog post"
	[traces]: https://huggingface.co/datasets/build-small-hackathon/codeflow-agent-traces "Agent traces dataset"

	---

	## ❓ The Problem

	Reading unfamiliar code means simulating its control flow in your head — chasing branches, loops, and early returns line by line. That's slow, error-prone, and gets worse the deeper the nesting. Existing "code → diagram" tools are usually rigid AST parsers (brittle, language-locked) or cloud LLM APIs (your code leaves the building).

	CodeFlow turns any snippet into a scannable flowchart you can audit at a glance — generated by a real language model that runs 100% locally, so nothing is sent to an external API.

	## ⚙️ How It Works

	```
	Paste code ──▶ Generate ──▶ POST /generate_flowchart (Gradio API)
	│
	number the source lines + structured system prompt
	│
	Qwen3-Coder-30B-A3B (llama.cpp · CPU)
	│
	<thinking> …reasoning… </thinking>
	graph TD … nodes & edges …
	<linemap> A:1 B:2 C:3-4 </linemap>
	│
	strip reasoning · parse + validate the line-map · sanitize labels
	│
	{ mermaid, linemap } ──▶ append agent_traces.jsonl
	│
	Mermaid render + "trace-the-path" reveal + node ↔ code linking
	```

	1. You paste code (or pick a pre-rendered example) into the CodeMirror editor and hit Generate.
	2. The backend numbers the source lines and sends them with a strict system prompt to Qwen3-Coder running on llama.cpp.
	3. The model returns hidden `<thinking>`, the Mermaid `graph`, and a `<linemap>` mapping every node to its source line(s).
	4. The server strips the reasoning, validates the line-map against the source, sanitizes labels for Mermaid, and returns `{ mermaid, linemap }`.
	5. The frontend renders the diagram with a trace-the-path reveal that flows out of a persistent Start node while the canvas scrolls along in real time.
	6. Node ↔ code linking: hover a node to highlight its source lines, click a node to jump-and-edit them, or move your cursor over a line to light up the matching node.
	7. Every generation is captured as a structured agent trace (`/traces`).

	## 🧰 Tech Stack

	\| Layer \| What it is \| Used for \|
	\|---\|---\|---\|
	\| Model \| [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen) (Mixture-of-Experts) \| Code → Mermaid + line-map generation \|
	\| Quantization \| [Unsloth](https://huggingface.co/unsloth) Dynamic UD-Q3_K_XL GGUF (~3-bit) \| Shrinks the 30B model to run on CPU \|
	\| Inference \| [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (llama.cpp) \| Local CPU inference (`n_ctx=4096`) \|
	\| Model fetch \| `huggingface_hub` \| Downloads the GGUF on first run \|
	\| Server \| [Gradio](https://www.gradio.app/) `gr.Server` + FastAPI \| `/generate_flowchart` API, `/` UI, `/traces` \|
	\| Frontend \| A single self-contained `frontend.html` (vanilla JS + CSS custom properties) \| Editor, diagram, animation, theming \|
	\| Editor \| [CodeMirror 6](https://codemirror.net/) — vendored bundle (`static/cm.bundle.js`) \| Syntax-highlighted code input \|
	\| Diagrams \| [Mermaid.js 10](https://mermaid.js.org/) — vendored UMD (`static/mermaid.min.js`) \| Flowchart rendering \|
	\| Animation \| Web Animations API \| Trace-the-path reveal + theme crossfade \|
	\| Type \| Fraunces · Hanken Grotesk · JetBrains Mono — vendored woff2 (`static/fonts/`) \| Custom, non-default look \|
	\| Assets \| All JS/CSS/fonts bundled into `static/` (no CDN at runtime) \| True offline operation \|
	\| Observability \| Hand-rolled JSONL agent traces \| One trace per generation, served at `/traces` \|
	\| Tests \| `smoke-test.sh` (headless Chrome) \| 13 build/render checks \|
	\| Deploy \| Hugging Face Spaces \| Hosting \|

	## 🔢 Total Parameters

	CodeFlow is driven by Qwen3-Coder-30B-A3B-Instruct — a Mixture-of-Experts model with:

	- ≈ 30.5 billion total parameters
	- ≈ 3.3 billion active parameters per token (128 experts, 8 activated)

	It's served as an Unsloth Dynamic ~3-bit (UD-Q3_K_XL) GGUF, which compresses those 30B weights to a CPU-runnable footprint (~13 GB on disk) — letting a 30B-class model generate diagrams off the grid, with no GPU and no external API.

	## 🏅 Badges (5 / 6)

	These map to the Space tags above.

	\| Badge \| How CodeFlow earns it \|
	\|---\|---\|
	\| 🔌 Off the Grid \| No external API or CDN at runtime — period. The model runs fully locally (Qwen3-Coder GGUF on CPU via llama.cpp), and every frontend asset (Mermaid, CodeMirror, the Gradio client, all fonts) is vendored into `static/`. The Gradio share tunnel is off (`share=False`). The only network call in the whole project is the one-time model download at startup. The UI even runs fully offline from `file://`. \|
	\| 🎨 Off-Brand \| Zero default-Gradio look. A bespoke single-file UI: custom "Pine & Sage" palette (one-word rust fallback), Fraunces + Hanken Grotesk type, a hand-drawn decision-node logo, restyled Mermaid nodes, and a trace-the-path reveal animation — deliberately designed not to look templated. \|
	\| 📓 Field Notes \| See the [blog post][blog]. \|
	\| 🤝 Sharing is Caring \| Open-source under MIT, a public Space, plus a [social post][social] sharing the process and learnings. \|
	\| 🤖 Agentic \| Every model generation is captured as a structured agent trace (input code, the model's reasoning, output, token usage, latency), downloadable at [`/traces`][traces]. \|

	## 🎥 Demo

	[![Watch the demo](REPLACE_ME_thumbnail.png)][video]

	> ▶️ Click above, or use the [Demo Video][video] link at the top.

	## 💻 Run It Locally

	> First launch downloads the ~13 GB GGUF from Hugging Face. CPU inference is slow (cold generations can take minutes) — the built-in examples render instantly because their diagrams are pre-computed.

	```bash
	# 1. Clone
	git clone REPLACE_ME_repo_url CodeFlow
	cd CodeFlow

	# 2. Create a virtual env
	python -m venv .venv
	source .venv/bin/activate # Windows: .venv\Scripts\activate

	# 3. Install deps (uses a prebuilt CPU wheel for llama-cpp-python)
	pip install -r requirements.txt

	# 4. Run — opens a local Gradio URL
	python app.py
	```

	Then open the printed URL. Preview the UI without the model by opening `frontend.html` directly in a browser (`file://`) — fully offline, since all assets are vendored in `static/`; the example presets render their diagrams instantly.

	> Rebuilding the vendored bundles (optional): the CodeMirror + Gradio-client bundles in `static/` are produced by `build/build.sh` (needs Node). Mermaid and the fonts are downloaded into `static/` as well. You never need this to run the app — only to regenerate the bundles.

	Endpoints: `/` (UI) · `/generate_flowchart` (API) · `/traces` (download all agent traces as JSONL).

	## 🗂️ Repository Structure

	```
	CodeFlow/
	├── app.py # Gradio + FastAPI server: loads the model and exposes
	│ # /generate_flowchart (API), / (UI), /static, /traces
	├── frontend.html # Self-contained UI — CodeMirror editor, Mermaid render,
	│ # trace-the-path animation, node↔code linking, theming
	├── static/ # Vendored frontend assets — NO CDN at runtime
	│ ├── mermaid.min.js # Mermaid (UMD, ~3.2 MB)
	│ ├── cm.bundle.js # CodeMirror 6 (single IIFE bundle)
	│ ├── gradio-client.js # @gradio/client (IIFE bundle)
	│ ├── fonts.css # @font-face → local woff2
	│ └── fonts/ # Fraunces · Hanken Grotesk · JetBrains Mono (woff2)
	├── build/ # Reproducible bundle build (Node) — build.sh + entry files
	├── requirements.txt # Python deps (CPU llama-cpp-python wheel, gradio, hub)
	├── smoke-test.sh # Headless-Chrome smoke test (13 checks)
	├── notes-for-blog.md # Field Notes — the full build log
	├── README.md # You are here
	├── LICENSE # MIT
	└── agent_traces.jsonl # (created at runtime) one JSON line per generation
	```

	## ⚠️ Limitations

	- CPU inference is slow. A 30B model on CPU means cold generations can take minutes; the demo leans on pre-rendered examples for instant feedback.
	- 3-bit quantization trades some fidelity for the ability to run a 30B model at all — occasional imperfect diagrams.
	- 4096-token context — very large files won't fit; works best on functions/snippets.
	- Line-map depends on the model. The `<linemap>` is LLM-generated; the server validates and drops bad entries, so node↔code links can be partial on tricky code.
	- Paraphrased labels. Nodes describe logic in plain words (no raw code), so they read cleanly but aren't verbatim.
	- Mermaid parse failures on unusual syntax are possible (the raw output is shown so nothing is lost).
	- Ephemeral traces on Spaces. `agent_traces.jsonl` lives on the runtime filesystem and resets on restart/rebuild — download it before then.

	## 🙏 Credits

	- Model: [Qwen3-Coder](https://huggingface.co/Qwen) (Qwen Team, Alibaba) — GGUF quant by [Unsloth](https://huggingface.co/unsloth).
	- Inference: [llama.cpp](https://github.com/ggml-org/llama.cpp) via [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (Andrei Betlen).
	- App framework: [Gradio](https://www.gradio.app/) (Hugging Face).
	- Diagrams: [Mermaid.js](https://mermaid.js.org/) · Editor: [CodeMirror](https://codemirror.net/).
	- Type: Fraunces, Hanken Grotesk, JetBrains Mono ([Google Fonts](https://fonts.google.com/), SIL OFL).
	- Built for the Build Small Hackathon.

	## 📄 License

	Released under the MIT License — see [`LICENSE`](LICENSE). © 2026 Rishi Jain.