Spaces:

build-small-hackathon
/

CodeFlow

Running

App Files Files Community

CodeFlow / README.md

Rishi-Jain-27

Removed get traces temp and updated readme

3dffa87 about 5 hours ago

preview code

raw

history blame contribute delete

11.7 kB

A newer version of the Gradio SDK is available: 6.18.0

Upgrade

metadata

title: CodeFlow
emoji: 📊
colorFrom: indigo
colorTo: blue
sdk: gradio
python_version: '3.13'
sdk_version: 6.16.0
app_file: app.py
pinned: true
license: mit
short_description: Turn code into a readable Mermaid.js flowchart 📊!
tags:
  - build-small-hackathon
  - backyard-ai
  - llama-cpp
  - field-notes
  - sharing-is-caring
  - off-brand
  - off-the-grid
  - code
  - mermaid.js
  - flowchart
  - small-models
  - seq2seq
  - gradio
  - agentic

📊 CodeFlow

Paste code → read its logic as a flowchart. A 30B coder model runs entirely on CPU via llama.cpp to translate source code into a clean, animated Mermaid.js control-flow diagram — with each node wired back to the exact lines it came from.

🔗 Links

🚀 Live Space · ▶️ Demo Video · 🐦 Social Post · 📓 Field Notes (blog) · 🔍 Agent Traces

❓ The Problem

Reading unfamiliar code means simulating its control flow in your head — chasing branches, loops, and early returns line by line. That's slow, error-prone, and gets worse the deeper the nesting. Existing "code → diagram" tools are usually rigid AST parsers (brittle, language-locked) or cloud LLM APIs (your code leaves the building).

CodeFlow turns any snippet into a scannable flowchart you can audit at a glance — generated by a real language model that runs 100% locally, so nothing is sent to an external API.

⚙️ How It Works

 Paste code ──▶ Generate ──▶ POST /generate_flowchart        (Gradio API)
                                    │
                    number the source lines + structured system prompt
                                    │
                     Qwen3-Coder-30B-A3B   (llama.cpp · CPU)
                                    │
                 <thinking> …reasoning… </thinking>
                 graph TD … nodes & edges …
                 <linemap> A:1  B:2  C:3-4 </linemap>
                                    │
        strip reasoning · parse + validate the line-map · sanitize labels
                                    │
                  { mermaid, linemap }  ──▶  append agent_traces.jsonl
                                    │
   Mermaid render + "trace-the-path" reveal + node ↔ code linking

You paste code (or pick a pre-rendered example) into the CodeMirror editor and hit Generate.
The backend numbers the source lines and sends them with a strict system prompt to Qwen3-Coder running on llama.cpp.
The model returns hidden <thinking>, the Mermaid graph, and a <linemap> mapping every node to its source line(s).
The server strips the reasoning, validates the line-map against the source, sanitizes labels for Mermaid, and returns { mermaid, linemap }.
The frontend renders the diagram with a trace-the-path reveal that flows out of a persistent Start node while the canvas scrolls along in real time.
Node ↔ code linking: hover a node to highlight its source lines, click a node to jump-and-edit them, or move your cursor over a line to light up the matching node.
Every generation is captured as a structured agent trace (/traces).

🧰 Tech Stack

Layer	What it is	Used for
Model	Qwen3-Coder-30B-A3B-Instruct (Mixture-of-Experts)	Code → Mermaid + line-map generation
Quantization	Unsloth Dynamic UD-Q3_K_XL GGUF (~3-bit)	Shrinks the 30B model to run on CPU
Inference	`llama-cpp-python` (llama.cpp)	Local CPU inference (`n_ctx=4096`)
Model fetch	`huggingface_hub`	Downloads the GGUF on first run
Server	Gradio `gr.Server` + FastAPI	`/generate_flowchart` API, `/` UI, `/traces`
Frontend	A single self-contained `frontend.html` (vanilla JS + CSS custom properties)	Editor, diagram, animation, theming
Editor	CodeMirror 6 — vendored bundle (`static/cm.bundle.js`)	Syntax-highlighted code input
Diagrams	Mermaid.js 10 — vendored UMD (`static/mermaid.min.js`)	Flowchart rendering
Animation	Web Animations API	Trace-the-path reveal + theme crossfade
Type	Fraunces · Hanken Grotesk · JetBrains Mono — vendored woff2 (`static/fonts/`)	Custom, non-default look
Assets	All JS/CSS/fonts bundled into `static/` (no CDN at runtime)	True offline operation
Observability	Hand-rolled JSONL agent traces	One trace per generation, served at `/traces`
Tests	`smoke-test.sh` (headless Chrome)	13 build/render checks
Deploy	Hugging Face Spaces	Hosting

🔢 Total Parameters

CodeFlow is driven by Qwen3-Coder-30B-A3B-Instruct — a Mixture-of-Experts model with:

≈ 30.5 billion total parameters
≈ 3.3 billion active parameters per token (128 experts, 8 activated)

It's served as an Unsloth Dynamic ~3-bit (UD-Q3_K_XL) GGUF, which compresses those 30B weights to a CPU-runnable footprint (~13 GB on disk) — letting a 30B-class model generate diagrams off the grid, with no GPU and no external API.

🏅 Badges (5 / 6)

These map to the Space tags above.

Badge	How CodeFlow earns it
🔌 Off the Grid	No external API or CDN at runtime — period. The model runs fully locally (Qwen3-Coder GGUF on CPU via llama.cpp), and every frontend asset (Mermaid, CodeMirror, the Gradio client, all fonts) is vendored into `static/`. The Gradio share tunnel is off (`share=False`). The only network call in the whole project is the one-time model download at startup. The UI even runs fully offline from `file://`.
🎨 Off-Brand	Zero default-Gradio look. A bespoke single-file UI: custom "Pine & Sage" palette (one-word rust fallback), Fraunces + Hanken Grotesk type, a hand-drawn decision-node logo, restyled Mermaid nodes, and a trace-the-path reveal animation — deliberately designed not to look templated.
📓 Field Notes	See the blog post.
🤝 Sharing is Caring	Open-source under MIT, a public Space, plus a social post sharing the process and learnings.
🤖 Agentic	Every model generation is captured as a structured agent trace (input code, the model's reasoning, output, token usage, latency), downloadable at `/traces`.

🎥 Demo

▶️ Click above, or use the Demo Video link at the top.

💻 Run It Locally

First launch downloads the ~13 GB GGUF from Hugging Face. CPU inference is slow (cold generations can take minutes) — the built-in examples render instantly because their diagrams are pre-computed.

# 1. Clone
git clone REPLACE_ME_repo_url CodeFlow
cd CodeFlow

# 2. Create a virtual env
python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

# 3. Install deps (uses a prebuilt CPU wheel for llama-cpp-python)
pip install -r requirements.txt

# 4. Run — opens a local Gradio URL
python app.py

Then open the printed URL. Preview the UI without the model by opening frontend.html directly in a browser (file://) — fully offline, since all assets are vendored in static/; the example presets render their diagrams instantly.

Rebuilding the vendored bundles (optional): the CodeMirror + Gradio-client bundles in static/ are produced by build/build.sh (needs Node). Mermaid and the fonts are downloaded into static/ as well. You never need this to run the app — only to regenerate the bundles.

Endpoints: / (UI) · /generate_flowchart (API) · /traces (download all agent traces as JSONL).

🗂️ Repository Structure

CodeFlow/
├── app.py             # Gradio + FastAPI server: loads the model and exposes
│                      #   /generate_flowchart (API), / (UI), /static, /traces
├── frontend.html      # Self-contained UI — CodeMirror editor, Mermaid render,
│                      #   trace-the-path animation, node↔code linking, theming
├── static/            # Vendored frontend assets — NO CDN at runtime
│   ├── mermaid.min.js #   Mermaid (UMD, ~3.2 MB)
│   ├── cm.bundle.js   #   CodeMirror 6 (single IIFE bundle)
│   ├── gradio-client.js #  @gradio/client (IIFE bundle)
│   ├── fonts.css      #   @font-face → local woff2
│   └── fonts/         #   Fraunces · Hanken Grotesk · JetBrains Mono (woff2)
├── build/             # Reproducible bundle build (Node) — build.sh + entry files
├── requirements.txt   # Python deps (CPU llama-cpp-python wheel, gradio, hub)
├── smoke-test.sh      # Headless-Chrome smoke test (13 checks)
├── notes-for-blog.md  # Field Notes — the full build log
├── README.md          # You are here
├── LICENSE            # MIT
└── agent_traces.jsonl # (created at runtime) one JSON line per generation

⚠️ Limitations

CPU inference is slow. A 30B model on CPU means cold generations can take minutes; the demo leans on pre-rendered examples for instant feedback.
3-bit quantization trades some fidelity for the ability to run a 30B model at all — occasional imperfect diagrams.
4096-token context — very large files won't fit; works best on functions/snippets.
Line-map depends on the model. The <linemap> is LLM-generated; the server validates and drops bad entries, so node↔code links can be partial on tricky code.
Paraphrased labels. Nodes describe logic in plain words (no raw code), so they read cleanly but aren't verbatim.
Mermaid parse failures on unusual syntax are possible (the raw output is shown so nothing is lost).
Ephemeral traces on Spaces. agent_traces.jsonl lives on the runtime filesystem and resets on restart/rebuild — download it before then.

🙏 Credits

Model: Qwen3-Coder (Qwen Team, Alibaba) — GGUF quant by Unsloth.
Inference: llama.cpp via llama-cpp-python (Andrei Betlen).
App framework: Gradio (Hugging Face).
Diagrams: Mermaid.js · Editor: CodeMirror.
Type: Fraunces, Hanken Grotesk, JetBrains Mono (Google Fonts, SIL OFL).
Built for the Build Small Hackathon.