CodeFlow / README.md
Rishi-Jain-27's picture
Removed get traces temp and updated readme
3dffa87

A newer version of the Gradio SDK is available: 6.18.0

Upgrade
metadata
title: CodeFlow
emoji: πŸ“Š
colorFrom: indigo
colorTo: blue
sdk: gradio
python_version: '3.13'
sdk_version: 6.16.0
app_file: app.py
pinned: true
license: mit
short_description: Turn code into a readable Mermaid.js flowchart πŸ“Š!
tags:
  - build-small-hackathon
  - backyard-ai
  - llama-cpp
  - field-notes
  - sharing-is-caring
  - off-brand
  - off-the-grid
  - code
  - mermaid.js
  - flowchart
  - small-models
  - seq2seq
  - gradio
  - agentic

πŸ“Š CodeFlow

Paste code β†’ read its logic as a flowchart. A 30B coder model runs entirely on CPU via llama.cpp to translate source code into a clean, animated Mermaid.js control-flow diagram β€” with each node wired back to the exact lines it came from.

πŸ”— Links

πŸš€ Live Space Β· ▢️ Demo Video Β· 🐦 Social Post Β· πŸ““ Field Notes (blog) Β· πŸ” Agent Traces


❓ The Problem

Reading unfamiliar code means simulating its control flow in your head β€” chasing branches, loops, and early returns line by line. That's slow, error-prone, and gets worse the deeper the nesting. Existing "code β†’ diagram" tools are usually rigid AST parsers (brittle, language-locked) or cloud LLM APIs (your code leaves the building).

CodeFlow turns any snippet into a scannable flowchart you can audit at a glance β€” generated by a real language model that runs 100% locally, so nothing is sent to an external API.

βš™οΈ How It Works

 Paste code ──▢ Generate ──▢ POST /generate_flowchart        (Gradio API)
                                    β”‚
                    number the source lines + structured system prompt
                                    β”‚
                     Qwen3-Coder-30B-A3B   (llama.cpp Β· CPU)
                                    β”‚
                 <thinking> …reasoning… </thinking>
                 graph TD … nodes & edges …
                 <linemap> A:1  B:2  C:3-4 </linemap>
                                    β”‚
        strip reasoning Β· parse + validate the line-map Β· sanitize labels
                                    β”‚
                  { mermaid, linemap }  ──▢  append agent_traces.jsonl
                                    β”‚
   Mermaid render + "trace-the-path" reveal + node ↔ code linking
  1. You paste code (or pick a pre-rendered example) into the CodeMirror editor and hit Generate.
  2. The backend numbers the source lines and sends them with a strict system prompt to Qwen3-Coder running on llama.cpp.
  3. The model returns hidden <thinking>, the Mermaid graph, and a <linemap> mapping every node to its source line(s).
  4. The server strips the reasoning, validates the line-map against the source, sanitizes labels for Mermaid, and returns { mermaid, linemap }.
  5. The frontend renders the diagram with a trace-the-path reveal that flows out of a persistent Start node while the canvas scrolls along in real time.
  6. Node ↔ code linking: hover a node to highlight its source lines, click a node to jump-and-edit them, or move your cursor over a line to light up the matching node.
  7. Every generation is captured as a structured agent trace (/traces).

🧰 Tech Stack

Layer What it is Used for
Model Qwen3-Coder-30B-A3B-Instruct (Mixture-of-Experts) Code β†’ Mermaid + line-map generation
Quantization Unsloth Dynamic UD-Q3_K_XL GGUF (~3-bit) Shrinks the 30B model to run on CPU
Inference llama-cpp-python (llama.cpp) Local CPU inference (n_ctx=4096)
Model fetch huggingface_hub Downloads the GGUF on first run
Server Gradio gr.Server + FastAPI /generate_flowchart API, / UI, /traces
Frontend A single self-contained frontend.html (vanilla JS + CSS custom properties) Editor, diagram, animation, theming
Editor CodeMirror 6 β€” vendored bundle (static/cm.bundle.js) Syntax-highlighted code input
Diagrams Mermaid.js 10 β€” vendored UMD (static/mermaid.min.js) Flowchart rendering
Animation Web Animations API Trace-the-path reveal + theme crossfade
Type Fraunces Β· Hanken Grotesk Β· JetBrains Mono β€” vendored woff2 (static/fonts/) Custom, non-default look
Assets All JS/CSS/fonts bundled into static/ (no CDN at runtime) True offline operation
Observability Hand-rolled JSONL agent traces One trace per generation, served at /traces
Tests smoke-test.sh (headless Chrome) 13 build/render checks
Deploy Hugging Face Spaces Hosting

πŸ”’ Total Parameters

CodeFlow is driven by Qwen3-Coder-30B-A3B-Instruct β€” a Mixture-of-Experts model with:

  • β‰ˆ 30.5 billion total parameters
  • β‰ˆ 3.3 billion active parameters per token (128 experts, 8 activated)

It's served as an Unsloth Dynamic ~3-bit (UD-Q3_K_XL) GGUF, which compresses those 30B weights to a CPU-runnable footprint (~13 GB on disk) β€” letting a 30B-class model generate diagrams off the grid, with no GPU and no external API.

πŸ… Badges (5 / 6)

These map to the Space tags above.

Badge How CodeFlow earns it
πŸ”Œ Off the Grid No external API or CDN at runtime β€” period. The model runs fully locally (Qwen3-Coder GGUF on CPU via llama.cpp), and every frontend asset (Mermaid, CodeMirror, the Gradio client, all fonts) is vendored into static/. The Gradio share tunnel is off (share=False). The only network call in the whole project is the one-time model download at startup. The UI even runs fully offline from file://.
🎨 Off-Brand Zero default-Gradio look. A bespoke single-file UI: custom "Pine & Sage" palette (one-word rust fallback), Fraunces + Hanken Grotesk type, a hand-drawn decision-node logo, restyled Mermaid nodes, and a trace-the-path reveal animation β€” deliberately designed not to look templated.
πŸ““ Field Notes See the blog post.
🀝 Sharing is Caring Open-source under MIT, a public Space, plus a social post sharing the process and learnings.
πŸ€– Agentic Every model generation is captured as a structured agent trace (input code, the model's reasoning, output, token usage, latency), downloadable at /traces.

πŸŽ₯ Demo

Watch the demo

▢️ Click above, or use the Demo Video link at the top.

πŸ’» Run It Locally

First launch downloads the ~13 GB GGUF from Hugging Face. CPU inference is slow (cold generations can take minutes) β€” the built-in examples render instantly because their diagrams are pre-computed.

# 1. Clone
git clone REPLACE_ME_repo_url CodeFlow
cd CodeFlow

# 2. Create a virtual env
python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

# 3. Install deps (uses a prebuilt CPU wheel for llama-cpp-python)
pip install -r requirements.txt

# 4. Run β€” opens a local Gradio URL
python app.py

Then open the printed URL. Preview the UI without the model by opening frontend.html directly in a browser (file://) β€” fully offline, since all assets are vendored in static/; the example presets render their diagrams instantly.

Rebuilding the vendored bundles (optional): the CodeMirror + Gradio-client bundles in static/ are produced by build/build.sh (needs Node). Mermaid and the fonts are downloaded into static/ as well. You never need this to run the app β€” only to regenerate the bundles.

Endpoints: / (UI) Β· /generate_flowchart (API) Β· /traces (download all agent traces as JSONL).

πŸ—‚οΈ Repository Structure

CodeFlow/
β”œβ”€β”€ app.py             # Gradio + FastAPI server: loads the model and exposes
β”‚                      #   /generate_flowchart (API), / (UI), /static, /traces
β”œβ”€β”€ frontend.html      # Self-contained UI β€” CodeMirror editor, Mermaid render,
β”‚                      #   trace-the-path animation, node↔code linking, theming
β”œβ”€β”€ static/            # Vendored frontend assets β€” NO CDN at runtime
β”‚   β”œβ”€β”€ mermaid.min.js #   Mermaid (UMD, ~3.2 MB)
β”‚   β”œβ”€β”€ cm.bundle.js   #   CodeMirror 6 (single IIFE bundle)
β”‚   β”œβ”€β”€ gradio-client.js #  @gradio/client (IIFE bundle)
β”‚   β”œβ”€β”€ fonts.css      #   @font-face β†’ local woff2
β”‚   └── fonts/         #   Fraunces Β· Hanken Grotesk Β· JetBrains Mono (woff2)
β”œβ”€β”€ build/             # Reproducible bundle build (Node) β€” build.sh + entry files
β”œβ”€β”€ requirements.txt   # Python deps (CPU llama-cpp-python wheel, gradio, hub)
β”œβ”€β”€ smoke-test.sh      # Headless-Chrome smoke test (13 checks)
β”œβ”€β”€ notes-for-blog.md  # Field Notes β€” the full build log
β”œβ”€β”€ README.md          # You are here
β”œβ”€β”€ LICENSE            # MIT
└── agent_traces.jsonl # (created at runtime) one JSON line per generation

⚠️ Limitations

  • CPU inference is slow. A 30B model on CPU means cold generations can take minutes; the demo leans on pre-rendered examples for instant feedback.
  • 3-bit quantization trades some fidelity for the ability to run a 30B model at all β€” occasional imperfect diagrams.
  • 4096-token context β€” very large files won't fit; works best on functions/snippets.
  • Line-map depends on the model. The <linemap> is LLM-generated; the server validates and drops bad entries, so node↔code links can be partial on tricky code.
  • Paraphrased labels. Nodes describe logic in plain words (no raw code), so they read cleanly but aren't verbatim.
  • Mermaid parse failures on unusual syntax are possible (the raw output is shown so nothing is lost).
  • Ephemeral traces on Spaces. agent_traces.jsonl lives on the runtime filesystem and resets on restart/rebuild β€” download it before then.

πŸ™ Credits

πŸ“„ License

Released under the MIT License β€” see LICENSE. Β© 2026 Rishi Jain.