Spaces:
Running
Running
Commit ·
dc88fe3
1
Parent(s): f7edb7f
Updated README
Browse files
README.md
CHANGED
|
@@ -17,6 +17,7 @@ tags:
|
|
| 17 |
- achievement:offbrand
|
| 18 |
- achievement:llama
|
| 19 |
- achievement:fieldnotes
|
|
|
|
| 20 |
- build-small-hackathon
|
| 21 |
- backyard-ai
|
| 22 |
- llama-cpp
|
|
@@ -39,13 +40,14 @@ tags:
|
|
| 39 |
|
| 40 |
### 🔗 Links
|
| 41 |
|
| 42 |
-
[🚀 **Live Space**][space] · [▶️ **Demo Video**][video] · [🐦 **Social Post**][social] · [📓 **Field Notes (blog)**][blog] · [🔍 **Agent Traces**][traces]
|
| 43 |
|
| 44 |
[space]: https://huggingface.co/spaces/build-small-hackathon/CodeFlow "Hugging Face Space"
|
| 45 |
[video]: https://youtu.be/R5GbpN9FVxo "Demo video"
|
| 46 |
[social]: https://www.linkedin.com/feed/update/urn:li:share:7471327684539785217/ "Social post"
|
| 47 |
[blog]: https://huggingface.co/blog/build-small-hackathon/codeflow-field-notes "Field notes / blog post"
|
| 48 |
[traces]: https://huggingface.co/datasets/build-small-hackathon/codeflow-agent-traces "Agent traces dataset"
|
|
|
|
| 49 |
|
| 50 |
---
|
| 51 |
|
|
@@ -62,7 +64,7 @@ Reading unfamiliar code means simulating its control flow in your head — chasi
|
|
| 62 |
│
|
| 63 |
number the source lines + structured system prompt
|
| 64 |
│
|
| 65 |
-
|
| 66 |
│
|
| 67 |
<thinking> …reasoning… </thinking>
|
| 68 |
graph TD … nodes & edges …
|
|
@@ -76,19 +78,30 @@ Reading unfamiliar code means simulating its control flow in your head — chasi
|
|
| 76 |
```
|
| 77 |
|
| 78 |
1. You paste code (or pick a pre-rendered example) into the **CodeMirror** editor and hit **Generate**.
|
| 79 |
-
2. The backend numbers the source lines and sends them with a strict system prompt to **Qwen3-Coder** running on **llama.cpp**.
|
| 80 |
3. The model returns hidden `<thinking>`, the Mermaid `graph`, and a `<linemap>` mapping every node to its source line(s).
|
| 81 |
4. The server strips the reasoning, **validates** the line-map against the source, sanitizes labels for Mermaid, and returns `{ mermaid, linemap }`.
|
| 82 |
5. The frontend renders the diagram with a **trace-the-path reveal** that flows out of a persistent Start node while the canvas scrolls along in real time.
|
| 83 |
6. **Node ↔ code linking:** hover a node to highlight its source lines, click a node to jump-and-edit them, or move your cursor over a line to light up the matching node.
|
| 84 |
7. Every generation is captured as a structured **agent trace** (`/traces`).
|
| 85 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
## 🧰 Tech Stack
|
| 87 |
|
| 88 |
| Layer | What it is | Used for |
|
| 89 |
|---|---|---|
|
| 90 |
-
| **Model** | [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen) (Mixture-of-Experts) | Code → Mermaid + line-map generation |
|
| 91 |
-
| **
|
|
|
|
| 92 |
| **Inference** | [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (llama.cpp) | Local CPU inference (`n_ctx=4096`) |
|
| 93 |
| **Model fetch** | `huggingface_hub` | Downloads the GGUF on first run |
|
| 94 |
| **Server** | [Gradio](https://www.gradio.app/) `gr.Server` + FastAPI | `/generate_flowchart` API, `/` UI, `/traces` |
|
|
@@ -104,14 +117,14 @@ Reading unfamiliar code means simulating its control flow in your head — chasi
|
|
| 104 |
|
| 105 |
## 🔢 Total Parameters
|
| 106 |
|
| 107 |
-
CodeFlow is driven by **Qwen3-Coder-30B-A3B-Instruct** — a **Mixture-of-Experts** model with:
|
| 108 |
|
| 109 |
-
- **≈ 30.5 billion total parameters**
|
| 110 |
- **≈ 3.3 billion active parameters per token** (128 experts, 8 activated)
|
| 111 |
|
| 112 |
-
It's served as
|
| 113 |
|
| 114 |
-
## 🏅 Badges (
|
| 115 |
|
| 116 |
These map to the Space tags above.
|
| 117 |
|
|
@@ -122,6 +135,7 @@ These map to the Space tags above.
|
|
| 122 |
| 📓 **Field Notes** | See the [blog post][blog]. |
|
| 123 |
| 🤝 **Sharing is Caring** | Open-source under **MIT**, a public Space, plus a [social post][social] sharing the process and learnings. |
|
| 124 |
| 🤖 **Agentic** | Every model generation is captured as a structured agent trace (input code, the model's reasoning, output, token usage, latency), downloadable at [`/traces`][traces]. |
|
|
|
|
| 125 |
|
| 126 |
## 🎥 Demo
|
| 127 |
|
|
@@ -187,7 +201,7 @@ CodeFlow/
|
|
| 187 |
|
| 188 |
## 🙏 Credits
|
| 189 |
|
| 190 |
-
- **Model:** [Qwen3-Coder](https://huggingface.co/Qwen) (Qwen Team, Alibaba)
|
| 191 |
- **Inference:** [llama.cpp](https://github.com/ggml-org/llama.cpp) via [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (Andrei Betlen).
|
| 192 |
- **App framework:** [Gradio](https://www.gradio.app/) (Hugging Face).
|
| 193 |
- **Diagrams:** [Mermaid.js](https://mermaid.js.org/) · **Editor:** [CodeMirror](https://codemirror.net/).
|
|
|
|
| 17 |
- achievement:offbrand
|
| 18 |
- achievement:llama
|
| 19 |
- achievement:fieldnotes
|
| 20 |
+
- achievement:welltuned
|
| 21 |
- build-small-hackathon
|
| 22 |
- backyard-ai
|
| 23 |
- llama-cpp
|
|
|
|
| 40 |
|
| 41 |
### 🔗 Links
|
| 42 |
|
| 43 |
+
[🚀 **Live Space**][space] · [▶️ **Demo Video**][video] · [🐦 **Social Post**][social] · [📓 **Field Notes (blog)**][blog] · [🔍 **Agent Traces**][traces] · [🎛️ **Fine-Tuned Model**][model]
|
| 44 |
|
| 45 |
[space]: https://huggingface.co/spaces/build-small-hackathon/CodeFlow "Hugging Face Space"
|
| 46 |
[video]: https://youtu.be/R5GbpN9FVxo "Demo video"
|
| 47 |
[social]: https://www.linkedin.com/feed/update/urn:li:share:7471327684539785217/ "Social post"
|
| 48 |
[blog]: https://huggingface.co/blog/build-small-hackathon/codeflow-field-notes "Field notes / blog post"
|
| 49 |
[traces]: https://huggingface.co/datasets/build-small-hackathon/codeflow-agent-traces "Agent traces dataset"
|
| 50 |
+
[model]: https://huggingface.co/build-small-hackathon/codeflow-qwen-3-finetuning "Fine-tuned model"
|
| 51 |
|
| 52 |
---
|
| 53 |
|
|
|
|
| 64 |
│
|
| 65 |
number the source lines + structured system prompt
|
| 66 |
│
|
| 67 |
+
CodeFlow fine-tune of Qwen3-Coder-30B-A3B (llama.cpp · CPU)
|
| 68 |
│
|
| 69 |
<thinking> …reasoning… </thinking>
|
| 70 |
graph TD … nodes & edges …
|
|
|
|
| 78 |
```
|
| 79 |
|
| 80 |
1. You paste code (or pick a pre-rendered example) into the **CodeMirror** editor and hit **Generate**.
|
| 81 |
+
2. The backend numbers the source lines and sends them with a strict system prompt to the **CodeFlow fine-tune of Qwen3-Coder** running on **llama.cpp**.
|
| 82 |
3. The model returns hidden `<thinking>`, the Mermaid `graph`, and a `<linemap>` mapping every node to its source line(s).
|
| 83 |
4. The server strips the reasoning, **validates** the line-map against the source, sanitizes labels for Mermaid, and returns `{ mermaid, linemap }`.
|
| 84 |
5. The frontend renders the diagram with a **trace-the-path reveal** that flows out of a persistent Start node while the canvas scrolls along in real time.
|
| 85 |
6. **Node ↔ code linking:** hover a node to highlight its source lines, click a node to jump-and-edit them, or move your cursor over a line to light up the matching node.
|
| 86 |
7. Every generation is captured as a structured **agent trace** (`/traces`).
|
| 87 |
|
| 88 |
+
## 🎛️ Fine-Tuning
|
| 89 |
+
|
| 90 |
+
CodeFlow runs a [**LoRA fine-tune**][model] of **Qwen3-Coder-30B-A3B-Instruct** (≈30.5B params), specialized for the code → Mermaid + `<linemap>` task rather than relying on the base model's general coding ability.
|
| 91 |
+
|
| 92 |
+
- **Data:** **2,400 synthetic examples** (2,208 train / 192 val — 8% holdout), built from **22 control-flow templates** across **Python, JavaScript, C++, and C**.
|
| 93 |
+
- **Method:** LoRA `r=16, α=32` on the attention + MLP projections, **bf16**, cosine schedule — then merged and exported to a **Q3_K_L GGUF** for CPU inference.
|
| 94 |
+
- **Validation:** the holdout is **hard-validated** — generated outputs are syntax-checked / compiled, not just eyeballed.
|
| 95 |
+
|
| 96 |
+
See the [model card][model] for the full data engine, `finetune.py` options, and dataset preview.
|
| 97 |
+
|
| 98 |
## 🧰 Tech Stack
|
| 99 |
|
| 100 |
| Layer | What it is | Used for |
|
| 101 |
|---|---|---|
|
| 102 |
+
| **Model** | [**CodeFlow fine-tune**][model] of [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen) (Mixture-of-Experts) | Code → Mermaid + line-map generation |
|
| 103 |
+
| **Fine-tuning** | LoRA SFT (`r=16, α=32`) on attention + MLP projections, merged to GGUF | Specializes the base model for the code → Mermaid + line-map task |
|
| 104 |
+
| **Quantization** | **Q3_K_L** GGUF (~3-bit) | Shrinks the 30B model to run on CPU |
|
| 105 |
| **Inference** | [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (llama.cpp) | Local CPU inference (`n_ctx=4096`) |
|
| 106 |
| **Model fetch** | `huggingface_hub` | Downloads the GGUF on first run |
|
| 107 |
| **Server** | [Gradio](https://www.gradio.app/) `gr.Server` + FastAPI | `/generate_flowchart` API, `/` UI, `/traces` |
|
|
|
|
| 117 |
|
| 118 |
## 🔢 Total Parameters
|
| 119 |
|
| 120 |
+
CodeFlow is driven by a [**LoRA fine-tune**][model] of **Qwen3-Coder-30B-A3B-Instruct** — a **Mixture-of-Experts** model with:
|
| 121 |
|
| 122 |
+
- **≈ 30.5 billion total parameters** (well under the 32B cap)
|
| 123 |
- **≈ 3.3 billion active parameters per token** (128 experts, 8 activated)
|
| 124 |
|
| 125 |
+
It's served as a **~3-bit (Q3_K_L) GGUF**, which compresses those 30B weights to a CPU-runnable footprint (~13 GB on disk) — letting a 30B-class model generate diagrams **off the grid**, with no GPU and no external API.
|
| 126 |
|
| 127 |
+
## 🏅 Badges (6 / 6)
|
| 128 |
|
| 129 |
These map to the Space tags above.
|
| 130 |
|
|
|
|
| 135 |
| 📓 **Field Notes** | See the [blog post][blog]. |
|
| 136 |
| 🤝 **Sharing is Caring** | Open-source under **MIT**, a public Space, plus a [social post][social] sharing the process and learnings. |
|
| 137 |
| 🤖 **Agentic** | Every model generation is captured as a structured agent trace (input code, the model's reasoning, output, token usage, latency), downloadable at [`/traces`][traces]. |
|
| 138 |
+
| 🎛️ **Well-Tuned** | A [**LoRA fine-tune**][model] of Qwen3-Coder-30B-A3B-Instruct (**≈30.5B params — under the 32B cap**), specialized for the code → Mermaid + `<linemap>` task and shipped as the GGUF the Space actually runs. |
|
| 139 |
|
| 140 |
## 🎥 Demo
|
| 141 |
|
|
|
|
| 201 |
|
| 202 |
## 🙏 Credits
|
| 203 |
|
| 204 |
+
- **Model:** [CodeFlow fine-tune][model] of [Qwen3-Coder](https://huggingface.co/Qwen) (Qwen Team, Alibaba), built with [Unsloth](https://huggingface.co/unsloth).
|
| 205 |
- **Inference:** [llama.cpp](https://github.com/ggml-org/llama.cpp) via [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (Andrei Betlen).
|
| 206 |
- **App framework:** [Gradio](https://www.gradio.app/) (Hugging Face).
|
| 207 |
- **Diagrams:** [Mermaid.js](https://mermaid.js.org/) · **Editor:** [CodeMirror](https://codemirror.net/).
|