Rishi-Jain-27 commited on
Commit
dc88fe3
·
1 Parent(s): f7edb7f

Updated README

Browse files
Files changed (1) hide show
  1. README.md +24 -10
README.md CHANGED
@@ -17,6 +17,7 @@ tags:
17
  - achievement:offbrand
18
  - achievement:llama
19
  - achievement:fieldnotes
 
20
  - build-small-hackathon
21
  - backyard-ai
22
  - llama-cpp
@@ -39,13 +40,14 @@ tags:
39
 
40
  ### 🔗 Links
41
 
42
- [🚀 **Live Space**][space] · [▶️ **Demo Video**][video] · [🐦 **Social Post**][social] · [📓 **Field Notes (blog)**][blog] · [🔍 **Agent Traces**][traces]
43
 
44
  [space]: https://huggingface.co/spaces/build-small-hackathon/CodeFlow "Hugging Face Space"
45
  [video]: https://youtu.be/R5GbpN9FVxo "Demo video"
46
  [social]: https://www.linkedin.com/feed/update/urn:li:share:7471327684539785217/ "Social post"
47
  [blog]: https://huggingface.co/blog/build-small-hackathon/codeflow-field-notes "Field notes / blog post"
48
  [traces]: https://huggingface.co/datasets/build-small-hackathon/codeflow-agent-traces "Agent traces dataset"
 
49
 
50
  ---
51
 
@@ -62,7 +64,7 @@ Reading unfamiliar code means simulating its control flow in your head — chasi
62
 
63
  number the source lines + structured system prompt
64
 
65
- Qwen3-Coder-30B-A3B (llama.cpp · CPU)
66
 
67
  <thinking> …reasoning… </thinking>
68
  graph TD … nodes & edges …
@@ -76,19 +78,30 @@ Reading unfamiliar code means simulating its control flow in your head — chasi
76
  ```
77
 
78
  1. You paste code (or pick a pre-rendered example) into the **CodeMirror** editor and hit **Generate**.
79
- 2. The backend numbers the source lines and sends them with a strict system prompt to **Qwen3-Coder** running on **llama.cpp**.
80
  3. The model returns hidden `<thinking>`, the Mermaid `graph`, and a `<linemap>` mapping every node to its source line(s).
81
  4. The server strips the reasoning, **validates** the line-map against the source, sanitizes labels for Mermaid, and returns `{ mermaid, linemap }`.
82
  5. The frontend renders the diagram with a **trace-the-path reveal** that flows out of a persistent Start node while the canvas scrolls along in real time.
83
  6. **Node ↔ code linking:** hover a node to highlight its source lines, click a node to jump-and-edit them, or move your cursor over a line to light up the matching node.
84
  7. Every generation is captured as a structured **agent trace** (`/traces`).
85
 
 
 
 
 
 
 
 
 
 
 
86
  ## 🧰 Tech Stack
87
 
88
  | Layer | What it is | Used for |
89
  |---|---|---|
90
- | **Model** | [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen) (Mixture-of-Experts) | Code → Mermaid + line-map generation |
91
- | **Quantization** | [Unsloth](https://huggingface.co/unsloth) Dynamic **UD-Q3_K_XL** GGUF (~3-bit) | Shrinks the 30B model to run on CPU |
 
92
  | **Inference** | [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (llama.cpp) | Local CPU inference (`n_ctx=4096`) |
93
  | **Model fetch** | `huggingface_hub` | Downloads the GGUF on first run |
94
  | **Server** | [Gradio](https://www.gradio.app/) `gr.Server` + FastAPI | `/generate_flowchart` API, `/` UI, `/traces` |
@@ -104,14 +117,14 @@ Reading unfamiliar code means simulating its control flow in your head — chasi
104
 
105
  ## 🔢 Total Parameters
106
 
107
- CodeFlow is driven by **Qwen3-Coder-30B-A3B-Instruct** — a **Mixture-of-Experts** model with:
108
 
109
- - **≈ 30.5 billion total parameters**
110
  - **≈ 3.3 billion active parameters per token** (128 experts, 8 activated)
111
 
112
- It's served as an **Unsloth Dynamic ~3-bit (UD-Q3_K_XL) GGUF**, which compresses those 30B weights to a CPU-runnable footprint (~13 GB on disk) — letting a 30B-class model generate diagrams **off the grid**, with no GPU and no external API.
113
 
114
- ## 🏅 Badges (5 / 6)
115
 
116
  These map to the Space tags above.
117
 
@@ -122,6 +135,7 @@ These map to the Space tags above.
122
  | 📓 **Field Notes** | See the [blog post][blog]. |
123
  | 🤝 **Sharing is Caring** | Open-source under **MIT**, a public Space, plus a [social post][social] sharing the process and learnings. |
124
  | 🤖 **Agentic** | Every model generation is captured as a structured agent trace (input code, the model's reasoning, output, token usage, latency), downloadable at [`/traces`][traces]. |
 
125
 
126
  ## 🎥 Demo
127
 
@@ -187,7 +201,7 @@ CodeFlow/
187
 
188
  ## 🙏 Credits
189
 
190
- - **Model:** [Qwen3-Coder](https://huggingface.co/Qwen) (Qwen Team, Alibaba) GGUF quant by [Unsloth](https://huggingface.co/unsloth).
191
  - **Inference:** [llama.cpp](https://github.com/ggml-org/llama.cpp) via [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (Andrei Betlen).
192
  - **App framework:** [Gradio](https://www.gradio.app/) (Hugging Face).
193
  - **Diagrams:** [Mermaid.js](https://mermaid.js.org/) · **Editor:** [CodeMirror](https://codemirror.net/).
 
17
  - achievement:offbrand
18
  - achievement:llama
19
  - achievement:fieldnotes
20
+ - achievement:welltuned
21
  - build-small-hackathon
22
  - backyard-ai
23
  - llama-cpp
 
40
 
41
  ### 🔗 Links
42
 
43
+ [🚀 **Live Space**][space] · [▶️ **Demo Video**][video] · [🐦 **Social Post**][social] · [📓 **Field Notes (blog)**][blog] · [🔍 **Agent Traces**][traces] · [🎛️ **Fine-Tuned Model**][model]
44
 
45
  [space]: https://huggingface.co/spaces/build-small-hackathon/CodeFlow "Hugging Face Space"
46
  [video]: https://youtu.be/R5GbpN9FVxo "Demo video"
47
  [social]: https://www.linkedin.com/feed/update/urn:li:share:7471327684539785217/ "Social post"
48
  [blog]: https://huggingface.co/blog/build-small-hackathon/codeflow-field-notes "Field notes / blog post"
49
  [traces]: https://huggingface.co/datasets/build-small-hackathon/codeflow-agent-traces "Agent traces dataset"
50
+ [model]: https://huggingface.co/build-small-hackathon/codeflow-qwen-3-finetuning "Fine-tuned model"
51
 
52
  ---
53
 
 
64
 
65
  number the source lines + structured system prompt
66
 
67
+ CodeFlow fine-tune of Qwen3-Coder-30B-A3B (llama.cpp · CPU)
68
 
69
  <thinking> …reasoning… </thinking>
70
  graph TD … nodes & edges …
 
78
  ```
79
 
80
  1. You paste code (or pick a pre-rendered example) into the **CodeMirror** editor and hit **Generate**.
81
+ 2. The backend numbers the source lines and sends them with a strict system prompt to the **CodeFlow fine-tune of Qwen3-Coder** running on **llama.cpp**.
82
  3. The model returns hidden `<thinking>`, the Mermaid `graph`, and a `<linemap>` mapping every node to its source line(s).
83
  4. The server strips the reasoning, **validates** the line-map against the source, sanitizes labels for Mermaid, and returns `{ mermaid, linemap }`.
84
  5. The frontend renders the diagram with a **trace-the-path reveal** that flows out of a persistent Start node while the canvas scrolls along in real time.
85
  6. **Node ↔ code linking:** hover a node to highlight its source lines, click a node to jump-and-edit them, or move your cursor over a line to light up the matching node.
86
  7. Every generation is captured as a structured **agent trace** (`/traces`).
87
 
88
+ ## 🎛️ Fine-Tuning
89
+
90
+ CodeFlow runs a [**LoRA fine-tune**][model] of **Qwen3-Coder-30B-A3B-Instruct** (≈30.5B params), specialized for the code → Mermaid + `<linemap>` task rather than relying on the base model's general coding ability.
91
+
92
+ - **Data:** **2,400 synthetic examples** (2,208 train / 192 val — 8% holdout), built from **22 control-flow templates** across **Python, JavaScript, C++, and C**.
93
+ - **Method:** LoRA `r=16, α=32` on the attention + MLP projections, **bf16**, cosine schedule — then merged and exported to a **Q3_K_L GGUF** for CPU inference.
94
+ - **Validation:** the holdout is **hard-validated** — generated outputs are syntax-checked / compiled, not just eyeballed.
95
+
96
+ See the [model card][model] for the full data engine, `finetune.py` options, and dataset preview.
97
+
98
  ## 🧰 Tech Stack
99
 
100
  | Layer | What it is | Used for |
101
  |---|---|---|
102
+ | **Model** | [**CodeFlow fine-tune**][model] of [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen) (Mixture-of-Experts) | Code → Mermaid + line-map generation |
103
+ | **Fine-tuning** | LoRA SFT (`r=16, α=32`) on attention + MLP projections, merged to GGUF | Specializes the base model for the code Mermaid + line-map task |
104
+ | **Quantization** | **Q3_K_L** GGUF (~3-bit) | Shrinks the 30B model to run on CPU |
105
  | **Inference** | [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (llama.cpp) | Local CPU inference (`n_ctx=4096`) |
106
  | **Model fetch** | `huggingface_hub` | Downloads the GGUF on first run |
107
  | **Server** | [Gradio](https://www.gradio.app/) `gr.Server` + FastAPI | `/generate_flowchart` API, `/` UI, `/traces` |
 
117
 
118
  ## 🔢 Total Parameters
119
 
120
+ CodeFlow is driven by a [**LoRA fine-tune**][model] of **Qwen3-Coder-30B-A3B-Instruct** — a **Mixture-of-Experts** model with:
121
 
122
+ - **≈ 30.5 billion total parameters** (well under the 32B cap)
123
  - **≈ 3.3 billion active parameters per token** (128 experts, 8 activated)
124
 
125
+ It's served as a **~3-bit (Q3_K_L) GGUF**, which compresses those 30B weights to a CPU-runnable footprint (~13 GB on disk) — letting a 30B-class model generate diagrams **off the grid**, with no GPU and no external API.
126
 
127
+ ## 🏅 Badges (6 / 6)
128
 
129
  These map to the Space tags above.
130
 
 
135
  | 📓 **Field Notes** | See the [blog post][blog]. |
136
  | 🤝 **Sharing is Caring** | Open-source under **MIT**, a public Space, plus a [social post][social] sharing the process and learnings. |
137
  | 🤖 **Agentic** | Every model generation is captured as a structured agent trace (input code, the model's reasoning, output, token usage, latency), downloadable at [`/traces`][traces]. |
138
+ | 🎛️ **Well-Tuned** | A [**LoRA fine-tune**][model] of Qwen3-Coder-30B-A3B-Instruct (**≈30.5B params — under the 32B cap**), specialized for the code → Mermaid + `<linemap>` task and shipped as the GGUF the Space actually runs. |
139
 
140
  ## 🎥 Demo
141
 
 
201
 
202
  ## 🙏 Credits
203
 
204
+ - **Model:** [CodeFlow fine-tune][model] of [Qwen3-Coder](https://huggingface.co/Qwen) (Qwen Team, Alibaba), built with [Unsloth](https://huggingface.co/unsloth).
205
  - **Inference:** [llama.cpp](https://github.com/ggml-org/llama.cpp) via [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (Andrei Betlen).
206
  - **App framework:** [Gradio](https://www.gradio.app/) (Hugging Face).
207
  - **Diagrams:** [Mermaid.js](https://mermaid.js.org/) · **Editor:** [CodeMirror](https://codemirror.net/).