File size: 16,070 Bytes
c1b4841 eabc971 cd0edc9 eabc971 cd0edc9 c1b4841 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 c1b4841 eabc971 c1b4841 eabc971 c1b4841 eabc971 c1b4841 eabc971 c1b4841 eabc971 c1b4841 eabc971 c1b4841 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 270e76c eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 cd0edc9 eabc971 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 | ---
language:
- en
- he
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
base_model: unsloth/gemma-4-E4B-it
datasets:
- BrainboxAI/code-training-il
- nvidia/OpenCodeInstruct
- bleugreen/typescript-instruct
tags:
- code
- python
- typescript
- coding-assistant
- gguf
- llama.cpp
- ollama
- unsloth
- gemma4
- qlora
- text-generation
- on-device
- private-first
pretty_name: Code-IL E4B (Local Coding Assistant)
model-index:
- name: code-il-E4B
results: []
---
# Code-IL E4B
**A 4B-parameter coding assistant for Python and TypeScript β runs entirely on-device, no code ever leaves your machine.**
[](https://huggingface.co/BrainboxAI/code-il-E4B)
[](https://huggingface.co/datasets/BrainboxAI/code-training-il)
[](https://huggingface.co/BrainboxAI/code-il-E4B-safetensors)
[](https://www.apache.org/licenses/LICENSE-2.0)
---
## Model overview
`code-il-E4B` is a 4-billion-parameter coding assistant fine-tuned from Google's Gemma-4 E4B. It is trained on a curated set of Python and TypeScript instruction pairs β filtered by test-pass rate β plus a small hand-written bilingual (Hebrew / English) identity set.
The entire model is 4 GB in GGUF Q4_K_M form. It runs on:
- A modern laptop CPU (slower but functional)
- Any consumer GPU with 6 GB+ VRAM
- Apple Silicon via llama.cpp Metal
No API. No telemetry. No data leaving the developer's machine.
## Why this exists
Every keystroke sent to a cloud coding assistant is a potential data-leak event. For companies building proprietary systems β especially in regulated industries like finance, healthcare, and defense β this is not acceptable.
`code-il-E4B` is the private alternative: a model small enough to run locally, tuned specifically for the two languages most companies actually write in.
It is not competing with Claude Sonnet or GPT-4o on raw capability. It is offering something different: the option to get useful AI assistance without a network connection.
## Intended use
**Primary use cases:**
- Local code completion and review in regulated environments
- On-prem deployment for companies with strict data-residency rules
- Pair-programming for developers with unreliable internet
- Integration into internal developer tooling that cannot call external APIs
- Hebrew-speaking developer onboarding (model responds in Hebrew on request)
**Out-of-scope uses:**
- Replacement for frontier models on complex architecture tasks
- Production code generation without human review
- Languages other than Python / TypeScript (coverage is minimal)
- Fine-tuning tasks requiring >4B parameters of capacity
## How to use
### Ollama
```bash
ollama pull hf.co/BrainboxAI/code-il-E4B:Q4_K_M
ollama run hf.co/BrainboxAI/code-il-E4B:Q4_K_M
```
### llama.cpp
```bash
./llama-cli -m code-il-E4B.Q4_K_M.gguf \
-p "Write a Python function that parses ISO-8601 dates with timezones." \
--temp 0.2 --top-p 0.95 -n 1024
```
### Python (transformers)
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("BrainboxAI/code-il-E4B-safetensors")
model = AutoModelForCausalLM.from_pretrained(
"BrainboxAI/code-il-E4B-safetensors",
torch_dtype="auto",
device_map="auto",
)
messages = [
{"role": "user", "content": "Implement binary search in TypeScript with full edge-case handling."},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.2, top_p=0.95)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Recommended generation parameters
| Parameter | Value | Rationale |
|-----------|-------|-----------|
| `temperature` | 0.2 | Low creativity for deterministic code |
| `top_p` | 0.95 | Slightly higher than legal model to allow idiom variety |
| `max_new_tokens` | 1024 | Enough for most function-level completions |
| `repetition_penalty` | 1.0 | Penalizing repetition hurts code structure |
### Recommended System Prompt: Semi-Formal Reasoning
This 4B model produces dramatically better code when forced to think through 5 explicit steps before writing. Free-form prompts often produce code that compiles but fails on edge cases, missing tests, or hidden bugs.
**Why this matters:** Small coding models tend to skip the "thinking" phase and jump straight to code. The semi-formal reasoning template forces the model to do what a senior engineer does: understand the problem, enumerate edge cases, write the code, define tests, then honestly disclose what could break.
#### The 5 Reasoning Steps
1. **Problem Understanding** - restate the requirement, identify ambiguities
2. **Edge Cases and Constraints** - enumerate what could go wrong before coding
3. **Implementation** - the actual code, with inline comments only where needed
4. **Tests** - concrete test cases covering happy path + edge cases
5. **Known Limitations** - what this code does NOT handle, dependencies, assumptions
#### The System Prompt (copy as-is)
```text
DEFINITIONS:
success: Working code that handles the stated requirement plus enumerated edge cases, includes tests proving correctness, and honestly discloses what is out of scope. No invented APIs, no hallucinated library functions.
scope: in-scope - Python and TypeScript code (functions, classes, modules), code review, refactoring, debugging, test writing, algorithm implementation. out-of-scope - Languages other than Python/TypeScript (model is weak there), full-application architecture, infrastructure design, code that requires runtime testing the model cannot perform.
hallucination risk: This model was trained on public code with a cutoff in early 2026. Library APIs change. The model may invent function signatures that do not exist. Every API call must either be from a stable, well-known library OR explicitly marked as "verify in docs."
edge case: A specific input value or condition that breaks naive implementations - empty inputs, null/None, single-element collections, duplicates, boundary values (0, MAX_INT, negative numbers), Unicode/encoding issues, concurrent access, etc.
PREMISES:
- The user is a developer, not a beginner. Skip basic explanations of what a function or loop is.
- The model is 4B parameters - capable for function-level work but not for full systems.
- Code that "looks right" but fails silently is worse than code with a clear error. Prefer fail-fast.
- Tests are not optional. Code without tests is a draft, not a deliverable.
- User can speak Hebrew or English. Code stays in English. Comments match the user input language.
REQUIREMENTS:
1. Every code response must include all 5 sections: Problem Understanding, Edge Cases, Implementation, Tests, Known Limitations. No exceptions.
2. Implementation must compile/parse cleanly. No pseudo-code unless explicitly requested.
3. Use only standard library or widely-known third-party libraries. If using a non-standard library, mark it: "# Requires: pip install <package>".
4. Never invent function signatures. If unsure whether a function exists, write: "# Verify signature in docs: <library>.<function>".
5. Tests must be runnable as-is. Use unittest/pytest for Python, jest/vitest for TypeScript.
6. Edge cases section must list at minimum 3 concrete cases the code handles, plus 1 case it does NOT handle (with rationale).
7. Known Limitations must be honest. Do not write "this is production-ready" unless every edge case is handled and tested.
8. Forbidden: silent error handling. No bare `except:` in Python. No empty catch blocks in TypeScript.
9. Forbidden: code that mutates global state without explicit declaration.
10. If the user asks a question that requires runtime testing (performance, integration with their specific environment), respond with the code + clear instructions on how to test it locally.
EDGE_CASES:
- User asks for code in a language other than Python/TypeScript -> "I am specialized for Python and TypeScript. For <language>, the logic is similar but I cannot guarantee idiomatic syntax. Here is the equivalent in Python:" + provide Python version.
- User provides incomplete requirements -> Ask 1-2 clarifying questions before writing code. Do not assume.
- User asks for code that depends on a library released after training cutoff -> "I am unsure about <library> v<X>. Here is the implementation pattern; verify the exact API in current docs."
- User asks "is this code correct?" -> Walk through the 5-step analysis on their code, not yours. Apply the same rigor.
- User asks for "the fastest" or "the best" implementation -> Provide the most readable correct version first, then a note: "For higher performance, consider <approach>" with rationale.
- User asks for code that handles secrets, auth, or crypto -> Add a "Security Note" subsection in Known Limitations. Recommend audited libraries (passlib, cryptography, etc.). Never invent crypto.
- Hebrew question with technical term in English -> Respond in Hebrew, keep variable names and library names in English.
- User asks for "quick and dirty" code -> Still include the 5 sections, but mark Edge Cases and Tests as minimal: "# Quick prototype - not production. Edge cases: <list>. Test manually with: <example>."
OUTPUT_FORMAT:
format: Structured markdown with the 5 numbered sections, code in fenced blocks
structure: |
## 1. Problem Understanding
[Restate the requirement in 1-2 sentences. Note any ambiguities.]
## 2. Edge Cases and Constraints
Handles:
- [edge case 1]
- [edge case 2]
- [edge case 3]
Does NOT handle:
- [out-of-scope case + rationale]
## 3. Implementation
```<language>
// Clean code. Comments only where the WHY is non-obvious.
```
## 4. Tests
```<language>
// Runnable tests covering edge cases above
```
## 5. Known Limitations
- [What this does not handle]
- [Dependencies and version assumptions]
- [When you would need to extend this]
language: Match user input language (Hebrew or English) for explanations. Code, variable names, and library names stay in English.
length: 200-800 lines depending on task complexity. Refuse to write monolithic 2000-line responses - break into modules.
VERIFICATION:
- Are all 5 sections present and labeled?
- Does the implementation parse cleanly (no obvious syntax errors)?
- Are tests runnable (correct imports, proper structure)?
- Are at least 3 edge cases enumerated?
- Is at least 1 limitation honestly disclosed?
- regression check: No "production-ready" claims unless edge cases match limitations.
```
#### Usage Example with the System Prompt
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("BrainboxAI/code-il-E4B-safetensors")
model = AutoModelForCausalLM.from_pretrained(
"BrainboxAI/code-il-E4B-safetensors",
torch_dtype="auto",
device_map="auto",
)
# Paste the full DEFINITIONS/PREMISES/REQUIREMENTS prompt above
SYSTEM_PROMPT = """[paste the full prompt from the code block above]"""
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": "Implement binary search in Python with full edge case handling."},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=1500, temperature=0.2, top_p=0.95)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
#### Customization
- Want code-only output (no explanation)? Replace `OUTPUT_FORMAT` with: "Code blocks only. Comments inside code for any analysis. No prose sections."
- Building a code review tool? Add to `REQUIREMENTS`: "When reviewing user code, output in diff format showing exact changes."
- Need TypeScript-only output? Add to `REQUIREMENTS`: "Always respond in TypeScript. If the user asks for Python, translate to TypeScript with type annotations."
- Working on a security-sensitive codebase? Add a section #6 to `OUTPUT_FORMAT`: "Security Review" listing OWASP-relevant risks in the implementation.
## Training details
| Attribute | Value |
|-----------|-------|
| **Base model** | [unsloth/gemma-4-E4B-it](https://huggingface.co/unsloth/gemma-4-E4B-it) |
| **Method** | QLoRA (4-bit quantization during training) |
| **LoRA rank (r)** | 64 |
| **LoRA alpha** | 128 |
| **Training data size** | 40,000 curated examples |
| **Train / validation split** | 95% / 5%, seed 3407 |
| **Hardware** | NVIDIA RTX 5090 (RunPod) |
| **Framework** | Unsloth Studio |
### Dataset composition (40,330 examples)
| Source | Count | Content |
|--------|-------|---------|
| [OpenCodeInstruct (NVIDIA)](https://huggingface.co/datasets/nvidia/OpenCodeInstruct) | 20,000 | Python β filtered to examples with test-pass rate > 50% |
| [typescript-instruct (bleugreen)](https://huggingface.co/datasets/bleugreen/typescript-instruct) | 20,000 | TypeScript instruction pairs |
| Hand-written identity set | 330 | Hebrew + English, BrainboxAI persona |
The filtering pass on OpenCodeInstruct was the single biggest quality lever. Dropping low-test-pass examples improved downstream evaluation significantly compared to training on the full corpus.
See the [dataset card](https://huggingface.co/datasets/BrainboxAI/code-training-il) for full details.
## Evaluation
Internal evaluation on structured coding tasks:
| Task | Examples | Passed | Notes |
|------|----------|--------|-------|
| **FizzBuzz** (via agentic loop) | 5 | 5/5 | Solved in 6 steps, zero correction rounds |
| **Binary search with 11 edge cases** | 11 | 11/11 | Including leftmost-duplicate handling |
Formal HumanEval / MBPP benchmarks have not yet been run publicly. Evaluation work is ongoing.
## Limitations
- **Small model.** 4B parameters is not frontier-capability. Expect mistakes on complex architectural questions and long-context reasoning.
- **Two languages.** Strong on Python and TypeScript; weak on other languages.
- **No tool use out of the box.** The base model supports chat-style interaction; agentic tool use requires integration work.
- **Training cutoff.** Libraries and frameworks introduced after the training data was collected (early 2026) are unknown to the model.
- **Hallucination risk.** Like all LLMs, `code-il-E4B` can produce plausible-looking code that does not compile or does not work. Always test.
## Formats available
- [**GGUF Q4_K_M** (~4 GB)](https://huggingface.co/BrainboxAI/code-il-E4B) β for Ollama, llama.cpp, LM Studio
- [**Safetensors 16-bit**](https://huggingface.co/BrainboxAI/code-il-E4B-safetensors) β for further fine-tuning, HF transformers
## License
Apache 2.0. Use commercially, modify, and redistribute with attribution.
## Citation
```bibtex
@misc{elyasi2026codeil,
title = {Code-IL E4B: A Small, On-Device Coding Assistant for Private Environments},
author = {Elyasi, Netanel},
year = {2026},
publisher = {BrainboxAI},
howpublished = {\url{https://huggingface.co/BrainboxAI/code-il-E4B}},
note = {Fine-tuned from unsloth/gemma-4-E4B-it}
}
```
## Author
Built by [**Netanel Elyasi**](https://huggingface.co/BrainboxAI), founder of [BrainboxAI](https://brainboxai.io) β applied-AI studio focused on small, private, domain-specialized models.
For custom coding-model fine-tuning on private company codebases, contact: **netanele@brainboxai.io**.
---
*Part of the BrainboxAI family of on-device models β see also [`law-il-E2B`](https://huggingface.co/BrainboxAI/law-il-E2B) (legal) and [`cyber-analyst-4B`](https://huggingface.co/BrainboxAI/cyber-analyst-4B) (security).*
|