| --- |
| language: |
| - en |
| - he |
| license: apache-2.0 |
| library_name: transformers |
| pipeline_tag: text-generation |
| base_model: unsloth/gemma-4-E4B-it |
| datasets: |
| - BrainboxAI/code-training-il |
| - nvidia/OpenCodeInstruct |
| - bleugreen/typescript-instruct |
| tags: |
| - code |
| - python |
| - typescript |
| - coding-assistant |
| - gguf |
| - llama.cpp |
| - ollama |
| - unsloth |
| - gemma4 |
| - qlora |
| - text-generation |
| - on-device |
| - private-first |
| pretty_name: Code-IL E4B (Local Coding Assistant) |
| model-index: |
| - name: code-il-E4B |
| results: [] |
| --- |
| |
| # Code-IL E4B |
|
|
| **A 4B-parameter coding assistant for Python and TypeScript — runs entirely on-device, no code ever leaves your machine.** |
|
|
| [](https://huggingface.co/BrainboxAI/code-il-E4B) |
| [](https://huggingface.co/datasets/BrainboxAI/code-training-il) |
| [](https://huggingface.co/BrainboxAI/code-il-E4B-safetensors) |
| [](https://www.apache.org/licenses/LICENSE-2.0) |
|
|
| --- |
|
|
| ## Model overview |
|
|
| `code-il-E4B` is a 4-billion-parameter coding assistant fine-tuned from Google's Gemma-4 E4B. It is trained on a curated set of Python and TypeScript instruction pairs — filtered by test-pass rate — plus a small hand-written bilingual (Hebrew / English) identity set. |
|
|
| The entire model is 4 GB in GGUF Q4_K_M form. It runs on: |
| - A modern laptop CPU (slower but functional) |
| - Any consumer GPU with 6 GB+ VRAM |
| - Apple Silicon via llama.cpp Metal |
|
|
| No API. No telemetry. No data leaving the developer's machine. |
|
|
| ## Why this exists |
|
|
| Every keystroke sent to a cloud coding assistant is a potential data-leak event. For companies building proprietary systems — especially in regulated industries like finance, healthcare, and defense — this is not acceptable. |
|
|
| `code-il-E4B` is the private alternative: a model small enough to run locally, tuned specifically for the two languages most companies actually write in. |
|
|
| It is not competing with Claude Sonnet or GPT-4o on raw capability. It is offering something different: the option to get useful AI assistance without a network connection. |
|
|
| ## Intended use |
|
|
| **Primary use cases:** |
| - Local code completion and review in regulated environments |
| - On-prem deployment for companies with strict data-residency rules |
| - Pair-programming for developers with unreliable internet |
| - Integration into internal developer tooling that cannot call external APIs |
| - Hebrew-speaking developer onboarding (model responds in Hebrew on request) |
|
|
| **Out-of-scope uses:** |
| - Replacement for frontier models on complex architecture tasks |
| - Production code generation without human review |
| - Languages other than Python / TypeScript (coverage is minimal) |
| - Fine-tuning tasks requiring >4B parameters of capacity |
|
|
| ## How to use |
|
|
| ### Ollama |
|
|
| ```bash |
| ollama pull hf.co/BrainboxAI/code-il-E4B:Q4_K_M |
| ollama run hf.co/BrainboxAI/code-il-E4B:Q4_K_M |
| ``` |
|
|
| ### llama.cpp |
|
|
| ```bash |
| ./llama-cli -m code-il-E4B.Q4_K_M.gguf \ |
| -p "Write a Python function that parses ISO-8601 dates with timezones." \ |
| --temp 0.2 --top-p 0.95 -n 1024 |
| ``` |
|
|
| ### Python (transformers) |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| |
| tokenizer = AutoTokenizer.from_pretrained("BrainboxAI/code-il-E4B-safetensors") |
| model = AutoModelForCausalLM.from_pretrained( |
| "BrainboxAI/code-il-E4B-safetensors", |
| torch_dtype="auto", |
| device_map="auto", |
| ) |
| |
| messages = [ |
| {"role": "user", "content": "Implement binary search in TypeScript with full edge-case handling."}, |
| ] |
| inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True) |
| outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.2, top_p=0.95) |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| ``` |
|
|
| ### Recommended generation parameters |
|
|
| | Parameter | Value | Rationale | |
| |-----------|-------|-----------| |
| | `temperature` | 0.2 | Low creativity for deterministic code | |
| | `top_p` | 0.95 | Slightly higher than legal model to allow idiom variety | |
| | `max_new_tokens` | 1024 | Enough for most function-level completions | |
| | `repetition_penalty` | 1.0 | Penalizing repetition hurts code structure | |
|
|
|
|
| ### Recommended System Prompt: Semi-Formal Reasoning |
|
|
| This 4B model produces dramatically better code when forced to think through 5 explicit steps before writing. Free-form prompts often produce code that compiles but fails on edge cases, missing tests, or hidden bugs. |
|
|
| **Why this matters:** Small coding models tend to skip the "thinking" phase and jump straight to code. The semi-formal reasoning template forces the model to do what a senior engineer does: understand the problem, enumerate edge cases, write the code, define tests, then honestly disclose what could break. |
|
|
| #### The 5 Reasoning Steps |
|
|
| 1. **Problem Understanding** - restate the requirement, identify ambiguities |
| 2. **Edge Cases and Constraints** - enumerate what could go wrong before coding |
| 3. **Implementation** - the actual code, with inline comments only where needed |
| 4. **Tests** - concrete test cases covering happy path + edge cases |
| 5. **Known Limitations** - what this code does NOT handle, dependencies, assumptions |
|
|
| #### The System Prompt (copy as-is) |
|
|
| ```text |
| DEFINITIONS: |
| success: Working code that handles the stated requirement plus enumerated edge cases, includes tests proving correctness, and honestly discloses what is out of scope. No invented APIs, no hallucinated library functions. |
| scope: in-scope - Python and TypeScript code (functions, classes, modules), code review, refactoring, debugging, test writing, algorithm implementation. out-of-scope - Languages other than Python/TypeScript (model is weak there), full-application architecture, infrastructure design, code that requires runtime testing the model cannot perform. |
| hallucination risk: This model was trained on public code with a cutoff in early 2026. Library APIs change. The model may invent function signatures that do not exist. Every API call must either be from a stable, well-known library OR explicitly marked as "verify in docs." |
| edge case: A specific input value or condition that breaks naive implementations - empty inputs, null/None, single-element collections, duplicates, boundary values (0, MAX_INT, negative numbers), Unicode/encoding issues, concurrent access, etc. |
| |
| PREMISES: |
| - The user is a developer, not a beginner. Skip basic explanations of what a function or loop is. |
| - The model is 4B parameters - capable for function-level work but not for full systems. |
| - Code that "looks right" but fails silently is worse than code with a clear error. Prefer fail-fast. |
| - Tests are not optional. Code without tests is a draft, not a deliverable. |
| - User can speak Hebrew or English. Code stays in English. Comments match the user input language. |
| |
| REQUIREMENTS: |
| 1. Every code response must include all 5 sections: Problem Understanding, Edge Cases, Implementation, Tests, Known Limitations. No exceptions. |
| 2. Implementation must compile/parse cleanly. No pseudo-code unless explicitly requested. |
| 3. Use only standard library or widely-known third-party libraries. If using a non-standard library, mark it: "# Requires: pip install <package>". |
| 4. Never invent function signatures. If unsure whether a function exists, write: "# Verify signature in docs: <library>.<function>". |
| 5. Tests must be runnable as-is. Use unittest/pytest for Python, jest/vitest for TypeScript. |
| 6. Edge cases section must list at minimum 3 concrete cases the code handles, plus 1 case it does NOT handle (with rationale). |
| 7. Known Limitations must be honest. Do not write "this is production-ready" unless every edge case is handled and tested. |
| 8. Forbidden: silent error handling. No bare `except:` in Python. No empty catch blocks in TypeScript. |
| 9. Forbidden: code that mutates global state without explicit declaration. |
| 10. If the user asks a question that requires runtime testing (performance, integration with their specific environment), respond with the code + clear instructions on how to test it locally. |
| |
| EDGE_CASES: |
| - User asks for code in a language other than Python/TypeScript -> "I am specialized for Python and TypeScript. For <language>, the logic is similar but I cannot guarantee idiomatic syntax. Here is the equivalent in Python:" + provide Python version. |
| - User provides incomplete requirements -> Ask 1-2 clarifying questions before writing code. Do not assume. |
| - User asks for code that depends on a library released after training cutoff -> "I am unsure about <library> v<X>. Here is the implementation pattern; verify the exact API in current docs." |
| - User asks "is this code correct?" -> Walk through the 5-step analysis on their code, not yours. Apply the same rigor. |
| - User asks for "the fastest" or "the best" implementation -> Provide the most readable correct version first, then a note: "For higher performance, consider <approach>" with rationale. |
| - User asks for code that handles secrets, auth, or crypto -> Add a "Security Note" subsection in Known Limitations. Recommend audited libraries (passlib, cryptography, etc.). Never invent crypto. |
| - Hebrew question with technical term in English -> Respond in Hebrew, keep variable names and library names in English. |
| - User asks for "quick and dirty" code -> Still include the 5 sections, but mark Edge Cases and Tests as minimal: "# Quick prototype - not production. Edge cases: <list>. Test manually with: <example>." |
| |
| OUTPUT_FORMAT: |
| format: Structured markdown with the 5 numbered sections, code in fenced blocks |
| structure: | |
| ## 1. Problem Understanding |
| [Restate the requirement in 1-2 sentences. Note any ambiguities.] |
| |
| ## 2. Edge Cases and Constraints |
| Handles: |
| - [edge case 1] |
| - [edge case 2] |
| - [edge case 3] |
| |
| Does NOT handle: |
| - [out-of-scope case + rationale] |
| |
| ## 3. Implementation |
| ```<language> |
| // Clean code. Comments only where the WHY is non-obvious. |
| ``` |
| |
| ## 4. Tests |
| ```<language> |
| // Runnable tests covering edge cases above |
| ``` |
| |
| ## 5. Known Limitations |
| - [What this does not handle] |
| - [Dependencies and version assumptions] |
| - [When you would need to extend this] |
| language: Match user input language (Hebrew or English) for explanations. Code, variable names, and library names stay in English. |
| length: 200-800 lines depending on task complexity. Refuse to write monolithic 2000-line responses - break into modules. |
| |
| VERIFICATION: |
| - Are all 5 sections present and labeled? |
| - Does the implementation parse cleanly (no obvious syntax errors)? |
| - Are tests runnable (correct imports, proper structure)? |
| - Are at least 3 edge cases enumerated? |
| - Is at least 1 limitation honestly disclosed? |
| - regression check: No "production-ready" claims unless edge cases match limitations. |
| ``` |
| |
| #### Usage Example with the System Prompt |
| |
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
| tokenizer = AutoTokenizer.from_pretrained("BrainboxAI/code-il-E4B-safetensors") |
| model = AutoModelForCausalLM.from_pretrained( |
| "BrainboxAI/code-il-E4B-safetensors", |
| torch_dtype="auto", |
| device_map="auto", |
| ) |
| |
| # Paste the full DEFINITIONS/PREMISES/REQUIREMENTS prompt above |
| SYSTEM_PROMPT = """[paste the full prompt from the code block above]""" |
| |
| messages = [ |
| {"role": "system", "content": SYSTEM_PROMPT}, |
| {"role": "user", "content": "Implement binary search in Python with full edge case handling."}, |
| ] |
| |
| inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True) |
| outputs = model.generate(inputs, max_new_tokens=1500, temperature=0.2, top_p=0.95) |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| ``` |
| |
| #### Customization |
| |
| - Want code-only output (no explanation)? Replace `OUTPUT_FORMAT` with: "Code blocks only. Comments inside code for any analysis. No prose sections." |
| - Building a code review tool? Add to `REQUIREMENTS`: "When reviewing user code, output in diff format showing exact changes." |
| - Need TypeScript-only output? Add to `REQUIREMENTS`: "Always respond in TypeScript. If the user asks for Python, translate to TypeScript with type annotations." |
| - Working on a security-sensitive codebase? Add a section #6 to `OUTPUT_FORMAT`: "Security Review" listing OWASP-relevant risks in the implementation. |
| |
| |
| ## Training details |
| |
| | Attribute | Value | |
| |-----------|-------| |
| | **Base model** | [unsloth/gemma-4-E4B-it](https://huggingface.co/unsloth/gemma-4-E4B-it) | |
| | **Method** | QLoRA (4-bit quantization during training) | |
| | **LoRA rank (r)** | 64 | |
| | **LoRA alpha** | 128 | |
| | **Training data size** | 40,000 curated examples | |
| | **Train / validation split** | 95% / 5%, seed 3407 | |
| | **Hardware** | NVIDIA RTX 5090 (RunPod) | |
| | **Framework** | Unsloth Studio | |
| |
| ### Dataset composition (40,330 examples) |
| |
| | Source | Count | Content | |
| |--------|-------|---------| |
| | [OpenCodeInstruct (NVIDIA)](https://huggingface.co/datasets/nvidia/OpenCodeInstruct) | 20,000 | Python — filtered to examples with test-pass rate > 50% | |
| | [typescript-instruct (bleugreen)](https://huggingface.co/datasets/bleugreen/typescript-instruct) | 20,000 | TypeScript instruction pairs | |
| | Hand-written identity set | 330 | Hebrew + English, BrainboxAI persona | |
| |
| The filtering pass on OpenCodeInstruct was the single biggest quality lever. Dropping low-test-pass examples improved downstream evaluation significantly compared to training on the full corpus. |
| |
| See the [dataset card](https://huggingface.co/datasets/BrainboxAI/code-training-il) for full details. |
| |
| ## Evaluation |
| |
| Internal evaluation on structured coding tasks: |
| |
| | Task | Examples | Passed | Notes | |
| |------|----------|--------|-------| |
| | **FizzBuzz** (via agentic loop) | 5 | 5/5 | Solved in 6 steps, zero correction rounds | |
| | **Binary search with 11 edge cases** | 11 | 11/11 | Including leftmost-duplicate handling | |
| |
| Formal HumanEval / MBPP benchmarks have not yet been run publicly. Evaluation work is ongoing. |
| |
| ## Limitations |
| |
| - **Small model.** 4B parameters is not frontier-capability. Expect mistakes on complex architectural questions and long-context reasoning. |
| - **Two languages.** Strong on Python and TypeScript; weak on other languages. |
| - **No tool use out of the box.** The base model supports chat-style interaction; agentic tool use requires integration work. |
| - **Training cutoff.** Libraries and frameworks introduced after the training data was collected (early 2026) are unknown to the model. |
| - **Hallucination risk.** Like all LLMs, `code-il-E4B` can produce plausible-looking code that does not compile or does not work. Always test. |
| |
| ## Formats available |
| |
| - [**GGUF Q4_K_M** (~4 GB)](https://huggingface.co/BrainboxAI/code-il-E4B) — for Ollama, llama.cpp, LM Studio |
| - [**Safetensors 16-bit**](https://huggingface.co/BrainboxAI/code-il-E4B-safetensors) — for further fine-tuning, HF transformers |
| |
| ## License |
| |
| Apache 2.0. Use commercially, modify, and redistribute with attribution. |
| |
| ## Citation |
| |
| ```bibtex |
| @misc{elyasi2026codeil, |
| title = {Code-IL E4B: A Small, On-Device Coding Assistant for Private Environments}, |
| author = {Elyasi, Netanel}, |
| year = {2026}, |
| publisher = {BrainboxAI}, |
| howpublished = {\url{https://huggingface.co/BrainboxAI/code-il-E4B}}, |
| note = {Fine-tuned from unsloth/gemma-4-E4B-it} |
| } |
| ``` |
| |
| ## Author |
| |
| Built by [**Netanel Elyasi**](https://huggingface.co/BrainboxAI), founder of [BrainboxAI](https://brainboxai.io) — applied-AI studio focused on small, private, domain-specialized models. |
| |
| For custom coding-model fine-tuning on private company codebases, contact: **netanele@brainboxai.io**. |
| |
| --- |
| |
| *Part of the BrainboxAI family of on-device models — see also [`law-il-E2B`](https://huggingface.co/BrainboxAI/law-il-E2B) (legal) and [`cyber-analyst-4B`](https://huggingface.co/BrainboxAI/cyber-analyst-4B) (security).* |
| |