Instructions to use MrEngineer/ClinIQ-Edge-gemma-4-e4b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MrEngineer/ClinIQ-Edge-gemma-4-e4b-it with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MrEngineer/ClinIQ-Edge-gemma-4-e4b-it",
	filename="gguf_gguf/model_weights.BF16-mmproj.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use MrEngineer/ClinIQ-Edge-gemma-4-e4b-it with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16
# Run inference directly in the terminal:
llama-cli -hf MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16
# Run inference directly in the terminal:
llama-cli -hf MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16
# Run inference directly in the terminal:
./llama-cli -hf MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16

Use Docker

docker model run hf.co/MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16

LM Studio
Jan
Ollama
How to use MrEngineer/ClinIQ-Edge-gemma-4-e4b-it with Ollama:
```
ollama run hf.co/MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16
```

Unsloth Studio new

How to use MrEngineer/ClinIQ-Edge-gemma-4-e4b-it with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MrEngineer/ClinIQ-Edge-gemma-4-e4b-it to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MrEngineer/ClinIQ-Edge-gemma-4-e4b-it to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MrEngineer/ClinIQ-Edge-gemma-4-e4b-it to start chatting

Docker Model Runner
How to use MrEngineer/ClinIQ-Edge-gemma-4-e4b-it with Docker Model Runner:
```
docker model run hf.co/MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16
```

Lemonade

How to use MrEngineer/ClinIQ-Edge-gemma-4-e4b-it with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16

Run and chat with the model

lemonade run user.ClinIQ-Edge-gemma-4-e4b-it-BF16

List all available models

lemonade list

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

ClinIQ Edge — Medical Fine-tuning with Gemma 4 E4B

ClinIQ Edge is a highly optimized, state-of-the-art medical language model fine-tuned from the Google Gemma 4 E4B base model. This repository contains the complete pipeline used to train the model, specifically engineered to maximize cost-efficiency and bypass infrastructure bottlenecks by splitting the workflow across Lightning.ai and Modal.

🧠 Model Overview

Base Model: google/gemma-4-e4b-it (Multimodal, 6B parameters)
Optimization: Unsloth QLoRA (4-bit quantization, bfloat16 compute)
Final Format: GGUF (Q4_K_M) for local inference via Ollama
Training Dataset: 34,300 curated medical examples (MedMCQA, MedQA, Wikidoc)
Epochs: 2 (8,576 total steps)

🏗️ Architecture & Training Strategy

To train a model of this scale cost-effectively, we separated the pipeline into two distinct phases. This allowed us to leverage free CPU resources for network-heavy data processing, reserving expensive GPU time strictly for compute.

Phase 1: Data Acquisition (Lightning.ai)

To conserve funds, we utilized the free Lightning.ai (10 free credits) CPU studio for Phase 1 (phase1_download.py).

We downloaded the massive 6B parameter base model weights and all Hugging Face medical datasets (MedMCQA, MedQA-USMLE, Wikidoc) directly to local storage.
Once downloaded, these assets were uploaded to a persistent Modal Volume (cliniq-edge-volume). This completely eliminated network dependency and download times for the subsequent GPU phase.

Phase 2: High-Performance Compute (Modal)

For the actual fine-tuning, we deployed the training script (phase2_train.py wrapped in train_modal.py) to Modal.com, provisioning a high-end NVIDIA RTX PRO 6000 (Blackwell Server Edition) GPU.

By attaching the pre-populated cliniq-edge-volume, the script bypassed all network overhead and loaded data directly from disk.
We utilized Unsloth's 2x faster fine-tuning framework. Because Gemma 4 is a cutting-edge multimodal model, we implemented custom monkey-patches to resolve PEFT adapter injection compatibility (Gemma4ClippableLinear target modules) and processor positional argument mapping bugs.
Training executed with a batch size of 8 (BS 2 x 4 gradient accumulation) and successfully resumed from checkpoints when necessary.

📊 Results & Benchmarks

The model was rigorously evaluated immediately after training completed.

Final Training Loss: 0.1623
Evaluation Benchmark: MedQA USMLE (United States Medical Licensing Examination) 4-choice questions.
Accuracy: 24.5% (49 / 200 correct) on the zero-shot unseen validation set.

While USMLE is an extremely challenging benchmark (random guessing is 25%), the model demonstrated a strong reduction in training loss and successfully internalized the formatting and structure of complex clinical vignettes.

🚀 Running Locally with Ollama

The final output of the pipeline is a highly compressed Q4_K_M GGUF file. The model weights and a custom Modelfile have been automatically generated.

To run ClinIQ Edge locally on your laptop:

Install Ollama.
Navigate to the cliniq-edge/output/ directory containing the GGUF files.
Build and run the model:

ollama create cliniq-edge -f output/Modelfile
ollama run cliniq-edge

📂 Project Structure

phase1_download.py — Pipeline script for downloading Hugging Face models and datasets on CPU environments.
phase2_train.py — Core Unsloth QLoRA training script with custom MedQA evaluation and checkpoint resumption logic.
train_modal.py — Modal deployment wrapper that containerizes Phase 2, injects dependencies (including llama.cpp requirements), and orchestrates the GPU volume mounts.

Downloads last month: 95

GGUF

Model size

8B params

Architecture

gemma4

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support