Instructions to use MrEngineer/ClinIQ-Edge-gemma-4-e4b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use MrEngineer/ClinIQ-Edge-gemma-4-e4b-it with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="MrEngineer/ClinIQ-Edge-gemma-4-e4b-it", filename="gguf_gguf/model_weights.BF16-mmproj.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use MrEngineer/ClinIQ-Edge-gemma-4-e4b-it with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16 # Run inference directly in the terminal: llama-cli -hf MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16 # Run inference directly in the terminal: llama-cli -hf MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16 # Run inference directly in the terminal: ./llama-cli -hf MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16
Use Docker
docker model run hf.co/MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16
- LM Studio
- Jan
- Ollama
How to use MrEngineer/ClinIQ-Edge-gemma-4-e4b-it with Ollama:
ollama run hf.co/MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16
- Unsloth Studio new
How to use MrEngineer/ClinIQ-Edge-gemma-4-e4b-it with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for MrEngineer/ClinIQ-Edge-gemma-4-e4b-it to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for MrEngineer/ClinIQ-Edge-gemma-4-e4b-it to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for MrEngineer/ClinIQ-Edge-gemma-4-e4b-it to start chatting
- Docker Model Runner
How to use MrEngineer/ClinIQ-Edge-gemma-4-e4b-it with Docker Model Runner:
docker model run hf.co/MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16
- Lemonade
How to use MrEngineer/ClinIQ-Edge-gemma-4-e4b-it with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull MrEngineer/ClinIQ-Edge-gemma-4-e4b-it:BF16
Run and chat with the model
lemonade run user.ClinIQ-Edge-gemma-4-e4b-it-BF16
List all available models
lemonade list
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
ClinIQ Edge β Medical Fine-tuning with Gemma 4 E4B
ClinIQ Edge is a highly optimized, state-of-the-art medical language model fine-tuned from the Google Gemma 4 E4B base model. This repository contains the complete pipeline used to train the model, specifically engineered to maximize cost-efficiency and bypass infrastructure bottlenecks by splitting the workflow across Lightning.ai and Modal.
π§ Model Overview
- Base Model:
google/gemma-4-e4b-it(Multimodal, 6B parameters) - Optimization: Unsloth QLoRA (4-bit quantization, bfloat16 compute)
- Final Format: GGUF (
Q4_K_M) for local inference via Ollama - Training Dataset: 34,300 curated medical examples (MedMCQA, MedQA, Wikidoc)
- Epochs: 2 (8,576 total steps)
ποΈ Architecture & Training Strategy
To train a model of this scale cost-effectively, we separated the pipeline into two distinct phases. This allowed us to leverage free CPU resources for network-heavy data processing, reserving expensive GPU time strictly for compute.
Phase 1: Data Acquisition (Lightning.ai)
To conserve funds, we utilized the free Lightning.ai (10 free credits) CPU studio for Phase 1 (phase1_download.py).
- We downloaded the massive 6B parameter base model weights and all Hugging Face medical datasets (MedMCQA, MedQA-USMLE, Wikidoc) directly to local storage.
- Once downloaded, these assets were uploaded to a persistent Modal Volume (
cliniq-edge-volume). This completely eliminated network dependency and download times for the subsequent GPU phase.
Phase 2: High-Performance Compute (Modal)
For the actual fine-tuning, we deployed the training script (phase2_train.py wrapped in train_modal.py) to Modal.com, provisioning a high-end NVIDIA RTX PRO 6000 (Blackwell Server Edition) GPU.
- By attaching the pre-populated
cliniq-edge-volume, the script bypassed all network overhead and loaded data directly from disk. - We utilized Unsloth's 2x faster fine-tuning framework. Because Gemma 4 is a cutting-edge multimodal model, we implemented custom monkey-patches to resolve PEFT adapter injection compatibility (
Gemma4ClippableLineartarget modules) and processor positional argument mapping bugs. - Training executed with a batch size of 8 (BS 2 x 4 gradient accumulation) and successfully resumed from checkpoints when necessary.
π Results & Benchmarks
The model was rigorously evaluated immediately after training completed.
- Final Training Loss:
0.1623 - Evaluation Benchmark: MedQA USMLE (United States Medical Licensing Examination) 4-choice questions.
- Accuracy: 24.5% (49 / 200 correct) on the zero-shot unseen validation set.
While USMLE is an extremely challenging benchmark (random guessing is 25%), the model demonstrated a strong reduction in training loss and successfully internalized the formatting and structure of complex clinical vignettes.
π Running Locally with Ollama
The final output of the pipeline is a highly compressed Q4_K_M GGUF file. The model weights and a custom Modelfile have been automatically generated.
To run ClinIQ Edge locally on your laptop:
- Install Ollama.
- Navigate to the
cliniq-edge/output/directory containing the GGUF files. - Build and run the model:
ollama create cliniq-edge -f output/Modelfile
ollama run cliniq-edge
π Project Structure
phase1_download.pyβ Pipeline script for downloading Hugging Face models and datasets on CPU environments.phase2_train.pyβ Core Unsloth QLoRA training script with custom MedQA evaluation and checkpoint resumption logic.train_modal.pyβ Modal deployment wrapper that containerizes Phase 2, injects dependencies (includingllama.cpprequirements), and orchestrates the GPU volume mounts.
- Downloads last month
- 95
4-bit