YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

ClinIQ Edge β€” Medical Fine-tuning with Gemma 4 E4B

ClinIQ Edge is a highly optimized, state-of-the-art medical language model fine-tuned from the Google Gemma 4 E4B base model. This repository contains the complete pipeline used to train the model, specifically engineered to maximize cost-efficiency and bypass infrastructure bottlenecks by splitting the workflow across Lightning.ai and Modal.

🧠 Model Overview

  • Base Model: google/gemma-4-e4b-it (Multimodal, 6B parameters)
  • Optimization: Unsloth QLoRA (4-bit quantization, bfloat16 compute)
  • Final Format: GGUF (Q4_K_M) for local inference via Ollama
  • Training Dataset: 34,300 curated medical examples (MedMCQA, MedQA, Wikidoc)
  • Epochs: 2 (8,576 total steps)

πŸ—οΈ Architecture & Training Strategy

To train a model of this scale cost-effectively, we separated the pipeline into two distinct phases. This allowed us to leverage free CPU resources for network-heavy data processing, reserving expensive GPU time strictly for compute.

Phase 1: Data Acquisition (Lightning.ai)

To conserve funds, we utilized the free Lightning.ai (10 free credits) CPU studio for Phase 1 (phase1_download.py).

  • We downloaded the massive 6B parameter base model weights and all Hugging Face medical datasets (MedMCQA, MedQA-USMLE, Wikidoc) directly to local storage.
  • Once downloaded, these assets were uploaded to a persistent Modal Volume (cliniq-edge-volume). This completely eliminated network dependency and download times for the subsequent GPU phase.

Phase 2: High-Performance Compute (Modal)

For the actual fine-tuning, we deployed the training script (phase2_train.py wrapped in train_modal.py) to Modal.com, provisioning a high-end NVIDIA RTX PRO 6000 (Blackwell Server Edition) GPU.

  • By attaching the pre-populated cliniq-edge-volume, the script bypassed all network overhead and loaded data directly from disk.
  • We utilized Unsloth's 2x faster fine-tuning framework. Because Gemma 4 is a cutting-edge multimodal model, we implemented custom monkey-patches to resolve PEFT adapter injection compatibility (Gemma4ClippableLinear target modules) and processor positional argument mapping bugs.
  • Training executed with a batch size of 8 (BS 2 x 4 gradient accumulation) and successfully resumed from checkpoints when necessary.

πŸ“Š Results & Benchmarks

The model was rigorously evaluated immediately after training completed.

  • Final Training Loss: 0.1623
  • Evaluation Benchmark: MedQA USMLE (United States Medical Licensing Examination) 4-choice questions.
  • Accuracy: 24.5% (49 / 200 correct) on the zero-shot unseen validation set.

While USMLE is an extremely challenging benchmark (random guessing is 25%), the model demonstrated a strong reduction in training loss and successfully internalized the formatting and structure of complex clinical vignettes.

πŸš€ Running Locally with Ollama

The final output of the pipeline is a highly compressed Q4_K_M GGUF file. The model weights and a custom Modelfile have been automatically generated.

To run ClinIQ Edge locally on your laptop:

  1. Install Ollama.
  2. Navigate to the cliniq-edge/output/ directory containing the GGUF files.
  3. Build and run the model:
ollama create cliniq-edge -f output/Modelfile
ollama run cliniq-edge

πŸ“‚ Project Structure

  • phase1_download.py β€” Pipeline script for downloading Hugging Face models and datasets on CPU environments.
  • phase2_train.py β€” Core Unsloth QLoRA training script with custom MedQA evaluation and checkpoint resumption logic.
  • train_modal.py β€” Modal deployment wrapper that containerizes Phase 2, injects dependencies (including llama.cpp requirements), and orchestrates the GPU volume mounts.
Downloads last month
95
GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support