| | --- |
| | license: apple-amlr |
| | license_name: apple-ascl |
| | license_link: https://github.com/apple/ml-mobileclip/blob/main/LICENSE_weights_data |
| | library_name: mobileclip |
| | --- |
| | |
| | # ๐ธ MobileCLIP-B Zero-Shot Image Classifier |
| | ### Hugging Face Inference Endpoint |
| |
|
| | > **Production-ready wrapper** around Appleโs MobileCLIP-B checkpoint. |
| | > Handles image โ text similarity in a single fast call. |
| |
|
| | --- |
| |
|
| | ## ๐ Sidebar |
| |
|
| | - [Features](#-features) |
| | - [Repository layout](#-repository-layout) |
| | - [Quick start (local smoke-test)](#-quick-start-local-smoke-test) |
| | - [Calling the deployed endpoint](#-calling-the-deployed-endpoint) |
| | - [How it works](#-how-it-works) |
| | - [Updating the label set](#-updating-the-label-set) |
| | - [License](#-license) |
| |
|
| | --- |
| |
|
| | ## โจ Features |
| | | | This repo | |
| | |------------------------------|-----------| |
| | | **Model** | MobileCLIP-B (`datacompdr` checkpoint) | |
| | | **Branch fusion** | `reparameterize_model` baked in | |
| | | **Mixed-precision** | FP16 on GPU, FP32 on CPU | |
| | | **Pre-computed text feats** | One-time encoding of prompts in `items.json` | |
| | | **Per-request work** | _Only_ image decoding โ encode_image โ softmax | |
| | | **Latency (A10G)** | < 30 ms once the image arrives | |
| | |
| | --- |
| | |
| | ## ๐ Repository layout |
| | |
| | | Path | Purpose | |
| | |--------------------|------------------------------------------------------------------| |
| | | `handler.py` | HF entry-point (loads model + text cache, serves requests) | |
| | | `reparam.py` | 60-line stand-alone copy of Appleโs `reparameterize_model` | |
| | | `requirements.txt` | Minimal dep set (`torch`, `torchvision`, `open-clip-torch`) | |
| | | `items.json` | Your label set (`id`, `name`, `prompt` per line) | |
| | | `README.md` | This document | |
| |
|
| | --- |
| |
|
| | ## ๐ Quick start (local smoke-test) |
| |
|
| | ```bash |
| | python -m venv venv && source venv/bin/activate |
| | pip install -r requirements.txt |
| | |
| | python - <<'PY' |
| | import base64, json, handler, pathlib |
| | app = handler.EndpointHandler() |
| | |
| | img_b64 = base64.b64encode(pathlib.Path("tests/cat.jpg").read_bytes()).decode() |
| | print(app({"inputs": {"image": img_b64}})[:5]) # top-5 classes |
| | PY |
| | ``` |
| |
|
| | --- |
| |
|
| | ## ๐ Calling the deployed endpoint |
| |
|
| | ```bash |
| | ENDPOINT="https://<your-endpoint>.aws.endpoints.huggingface.cloud" |
| | TOKEN="hf_xxxxxxxxxxxxxxxxx" |
| | IMG="cat.jpg" |
| | |
| | python - <<'PY' |
| | import base64, json, os, requests, sys |
| | url = os.environ["ENDPOINT"] |
| | token = os.environ["TOKEN"] |
| | img = sys.argv |
| | |
| | payload = { |
| | "inputs": { |
| | "image": base64.b64encode(open(img, "rb").read()).decode() |
| | } |
| | } |
| | resp = requests.post( |
| | url, |
| | headers={ |
| | "Authorization": f"Bearer {token}", |
| | "Content-Type": "application/json", |
| | "Accept": "application/json", |
| | }, |
| | json=payload, |
| | timeout=60, |
| | ) |
| | print(json.dumps(resp.json()[:5], indent=2)) |
| | PY |
| | $IMG |
| | ``` |
| |
|
| | *Response example* |
| |
|
| | ```json |
| | [ |
| | { "id": 23, "label": "cat", "score": 0.92 }, |
| | { "id": 11, "label": "tiger cat", "score": 0.05 }, |
| | { "id": 48, "label": "siamese cat", "score": 0.02 } |
| | ] |
| | ``` |
| |
|
| | --- |
| |
|
| | ## โ๏ธ How it works |
| |
|
| | 1. **Startup (runs once per replica)** |
| |
|
| | * Downloads / loads MobileCLIP-B (`datacompdr`). |
| | * Fuses MobileOne branches via `reparam.py`. |
| | * Reads `items.json` and encodes every prompt โ `[N,512]` tensor. |
| |
|
| | 2. **Per request** |
| |
|
| | * Decodes base-64 JPEG/PNG. |
| | * Applies OpenCLIP preprocessing (224 ร 224 center-crop + normalise). |
| | * Encodes the image, normalises, computes cosine similarity vs. cached text matrix. |
| | * Returns sorted `[{id, label, score}, โฆ]`. |
| |
|
| | --- |
| |
|
| | ## ๐ Updating the label set |
| |
|
| | Simply edit `items.json`, push, and redeploy. |
| |
|
| | ```json |
| | [ |
| | { "id": 0, "name": "cat", "prompt": "a photo of a cat" }, |
| | { "id": 1, "name": "dog", "prompt": "a photo of a dog" } |
| | ] |
| | ``` |
| |
|
| | No code changes are required; the handler re-encodes prompts at start-up. |
| |
|
| | --- |
| |
|
| | ## โ๏ธ License |
| |
|
| | * **Weights / data** โ Apple AMLR (see [`LICENSE_weights_data`](./LICENSE_weights_data)) |
| | * **This wrapper code** โ MIT |
| |
|
| | --- |
| |
|
| | <div align="center"><sub>Maintained with โค๏ธ by Your-Team โ Aug 2025</sub></div> |
| |
|
| |
|