| | --- |
| | title: MobileCLIP Image Classifier |
| | emoji: πΈ |
| | colorFrom: blue |
| | colorTo: purple |
| | sdk: gradio |
| | sdk_version: 4.44.0 |
| | app_file: app.py |
| | pinned: false |
| | license: mit |
| | --- |
| | |
| | # πΈ MobileCLIP-B Image Classifier |
| |
|
| | Zero-shot image classification powered by Apple's MobileCLIP-B model, served through an interactive Gradio web interface. This application enables real-time image classification against a dynamic set of text labels, with support for admin-managed label updates and optional Hugging Face Hub persistence. |
| |
|
| | ## π― Key Features |
| |
|
| | ### Core Capabilities |
| | - **πΌοΈ Zero-Shot Classification**: Upload any image for instant classification without model retraining |
| | - **π·οΈ Dynamic Label Management**: Add, remove, and update classification labels on-the-fly |
| | - **π Interactive Results**: Visual confidence scores with sortable data tables |
| | - **β‘ Optimized Performance**: Sub-30ms inference on GPU with re-parameterized MobileOne blocks |
| | - **π Secure Admin Panel**: Token-protected label management interface |
| | - **βοΈ Hub Persistence**: Optional versioned label storage on Hugging Face Hub |
| |
|
| | ### API Access |
| | - **REST API**: Fully accessible via Gradio's automatic API endpoints |
| | - **Base64 Support**: Direct base64 image input for backend integration |
| | - **Batch Processing**: Efficient handling of multiple classification requests |
| |
|
| | ## ποΈ Architecture |
| |
|
| | ### Components |
| | - **`app.py`**: Main Gradio interface with public/admin tabs and API endpoints |
| | - **`handler.py`**: Core model management, inference logic, and label operations |
| | - **`reparam.py`**: MobileOne re-parameterization for optimized inference |
| | - **`items.json`**: Default label catalog with metadata |
| |
|
| | ### Model Details |
| | - **Architecture**: MobileCLIP-B with re-parameterized MobileOne image encoder |
| | - **Text Encoder**: Optimized CLIP text transformer |
| | - **Embedding Cache**: Pre-computed text embeddings for fast inference |
| | - **Device Support**: Automatic GPU/CPU detection with float16 optimization |
| |
|
| | ## π Quick Start |
| |
|
| | ### Environment Variables |
| |
|
| | Configure in your Space Settings β Variables and secrets: |
| |
|
| | | Variable | Description | Required | |
| | |----------|-------------|----------| |
| | | `ADMIN_TOKEN` | Secret token for admin operations | Yes (for admin) | |
| | | `HF_LABEL_REPO` | Hub dataset for label storage (e.g., `user/labels`) | No | |
| | | `HF_WRITE_TOKEN` | Token with write permissions to dataset repo | No | |
| | | `HF_READ_TOKEN` | Token with read permissions (defaults to write token) | No | |
| |
|
| | ### Usage Examples |
| |
|
| | #### Web Interface |
| | 1. Navigate to the Space URL |
| | 2. Upload an image in the Classification tab |
| | 3. Adjust top-k results (default: 10) |
| | 4. View ranked predictions with confidence scores |
| |
|
| | #### API Usage |
| |
|
| | **Standard Classification:** |
| | ```python |
| | import requests |
| | |
| | response = requests.post( |
| | "YOUR_SPACE_URL/api/classify_image", |
| | files={"image": open("photo.jpg", "rb")}, |
| | data={"top_k": 5} |
| | ) |
| | results = response.json() |
| | ``` |
| |
|
| | **Base64 Input:** |
| | ```python |
| | import base64 |
| | import requests |
| | |
| | with open("photo.jpg", "rb") as f: |
| | img_base64 = base64.b64encode(f.read()).decode() |
| | |
| | response = requests.post( |
| | "YOUR_SPACE_URL/api/classify_base64", |
| | json={ |
| | "image": img_base64, |
| | "top_k": 10 |
| | } |
| | ) |
| | results = response.json() |
| | ``` |
| |
|
| | ## π§ Admin Operations |
| |
|
| | ### Label Management |
| |
|
| | Authenticated admins can perform the following operations: |
| |
|
| | #### Add Labels |
| | ```json |
| | { |
| | "op": "upsert_labels", |
| | "token": "YOUR_ADMIN_TOKEN", |
| | "items": [ |
| | {"id": 100, "name": "bicycle", "prompt": "a photo of a bicycle"}, |
| | {"id": 101, "name": "airplane", "prompt": "a photo of an airplane"} |
| | ] |
| | } |
| | ``` |
| |
|
| | #### Reload Specific Version |
| | ```json |
| | { |
| | "op": "reload_labels", |
| | "token": "YOUR_ADMIN_TOKEN", |
| | "version": 5 |
| | } |
| | ``` |
| |
|
| | #### Remove Labels |
| | ```json |
| | { |
| | "op": "remove_labels", |
| | "token": "YOUR_ADMIN_TOKEN", |
| | "ids": [100, 101] |
| | } |
| | ``` |
| |
|
| | ### Label Deduplication |
| | - Automatic case-insensitive name deduplication |
| | - Prevents duplicate entries (e.g., "cat", "Cat", "CAT" treated as same) |
| | - ID-based deduplication for consistent label management |
| |
|
| | ## π¦ Hub Integration |
| |
|
| | When configured with `HF_LABEL_REPO` and tokens, the system automatically: |
| |
|
| | 1. **Saves Snapshots**: Each label update creates versioned snapshots |
| | - `snapshots/v{N}/embeddings.safetensors`: Pre-computed text embeddings |
| | - `snapshots/v{N}/meta.json`: Label metadata and model info |
| | - `snapshots/latest.json`: Points to current version |
| |
|
| | 2. **Loads on Startup**: Fetches latest snapshot or specified version |
| | 3. **Fallback**: Uses local `items.json` if Hub unavailable |
| |
|
| | ## π¨ Default Label Catalog |
| |
|
| | The bundled `items.json` includes 50+ kid-friendly objects with: |
| | - Unique IDs and display names |
| | - CLIP-optimized prompts |
| | - Category metadata |
| | - Fun facts and rarity ratings |
| |
|
| | Categories include animals, toys, food, vehicles, nature, and everyday objects. |
| |
|
| | ## β‘ Performance Optimization |
| |
|
| | - **GPU Acceleration**: Automatic CUDA detection with float16 inference |
| | - **CPU Fallback**: Graceful degradation with float32 precision |
| | - **Embedding Cache**: Pre-computed text embeddings updated on label changes |
| | - **Re-parameterization**: MobileOne blocks optimized for inference speed |
| | - **Batch Processing**: Efficient matrix operations for multi-label scoring |
| |
|
| | ## π Security Considerations |
| |
|
| | - **Token Protection**: Admin operations require `ADMIN_TOKEN` |
| | - **Private Datasets**: Keep label repos private for sensitive applications |
| | - **Input Validation**: Automatic sanitization of uploaded images |
| | - **Memory Management**: Images processed and discarded after inference |
| |
|
| | ## π License |
| |
|
| | - **Model Weights**: Apple Sample Code License (ASCL) |
| | - **Interface Code**: MIT License |
| |
|
| | ## π€ Contributing |
| |
|
| | Contributions welcome! Areas for improvement: |
| | - Additional label management features |
| | - Performance optimizations |
| | - Extended API capabilities |
| | - Multi-language support |
| |
|
| | ## π Resources |
| |
|
| | - [MobileCLIP Paper](https://arxiv.org/abs/2311.17049) |
| | - [OpenCLIP Library](https://github.com/mlfoundations/open_clip) |
| | - [Gradio Documentation](https://gradio.app/docs) |
| | - [Hugging Face Spaces](https://huggingface.co/spaces) |