| --- |
| title: "OpenAI-Compatible FastAPI Backend" |
| emoji: "π€" |
| colorFrom: "blue" |
| colorTo: "green" |
| sdk: "docker" |
| app_port: 7860 |
| pinned: false |
| --- |
| |
| # Hugging Face Spaces: FastAPI OpenAI-Compatible Backend |
|
|
| This project is now ready to deploy as a Hugging Face Space using FastAPI and transformers (no vLLM, no llama-cpp/gguf). |
|
|
| ## Features |
|
|
| - OpenAI-compatible `/v1/chat/completions` endpoint |
| - Multimodal support (text + image, if model supports) |
| - Environment variable support via `.env` |
| - Hugging Face Spaces compatible (CPU or T4/RTX GPU) |
|
|
| ## Usage (Local) |
|
|
| ```bash |
| pip install -r requirements.txt |
| python -m uvicorn backend_service:app --host 0.0.0.0 --port 7860 |
| ``` |
|
|
| ## Usage (Hugging Face Spaces) |
|
|
| - Push this repo to your Hugging Face Space |
| - Space will auto-launch with FastAPI backend |
| - Use `/v1/chat/completions` endpoint for OpenAI-compatible clients |
|
|
| ## Notes |
|
|
| - Only transformers models are supported (no GGUF/llama-cpp, no vLLM) |
| - Set your model in the `AI_MODEL` environment variable or edit `backend_service.py` |
| - For secrets, use the Hugging Face Spaces Secrets UI or a `.env` file |
|
|
| ## Example curl |
|
|
| ```bash |
| curl -X POST https://<your-space>.hf.space/v1/chat/completions \ |
| -H "Content-Type: application/json" \ |
| -d '{"model": "google/gemma-3n-E4B-it", "messages": [{"role": "user", "content": "Hello!"}]}' |
| ``` |
|
|
| --- |
|
|
| For more, see Hugging Face Spaces docs: https://huggingface.co/docs/hub/spaces-sdks-docker |
|
|
| # Fallback Logic |
|
|
| If vLLM fails to start or respond, the backend will automatically fallback to the legacy backend. |
|
|
| # Fine-tuning Gemma 3n E4B on MacBook M1 (Apple Silicon) with Unsloth |
|
|
| This project supports local fine-tuning of the Gemma 3n E4B model using Unsloth, PEFT/LoRA, and export to GGUF Q4_K_XL for efficient inference. The workflow is optimized for Apple Silicon (M1/M2/M3) and avoids CUDA/bitsandbytes dependencies. |
|
|
| ## Prerequisites |
|
|
| - Python 3.10+ |
| - macOS with Apple Silicon (M1/M2/M3) |
| - PyTorch with MPS backend (install via `pip install torch`) |
| - All dependencies in `requirements.txt` (install with `pip install -r requirements.txt`) |
|
|
| ## Training Script Usage |
|
|
| Run the training script with your dataset (JSON/JSONL or Hugging Face format): |
|
|
| ```bash |
| python training/train_gemma_unsloth.py \ |
| --job-id myjob \ |
| --output-dir training_runs/myjob \ |
| --dataset sample_data/train.jsonl \ |
| --prompt-field prompt --response-field response \ |
| --epochs 1 --batch-size 1 --gradient-accumulation 8 \ |
| --use-fp16 \ |
| --grpo --cpt \ |
| --export-gguf --gguf-out training_runs/myjob/adapter-gguf-q4_k_xl |
| ``` |
|
|
| **Flags:** |
|
|
| - `--grpo`: Enable GRPO (if supported by Unsloth) |
| - `--cpt`: Enable CPT (if supported by Unsloth) |
| - `--export-gguf`: Export to GGUF Q4_K_XL after training |
| - `--gguf-out`: Path to save GGUF file |
|
|
| **Notes:** |
|
|
| - On Mac, bitsandbytes/xformers are disabled automatically. |
| - Training is slower than on CUDA GPUs; use small batch sizes and gradient accumulation. |
| - If Unsloth's GGUF export is unavailable, follow the printed instructions to use llama.cpp's `convert-hf-to-gguf.py`. |
|
|
| ## Troubleshooting |
|
|
| - If you see errors about missing CUDA or bitsandbytes, ensure you are running on Apple Silicon and have the latest Unsloth/Transformers. |
| - For memory errors, reduce `--batch-size` or `--cutoff-len`. |
| - For best results, use datasets formatted to match the official Gemma 3n chat template. |
|
|
| ## Example: Manual GGUF Export with llama.cpp |
|
|
| If the script prints a message about manual conversion, run: |
|
|
| ```bash |
| python convert-hf-to-gguf.py --outtype q4_k_xl --outfile training_runs/myjob/adapter-gguf-q4_k_xl training_runs/myjob/adapter |
| ``` |
|
|
| ## References |
|
|
| - [Unsloth Documentation](https://unsloth.ai/) |
| - [Gemma 3n E4B Model Card](https://huggingface.co/unsloth/gemma-3n-E4B-it) |
| - [llama.cpp GGUF Export Guide](https://github.com/ggerganov/llama.cpp) |
|
|
| --- |
|
|
| title: Multimodal AI Backend Service |
| emoji: π |
| colorFrom: yellow |
| colorTo: purple |
| sdk: docker |
| app_port: 8000 |
| pinned: false |
| |
| --- |
| |
| # firstAI - Multimodal AI Backend π |
| |
| A powerful AI backend service with **multimodal capabilities** and **advanced deployment support** - supporting both text generation and image analysis using transformers pipelines. |
| |
| ## π Features |
| |
| ### π€ Configurable AI Models |
| |
| - **Default Text Model**: Microsoft DialoGPT-medium (deployment-friendly) |
| - **Advanced Models**: Support for quantized models (Unsloth, 4-bit, GGUF) |
| - **Environment Configuration**: Runtime model selection via environment variables |
| - **Quantization Support**: Automatic 4-bit quantization with fallback mechanisms |
| |
| ### πΌοΈ Multimodal Support |
| |
| - Process text-only messages |
| - Analyze images from URLs |
| - Combined image + text conversations |
| - OpenAI Vision API compatible format |
| |
| ### οΏ½ Production Ready |
| |
| - **Enhanced Deployment**: Multi-level fallback for quantized models |
| - **Environment Flexibility**: Works in constrained deployment environments |
| - **Error Resilience**: Comprehensive error handling with graceful degradation |
| - FastAPI backend with automatic docs |
| - Health checks and monitoring |
| - PyTorch with MPS acceleration (Apple Silicon) |
| |
| ### π§ Model Configuration |
| |
| Configure models via environment variables: |
| |
| ```bash |
| # Set custom text model (optional) |
| export AI_MODEL="microsoft/DialoGPT-medium" |
|
|
| # Set custom vision model (optional) |
| export VISION_MODEL="Salesforce/blip-image-captioning-base" |
| |
| # For private models (optional) |
| export HF_TOKEN="your_huggingface_token" |
| ``` |
| |
| **Supported Model Types:** |
| |
| - Standard models: `microsoft/DialoGPT-medium`, `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B` |
| - Quantized models: `unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit` |
| - GGUF models: `unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF` |
| |
| ## π Quick Start |
| |
| ### 1. Install Dependencies |
| |
| ```bash |
| pip install -r requirements.txt |
| ``` |
| |
| ### 2. Start the Service |
| |
| ```bash |
| python backend_service.py |
| ``` |
| |
| ### 3. Test Multimodal Capabilities |
| |
| ```bash |
| python test_final.py |
| ``` |
| |
| The service will start on **http://localhost:8001** with both text and vision models loaded. |
| |
| ## π‘ Usage Examples |
| |
| ### Text-Only Chat |
| |
| ```bash |
| curl -X POST http://localhost:8001/v1/chat/completions \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "model": "microsoft/DialoGPT-medium", |
| "messages": [{"role": "user", "content": "Hello!"}] |
| }' |
| ``` |
| |
| ### Image Analysis |
|
|
| ```bash |
| curl -X POST http://localhost:8001/v1/chat/completions \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "model": "Salesforce/blip-image-captioning-base", |
| "messages": [ |
| { |
| "role": "user", |
| "content": [ |
| { |
| "type": "image", |
| "url": "https://example.com/image.jpg" |
| } |
| ] |
| } |
| ] |
| }' |
| ``` |
|
|
| ### Multimodal (Image + Text) |
|
|
| ```bash |
| curl -X POST http://localhost:8001/v1/chat/completions \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "model": "Salesforce/blip-image-captioning-base", |
| "messages": [ |
| { |
| "role": "user", |
| "content": [ |
| { |
| "type": "image", |
| "url": "https://example.com/image.jpg" |
| }, |
| { |
| "type": "text", |
| "text": "What do you see in this image?" |
| } |
| ] |
| } |
| ] |
| }' |
| ``` |
|
|
| ## π§ Technical Details |
|
|
| ### Architecture |
|
|
| - **FastAPI** web framework |
| - **Transformers** pipeline for AI models |
| - **PyTorch** backend with GPU/MPS support |
| - **Pydantic** for request/response validation |
|
|
| ### Models |
|
|
| - **Text**: microsoft/DialoGPT-medium |
| - **Vision**: Salesforce/blip-image-captioning-base |
|
|
| ### API Endpoints |
|
|
| - `GET /` - Service information |
| - `GET /health` - Health check |
| - `GET /v1/models` - List available models |
| - `POST /v1/chat/completions` - Chat completions (text/multimodal) |
| - `GET /docs` - Interactive API documentation |
|
|
| ## π Deployment |
|
|
| ### Environment Variables |
|
|
| ```bash |
| # Optional: Custom models |
| export AI_MODEL="microsoft/DialoGPT-medium" |
| export VISION_MODEL="Salesforce/blip-image-captioning-base" |
| export HF_TOKEN="your_token_here" # For private models |
| ``` |
|
|
| ### Production Deployment |
|
|
| The service includes enhanced deployment capabilities: |
|
|
| - **Quantized Model Support**: Automatic handling of 4-bit and GGUF models |
| - **Fallback Mechanisms**: Multi-level fallback for constrained environments |
| - **Error Resilience**: Graceful degradation when quantization libraries unavailable |
|
|
| ### Docker Deployment |
|
|
| ```bash |
| # Build and run with Docker |
| docker build -t firstai . |
| docker run -p 8000:8000 firstai |
| ``` |
|
|
| ### Testing Deployment |
|
|
| ```bash |
| # Test quantization detection and fallbacks |
| python test_deployment_fallbacks.py |
| |
| # Test health endpoint |
| curl http://localhost:8000/health |
| ``` |
|
|
| For comprehensive deployment guidance, see `DEPLOYMENT_ENHANCEMENTS.md`. |
|
|
| ## π§ͺ Testing |
|
|
| Run the comprehensive test suite: |
|
|
| ```bash |
| python test_final.py |
| ``` |
|
|
| Test individual components: |
|
|
| ```bash |
| python test_multimodal.py # Basic multimodal tests |
| python test_pipeline.py # Pipeline compatibility |
| ``` |
|
|
| ## π¦ Dependencies |
|
|
| Key packages: |
|
|
| - `fastapi` - Web framework |
| - `transformers` - AI model pipelines |
| - `torch` - PyTorch backend |
| - `Pillow` - Image processing |
| - `accelerate` - Model acceleration |
| - `requests` - HTTP client |
|
|
| ## π― Integration Complete |
|
|
| This project successfully integrates: |
| β
**Transformers image-text-to-text pipeline** |
| β
**OpenAI Vision API compatibility** |
| β
**Multimodal message processing** |
| β
**Production-ready FastAPI service** |
|
|
| See `MULTIMODAL_INTEGRATION_COMPLETE.md` for detailed integration documentation. |
|
|
| - PyTorch with MPS acceleration (Apple Silicon) AI Backend Service |
| emoji: οΏ½ |
| colorFrom: yellow |
| colorTo: purple |
| sdk: fastapi |
| sdk_version: 0.100.0 |
| app_file: backend_service.py |
| pinned: false |
| |
| --- |
| |
| # AI Backend Service π |
| |
| **Status: β
CONVERSION COMPLETE!** |
| |
| Successfully converted from a non-functioning Gradio HuggingFace app to a production-ready FastAPI backend service with OpenAI-compatible API endpoints. |
| |
| ## Quick Start |
| |
| ### 1. Setup Environment |
| |
| ```bash |
| # Activate the virtual environment |
| source gradio_env/bin/activate |
|
|
| # Install dependencies (already done) |
| pip install -r requirements.txt |
| ``` |
| |
| ### 2. Start the Backend Service |
| |
| ```bash |
| python backend_service.py --port 8000 --reload |
| ``` |
| |
| ### 3. Test the API |
| |
| ```bash |
| # Run comprehensive tests |
| python test_api.py |
|
|
| # Or try usage examples |
| python usage_examples.py |
| ``` |
| |
| ## API Endpoints |
| |
| | Endpoint | Method | Description | |
| | ---------------------- | ------ | ----------------------------------- | |
| | `/` | GET | Service information | |
| | `/health` | GET | Health check | |
| | `/v1/models` | GET | List available models | |
| | `/v1/chat/completions` | POST | Chat completion (OpenAI compatible) | |
| | `/v1/completions` | POST | Text completion | |
| |
| ## Example Usage |
| |
| ### Chat Completion |
| |
| ```bash |
| curl -X POST http://localhost:8000/v1/chat/completions \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "model": "microsoft/DialoGPT-medium", |
| "messages": [ |
| {"role": "user", "content": "Hello! How are you?"} |
| ], |
| "max_tokens": 150, |
| "temperature": 0.7 |
| }' |
| ``` |
| |
| ### Streaming Chat |
|
|
| ```bash |
| curl -X POST http://localhost:8000/v1/chat/completions \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "model": "microsoft/DialoGPT-medium", |
| "messages": [ |
| {"role": "user", "content": "Tell me a joke"} |
| ], |
| "stream": true |
| }' |
| ``` |
|
|
| ## Files |
|
|
| - **`app.py`** - Original Gradio ChatInterface (still functional) |
| - **`backend_service.py`** - New FastAPI backend service β |
| - **`test_api.py`** - Comprehensive API testing |
| - **`usage_examples.py`** - Simple usage examples |
| - **`requirements.txt`** - Updated dependencies |
| - **`CONVERSION_COMPLETE.md`** - Detailed conversion documentation |
|
|
| ## Features |
|
|
| β
**OpenAI-Compatible API** - Drop-in replacement for OpenAI API |
| β
**Async FastAPI** - High-performance async architecture |
| β
**Streaming Support** - Real-time response streaming |
| β
**Error Handling** - Robust error handling with fallbacks |
| β
**Production Ready** - CORS, logging, health checks |
| β
**Docker Ready** - Easy containerization |
| β
**Auto-reload** - Development-friendly auto-reload |
| β
**Type Safety** - Full type hints with Pydantic validation |
|
|
| ## Service URLs |
|
|
| - **Backend Service**: http://localhost:8000 |
| - **API Documentation**: http://localhost:8000/docs |
| - **OpenAPI Spec**: http://localhost:8000/openapi.json |
|
|
| ## Model Information |
|
|
| - **Current Model**: `microsoft/DialoGPT-medium` |
| - **Type**: Conversational AI model |
| - **Provider**: HuggingFace Inference API |
| - **Capabilities**: Text generation, chat completion |
|
|
| ## Architecture |
|
|
| ``` |
| βββββββββββββββββββββββ ββββββββββββββββββββββββ βββββββββββββββββββββββ |
| β Client Request βββββΆβ FastAPI Backend βββββΆβ HuggingFace API β |
| β (OpenAI format) β β (backend_service) β β (DialoGPT-medium) β |
| βββββββββββββββββββββββ ββββββββββββββββββββββββ βββββββββββββββββββββββ |
| β |
| βΌ |
| ββββββββββββββββββββββββ |
| β OpenAI Response β |
| β (JSON/Streaming) β |
| ββββββββββββββββββββββββ |
| ``` |
|
|
| ## Development |
|
|
| The service includes: |
|
|
| - **Auto-reload** for development |
| - **Comprehensive logging** for debugging |
| - **Type checking** for code quality |
| - **Test suite** for reliability |
| - **Error handling** for robustness |
|
|
| ## Production Deployment |
|
|
| Ready for production with: |
|
|
| - **Environment variables** for configuration |
| - **Health check endpoints** for monitoring |
| - **CORS support** for web applications |
| - **Docker compatibility** for containerization |
| - **Structured logging** for observability |
|
|
| --- |
|
|
| **π Conversion Status: COMPLETE!** |
| Successfully transformed from broken Gradio app to production-ready AI backend service. |
|
|
| For detailed conversion documentation, see [`CONVERSION_COMPLETE.md`](CONVERSION_COMPLETE.md). |
|
|
| # Gemma 3n GGUF FastAPI Backend (Hugging Face Space) |
|
|
| This Space provides an OpenAI-compatible chat API for Gemma 3n GGUF models, powered by FastAPI. |
|
|
| **Note:** On Hugging Face Spaces, the backend runs in `DEMO_MODE` (no model loaded) for demonstration and endpoint testing. For real inference, run locally with a GGUF model and llama-cpp-python. |
|
|
| ## Endpoints |
|
|
| - `/health` β Health check |
| - `/v1/chat/completions` β OpenAI-style chat completions (returns demo response) |
| - `/train/start` β Start a (demo) training job |
| - `/train/status/{job_id}` β Check training job status |
| - `/train/logs/{job_id}` β Get training logs |
|
|
| ## Usage |
|
|
| 1. **Clone this repo** or create a Hugging Face Space (type: FastAPI). |
| 2. All dependencies are in `requirements.txt`. |
| 3. The Space will start in demo mode (no model download required). |
|
|
| ## Local Inference (with GGUF) |
|
|
| To run with a real model locally: |
|
|
| 1. Download a Gemma 3n GGUF model (e.g. from https://huggingface.co/unsloth/gemma-3n-E4B-it-GGUF). |
| 2. Set `AI_MODEL` to the local path or repo. |
| 3. Unset `DEMO_MODE`. |
| 4. Run: |
| ```bash |
| pip install -r requirements.txt |
| uvicorn gemma_gguf_backend:app --host 0.0.0.0 --port 8000 |
| ``` |
|
|
| ## License |
|
|
| Apache 2.0 |
|
|