--- title: Multi-LLM API Gateway emoji: πŸ›‘οΈ colorFrom: indigo colorTo: red sdk: docker pinned: false license: apache-2.0 short_description: 'Secure Multi-LLM Gateway β€” (Streamable HTTP / SSE)' --- # Multi-LLM API Gateway β€” or Universal MCP Hub (Sandboxed) β€” or secure AI wrapper with dual interface: REST + MCP aka: a clean, secure starting point for your own projects. Pick the description that fits your use case. They're all correct. > A production-grade **the-thing** that actually thinks about security. > Built on [PyFundaments](PyFundaments.md) β€” running on **simpleCity**. ``` No key β†’ no tool β†’ no crash β†’ no exposed secrets ``` > [!WARNING] > Most MCP servers are prompts dressed up as servers. This one has a real architecture. --- > [!IMPORTANT] > This project is under active development β€” always use the latest release from [Codey Lab](https://github.com/Codey-LAB/Multi-LLM-API-Gateway) *(more stable builds land here first)*. > This repo ([DEV](https://github.com/VolkanSah/Multi-LLM-API-Gateway)) is where the chaos happens. πŸ”¬ A ⭐ on the repos will be cool πŸ˜™ --- ## Why this exists The AI ecosystem is full of servers with hardcoded keys, `os.environ` scattered everywhere, zero sandboxing. One misconfigured fork and your API keys are gone. This is exactly the kind of negligence (and worse β€” outright fraud) that [Wall of Shames](https://github.com/Wall-of-Shames) documents: fake "AI tools" exploiting non-technical users β€” API wrappers dressed up as custom models, Telegram payment funnels, bought stars. If you build on open source, you should know this exists. This hub is the antidote: - **Structural sandboxing** β€” `app/*` can never touch `fundaments/` or `.env`. Not by convention. By design. - **Guardian pattern** β€” `main.py` is the only process that reads secrets. It injects validated services as a dict. `app/*` never sees the raw environment. - **Graceful degradation** β€” No key? Tool doesn't register. Server still starts. No crash, no error, no empty `None` floating around. - **Single source of truth** β€” All tool/provider/model config lives in `app/.pyfun`. Adding a provider = edit one file. No code changes. --- ## Two Interfaces β€” One Server This hub exposes **two completely independent interfaces** on the same hypercorn instance: ``` POST /api β†’ REST interface β€” for custom clients, desktop apps, CMS plugins GET+POST /mcp β†’ MCP interface β€” for Claude Desktop, Cursor, Windsurf, any MCP client GET / β†’ Health check β€” uptime, status ``` They share the same tool registry, provider config, and fallback chain. Adding a tool once makes it available on both interfaces automatically. ### REST API (`/api`) Simple JSON POST β€” no protocol overhead, works with any HTTP client: ```json POST /api {"tool": "llm_complete", "params": {"prompt": "Hello", "provider": "anthropic"}} ``` Used by: Desktop Client (`DESKTOP_CLIENT/hub.py`), WordPress plugin, any custom integration. ### MCP Interface (`/mcp`) Full MCP protocol β€” tool discovery, structured calls, streaming responses. **Primary transport: Streamable HTTP** (MCP spec 2025-11-25) **Fallback transport: SSE** (legacy, configurable via `.pyfun`) Configured via `HUB_TRANSPORT` in `app/.pyfun [HUB]`: ```ini HUB_TRANSPORT = "streamable-http" # default β€” MCP spec 2025-11-25 # HUB_TRANSPORT = "sse" # legacy fallback for older clients ``` Used by: Claude Desktop, Cursor, Windsurf, any MCP-compatible client. --- ## Architecture ``` main.py (Guardian) β”‚ β”‚ reads .env / HF Secrets β”‚ initializes fundaments/* conditionally β”‚ injects validated services as dict β”‚ └──► app/app.py (Orchestrator, sandboxed) β”‚ β”‚ unpacks fundaments ONCE, at startup, never stores globally β”‚ starts hypercorn (async ASGI) β”‚ routes: GET / | POST /api | /mcp (transport-dependent) β”‚ β”œβ”€β”€ app/mcp.py ← FastMCP + transport handler (Streamable HTTP / SSE) β”œβ”€β”€ app/tools.py ← Tool registry (key-gated) β”œβ”€β”€ app/providers.py ← LLM + Search execution + fallback chain β”œβ”€β”€ app/models.py ← Model limits, costs, capabilities β”œβ”€β”€ app/config.py ← .pyfun parser (single source of truth) └── app/db_sync.py ← Internal SQLite IPC (app/* state only) β‰  fundaments/postgresql.py (Guardian-only) ``` **The sandbox is structural:** ```python # app/app.py β€” fundaments unpacked ONCE, NEVER stored globally async def start_application(fundaments: Dict[str, Any]) -> None: config_service = fundaments["config"] db_service = fundaments["db"] # None if not configured encryption_service = fundaments["encryption"] # None if keys missing access_control_service = fundaments["access_control"] ... # From here: app/* reads its own config from app/.pyfun only. # fundaments are never passed into other app/* modules. ``` `app/app.py` never calls `os.environ`. Never imports from `fundaments/`. Never reads `.env`. This isn't documentation. It's enforced by the import structure. ### Why Quart + hypercorn? **Quart** is async Flask β€” fully `async/await` native. FastMCP's handlers are async; mixing sync Flask would require thread hacks. With Quart, `/mcp` hands off directly to FastMCP β€” no bridging, no blocking. **hypercorn** is an ASGI server (vs. waitress/gunicorn which are WSGI). WSGI servers handle one request per thread β€” wrong for long-lived MCP connections. hypercorn handles both Streamable HTTP and SSE natively, and runs without extra config on HuggingFace Spaces. HTTP/2 support (`config.h2 = True`) is built-in β€” relevant for Streamable HTTP performance at scale. The `/mcp` route in `app.py` remains the natural interception point regardless of transport β€” auth checks, rate limiting, and logging can all be added there before the request reaches FastMCP. --- ## Two Databases β€” One Architecture ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Guardian Layer (fundaments/*) β”‚ β”‚ β”‚ β”‚ postgresql.py β†’ Cloud DB (e.g. Neon, Supabase) β”‚ β”‚ asyncpg pool, SSL enforced β”‚ β”‚ β”‚ β”‚ user_handler.py β†’ SQLite (users + sessions tables) β”‚ β”‚ PBKDF2-SHA256 password hashing β”‚ β”‚ Session validation incl. IP + UserAgent β”‚ β”‚ Account lockout after 5 failed attempts β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ inject as fundaments dict β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ App Layer (app/*) β”‚ β”‚ β”‚ β”‚ db_sync.py β†’ SQLite (hub_state + tool_cache tables) β”‚ β”‚ aiosqlite (async, non-blocking) β”‚ β”‚ NEVER touches users/sessions tables β”‚ β”‚ Relocated to /tmp/ on HF Spaces auto β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` **Table ownership β€” hard rule:** | Table | Owner | Access | | :--- | :--- | :--- | | `users` | `fundaments/user_handler.py` | Guardian only | | `sessions` | `fundaments/user_handler.py` | Guardian only | | `hub_state` | `app/db_sync.py` | app/* only | | `tool_cache` | `app/db_sync.py` | app/* only | | `hub_results` | PostgreSQL / Guardian | via `persist_result` tool | --- ## Tools Tools register at startup β€” only if the required API key exists. No key, no tool. Server always starts. | ENV Secret | Tool | Notes | | :--- | :--- | :--- | | `ANTHROPIC_API_KEY` | `llm_complete` | Claude Haiku / Sonnet / Opus | | `GEMINI_API_KEY` | `llm_complete` | Gemini 2.0 / 2.5 / 3.x Flash & Pro | | `OPENROUTER_API_KEY` | `llm_complete` | 100+ models via OpenRouter | | `HF_TOKEN` | `llm_complete` | HuggingFace Inference API | | `BRAVE_API_KEY` | `web_search` | Independent web index | | `TAVILY_API_KEY` | `web_search` | AI-optimized search with synthesized answers | | `DATABASE_URL` | `cloud DB` | e.g. Neon, Supabase | | `DATABASE_URL` | `db_query`, `persist_result` | SQLite read + PostgreSQL write | | *(always)* | `list_active_tools` | Shows key names only β€” never values | | *(always)* | `health_check` | Status + uptime + active transport | | *(always)* | `get_model_info` | Limits, costs, capabilities per model | For all key names see [`app/.pyfun`](app/.pyfun). **Tools are configured in `.pyfun` β€” including system prompts:** ```ini [TOOL.code_review] active = "true" description = "Review code for bugs, security issues and improvements" provider_type = "llm" default_provider = "anthropic" timeout_sec = "60" system_prompt = "You are an expert code reviewer. Analyze the given code for bugs, security issues, and improvements. Be specific and concise." [TOOL.code_review_END] ``` Current built-in tools: `llm_complete`, `code_review`, `summarize`, `translate`, `web_search`, `db_query` Future hooks (commented, ready): `image_gen`, `code_exec`, `shellmaster_2.0`, Discord, GitHub webhooks --- ## LLM Fallback Chain All LLM providers share one `llm_complete` tool. If a provider fails, the hub walks the fallback chain from `.pyfun`: ``` e.g. anthropic β†’ gemini β†’ openrouter β†’ huggingface ``` ```ini [LLM_PROVIDER.anthropic] fallback_to = "gemini" [LLM_PROVIDER.anthropic_END] [LLM_PROVIDER.gemini] fallback_to = "openrouter" [LLM_PROVIDER.gemini_END] ``` Same pattern applies to search providers (`brave β†’ tavily`). --- ## Quick Start ### HuggingFace Spaces (recommended) 1. Fork / duplicate this Space 2. Go to **Settings β†’ Variables and secrets** 3. Add the API keys you have (any subset works) 4. Space starts automatically β€” only tools with valid keys register [β†’ Live Demo Space](https://huggingface.co/spaces/codey-lab/Multi-LLM-API-Gateway) (no LLM keys set) ### Local / Docker ```bash git clone https://github.com/VolkanSah/Multi-LLM-API-Gateway cd Multi-LLM-API-Gateway cp example-mcp___.env .env # fill in your keys pip install -r requirements.txt python main.py ``` Minimum required ENV vars (everything else is optional): ```env PYFUNDAMENTS_DEBUG="" LOG_LEVEL="INFO" LOG_TO_TMP="" ENABLE_PUBLIC_LOGS="true" HF_TOKEN="" HUB_SPACE_URL="" ``` Transport is configured in `app/.pyfun [HUB]` β€” not via ENV. --- ## Connect an MCP Client ### Streamable HTTP (default β€” MCP spec 2025-11-25) ```json { "mcpServers": { "universal-mcp-hub": { "url": "https://YOUR_USERNAME-universal-mcp-hub.hf.space/mcp" } } } ``` ### Streamable HTTP β€” Private Space (with HF token) ```json { "mcpServers": { "universal-mcp-hub": { "url": "https://YOUR_USERNAME-universal-mcp-hub.hf.space/mcp", "headers": { "Authorization": "Bearer hf_..." } } } } ``` ### SSE legacy fallback (set `HUB_TRANSPORT = "sse"` in `.pyfun`) ```json { "mcpServers": { "universal-mcp-hub": { "url": "https://YOUR_USERNAME-universal-mcp-hub.hf.space/mcp" } } } ``` > Same URL (`/mcp`) for both transports β€” the protocol is negotiated automatically. > SSE fallback is for older clients that don't support Streamable HTTP yet. --- ## Desktop Client ###### (experimental β€” ~80% AI generated) A full PySide6 desktop client is included in `DESKTOP_CLIENT/hub.py`. Communicates via the REST `/api` endpoint β€” no MCP protocol overhead. Ideal for private or non-public Spaces. ```bash pip install PySide6 httpx # optional file handling: pip install Pillow PyPDF2 pandas openpyxl python DESKTOP_CLIENT/hub.py ``` **Features:** - Multi-chat with persistent history - Tool / Provider / Model selector loaded live from your Hub - File attachments: images, PDF, CSV, Excel, ZIP, source code - Connect tab with health check + auto-load - Settings: HF Token + Hub URL saved locally, never sent anywhere except your own Hub - Full request/response log with timestamps - Runs on Windows, Linux, macOS [β†’ Desktop Client docs](DESKTOP_CLIENT/README.md) --- ## CMS & Custom Clients | Client | Interface used | Notes | | :--- | :--- | :--- | | [Desktop Client](DESKTOP_CLIENT/hub.py) | REST `/api` | PySide6, local | | [WP AI Hub](https://github.com/VolkanSah/WP-AI-HUB/) | REST `/api` | WordPress plugin | | TYPO3 (soon) | REST `/api` | β€” | | Claude Desktop | MCP `/mcp` | Streamable HTTP | | Cursor / Windsurf | MCP `/mcp` | Streamable HTTP | --- ## Configuration (.pyfun) `app/.pyfun` is the single source of truth for all app behavior. Three tiers: ``` LAZY: [HUB] + one [LLM_PROVIDER.*] β†’ works NORMAL: + [SEARCH_PROVIDER.*] + [MODELS.*] β†’ works better PRODUCTIVE: + [TOOLS] + [HUB_LIMITS] + [DB_SYNC] β†’ full power ``` Key settings in `[HUB]`: ```ini [HUB] HUB_TRANSPORT = "streamable-http" # streamable-http | sse HUB_STATELESS = "true" # true = HF Spaces safe, no session state HUB_PORT = "7860" [HUB_END] ``` Adding a new LLM provider β€” two steps: ```ini # 1. app/.pyfun [LLM_PROVIDER.mistral] active = "true" base_url = "https://api.mistral.ai/v1" env_key = "MISTRAL_API_KEY" default_model = "mistral-large-latest" models = "mistral-large-latest, mistral-small-latest" fallback_to = "" [LLM_PROVIDER.mistral_END] ``` ```python # 2. app/providers.py β€” uncomment the dummy _PROVIDER_CLASSES = { ... "mistral": MistralProvider, # ← uncomment to activate } ``` --- ## Dependencies ``` # PyFundaments Core (always required) asyncpg β€” async PostgreSQL pool (Guardian/cloud DB) python-dotenv β€” .env loading passlib β€” PBKDF2 password hashing in user_handler.py cryptography β€” encryption layer in fundaments/ # MCP Hub mcp β€” MCP protocol + FastMCP (Streamable HTTP + SSE) httpx β€” async HTTP for all provider API calls quart β€” async Flask (ASGI) β€” needed for MCP + hypercorn hypercorn β€” ASGI server β€” Streamable HTTP + SSE, HF Spaces native requests β€” sync HTTP for tool workers # Optional (uncomment in requirements.txt as needed) # aiofiles β€” async file ops (ML pipelines, file uploads) # discord.py β€” Discord bot integration (planned) # PyNaCl β€” Discord signature verification # psycopg2-binary β€” alternative PostgreSQL driver ``` > **Note:** The package is `mcp` (not `fastmcp`) β€” `FastMCP` is imported from `mcp.server.fastmcp`. > Streamable HTTP support requires `mcp >= 1.6.0`. --- ## Security Design - API keys live in HF Secrets / `.env` β€” never in `.pyfun`, never in code - `list_active_tools` returns key **names** only β€” never values - `db_query` is SELECT-only, enforced at application level (not just docs) - `app/*` has zero import access to `fundaments/` internals - Direct execution of `app/app.py` blocked by design β€” warning + null-fundaments fallback - `fundaments/` initialized conditionally β€” missing services degrade gracefully, never crash - Streamable HTTP uses standard Bearer headers β€” no token-in-URL (unlike SSE) > PyFundaments is not perfect. But it's more secure than most of what runs in production today. [β†’ Full Security Policy](SECURITY.md) --- ## Foundation Built on [PyFundaments](PyFundaments.md) β€” a security-first Python boilerplate: - `config_handler.py` β€” env loading with validation - `postgresql.py` β€” async DB pool (Guardian-only) - `encryption.py` β€” key-based encryption layer - `access_control.py` β€” role/permission management - `user_handler.py` β€” user lifecycle management - `security.py` β€” unified security manager composing the above None accessible from `app/*`. Injected as a validated dict by `main.py`. [β†’ PyFundaments Function Overview](PyFundaments%20–%20Function%20Overview.md) [β†’ Module Docs](docs/app/) [β†’ Source Repo](https://github.com/VolkanSah/Multi-LLM-API-Gateway) --- ## Related Projects - [Customs LLMs for free β€” Build Your Own LLM Service](https://github.com/VolkanSah/SmolLM2-customs/) - [WP AI Hub (WordPress Client)](https://github.com/VolkanSah/WP-AI-HUB/) - [ShellMaster (2023 precursor)](https://github.com/VolkanSah/ChatGPT-ShellMaster) --- ## History [ShellMaster](https://github.com/VolkanSah/ChatGPT-ShellMaster) (2023, MIT) was the precursor β€” browser-accessible shell for ChatGPT with session memory, built before MCP was a concept. Universal MCP Hub is its natural evolution: same idea, proper architecture, dual interface. --- ## License Dual-licensed: - [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0) - [Ethical Security Operations License v1.1 (ESOL)](ESOL) β€” mandatory, non-severable By using this software you agree to all ethical constraints defined in ESOL v1.1. --- *Architecture, security decisions, and PyFundaments by Volkan KΓΌcΓΌkbudak.* *Built with Claude (Anthropic) as a typing assistant for docs (and the occasional bug).* > crafted with passion β€” just wanted to understand how it works, don't actually need it, have a CLI πŸ˜„