Spaces:

codey-lab
/

Multi-LLM-API-Gateway

Running

App Files Files Community

Multi-LLM-API-Gateway / README.md

Alibrown

Update README.md

dd53022 verified about 4 hours ago

preview code

raw

history blame contribute delete

18.1 kB

	---
	title: Multi-LLM API Gateway
	emoji: 🛡️
	colorFrom: indigo
	colorTo: red
	sdk: docker
	pinned: false
	license: apache-2.0
	short_description: 'Secure Multi-LLM Gateway — (Streamable HTTP / SSE)'
	---

	# Multi-LLM API Gateway

	— or Universal MCP Hub (Sandboxed)
	— or secure AI wrapper with dual interface: REST + MCP

	aka: a clean, secure starting point for your own projects.
	Pick the description that fits your use case. They're all correct.

	> A production-grade the-thing that actually thinks about security.
	> Built on [PyFundaments](PyFundaments.md) — running on simpleCity.

	```
	No key → no tool → no crash → no exposed secrets
	```

	> [!WARNING]
	> Most MCP servers are prompts dressed up as servers. This one has a real architecture.

	---

	> [!IMPORTANT]
	> This project is under active development — always use the latest release from [Codey Lab](https://github.com/Codey-LAB/Multi-LLM-API-Gateway) (more stable builds land here first).
	> This repo ([DEV](https://github.com/VolkanSah/Multi-LLM-API-Gateway)) is where the chaos happens. 🔬 A ⭐ on the repos will be cool 😙

	---

	## Why this exists

	The AI ecosystem is full of servers with hardcoded keys, `os.environ` scattered everywhere, zero sandboxing. One misconfigured fork and your API keys are gone.

	This is exactly the kind of negligence (and worse — outright fraud) that [Wall of Shames](https://github.com/Wall-of-Shames) documents: fake "AI tools" exploiting non-technical users — API wrappers dressed up as custom models, Telegram payment funnels, bought stars. If you build on open source, you should know this exists.

	This hub is the antidote:

	- Structural sandboxing — `app/*` can never touch `fundaments/` or `.env`. Not by convention. By design.
	- Guardian pattern — `main.py` is the only process that reads secrets. It injects validated services as a dict. `app/*` never sees the raw environment.
	- Graceful degradation — No key? Tool doesn't register. Server still starts. No crash, no error, no empty `None` floating around.
	- Single source of truth — All tool/provider/model config lives in `app/.pyfun`. Adding a provider = edit one file. No code changes.

	---

	## Two Interfaces — One Server

	This hub exposes two completely independent interfaces on the same hypercorn instance:

	```
	POST /api → REST interface — for custom clients, desktop apps, CMS plugins
	GET+POST /mcp → MCP interface — for Claude Desktop, Cursor, Windsurf, any MCP client
	GET / → Health check — uptime, status
	```

	They share the same tool registry, provider config, and fallback chain. Adding a tool once makes it available on both interfaces automatically.

	### REST API (`/api`)

	Simple JSON POST — no protocol overhead, works with any HTTP client:

	```json
	POST /api
	{"tool": "llm_complete", "params": {"prompt": "Hello", "provider": "anthropic"}}
	```

	Used by: Desktop Client (`DESKTOP_CLIENT/hub.py`), WordPress plugin, any custom integration.

	### MCP Interface (`/mcp`)

	Full MCP protocol — tool discovery, structured calls, streaming responses.

	Primary transport: Streamable HTTP (MCP spec 2025-11-25)
	Fallback transport: SSE (legacy, configurable via `.pyfun`)

	Configured via `HUB_TRANSPORT` in `app/.pyfun [HUB]`:

	```ini
	HUB_TRANSPORT = "streamable-http" # default — MCP spec 2025-11-25
	# HUB_TRANSPORT = "sse" # legacy fallback for older clients
	```

	Used by: Claude Desktop, Cursor, Windsurf, any MCP-compatible client.

	---

	## Architecture

	```
	main.py (Guardian)
	│
	│ reads .env / HF Secrets
	│ initializes fundaments/* conditionally
	│ injects validated services as dict
	│
	└──► app/app.py (Orchestrator, sandboxed)
	│
	│ unpacks fundaments ONCE, at startup, never stores globally
	│ starts hypercorn (async ASGI)
	│ routes: GET / \| POST /api \| /mcp (transport-dependent)
	│
	├── app/mcp.py ← FastMCP + transport handler (Streamable HTTP / SSE)
	├── app/tools.py ← Tool registry (key-gated)
	├── app/providers.py ← LLM + Search execution + fallback chain
	├── app/models.py ← Model limits, costs, capabilities
	├── app/config.py ← .pyfun parser (single source of truth)
	└── app/db_sync.py ← Internal SQLite IPC (app/* state only)
	≠ fundaments/postgresql.py (Guardian-only)
	```

	The sandbox is structural:

	```python
	# app/app.py — fundaments unpacked ONCE, NEVER stored globally
	async def start_application(fundaments: Dict[str, Any]) -> None:
	config_service = fundaments["config"]
	db_service = fundaments["db"] # None if not configured
	encryption_service = fundaments["encryption"] # None if keys missing
	access_control_service = fundaments["access_control"]
	...
	# From here: app/* reads its own config from app/.pyfun only.
	# fundaments are never passed into other app/* modules.
	```

	`app/app.py` never calls `os.environ`. Never imports from `fundaments/`. Never reads `.env`.
	This isn't documentation. It's enforced by the import structure.

	### Why Quart + hypercorn?

	Quart is async Flask — fully `async/await` native. FastMCP's handlers are async; mixing sync Flask would require thread hacks. With Quart, `/mcp` hands off directly to FastMCP — no bridging, no blocking.

	hypercorn is an ASGI server (vs. waitress/gunicorn which are WSGI). WSGI servers handle one request per thread — wrong for long-lived MCP connections. hypercorn handles both Streamable HTTP and SSE natively, and runs without extra config on HuggingFace Spaces. HTTP/2 support (`config.h2 = True`) is built-in — relevant for Streamable HTTP performance at scale.

	The `/mcp` route in `app.py` remains the natural interception point regardless of transport — auth checks, rate limiting, and logging can all be added there before the request reaches FastMCP.

	---

	## Two Databases — One Architecture


	```
	┌─────────────────────────────────────────────────────────────┐
	│ Guardian Layer (fundaments/*) │
	│ │
	│ postgresql.py → Cloud DB (e.g. Neon, Supabase) │
	│ asyncpg pool, SSL enforced │
	│ │
	│ user_handler.py → SQLite (users + sessions tables) │
	│ PBKDF2-SHA256 password hashing │
	│ Session validation incl. IP + UserAgent │
	│ Account lockout after 5 failed attempts │
	│ │
	└──────────────────────┬──────────────────────────────────────┘
	│ inject as fundaments dict
	▼
	┌─────────────────────────────────────────────────────────────┐
	│ App Layer (app/*) │
	│ │
	│ db_sync.py → SQLite (hub_state + tool_cache tables) │
	│ aiosqlite (async, non-blocking) │
	│ NEVER touches users/sessions tables │
	│ Relocated to /tmp/ on HF Spaces auto │
	│ │
	└─────────────────────────────────────────────────────────────┘
	```

	Table ownership — hard rule:

	\| Table \| Owner \| Access \|
	\| :--- \| :--- \| :--- \|
	\| `users` \| `fundaments/user_handler.py` \| Guardian only \|
	\| `sessions` \| `fundaments/user_handler.py` \| Guardian only \|
	\| `hub_state` \| `app/db_sync.py` \| app/* only \|
	\| `tool_cache` \| `app/db_sync.py` \| app/* only \|
	\| `hub_results` \| PostgreSQL / Guardian \| via `persist_result` tool \|

	---

	## Tools

	Tools register at startup — only if the required API key exists. No key, no tool. Server always starts.

	\| ENV Secret \| Tool \| Notes \|
	\| :--- \| :--- \| :--- \|
	\| `ANTHROPIC_API_KEY` \| `llm_complete` \| Claude Haiku / Sonnet / Opus \|
	\| `GEMINI_API_KEY` \| `llm_complete` \| Gemini 2.0 / 2.5 / 3.x Flash & Pro \|
	\| `OPENROUTER_API_KEY` \| `llm_complete` \| 100+ models via OpenRouter \|
	\| `HF_TOKEN` \| `llm_complete` \| HuggingFace Inference API \|
	\| `BRAVE_API_KEY` \| `web_search` \| Independent web index \|
	\| `TAVILY_API_KEY` \| `web_search` \| AI-optimized search with synthesized answers \|
	\| `DATABASE_URL` \| `cloud DB` \| e.g. Neon, Supabase \|
	\| `DATABASE_URL` \| `db_query`, `persist_result` \| SQLite read + PostgreSQL write \|
	\| (always) \| `list_active_tools` \| Shows key names only — never values \|
	\| (always) \| `health_check` \| Status + uptime + active transport \|
	\| (always) \| `get_model_info` \| Limits, costs, capabilities per model \|

	For all key names see [`app/.pyfun`](app/.pyfun).

	Tools are configured in `.pyfun` — including system prompts:

	```ini
	[TOOL.code_review]
	active = "true"
	description = "Review code for bugs, security issues and improvements"
	provider_type = "llm"
	default_provider = "anthropic"
	timeout_sec = "60"
	system_prompt = "You are an expert code reviewer. Analyze the given code for bugs, security issues, and improvements. Be specific and concise."
	[TOOL.code_review_END]
	```

	Current built-in tools: `llm_complete`, `code_review`, `summarize`, `translate`, `web_search`, `db_query`
	Future hooks (commented, ready): `image_gen`, `code_exec`, `shellmaster_2.0`, Discord, GitHub webhooks

	---

	## LLM Fallback Chain

	All LLM providers share one `llm_complete` tool. If a provider fails, the hub walks the fallback chain from `.pyfun`:

	```
	e.g. anthropic → gemini → openrouter → huggingface
	```

	```ini
	[LLM_PROVIDER.anthropic]
	fallback_to = "gemini"
	[LLM_PROVIDER.anthropic_END]

	[LLM_PROVIDER.gemini]
	fallback_to = "openrouter"
	[LLM_PROVIDER.gemini_END]
	```

	Same pattern applies to search providers (`brave → tavily`).

	---

	## Quick Start

	### HuggingFace Spaces (recommended)

	1. Fork / duplicate this Space
	2. Go to Settings → Variables and secrets
	3. Add the API keys you have (any subset works)
	4. Space starts automatically — only tools with valid keys register

	[→ Live Demo Space](https://huggingface.co/spaces/codey-lab/Multi-LLM-API-Gateway) (no LLM keys set)

	### Local / Docker

	```bash
	git clone https://github.com/VolkanSah/Multi-LLM-API-Gateway
	cd Multi-LLM-API-Gateway
	cp example-mcp___.env .env
	# fill in your keys
	pip install -r requirements.txt
	python main.py
	```

	Minimum required ENV vars (everything else is optional):

	```env
	PYFUNDAMENTS_DEBUG=""
	LOG_LEVEL="INFO"
	LOG_TO_TMP=""
	ENABLE_PUBLIC_LOGS="true"
	HF_TOKEN=""
	HUB_SPACE_URL=""
	```

	Transport is configured in `app/.pyfun [HUB]` — not via ENV.

	---

	## Connect an MCP Client

	### Streamable HTTP (default — MCP spec 2025-11-25)

	```json
	{
	"mcpServers": {
	"universal-mcp-hub": {
	"url": "https://YOUR_USERNAME-universal-mcp-hub.hf.space/mcp"
	}
	}
	}
	```

	### Streamable HTTP — Private Space (with HF token)

	```json
	{
	"mcpServers": {
	"universal-mcp-hub": {
	"url": "https://YOUR_USERNAME-universal-mcp-hub.hf.space/mcp",
	"headers": {
	"Authorization": "Bearer hf_..."
	}
	}
	}
	}
	```

	### SSE legacy fallback (set `HUB_TRANSPORT = "sse"` in `.pyfun`)

	```json
	{
	"mcpServers": {
	"universal-mcp-hub": {
	"url": "https://YOUR_USERNAME-universal-mcp-hub.hf.space/mcp"
	}
	}
	}
	```

	> Same URL (`/mcp`) for both transports — the protocol is negotiated automatically.
	> SSE fallback is for older clients that don't support Streamable HTTP yet.

	---

	## Desktop Client
	###### (experimental — ~80% AI generated)

	A full PySide6 desktop client is included in `DESKTOP_CLIENT/hub.py`.
	Communicates via the REST `/api` endpoint — no MCP protocol overhead.
	Ideal for private or non-public Spaces.

	```bash
	pip install PySide6 httpx
	# optional file handling:
	pip install Pillow PyPDF2 pandas openpyxl
	python DESKTOP_CLIENT/hub.py
	```

	Features:
	- Multi-chat with persistent history
	- Tool / Provider / Model selector loaded live from your Hub
	- File attachments: images, PDF, CSV, Excel, ZIP, source code
	- Connect tab with health check + auto-load
	- Settings: HF Token + Hub URL saved locally, never sent anywhere except your own Hub
	- Full request/response log with timestamps
	- Runs on Windows, Linux, macOS

	[→ Desktop Client docs](DESKTOP_CLIENT/README.md)

	---

	## CMS & Custom Clients

	\| Client \| Interface used \| Notes \|
	\| :--- \| :--- \| :--- \|
	\| [Desktop Client](DESKTOP_CLIENT/hub.py) \| REST `/api` \| PySide6, local \|
	\| [WP AI Hub](https://github.com/VolkanSah/WP-AI-HUB/) \| REST `/api` \| WordPress plugin \|
	\| TYPO3 (soon) \| REST `/api` \| — \|
	\| Claude Desktop \| MCP `/mcp` \| Streamable HTTP \|
	\| Cursor / Windsurf \| MCP `/mcp` \| Streamable HTTP \|

	---

	## Configuration (.pyfun)

	`app/.pyfun` is the single source of truth for all app behavior. Three tiers:

	```
	LAZY: [HUB] + one [LLM_PROVIDER.*] → works
	NORMAL: + [SEARCH_PROVIDER.] + [MODELS.] → works better
	PRODUCTIVE: + [TOOLS] + [HUB_LIMITS] + [DB_SYNC] → full power
	```

	Key settings in `[HUB]`:

	```ini
	[HUB]
	HUB_TRANSPORT = "streamable-http" # streamable-http \| sse
	HUB_STATELESS = "true" # true = HF Spaces safe, no session state
	HUB_PORT = "7860"
	[HUB_END]
	```

	Adding a new LLM provider — two steps:

	```ini
	# 1. app/.pyfun
	[LLM_PROVIDER.mistral]
	active = "true"
	base_url = "https://api.mistral.ai/v1"
	env_key = "MISTRAL_API_KEY"
	default_model = "mistral-large-latest"
	models = "mistral-large-latest, mistral-small-latest"
	fallback_to = ""
	[LLM_PROVIDER.mistral_END]
	```

	```python
	# 2. app/providers.py — uncomment the dummy
	_PROVIDER_CLASSES = {
	...
	"mistral": MistralProvider, # ← uncomment to activate
	}
	```

	---

	## Dependencies

	```
	# PyFundaments Core (always required)
	asyncpg — async PostgreSQL pool (Guardian/cloud DB)
	python-dotenv — .env loading
	passlib — PBKDF2 password hashing in user_handler.py
	cryptography — encryption layer in fundaments/

	# MCP Hub
	mcp — MCP protocol + FastMCP (Streamable HTTP + SSE)
	httpx — async HTTP for all provider API calls
	quart — async Flask (ASGI) — needed for MCP + hypercorn
	hypercorn — ASGI server — Streamable HTTP + SSE, HF Spaces native
	requests — sync HTTP for tool workers

	# Optional (uncomment in requirements.txt as needed)
	# aiofiles — async file ops (ML pipelines, file uploads)
	# discord.py — Discord bot integration (planned)
	# PyNaCl — Discord signature verification
	# psycopg2-binary — alternative PostgreSQL driver
	```

	> Note: The package is `mcp` (not `fastmcp`) — `FastMCP` is imported from `mcp.server.fastmcp`.
	> Streamable HTTP support requires `mcp >= 1.6.0`.

	---

	## Security Design

	- API keys live in HF Secrets / `.env` — never in `.pyfun`, never in code
	- `list_active_tools` returns key names only — never values
	- `db_query` is SELECT-only, enforced at application level (not just docs)
	- `app/*` has zero import access to `fundaments/` internals
	- Direct execution of `app/app.py` blocked by design — warning + null-fundaments fallback
	- `fundaments/` initialized conditionally — missing services degrade gracefully, never crash
	- Streamable HTTP uses standard Bearer headers — no token-in-URL (unlike SSE)

	> PyFundaments is not perfect. But it's more secure than most of what runs in production today.

	[→ Full Security Policy](SECURITY.md)

	---

	## Foundation

	Built on [PyFundaments](PyFundaments.md) — a security-first Python boilerplate:

	- `config_handler.py` — env loading with validation
	- `postgresql.py` — async DB pool (Guardian-only)
	- `encryption.py` — key-based encryption layer
	- `access_control.py` — role/permission management
	- `user_handler.py` — user lifecycle management
	- `security.py` — unified security manager composing the above

	None accessible from `app/*`. Injected as a validated dict by `main.py`.

	[→ PyFundaments Function Overview](PyFundaments%20–%20Function%20Overview.md)
	[→ Module Docs](docs/app/)
	[→ Source Repo](https://github.com/VolkanSah/Multi-LLM-API-Gateway)

	---

	## Related Projects

	- [Customs LLMs for free — Build Your Own LLM Service](https://github.com/VolkanSah/SmolLM2-customs/)
	- [WP AI Hub (WordPress Client)](https://github.com/VolkanSah/WP-AI-HUB/)
	- [ShellMaster (2023 precursor)](https://github.com/VolkanSah/ChatGPT-ShellMaster)

	---

	## History

	[ShellMaster](https://github.com/VolkanSah/ChatGPT-ShellMaster) (2023, MIT) was the precursor — browser-accessible shell for ChatGPT with session memory, built before MCP was a concept. Universal MCP Hub is its natural evolution: same idea, proper architecture, dual interface.

	---

	## License

	Dual-licensed:

	- [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)
	- [Ethical Security Operations License v1.1 (ESOL)](ESOL) — mandatory, non-severable

	By using this software you agree to all ethical constraints defined in ESOL v1.1.

	---

	Architecture, security decisions, and PyFundaments by Volkan Kücükbudak.
	Built with Claude (Anthropic) as a typing assistant for docs (and the occasional bug).

	> crafted with passion — just wanted to understand how it works, don't actually need it, have a CLI 😄