| # z.ai provider + MCP proxy (implemented) |
|
|
| This document describes the z.ai integration that is implemented on the `feat/zai-passthrough-mcp` branch: what was added, how it works internally, and how to validate it. |
|
|
| Related deep dives: |
| - [`docs/zai/provider.md`](provider.md) |
| - [`docs/zai/mcp.md`](mcp.md) |
| - [`docs/zai/vision-mcp.md`](vision-mcp.md) |
| - [`docs/proxy/auth.md`](../proxy/auth.md) |
| - [`docs/proxy/accounts.md`](../proxy/accounts.md) |
|
|
| ## Scope (current) |
| - z.ai is integrated as an **optional upstream** for **Anthropic/Claude protocol only** (`/v1/messages`, `/v1/messages/count_tokens`). |
| - OpenAI and Gemini protocol handlers are unchanged and continue to use the existing Google-backed pool. |
| - z.ai MCP (Search + Reader) is exposed via local proxy endpoints (reverse proxy) and injects the z.ai API key upstream. |
| - Vision MCP is exposed via a **built-in MCP server** (local endpoint) and uses the stored z.ai API key to call the z.ai vision API. |
|
|
| ## Configuration |
| All settings are persisted in the existing data directory (same place as Google accounts and `gui_config.json`). |
|
|
| ### Proxy auth |
| - `proxy.auth_mode` (`off` | `strict` | `all_except_health` | `auto`) |
| - `off`: no auth required |
| - `strict`: auth required for all routes |
| - `all_except_health`: auth required for all routes except `GET /healthz` |
| - `auto`: if `allow_lan_access=true` -> `all_except_health`, else `off` |
| - `proxy.api_key`: required when auth is enabled |
|
|
| Implementation: |
| - Backend enum: [`src-tauri/src/proxy/config.rs`](../../src-tauri/src/proxy/config.rs) (`ProxyAuthMode`) |
| - Effective policy resolver: [`src-tauri/src/proxy/security.rs`](../../src-tauri/src/proxy/security.rs) |
| - Middleware enforcement: [`src-tauri/src/proxy/middleware/auth.rs`](../../src-tauri/src/proxy/middleware/auth.rs) |
|
|
| ### z.ai provider |
| Config lives under `proxy.zai` (`src-tauri/src/proxy/config.rs`): |
| - `enabled: bool` |
| - `base_url: string` (default `https://api.z.ai/api/anthropic`) |
| - `api_key: string` |
| - `dispatch_mode: off | exclusive | pooled | fallback` |
| - `off`: never use z.ai |
| - `exclusive`: all Claude protocol requests go to z.ai |
| - `pooled`: z.ai is treated as **one additional slot** in the shared pool (no priority, no strict guarantee) |
| - `fallback`: z.ai is used only when the Google pool has 0 accounts |
| - `models`: defaults used when the incoming Anthropic request uses `claude-*` model ids |
| - `opus` default `glm-4.7` |
| - `sonnet` default `glm-4.7` |
| - `haiku` default `glm-4.5-air` |
| - `model_mapping`: optional exact-match overrides (`{ "<incoming_model>": "<glm-model-id>" }`) |
| - When a key matches the incoming `model` string, it is replaced with the mapped z.ai model id before forwarding upstream. |
| - `mcp` toggles: |
| - `enabled` |
| - `web_search_enabled` |
| - `web_reader_enabled` |
| - `vision_enabled` |
|
|
| Runtime hot update: |
| - `save_config` hot-updates `auth`, `upstream_proxy`, `model mappings`, and `z.ai` without restart. |
| - `src-tauri/src/commands/mod.rs` calls `axum_server.update_security(...)` and `axum_server.update_zai(...)`. |
|
|
| ## Request routing |
|
|
| ### `/v1/messages` (Anthropic messages) |
| Handler: `src-tauri/src/proxy/handlers/claude.rs` (`handle_messages`) |
|
|
| Flow: |
| 1. The handler receives `HeaderMap` + raw JSON `Value`. |
| 2. It decides whether to use z.ai or the existing Google flow: |
| - If z.ai is disabled -> use Google flow. |
| - If `dispatch_mode=exclusive` -> use z.ai. |
| - If `dispatch_mode=fallback` -> use z.ai only if Google pool size is 0. |
| - If `dispatch_mode=pooled` -> use round-robin across `(google_accounts + 1)` slots; slot `0` is z.ai, others are Google. |
| 3. If z.ai is selected: |
| - The raw JSON is forwarded to z.ai as-is (streaming is supported by byte passthrough). |
| - The request `model` may be rewritten: |
| - if `proxy.zai.model_mapping` contains an exact match, that mapping wins |
| - `glm-*` stays unchanged |
| - `claude-*` becomes one of `proxy.zai.models.{opus,sonnet,haiku}` based on name match |
| 4. Otherwise: |
| - The existing Claude→Gemini transform and Google-backed execution path runs as before. |
|
|
| ### `/v1/messages/count_tokens` |
| Handler: `src-tauri/src/proxy/handlers/claude.rs` (`handle_count_tokens`) |
| - If z.ai is enabled (mode != off), this request is forwarded to z.ai. |
| - Otherwise it returns the existing placeholder `{input_tokens: 0, output_tokens: 0}`. |
| |
| ## Upstream forwarding details (z.ai Anthropic) |
| Provider: `src-tauri/src/proxy/providers/zai_anthropic.rs` |
|
|
| Security / header handling: |
| - The local proxy API key must **never** be forwarded upstream. |
| - Only a conservative set of incoming headers is forwarded (e.g. `content-type`, `accept`, `anthropic-version`, `user-agent`). |
| - z.ai auth is injected: |
| - If the client used `x-api-key`, it is replaced with z.ai key. |
| - If the client used `Authorization`, it is replaced with `Bearer <zai_key>`. |
| - If neither is present, `x-api-key: <zai_key>` is used. |
| - Responses are streamed back to the client without parsing SSE. |
|
|
| Networking: |
| - Respects the global upstream proxy config (`proxy.upstream_proxy`) for outbound HTTP calls. |
|
|
| ## MCP reverse proxy (Search + Reader) |
| Handlers: `src-tauri/src/proxy/handlers/mcp.rs` |
| Routes: `src-tauri/src/proxy/server.rs` |
|
|
| Local endpoints: |
| - `/mcp/web_search_prime/mcp` → `https://api.z.ai/api/mcp/web_search_prime/mcp` |
| - `/mcp/web_reader/mcp` → `https://api.z.ai/api/mcp/web_reader/mcp` |
|
|
| Behavior: |
| - Controlled by `proxy.zai.mcp.*` flags: |
| - If `mcp.enabled=false` -> endpoints return 404. |
| - If per-server flag is false -> returns 404 for that endpoint. |
| - z.ai key is injected upstream as `Authorization: Bearer <zai_key>`. |
| - Response body is streamed back to the client. |
|
|
| Note: |
| - These endpoints are still subject to the proxy’s auth middleware depending on `proxy.auth_mode`. |
|
|
| ## Vision MCP (built-in server) |
| Handlers: |
| - [`src-tauri/src/proxy/handlers/mcp.rs`](../../src-tauri/src/proxy/handlers/mcp.rs) (`handle_zai_mcp_server`) |
| - [`src-tauri/src/proxy/zai_vision_tools.rs`](../../src-tauri/src/proxy/zai_vision_tools.rs) (tool registry + z.ai vision API client) |
|
|
| Local endpoint: |
| - `/mcp/zai-mcp-server/mcp` |
|
|
| Behavior: |
| - Controlled by `proxy.zai.mcp.enabled` and `proxy.zai.mcp.vision_enabled`. |
| - If `mcp.enabled=false` -> returns 404. |
| - If `vision_enabled=false` -> returns 404. |
| - No z.ai key is required from MCP clients: |
| - the proxy injects the stored `proxy.zai.api_key` when calling the z.ai vision API. |
| - Implements a minimal Streamable HTTP MCP flow: |
| - `POST /mcp` supports `initialize`, `tools/list`, `tools/call` |
| - `GET /mcp` returns an SSE stream with keep-alive events for an initialized session |
| - `DELETE /mcp` terminates a session |
|
|
| Upstream calls: |
| - z.ai vision endpoint: `https://api.z.ai/api/paas/v4/chat/completions` |
| - Uses `Authorization: Bearer <zai_key>` |
| - Default model: `glm-4.6v` (hardcoded for now) |
|
|
| Tool input and limits: |
| - Images: `.png`, `.jpg`, `.jpeg` up to 5 MB (local files are encoded as `data:<mime>;base64,...`). |
| - Videos: `.mp4`, `.mov`, `.m4v` up to 8 MB. |
| - Supported tools: |
| - `ui_to_artifact` |
| - `extract_text_from_screenshot` |
| - `diagnose_error_screenshot` |
| - `understand_technical_diagram` |
| - `analyze_data_visualization` |
| - `ui_diff_check` |
| - `analyze_image` |
| - `analyze_video` |
|
|
| ## UI |
| Page: `src/pages/ApiProxy.tsx` |
|
|
| Added controls: |
| - Authorization toggle + mode selector (`off/strict/all_except_health/auto`) |
| - z.ai block: |
| - enable toggle |
| - base_url |
| - dispatch mode |
| - api key input (stored locally) |
| - model mapping UI: |
| - fetch available model ids from the z.ai upstream (`GET <base_url>/v1/models`) |
| - configure default `opus/sonnet/haiku` mapping |
| - configure optional exact-match overrides |
| - MCP toggles + display of local MCP endpoints |
| |
| Translations: |
| - `src/locales/en.json` |
| - `src/locales/zh.json` |
| |
| ## Validation checklist |
| Build: |
| - Frontend: `npm run build` |
| - Backend: `cd src-tauri && cargo build` |
| |
| Manual (example): |
| 1) Enable proxy auth (strict or all-except-health) and note `proxy.api_key`. |
| 2) Enable z.ai and set: |
| - `dispatch_mode=exclusive` |
| - `api_key=<your_z.ai.key>` |
| 3) Start proxy and call: |
| - `GET http://127.0.0.1:<port>/healthz` (should work without auth in all-except-health; always works in off) |
| - `POST http://127.0.0.1:<port>/v1/messages` with `Authorization: Bearer <proxy.api_key>` and a normal Anthropic request body. |
| 4) Enable MCP Search and call local `/mcp/web_search_prime/mcp` via an MCP client (the proxy injects z.ai auth upstream). |
| 5) Enable Vision MCP and verify the tool list: |
| - `POST http://127.0.0.1:<port>/mcp/zai-mcp-server/mcp` with a JSON-RPC `initialize` |
| - then `POST ...` with `tools/list` using the returned `Mcp-Session-Id` header. |
|
|
| ## Known limitations / follow-ups |
| - Vision MCP currently implements the core methods needed for tool calls but is not yet a full feature-complete MCP server (prompts/resources, resumability, streaming tool output). |
| - z.ai usage/budget (monitor endpoints) is not implemented yet. |
| - Claude model list endpoint remains a static stub (`/v1/models/claude`) and is not yet provider-aware. |
|
|