Spaces:
Running
Running
| # NLProxy Server Module Reference | |
| This document covers the FastAPI server implementation in the `server/` package. | |
| ## Purpose | |
| The server module exposes HTTP APIs, handles dependency initialization, and coordinates request lifecycle management for NLProxy. | |
| ## Files and Responsibilities | |
| ### `server/config.py` | |
| #### Purpose | |
| Defines runtime settings using Pydantic. | |
| #### Key Features | |
| - Uses `ConfigDict(env_prefix="NLPROXY_", case_sensitive=False)`. | |
| - Supports boolean parsing for env values such as `true`, `yes`, `on`, and `1`. | |
| - Validates provider names against `LLMProvider`. | |
| #### Configurable Values | |
| - Server: `host`, `port`, `workers`, `log_level` | |
| - Redis: `redis_url`, `redis_max_connections`, `redis_socket_timeout` | |
| - Cache: `enable_semantic_cache`, `cache_similarity_threshold`, `cache_default_ttl`, `cache_embedding_dim` | |
| - Compression: `default_aggressiveness`, `max_compression_timeout`, `compression_max_retries` | |
| - LLM: `default_llm_provider`, `default_llm_model`, `llm_request_timeout`, `llm_max_retries`, `enable_llm_fallback` | |
| - Safety: `min_confidence_threshold`, `max_regeneration_attempts`, `enable_auto_correction`, `privacy_mode_default` | |
| - Observability: `enable_metrics`, `metrics_path`, `enable_tracing`, `trace_sample_rate` | |
| - Verification: `enable_nli_verification`, `enable_perplexity_check`, `enable_semantic_drift` | |
| ### `server/main.py` | |
| #### Purpose | |
| Creates the FastAPI application and registers middleware, routes, exception handlers, and metrics instrumentation. | |
| #### Key Behavior | |
| - Uses `asynccontextmanager` for lifespan management. | |
| - Calls `startup()` before the app becomes ready and `shutdown()` on exit. | |
| - Registers `/docs`, `/redoc`, and `/openapi.json` endpoints. | |
| - Conditionally instruments Prometheus metrics under `settings.metrics_path`. | |
| ### `server/dependencies.py` | |
| #### Purpose | |
| Initializes shared application dependencies and holds module-scoped singletons. | |
| #### Global Objects | |
| - `compression_service` | |
| - `post_verifier` | |
| - `response_corrector` | |
| - `llm_orchestrator` | |
| - `firewall` | |
| - `semantic_cache` | |
| #### Startup Flow | |
| - Instantiates `CompressionService` with cache and model settings. | |
| - Creates `PostLLMVerifier` and optionally hooks NLI refinement into compression. | |
| - Creates `ResponseCorrector` and attaches it to the compression service. | |
| - Creates `LLMOrchestrator` with fallback providers. | |
| - Validates the default provider has credentials configured. | |
| - Instantiates `PromptFirewall` with default regex rules and optional semantic config. | |
| - Initializes `SemanticLLMCache` when enabled. | |
| #### Shutdown Flow | |
| - Calls `llm_orchestrator.close()`. | |
| - Closes Redis connections used by cache. | |
| ### `server/middleware.py` | |
| #### Purpose | |
| Installs standard HTTP middleware. | |
| #### Middleware | |
| - `CORSMiddleware` | |
| - `allow_origins=["*"]` | |
| - `allow_methods=["*"]` | |
| - `allow_headers=["*"]` | |
| - `GZipMiddleware` | |
| - Compresses responses larger than 1000 bytes. | |
| #### Request ID | |
| - Adds `X-Request-ID` header to every response. | |
| - Stores request ID in `request.state`. | |
| ### `server/apis/chat.py` | |
| #### Purpose | |
| Handles chat completion requests with prompt compression, firewall analysis, LLM generation, and post-processing. | |
| #### Key Endpoints | |
| - `POST /v1/chat/completions` | |
| - Accepts `ChatCompletionRequest` | |
| - Returns `ChatCompletionResponse` | |
| #### Request Flow | |
| 1. Validate service readiness. | |
| 2. Perform firewall check on user prompt. | |
| 3. Compress prompt with `CompressionService.compress_batch_async()`. | |
| 4. Apply shield and safety validation. | |
| 5. Call `LLMOrchestrator.generate()` or streaming equivalent. | |
| 6. Apply `ResponseCorrector` and `PostLLMVerifier`. | |
| 7. Optionally auto-correct low-confidence outputs. | |
| 8. Return structured response with NLProxy metadata. | |
| #### Error Handling | |
| - `403` for blocked prompts. | |
| - `400` for invalid user input. | |
| - `504` for compression or LLM timeouts. | |
| - `502` for provider errors. | |
| - `409` when security confidence thresholds are not met. | |
| ### `server/apis/health.py` | |
| #### Purpose | |
| Exposes service diagnostics and health status. | |
| #### Endpoints | |
| - `GET /health` | |
| - Returns aggregate status of compression, LLM providers, and semantic cache. | |
| - `GET /metrics/app` | |
| - Returns basic request and cache metrics. | |
| ### `server/apis/errors.py` | |
| #### Purpose | |
| Defines global exception handlers for HTTP and internal errors. | |
| #### Behavior | |
| - Converts `HTTPException` into `JSONResponse` with structured error payloads. | |
| - Handles unexpected exceptions with HTTP 500. | |
| ### `server/apis/models.py` | |
| #### Purpose | |
| Placeholder for model discovery APIs. | |
| ## Schemas | |
| ### `server/schemas.py` | |
| Defines Pydantic request and response models: | |
| - `Message` | |
| - `ChatCompletionRequest` | |
| - `ChatCompletionResponse` | |
| - `HealthResponse` | |
| - `MetricsResponse` | |
| #### Validation Notes | |
| - `Message.content` must not be empty. | |
| - `ChatCompletionRequest` supports `provider`, `model`, `mode`, `aggressiveness`, `privacy_mode`, `auto_correct`, and `use_perplexity`. | |
| - `privacy_mode` defaults to `settings.privacy_mode_default` if omitted. | |
| ### `server/logger.py` | |
| #### Purpose | |
| Provides FastAPI-specific logger setup. | |
| #### Functions | |
| - `setup_logging(level)` | |
| - `get_request_logger(name)` | |
| ## Deployment Considerations | |
| - The server uses an async lifespan, so startup failures should surface before binding ports. | |
| - Metrics are only enabled if `settings.enable_metrics` is true. | |
| - The server expects dependent modules and models to be available at startup. | |
| - Use `uvicorn` via `cli/runserver.py` for production readiness. | |