# NLProxy Server Module Reference This document covers the FastAPI server implementation in the `server/` package. ## Purpose The server module exposes HTTP APIs, handles dependency initialization, and coordinates request lifecycle management for NLProxy. ## Files and Responsibilities ### `server/config.py` #### Purpose Defines runtime settings using Pydantic. #### Key Features - Uses `ConfigDict(env_prefix="NLPROXY_", case_sensitive=False)`. - Supports boolean parsing for env values such as `true`, `yes`, `on`, and `1`. - Validates provider names against `LLMProvider`. #### Configurable Values - Server: `host`, `port`, `workers`, `log_level` - Redis: `redis_url`, `redis_max_connections`, `redis_socket_timeout` - Cache: `enable_semantic_cache`, `cache_similarity_threshold`, `cache_default_ttl`, `cache_embedding_dim` - Compression: `default_aggressiveness`, `max_compression_timeout`, `compression_max_retries` - LLM: `default_llm_provider`, `default_llm_model`, `llm_request_timeout`, `llm_max_retries`, `enable_llm_fallback` - Safety: `min_confidence_threshold`, `max_regeneration_attempts`, `enable_auto_correction`, `privacy_mode_default` - Observability: `enable_metrics`, `metrics_path`, `enable_tracing`, `trace_sample_rate` - Verification: `enable_nli_verification`, `enable_perplexity_check`, `enable_semantic_drift` ### `server/main.py` #### Purpose Creates the FastAPI application and registers middleware, routes, exception handlers, and metrics instrumentation. #### Key Behavior - Uses `asynccontextmanager` for lifespan management. - Calls `startup()` before the app becomes ready and `shutdown()` on exit. - Registers `/docs`, `/redoc`, and `/openapi.json` endpoints. - Conditionally instruments Prometheus metrics under `settings.metrics_path`. ### `server/dependencies.py` #### Purpose Initializes shared application dependencies and holds module-scoped singletons. #### Global Objects - `compression_service` - `post_verifier` - `response_corrector` - `llm_orchestrator` - `firewall` - `semantic_cache` #### Startup Flow - Instantiates `CompressionService` with cache and model settings. - Creates `PostLLMVerifier` and optionally hooks NLI refinement into compression. - Creates `ResponseCorrector` and attaches it to the compression service. - Creates `LLMOrchestrator` with fallback providers. - Validates the default provider has credentials configured. - Instantiates `PromptFirewall` with default regex rules and optional semantic config. - Initializes `SemanticLLMCache` when enabled. #### Shutdown Flow - Calls `llm_orchestrator.close()`. - Closes Redis connections used by cache. ### `server/middleware.py` #### Purpose Installs standard HTTP middleware. #### Middleware - `CORSMiddleware` - `allow_origins=["*"]` - `allow_methods=["*"]` - `allow_headers=["*"]` - `GZipMiddleware` - Compresses responses larger than 1000 bytes. #### Request ID - Adds `X-Request-ID` header to every response. - Stores request ID in `request.state`. ### `server/apis/chat.py` #### Purpose Handles chat completion requests with prompt compression, firewall analysis, LLM generation, and post-processing. #### Key Endpoints - `POST /v1/chat/completions` - Accepts `ChatCompletionRequest` - Returns `ChatCompletionResponse` #### Request Flow 1. Validate service readiness. 2. Perform firewall check on user prompt. 3. Compress prompt with `CompressionService.compress_batch_async()`. 4. Apply shield and safety validation. 5. Call `LLMOrchestrator.generate()` or streaming equivalent. 6. Apply `ResponseCorrector` and `PostLLMVerifier`. 7. Optionally auto-correct low-confidence outputs. 8. Return structured response with NLProxy metadata. #### Error Handling - `403` for blocked prompts. - `400` for invalid user input. - `504` for compression or LLM timeouts. - `502` for provider errors. - `409` when security confidence thresholds are not met. ### `server/apis/health.py` #### Purpose Exposes service diagnostics and health status. #### Endpoints - `GET /health` - Returns aggregate status of compression, LLM providers, and semantic cache. - `GET /metrics/app` - Returns basic request and cache metrics. ### `server/apis/errors.py` #### Purpose Defines global exception handlers for HTTP and internal errors. #### Behavior - Converts `HTTPException` into `JSONResponse` with structured error payloads. - Handles unexpected exceptions with HTTP 500. ### `server/apis/models.py` #### Purpose Placeholder for model discovery APIs. ## Schemas ### `server/schemas.py` Defines Pydantic request and response models: - `Message` - `ChatCompletionRequest` - `ChatCompletionResponse` - `HealthResponse` - `MetricsResponse` #### Validation Notes - `Message.content` must not be empty. - `ChatCompletionRequest` supports `provider`, `model`, `mode`, `aggressiveness`, `privacy_mode`, `auto_correct`, and `use_perplexity`. - `privacy_mode` defaults to `settings.privacy_mode_default` if omitted. ### `server/logger.py` #### Purpose Provides FastAPI-specific logger setup. #### Functions - `setup_logging(level)` - `get_request_logger(name)` ## Deployment Considerations - The server uses an async lifespan, so startup failures should surface before binding ports. - Metrics are only enabled if `settings.enable_metrics` is true. - The server expects dependent modules and models to be available at startup. - Use `uvicorn` via `cli/runserver.py` for production readiness.