NLProxy / nlproxy /docs /server.md
Luiserb's picture
first commit
2129c29
|
Raw
History Blame Contribute Delete
5.47 kB
# NLProxy Server Module Reference
This document covers the FastAPI server implementation in the `server/` package.
## Purpose
The server module exposes HTTP APIs, handles dependency initialization, and coordinates request lifecycle management for NLProxy.
## Files and Responsibilities
### `server/config.py`
#### Purpose
Defines runtime settings using Pydantic.
#### Key Features
- Uses `ConfigDict(env_prefix="NLPROXY_", case_sensitive=False)`.
- Supports boolean parsing for env values such as `true`, `yes`, `on`, and `1`.
- Validates provider names against `LLMProvider`.
#### Configurable Values
- Server: `host`, `port`, `workers`, `log_level`
- Redis: `redis_url`, `redis_max_connections`, `redis_socket_timeout`
- Cache: `enable_semantic_cache`, `cache_similarity_threshold`, `cache_default_ttl`, `cache_embedding_dim`
- Compression: `default_aggressiveness`, `max_compression_timeout`, `compression_max_retries`
- LLM: `default_llm_provider`, `default_llm_model`, `llm_request_timeout`, `llm_max_retries`, `enable_llm_fallback`
- Safety: `min_confidence_threshold`, `max_regeneration_attempts`, `enable_auto_correction`, `privacy_mode_default`
- Observability: `enable_metrics`, `metrics_path`, `enable_tracing`, `trace_sample_rate`
- Verification: `enable_nli_verification`, `enable_perplexity_check`, `enable_semantic_drift`
### `server/main.py`
#### Purpose
Creates the FastAPI application and registers middleware, routes, exception handlers, and metrics instrumentation.
#### Key Behavior
- Uses `asynccontextmanager` for lifespan management.
- Calls `startup()` before the app becomes ready and `shutdown()` on exit.
- Registers `/docs`, `/redoc`, and `/openapi.json` endpoints.
- Conditionally instruments Prometheus metrics under `settings.metrics_path`.
### `server/dependencies.py`
#### Purpose
Initializes shared application dependencies and holds module-scoped singletons.
#### Global Objects
- `compression_service`
- `post_verifier`
- `response_corrector`
- `llm_orchestrator`
- `firewall`
- `semantic_cache`
#### Startup Flow
- Instantiates `CompressionService` with cache and model settings.
- Creates `PostLLMVerifier` and optionally hooks NLI refinement into compression.
- Creates `ResponseCorrector` and attaches it to the compression service.
- Creates `LLMOrchestrator` with fallback providers.
- Validates the default provider has credentials configured.
- Instantiates `PromptFirewall` with default regex rules and optional semantic config.
- Initializes `SemanticLLMCache` when enabled.
#### Shutdown Flow
- Calls `llm_orchestrator.close()`.
- Closes Redis connections used by cache.
### `server/middleware.py`
#### Purpose
Installs standard HTTP middleware.
#### Middleware
- `CORSMiddleware`
- `allow_origins=["*"]`
- `allow_methods=["*"]`
- `allow_headers=["*"]`
- `GZipMiddleware`
- Compresses responses larger than 1000 bytes.
#### Request ID
- Adds `X-Request-ID` header to every response.
- Stores request ID in `request.state`.
### `server/apis/chat.py`
#### Purpose
Handles chat completion requests with prompt compression, firewall analysis, LLM generation, and post-processing.
#### Key Endpoints
- `POST /v1/chat/completions`
- Accepts `ChatCompletionRequest`
- Returns `ChatCompletionResponse`
#### Request Flow
1. Validate service readiness.
2. Perform firewall check on user prompt.
3. Compress prompt with `CompressionService.compress_batch_async()`.
4. Apply shield and safety validation.
5. Call `LLMOrchestrator.generate()` or streaming equivalent.
6. Apply `ResponseCorrector` and `PostLLMVerifier`.
7. Optionally auto-correct low-confidence outputs.
8. Return structured response with NLProxy metadata.
#### Error Handling
- `403` for blocked prompts.
- `400` for invalid user input.
- `504` for compression or LLM timeouts.
- `502` for provider errors.
- `409` when security confidence thresholds are not met.
### `server/apis/health.py`
#### Purpose
Exposes service diagnostics and health status.
#### Endpoints
- `GET /health`
- Returns aggregate status of compression, LLM providers, and semantic cache.
- `GET /metrics/app`
- Returns basic request and cache metrics.
### `server/apis/errors.py`
#### Purpose
Defines global exception handlers for HTTP and internal errors.
#### Behavior
- Converts `HTTPException` into `JSONResponse` with structured error payloads.
- Handles unexpected exceptions with HTTP 500.
### `server/apis/models.py`
#### Purpose
Placeholder for model discovery APIs.
## Schemas
### `server/schemas.py`
Defines Pydantic request and response models:
- `Message`
- `ChatCompletionRequest`
- `ChatCompletionResponse`
- `HealthResponse`
- `MetricsResponse`
#### Validation Notes
- `Message.content` must not be empty.
- `ChatCompletionRequest` supports `provider`, `model`, `mode`, `aggressiveness`, `privacy_mode`, `auto_correct`, and `use_perplexity`.
- `privacy_mode` defaults to `settings.privacy_mode_default` if omitted.
### `server/logger.py`
#### Purpose
Provides FastAPI-specific logger setup.
#### Functions
- `setup_logging(level)`
- `get_request_logger(name)`
## Deployment Considerations
- The server uses an async lifespan, so startup failures should surface before binding ports.
- Metrics are only enabled if `settings.enable_metrics` is true.
- The server expects dependent modules and models to be available at startup.
- Use `uvicorn` via `cli/runserver.py` for production readiness.