Spaces:
Running
Running
File size: 5,466 Bytes
2129c29 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 | # NLProxy Server Module Reference
This document covers the FastAPI server implementation in the `server/` package.
## Purpose
The server module exposes HTTP APIs, handles dependency initialization, and coordinates request lifecycle management for NLProxy.
## Files and Responsibilities
### `server/config.py`
#### Purpose
Defines runtime settings using Pydantic.
#### Key Features
- Uses `ConfigDict(env_prefix="NLPROXY_", case_sensitive=False)`.
- Supports boolean parsing for env values such as `true`, `yes`, `on`, and `1`.
- Validates provider names against `LLMProvider`.
#### Configurable Values
- Server: `host`, `port`, `workers`, `log_level`
- Redis: `redis_url`, `redis_max_connections`, `redis_socket_timeout`
- Cache: `enable_semantic_cache`, `cache_similarity_threshold`, `cache_default_ttl`, `cache_embedding_dim`
- Compression: `default_aggressiveness`, `max_compression_timeout`, `compression_max_retries`
- LLM: `default_llm_provider`, `default_llm_model`, `llm_request_timeout`, `llm_max_retries`, `enable_llm_fallback`
- Safety: `min_confidence_threshold`, `max_regeneration_attempts`, `enable_auto_correction`, `privacy_mode_default`
- Observability: `enable_metrics`, `metrics_path`, `enable_tracing`, `trace_sample_rate`
- Verification: `enable_nli_verification`, `enable_perplexity_check`, `enable_semantic_drift`
### `server/main.py`
#### Purpose
Creates the FastAPI application and registers middleware, routes, exception handlers, and metrics instrumentation.
#### Key Behavior
- Uses `asynccontextmanager` for lifespan management.
- Calls `startup()` before the app becomes ready and `shutdown()` on exit.
- Registers `/docs`, `/redoc`, and `/openapi.json` endpoints.
- Conditionally instruments Prometheus metrics under `settings.metrics_path`.
### `server/dependencies.py`
#### Purpose
Initializes shared application dependencies and holds module-scoped singletons.
#### Global Objects
- `compression_service`
- `post_verifier`
- `response_corrector`
- `llm_orchestrator`
- `firewall`
- `semantic_cache`
#### Startup Flow
- Instantiates `CompressionService` with cache and model settings.
- Creates `PostLLMVerifier` and optionally hooks NLI refinement into compression.
- Creates `ResponseCorrector` and attaches it to the compression service.
- Creates `LLMOrchestrator` with fallback providers.
- Validates the default provider has credentials configured.
- Instantiates `PromptFirewall` with default regex rules and optional semantic config.
- Initializes `SemanticLLMCache` when enabled.
#### Shutdown Flow
- Calls `llm_orchestrator.close()`.
- Closes Redis connections used by cache.
### `server/middleware.py`
#### Purpose
Installs standard HTTP middleware.
#### Middleware
- `CORSMiddleware`
- `allow_origins=["*"]`
- `allow_methods=["*"]`
- `allow_headers=["*"]`
- `GZipMiddleware`
- Compresses responses larger than 1000 bytes.
#### Request ID
- Adds `X-Request-ID` header to every response.
- Stores request ID in `request.state`.
### `server/apis/chat.py`
#### Purpose
Handles chat completion requests with prompt compression, firewall analysis, LLM generation, and post-processing.
#### Key Endpoints
- `POST /v1/chat/completions`
- Accepts `ChatCompletionRequest`
- Returns `ChatCompletionResponse`
#### Request Flow
1. Validate service readiness.
2. Perform firewall check on user prompt.
3. Compress prompt with `CompressionService.compress_batch_async()`.
4. Apply shield and safety validation.
5. Call `LLMOrchestrator.generate()` or streaming equivalent.
6. Apply `ResponseCorrector` and `PostLLMVerifier`.
7. Optionally auto-correct low-confidence outputs.
8. Return structured response with NLProxy metadata.
#### Error Handling
- `403` for blocked prompts.
- `400` for invalid user input.
- `504` for compression or LLM timeouts.
- `502` for provider errors.
- `409` when security confidence thresholds are not met.
### `server/apis/health.py`
#### Purpose
Exposes service diagnostics and health status.
#### Endpoints
- `GET /health`
- Returns aggregate status of compression, LLM providers, and semantic cache.
- `GET /metrics/app`
- Returns basic request and cache metrics.
### `server/apis/errors.py`
#### Purpose
Defines global exception handlers for HTTP and internal errors.
#### Behavior
- Converts `HTTPException` into `JSONResponse` with structured error payloads.
- Handles unexpected exceptions with HTTP 500.
### `server/apis/models.py`
#### Purpose
Placeholder for model discovery APIs.
## Schemas
### `server/schemas.py`
Defines Pydantic request and response models:
- `Message`
- `ChatCompletionRequest`
- `ChatCompletionResponse`
- `HealthResponse`
- `MetricsResponse`
#### Validation Notes
- `Message.content` must not be empty.
- `ChatCompletionRequest` supports `provider`, `model`, `mode`, `aggressiveness`, `privacy_mode`, `auto_correct`, and `use_perplexity`.
- `privacy_mode` defaults to `settings.privacy_mode_default` if omitted.
### `server/logger.py`
#### Purpose
Provides FastAPI-specific logger setup.
#### Functions
- `setup_logging(level)`
- `get_request_logger(name)`
## Deployment Considerations
- The server uses an async lifespan, so startup failures should surface before binding ports.
- Metrics are only enabled if `settings.enable_metrics` is true.
- The server expects dependent modules and models to be available at startup.
- Use `uvicorn` via `cli/runserver.py` for production readiness.
|