File size: 5,466 Bytes
2129c29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
# NLProxy Server Module Reference

This document covers the FastAPI server implementation in the `server/` package.

## Purpose

The server module exposes HTTP APIs, handles dependency initialization, and coordinates request lifecycle management for NLProxy.

## Files and Responsibilities

### `server/config.py`

#### Purpose

Defines runtime settings using Pydantic.

#### Key Features

- Uses `ConfigDict(env_prefix="NLPROXY_", case_sensitive=False)`.
- Supports boolean parsing for env values such as `true`, `yes`, `on`, and `1`.
- Validates provider names against `LLMProvider`.

#### Configurable Values

- Server: `host`, `port`, `workers`, `log_level`
- Redis: `redis_url`, `redis_max_connections`, `redis_socket_timeout`
- Cache: `enable_semantic_cache`, `cache_similarity_threshold`, `cache_default_ttl`, `cache_embedding_dim`
- Compression: `default_aggressiveness`, `max_compression_timeout`, `compression_max_retries`
- LLM: `default_llm_provider`, `default_llm_model`, `llm_request_timeout`, `llm_max_retries`, `enable_llm_fallback`
- Safety: `min_confidence_threshold`, `max_regeneration_attempts`, `enable_auto_correction`, `privacy_mode_default`
- Observability: `enable_metrics`, `metrics_path`, `enable_tracing`, `trace_sample_rate`
- Verification: `enable_nli_verification`, `enable_perplexity_check`, `enable_semantic_drift`

### `server/main.py`

#### Purpose

Creates the FastAPI application and registers middleware, routes, exception handlers, and metrics instrumentation.

#### Key Behavior

- Uses `asynccontextmanager` for lifespan management.
- Calls `startup()` before the app becomes ready and `shutdown()` on exit.
- Registers `/docs`, `/redoc`, and `/openapi.json` endpoints.
- Conditionally instruments Prometheus metrics under `settings.metrics_path`.

### `server/dependencies.py`

#### Purpose

Initializes shared application dependencies and holds module-scoped singletons.

#### Global Objects

- `compression_service`
- `post_verifier`
- `response_corrector`
- `llm_orchestrator`
- `firewall`
- `semantic_cache`

#### Startup Flow

- Instantiates `CompressionService` with cache and model settings.
- Creates `PostLLMVerifier` and optionally hooks NLI refinement into compression.
- Creates `ResponseCorrector` and attaches it to the compression service.
- Creates `LLMOrchestrator` with fallback providers.
- Validates the default provider has credentials configured.
- Instantiates `PromptFirewall` with default regex rules and optional semantic config.
- Initializes `SemanticLLMCache` when enabled.

#### Shutdown Flow

- Calls `llm_orchestrator.close()`.
- Closes Redis connections used by cache.

### `server/middleware.py`

#### Purpose

Installs standard HTTP middleware.

#### Middleware

- `CORSMiddleware`
  - `allow_origins=["*"]`
  - `allow_methods=["*"]`
  - `allow_headers=["*"]`
- `GZipMiddleware`
  - Compresses responses larger than 1000 bytes.

#### Request ID

- Adds `X-Request-ID` header to every response.
- Stores request ID in `request.state`.

### `server/apis/chat.py`

#### Purpose

Handles chat completion requests with prompt compression, firewall analysis, LLM generation, and post-processing.

#### Key Endpoints

- `POST /v1/chat/completions`
  - Accepts `ChatCompletionRequest`
  - Returns `ChatCompletionResponse`

#### Request Flow

1. Validate service readiness.
2. Perform firewall check on user prompt.
3. Compress prompt with `CompressionService.compress_batch_async()`.
4. Apply shield and safety validation.
5. Call `LLMOrchestrator.generate()` or streaming equivalent.
6. Apply `ResponseCorrector` and `PostLLMVerifier`.
7. Optionally auto-correct low-confidence outputs.
8. Return structured response with NLProxy metadata.

#### Error Handling

- `403` for blocked prompts.
- `400` for invalid user input.
- `504` for compression or LLM timeouts.
- `502` for provider errors.
- `409` when security confidence thresholds are not met.

### `server/apis/health.py`

#### Purpose

Exposes service diagnostics and health status.

#### Endpoints

- `GET /health`
  - Returns aggregate status of compression, LLM providers, and semantic cache.
- `GET /metrics/app`
  - Returns basic request and cache metrics.

### `server/apis/errors.py`

#### Purpose

Defines global exception handlers for HTTP and internal errors.

#### Behavior

- Converts `HTTPException` into `JSONResponse` with structured error payloads.
- Handles unexpected exceptions with HTTP 500.

### `server/apis/models.py`

#### Purpose

Placeholder for model discovery APIs.

## Schemas

### `server/schemas.py`

Defines Pydantic request and response models:

- `Message`
- `ChatCompletionRequest`
- `ChatCompletionResponse`
- `HealthResponse`
- `MetricsResponse`

#### Validation Notes

- `Message.content` must not be empty.
- `ChatCompletionRequest` supports `provider`, `model`, `mode`, `aggressiveness`, `privacy_mode`, `auto_correct`, and `use_perplexity`.
- `privacy_mode` defaults to `settings.privacy_mode_default` if omitted.

### `server/logger.py`

#### Purpose

Provides FastAPI-specific logger setup.

#### Functions

- `setup_logging(level)`
- `get_request_logger(name)`

## Deployment Considerations

- The server uses an async lifespan, so startup failures should surface before binding ports.
- Metrics are only enabled if `settings.enable_metrics` is true.
- The server expects dependent modules and models to be available at startup.
- Use `uvicorn` via `cli/runserver.py` for production readiness.