Spaces:

IntelliDeep
/

NLProxy

Running

File size: 5,466 Bytes

2129c29

# NLProxy Server Module Reference

This document covers the FastAPI server implementation in the `server/` package.

## Purpose

The server module exposes HTTP APIs, handles dependency initialization, and coordinates request lifecycle management for NLProxy.

## Files and Responsibilities

### `server/config.py`

#### Purpose

Defines runtime settings using Pydantic.

#### Key Features

- Uses `ConfigDict(env_prefix="NLPROXY_", case_sensitive=False)`.
- Supports boolean parsing for env values such as `true`, `yes`, `on`, and `1`.
- Validates provider names against `LLMProvider`.

#### Configurable Values

- Server: `host`, `port`, `workers`, `log_level`
- Redis: `redis_url`, `redis_max_connections`, `redis_socket_timeout`
- Cache: `enable_semantic_cache`, `cache_similarity_threshold`, `cache_default_ttl`, `cache_embedding_dim`
- Compression: `default_aggressiveness`, `max_compression_timeout`, `compression_max_retries`
- LLM: `default_llm_provider`, `default_llm_model`, `llm_request_timeout`, `llm_max_retries`, `enable_llm_fallback`
- Safety: `min_confidence_threshold`, `max_regeneration_attempts`, `enable_auto_correction`, `privacy_mode_default`
- Observability: `enable_metrics`, `metrics_path`, `enable_tracing`, `trace_sample_rate`
- Verification: `enable_nli_verification`, `enable_perplexity_check`, `enable_semantic_drift`

### `server/main.py`

#### Purpose

Creates the FastAPI application and registers middleware, routes, exception handlers, and metrics instrumentation.

#### Key Behavior

- Uses `asynccontextmanager` for lifespan management.
- Calls `startup()` before the app becomes ready and `shutdown()` on exit.
- Registers `/docs`, `/redoc`, and `/openapi.json` endpoints.
- Conditionally instruments Prometheus metrics under `settings.metrics_path`.

### `server/dependencies.py`

#### Purpose

Initializes shared application dependencies and holds module-scoped singletons.

#### Global Objects

- `compression_service`
- `post_verifier`
- `response_corrector`
- `llm_orchestrator`
- `firewall`
- `semantic_cache`

#### Startup Flow

- Instantiates `CompressionService` with cache and model settings.
- Creates `PostLLMVerifier` and optionally hooks NLI refinement into compression.
- Creates `ResponseCorrector` and attaches it to the compression service.
- Creates `LLMOrchestrator` with fallback providers.
- Validates the default provider has credentials configured.
- Instantiates `PromptFirewall` with default regex rules and optional semantic config.
- Initializes `SemanticLLMCache` when enabled.

#### Shutdown Flow

- Calls `llm_orchestrator.close()`.
- Closes Redis connections used by cache.

### `server/middleware.py`

#### Purpose

Installs standard HTTP middleware.

#### Middleware

- `CORSMiddleware`
  - `allow_origins=["*"]`
  - `allow_methods=["*"]`
  - `allow_headers=["*"]`
- `GZipMiddleware`
  - Compresses responses larger than 1000 bytes.

#### Request ID

- Adds `X-Request-ID` header to every response.
- Stores request ID in `request.state`.

### `server/apis/chat.py`

#### Purpose

Handles chat completion requests with prompt compression, firewall analysis, LLM generation, and post-processing.

#### Key Endpoints

- `POST /v1/chat/completions`
  - Accepts `ChatCompletionRequest`
  - Returns `ChatCompletionResponse`

#### Request Flow

1. Validate service readiness.
2. Perform firewall check on user prompt.
3. Compress prompt with `CompressionService.compress_batch_async()`.
4. Apply shield and safety validation.
5. Call `LLMOrchestrator.generate()` or streaming equivalent.
6. Apply `ResponseCorrector` and `PostLLMVerifier`.
7. Optionally auto-correct low-confidence outputs.
8. Return structured response with NLProxy metadata.

#### Error Handling

- `403` for blocked prompts.
- `400` for invalid user input.
- `504` for compression or LLM timeouts.
- `502` for provider errors.
- `409` when security confidence thresholds are not met.

### `server/apis/health.py`

#### Purpose

Exposes service diagnostics and health status.

#### Endpoints

- `GET /health`
  - Returns aggregate status of compression, LLM providers, and semantic cache.
- `GET /metrics/app`
  - Returns basic request and cache metrics.

### `server/apis/errors.py`

#### Purpose

Defines global exception handlers for HTTP and internal errors.

#### Behavior

- Converts `HTTPException` into `JSONResponse` with structured error payloads.
- Handles unexpected exceptions with HTTP 500.

### `server/apis/models.py`

#### Purpose

Placeholder for model discovery APIs.

## Schemas

### `server/schemas.py`

Defines Pydantic request and response models:

- `Message`
- `ChatCompletionRequest`
- `ChatCompletionResponse`
- `HealthResponse`
- `MetricsResponse`

#### Validation Notes

- `Message.content` must not be empty.
- `ChatCompletionRequest` supports `provider`, `model`, `mode`, `aggressiveness`, `privacy_mode`, `auto_correct`, and `use_perplexity`.
- `privacy_mode` defaults to `settings.privacy_mode_default` if omitted.

### `server/logger.py`

#### Purpose

Provides FastAPI-specific logger setup.

#### Functions

- `setup_logging(level)`
- `get_request_logger(name)`

## Deployment Considerations

- The server uses an async lifespan, so startup failures should surface before binding ports.
- Metrics are only enabled if `settings.enable_metrics` is true.
- The server expects dependent modules and models to be available at startup.
- Use `uvicorn` via `cli/runserver.py` for production readiness.