Spaces:

IntelliDeep
/

NLProxy

Running

App Files Files Community

NLProxy / nlproxy /docs /server.md

Luiserb

first commit

2129c29 12 days ago

preview code

Raw

History Blame Contribute Delete

5.47 kB

	# NLProxy Server Module Reference

	This document covers the FastAPI server implementation in the `server/` package.

	## Purpose

	The server module exposes HTTP APIs, handles dependency initialization, and coordinates request lifecycle management for NLProxy.

	## Files and Responsibilities

	### `server/config.py`

	#### Purpose

	Defines runtime settings using Pydantic.

	#### Key Features

	- Uses `ConfigDict(env_prefix="NLPROXY_", case_sensitive=False)`.
	- Supports boolean parsing for env values such as `true`, `yes`, `on`, and `1`.
	- Validates provider names against `LLMProvider`.

	#### Configurable Values

	- Server: `host`, `port`, `workers`, `log_level`
	- Redis: `redis_url`, `redis_max_connections`, `redis_socket_timeout`
	- Cache: `enable_semantic_cache`, `cache_similarity_threshold`, `cache_default_ttl`, `cache_embedding_dim`
	- Compression: `default_aggressiveness`, `max_compression_timeout`, `compression_max_retries`
	- LLM: `default_llm_provider`, `default_llm_model`, `llm_request_timeout`, `llm_max_retries`, `enable_llm_fallback`
	- Safety: `min_confidence_threshold`, `max_regeneration_attempts`, `enable_auto_correction`, `privacy_mode_default`
	- Observability: `enable_metrics`, `metrics_path`, `enable_tracing`, `trace_sample_rate`
	- Verification: `enable_nli_verification`, `enable_perplexity_check`, `enable_semantic_drift`

	### `server/main.py`

	#### Purpose

	Creates the FastAPI application and registers middleware, routes, exception handlers, and metrics instrumentation.

	#### Key Behavior

	- Uses `asynccontextmanager` for lifespan management.
	- Calls `startup()` before the app becomes ready and `shutdown()` on exit.
	- Registers `/docs`, `/redoc`, and `/openapi.json` endpoints.
	- Conditionally instruments Prometheus metrics under `settings.metrics_path`.

	### `server/dependencies.py`

	#### Purpose

	Initializes shared application dependencies and holds module-scoped singletons.

	#### Global Objects

	- `compression_service`
	- `post_verifier`
	- `response_corrector`
	- `llm_orchestrator`
	- `firewall`
	- `semantic_cache`

	#### Startup Flow

	- Instantiates `CompressionService` with cache and model settings.
	- Creates `PostLLMVerifier` and optionally hooks NLI refinement into compression.
	- Creates `ResponseCorrector` and attaches it to the compression service.
	- Creates `LLMOrchestrator` with fallback providers.
	- Validates the default provider has credentials configured.
	- Instantiates `PromptFirewall` with default regex rules and optional semantic config.
	- Initializes `SemanticLLMCache` when enabled.

	#### Shutdown Flow

	- Calls `llm_orchestrator.close()`.
	- Closes Redis connections used by cache.

	### `server/middleware.py`

	#### Purpose

	Installs standard HTTP middleware.

	#### Middleware

	- `CORSMiddleware`
	- `allow_origins=["*"]`
	- `allow_methods=["*"]`
	- `allow_headers=["*"]`
	- `GZipMiddleware`
	- Compresses responses larger than 1000 bytes.

	#### Request ID

	- Adds `X-Request-ID` header to every response.
	- Stores request ID in `request.state`.

	### `server/apis/chat.py`

	#### Purpose

	Handles chat completion requests with prompt compression, firewall analysis, LLM generation, and post-processing.

	#### Key Endpoints

	- `POST /v1/chat/completions`
	- Accepts `ChatCompletionRequest`
	- Returns `ChatCompletionResponse`

	#### Request Flow

	1. Validate service readiness.
	2. Perform firewall check on user prompt.
	3. Compress prompt with `CompressionService.compress_batch_async()`.
	4. Apply shield and safety validation.
	5. Call `LLMOrchestrator.generate()` or streaming equivalent.
	6. Apply `ResponseCorrector` and `PostLLMVerifier`.
	7. Optionally auto-correct low-confidence outputs.
	8. Return structured response with NLProxy metadata.

	#### Error Handling

	- `403` for blocked prompts.
	- `400` for invalid user input.
	- `504` for compression or LLM timeouts.
	- `502` for provider errors.
	- `409` when security confidence thresholds are not met.

	### `server/apis/health.py`

	#### Purpose

	Exposes service diagnostics and health status.

	#### Endpoints

	- `GET /health`
	- Returns aggregate status of compression, LLM providers, and semantic cache.
	- `GET /metrics/app`
	- Returns basic request and cache metrics.

	### `server/apis/errors.py`

	#### Purpose

	Defines global exception handlers for HTTP and internal errors.

	#### Behavior

	- Converts `HTTPException` into `JSONResponse` with structured error payloads.
	- Handles unexpected exceptions with HTTP 500.

	### `server/apis/models.py`

	#### Purpose

	Placeholder for model discovery APIs.

	## Schemas

	### `server/schemas.py`

	Defines Pydantic request and response models:

	- `Message`
	- `ChatCompletionRequest`
	- `ChatCompletionResponse`
	- `HealthResponse`
	- `MetricsResponse`

	#### Validation Notes

	- `Message.content` must not be empty.
	- `ChatCompletionRequest` supports `provider`, `model`, `mode`, `aggressiveness`, `privacy_mode`, `auto_correct`, and `use_perplexity`.
	- `privacy_mode` defaults to `settings.privacy_mode_default` if omitted.

	### `server/logger.py`

	#### Purpose

	Provides FastAPI-specific logger setup.

	#### Functions

	- `setup_logging(level)`
	- `get_request_logger(name)`

	## Deployment Considerations

	- The server uses an async lifespan, so startup failures should surface before binding ports.
	- Metrics are only enabled if `settings.enable_metrics` is true.
	- The server expects dependent modules and models to be available at startup.
	- Use `uvicorn` via `cli/runserver.py` for production readiness.