Spaces:
Running
A newer version of the Gradio SDK is available: 6.19.0
NLProxy Server Module Reference
This document covers the FastAPI server implementation in the server/ package.
Purpose
The server module exposes HTTP APIs, handles dependency initialization, and coordinates request lifecycle management for NLProxy.
Files and Responsibilities
server/config.py
Purpose
Defines runtime settings using Pydantic.
Key Features
- Uses
ConfigDict(env_prefix="NLPROXY_", case_sensitive=False). - Supports boolean parsing for env values such as
true,yes,on, and1. - Validates provider names against
LLMProvider.
Configurable Values
- Server:
host,port,workers,log_level - Redis:
redis_url,redis_max_connections,redis_socket_timeout - Cache:
enable_semantic_cache,cache_similarity_threshold,cache_default_ttl,cache_embedding_dim - Compression:
default_aggressiveness,max_compression_timeout,compression_max_retries - LLM:
default_llm_provider,default_llm_model,llm_request_timeout,llm_max_retries,enable_llm_fallback - Safety:
min_confidence_threshold,max_regeneration_attempts,enable_auto_correction,privacy_mode_default - Observability:
enable_metrics,metrics_path,enable_tracing,trace_sample_rate - Verification:
enable_nli_verification,enable_perplexity_check,enable_semantic_drift
server/main.py
Purpose
Creates the FastAPI application and registers middleware, routes, exception handlers, and metrics instrumentation.
Key Behavior
- Uses
asynccontextmanagerfor lifespan management. - Calls
startup()before the app becomes ready andshutdown()on exit. - Registers
/docs,/redoc, and/openapi.jsonendpoints. - Conditionally instruments Prometheus metrics under
settings.metrics_path.
server/dependencies.py
Purpose
Initializes shared application dependencies and holds module-scoped singletons.
Global Objects
compression_servicepost_verifierresponse_correctorllm_orchestratorfirewallsemantic_cache
Startup Flow
- Instantiates
CompressionServicewith cache and model settings. - Creates
PostLLMVerifierand optionally hooks NLI refinement into compression. - Creates
ResponseCorrectorand attaches it to the compression service. - Creates
LLMOrchestratorwith fallback providers. - Validates the default provider has credentials configured.
- Instantiates
PromptFirewallwith default regex rules and optional semantic config. - Initializes
SemanticLLMCachewhen enabled.
Shutdown Flow
- Calls
llm_orchestrator.close(). - Closes Redis connections used by cache.
server/middleware.py
Purpose
Installs standard HTTP middleware.
Middleware
CORSMiddlewareallow_origins=["*"]allow_methods=["*"]allow_headers=["*"]
GZipMiddleware- Compresses responses larger than 1000 bytes.
Request ID
- Adds
X-Request-IDheader to every response. - Stores request ID in
request.state.
server/apis/chat.py
Purpose
Handles chat completion requests with prompt compression, firewall analysis, LLM generation, and post-processing.
Key Endpoints
POST /v1/chat/completions- Accepts
ChatCompletionRequest - Returns
ChatCompletionResponse
- Accepts
Request Flow
- Validate service readiness.
- Perform firewall check on user prompt.
- Compress prompt with
CompressionService.compress_batch_async(). - Apply shield and safety validation.
- Call
LLMOrchestrator.generate()or streaming equivalent. - Apply
ResponseCorrectorandPostLLMVerifier. - Optionally auto-correct low-confidence outputs.
- Return structured response with NLProxy metadata.
Error Handling
403for blocked prompts.400for invalid user input.504for compression or LLM timeouts.502for provider errors.409when security confidence thresholds are not met.
server/apis/health.py
Purpose
Exposes service diagnostics and health status.
Endpoints
GET /health- Returns aggregate status of compression, LLM providers, and semantic cache.
GET /metrics/app- Returns basic request and cache metrics.
server/apis/errors.py
Purpose
Defines global exception handlers for HTTP and internal errors.
Behavior
- Converts
HTTPExceptionintoJSONResponsewith structured error payloads. - Handles unexpected exceptions with HTTP 500.
server/apis/models.py
Purpose
Placeholder for model discovery APIs.
Schemas
server/schemas.py
Defines Pydantic request and response models:
MessageChatCompletionRequestChatCompletionResponseHealthResponseMetricsResponse
Validation Notes
Message.contentmust not be empty.ChatCompletionRequestsupportsprovider,model,mode,aggressiveness,privacy_mode,auto_correct, anduse_perplexity.privacy_modedefaults tosettings.privacy_mode_defaultif omitted.
server/logger.py
Purpose
Provides FastAPI-specific logger setup.
Functions
setup_logging(level)get_request_logger(name)
Deployment Considerations
- The server uses an async lifespan, so startup failures should surface before binding ports.
- Metrics are only enabled if
settings.enable_metricsis true. - The server expects dependent modules and models to be available at startup.
- Use
uvicornviacli/runserver.pyfor production readiness.