NLProxy / nlproxy /docs /server.md
Luiserb's picture
first commit
2129c29
|
Raw
History Blame Contribute Delete
5.47 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

NLProxy Server Module Reference

This document covers the FastAPI server implementation in the server/ package.

Purpose

The server module exposes HTTP APIs, handles dependency initialization, and coordinates request lifecycle management for NLProxy.

Files and Responsibilities

server/config.py

Purpose

Defines runtime settings using Pydantic.

Key Features

  • Uses ConfigDict(env_prefix="NLPROXY_", case_sensitive=False).
  • Supports boolean parsing for env values such as true, yes, on, and 1.
  • Validates provider names against LLMProvider.

Configurable Values

  • Server: host, port, workers, log_level
  • Redis: redis_url, redis_max_connections, redis_socket_timeout
  • Cache: enable_semantic_cache, cache_similarity_threshold, cache_default_ttl, cache_embedding_dim
  • Compression: default_aggressiveness, max_compression_timeout, compression_max_retries
  • LLM: default_llm_provider, default_llm_model, llm_request_timeout, llm_max_retries, enable_llm_fallback
  • Safety: min_confidence_threshold, max_regeneration_attempts, enable_auto_correction, privacy_mode_default
  • Observability: enable_metrics, metrics_path, enable_tracing, trace_sample_rate
  • Verification: enable_nli_verification, enable_perplexity_check, enable_semantic_drift

server/main.py

Purpose

Creates the FastAPI application and registers middleware, routes, exception handlers, and metrics instrumentation.

Key Behavior

  • Uses asynccontextmanager for lifespan management.
  • Calls startup() before the app becomes ready and shutdown() on exit.
  • Registers /docs, /redoc, and /openapi.json endpoints.
  • Conditionally instruments Prometheus metrics under settings.metrics_path.

server/dependencies.py

Purpose

Initializes shared application dependencies and holds module-scoped singletons.

Global Objects

  • compression_service
  • post_verifier
  • response_corrector
  • llm_orchestrator
  • firewall
  • semantic_cache

Startup Flow

  • Instantiates CompressionService with cache and model settings.
  • Creates PostLLMVerifier and optionally hooks NLI refinement into compression.
  • Creates ResponseCorrector and attaches it to the compression service.
  • Creates LLMOrchestrator with fallback providers.
  • Validates the default provider has credentials configured.
  • Instantiates PromptFirewall with default regex rules and optional semantic config.
  • Initializes SemanticLLMCache when enabled.

Shutdown Flow

  • Calls llm_orchestrator.close().
  • Closes Redis connections used by cache.

server/middleware.py

Purpose

Installs standard HTTP middleware.

Middleware

  • CORSMiddleware
    • allow_origins=["*"]
    • allow_methods=["*"]
    • allow_headers=["*"]
  • GZipMiddleware
    • Compresses responses larger than 1000 bytes.

Request ID

  • Adds X-Request-ID header to every response.
  • Stores request ID in request.state.

server/apis/chat.py

Purpose

Handles chat completion requests with prompt compression, firewall analysis, LLM generation, and post-processing.

Key Endpoints

  • POST /v1/chat/completions
    • Accepts ChatCompletionRequest
    • Returns ChatCompletionResponse

Request Flow

  1. Validate service readiness.
  2. Perform firewall check on user prompt.
  3. Compress prompt with CompressionService.compress_batch_async().
  4. Apply shield and safety validation.
  5. Call LLMOrchestrator.generate() or streaming equivalent.
  6. Apply ResponseCorrector and PostLLMVerifier.
  7. Optionally auto-correct low-confidence outputs.
  8. Return structured response with NLProxy metadata.

Error Handling

  • 403 for blocked prompts.
  • 400 for invalid user input.
  • 504 for compression or LLM timeouts.
  • 502 for provider errors.
  • 409 when security confidence thresholds are not met.

server/apis/health.py

Purpose

Exposes service diagnostics and health status.

Endpoints

  • GET /health
    • Returns aggregate status of compression, LLM providers, and semantic cache.
  • GET /metrics/app
    • Returns basic request and cache metrics.

server/apis/errors.py

Purpose

Defines global exception handlers for HTTP and internal errors.

Behavior

  • Converts HTTPException into JSONResponse with structured error payloads.
  • Handles unexpected exceptions with HTTP 500.

server/apis/models.py

Purpose

Placeholder for model discovery APIs.

Schemas

server/schemas.py

Defines Pydantic request and response models:

  • Message
  • ChatCompletionRequest
  • ChatCompletionResponse
  • HealthResponse
  • MetricsResponse

Validation Notes

  • Message.content must not be empty.
  • ChatCompletionRequest supports provider, model, mode, aggressiveness, privacy_mode, auto_correct, and use_perplexity.
  • privacy_mode defaults to settings.privacy_mode_default if omitted.

server/logger.py

Purpose

Provides FastAPI-specific logger setup.

Functions

  • setup_logging(level)
  • get_request_logger(name)

Deployment Considerations

  • The server uses an async lifespan, so startup failures should surface before binding ports.
  • Metrics are only enabled if settings.enable_metrics is true.
  • The server expects dependent modules and models to be available at startup.
  • Use uvicorn via cli/runserver.py for production readiness.