Spaces:

IntelliDeep
/

NLProxy

Running

Server: host, port, workers, log_level
Redis: redis_url, redis_max_connections, redis_socket_timeout
Cache: enable_semantic_cache, cache_similarity_threshold, cache_default_ttl, cache_embedding_dim
Compression: default_aggressiveness, max_compression_timeout, compression_max_retries
LLM: default_llm_provider, default_llm_model, llm_request_timeout, llm_max_retries, enable_llm_fallback
Safety: min_confidence_threshold, max_regeneration_attempts, enable_auto_correction, privacy_mode_default
Observability: enable_metrics, metrics_path, enable_tracing, trace_sample_rate
Verification: enable_nli_verification, enable_perplexity_check, enable_semantic_drift

`server/main.py`

Purpose

Creates the FastAPI application and registers middleware, routes, exception handlers, and metrics instrumentation.

Key Behavior

Uses asynccontextmanager for lifespan management.
Calls startup() before the app becomes ready and shutdown() on exit.
Registers /docs, /redoc, and /openapi.json endpoints.
Conditionally instruments Prometheus metrics under settings.metrics_path.

`server/dependencies.py`

Purpose

Initializes shared application dependencies and holds module-scoped singletons.

Global Objects

compression_service
post_verifier
response_corrector
llm_orchestrator
firewall
semantic_cache

Startup Flow

Instantiates CompressionService with cache and model settings.
Creates PostLLMVerifier and optionally hooks NLI refinement into compression.
Creates ResponseCorrector and attaches it to the compression service.
Creates LLMOrchestrator with fallback providers.
Validates the default provider has credentials configured.
Instantiates PromptFirewall with default regex rules and optional semantic config.
Initializes SemanticLLMCache when enabled.

Shutdown Flow

Calls llm_orchestrator.close().
Closes Redis connections used by cache.

`server/middleware.py`

Purpose

Installs standard HTTP middleware.

Middleware

CORSMiddleware
- allow_origins=["*"]
- allow_methods=["*"]
- allow_headers=["*"]
GZipMiddleware
- Compresses responses larger than 1000 bytes.

Request ID

Adds X-Request-ID header to every response.
Stores request ID in request.state.

`server/apis/chat.py`

Purpose

Handles chat completion requests with prompt compression, firewall analysis, LLM generation, and post-processing.

Key Endpoints

POST /v1/chat/completions
- Accepts ChatCompletionRequest
- Returns ChatCompletionResponse

Request Flow

Validate service readiness.
Perform firewall check on user prompt.
Compress prompt with CompressionService.compress_batch_async().
Apply shield and safety validation.
Call LLMOrchestrator.generate() or streaming equivalent.
Apply ResponseCorrector and PostLLMVerifier.
Optionally auto-correct low-confidence outputs.
Return structured response with NLProxy metadata.

Error Handling

403 for blocked prompts.
400 for invalid user input.
504 for compression or LLM timeouts.
502 for provider errors.
409 when security confidence thresholds are not met.

`server/apis/health.py`

Purpose

Exposes service diagnostics and health status.

Endpoints

GET /health
- Returns aggregate status of compression, LLM providers, and semantic cache.
GET /metrics/app
- Returns basic request and cache metrics.

`server/apis/errors.py`

Purpose

Defines global exception handlers for HTTP and internal errors.

Behavior

Converts HTTPException into JSONResponse with structured error payloads.
Handles unexpected exceptions with HTTP 500.

`server/apis/models.py`

Purpose

Placeholder for model discovery APIs.

Schemas

`server/schemas.py`

Defines Pydantic request and response models:

Message
ChatCompletionRequest
ChatCompletionResponse
HealthResponse
MetricsResponse

Validation Notes

Message.content must not be empty.
ChatCompletionRequest supports provider, model, mode, aggressiveness, privacy_mode, auto_correct, and use_perplexity.
privacy_mode defaults to settings.privacy_mode_default if omitted.

`server/logger.py`

Purpose

Provides FastAPI-specific logger setup.

Functions

setup_logging(level)
get_request_logger(name)

Deployment Considerations

The server uses an async lifespan, so startup failures should surface before binding ports.
Metrics are only enabled if settings.enable_metrics is true.
The server expects dependent modules and models to be available at startup.
Use uvicorn via cli/runserver.py for production readiness.