# NLProxy Utilities Module Reference This document covers shared utility modules in `utils/`. ## Purpose Utility modules centralize constants, pricing data, and logging configuration used across the NLProxy codebase. ## Files ### `utils/constants.py` #### Purpose Provides canonical pricing data, aggressiveness presets, semantic firewall defaults, and other system constants. #### Pricing - `MODEL_PRICING` contains per-1,000-token pricing for supported models. - Pricing can be overridden using environment variables of the form: - `NLPROXY_PRICE_{MODEL_NAME}_INPUT` - `NLPROXY_PRICE_{MODEL_NAME}_OUTPUT` - `_normalize_pricing_env_name()` sanitizes the model name into a valid env var segment. - `PROVIDER_PRICING` is the runtime pricing table after env overrides. #### Aggressiveness Presets - `AGGRESSIVENESS_MAP` - `legal`: `0.25` - `finance`: `0.30` - `code`: `0.45` - `general`: `0.40` #### Semantic Firewall Defaults - `SEMANTIC_FIREWALL_CONFIG` controls optional semantic attack detection. - Default device preference is `cpu`. - Similarity threshold is `0.85` by default. #### Semantic Stopwords - `SEMANTIC_STOPWORDS` contains curated low-value phrases used during prompt reconstruction. - Includes English and Spanish tokens for bilingual prompt handling. #### System Defaults - `DEFAULT_AGGRESSIVENESS`: `0.2` - `DEFAULT_CONFIDENCE_THRESHOLD`: `0.6` - `DEFAULT_MAX_TOKENS`: `512` - `DEFAULT_TIMEOUT_SECONDS`: `30.0` - `DEFAULT_BATCH_SIZE`: `32` - `DEFAULT_EMBEDDING_DIM`: `384` - `DEFAULT_SIMILARITY_THRESHOLD`: `0.92` #### API Constants - `API_VERSION`: `v1` - `CHAT_ENDPOINT`: `/v1/chat/completions` - `HEALTH_ENDPOINT`: `/health` - `METRICS_ENDPOINT`: `/metrics` - `DOCS_ENDPOINT`: `/docs` #### Security Constants - `PLACEHOLDER_PREFIX`: `__PROT_` - `MAX_PROMPT_LENGTH`: `100_000` - `MAX_RESPONSE_LENGTH`: `50_000` - `ALLOWED_ROLES`: `{"system", "user", "assistant"}` #### Utility Functions - `get_pricing(model_name: str) -> Dict[str, float]` - Returns pricing data for a given model name. - Falls back to `default` pricing if unknown. ### `utils/logger.py` #### Purpose Standardizes logging configuration and request-context logging across the project. #### Key Components - `ContextFilter` - Thread-local context propagation for `request_id`, `user_id`, and custom fields. - Methods: `set_context()`, `get_context()`, `clear_context()`. - `JSONFormatter` - Emits structured JSON logs for production observability. - Includes timestamp, level, module, function, line, and additional context. - `PrettyFormatter` - Emits ANSI-colored human-readable logs for development. - `setup_logging(level, format_type, log_dir, max_bytes, backup_count, disable_existing)` - Configures root logger and optional rotating file output. - Detects environment via `NLPROXY_ENV`. - `get_request_logger(name)` - Returns a `LoggerAdapter` bound to current context. #### Performance Considerations - Logging format is selected based on environment to balance readability and parsing. - File rotation prevents unbounded disk growth. - Third-party noise is suppressed for stable runtime logs. #### Edge Cases - `setup_logging()` is idempotent and no-op after first initialization. - Context filter gracefully handles missing context data. ## Implementation Notes - Utility modules are intentionally low-dependency and safe to import in startup initialization. - Logging format selection is automatic unless overridden explicitly.