NLProxy / nlproxy /docs /utils.md
Luiserb's picture
first commit
2129c29
|
Raw
History Blame Contribute Delete
3.49 kB
# NLProxy Utilities Module Reference
This document covers shared utility modules in `utils/`.
## Purpose
Utility modules centralize constants, pricing data, and logging configuration used across the NLProxy codebase.
## Files
### `utils/constants.py`
#### Purpose
Provides canonical pricing data, aggressiveness presets, semantic firewall defaults, and other system constants.
#### Pricing
- `MODEL_PRICING` contains per-1,000-token pricing for supported models.
- Pricing can be overridden using environment variables of the form:
- `NLPROXY_PRICE_{MODEL_NAME}_INPUT`
- `NLPROXY_PRICE_{MODEL_NAME}_OUTPUT`
- `_normalize_pricing_env_name()` sanitizes the model name into a valid env var segment.
- `PROVIDER_PRICING` is the runtime pricing table after env overrides.
#### Aggressiveness Presets
- `AGGRESSIVENESS_MAP`
- `legal`: `0.25`
- `finance`: `0.30`
- `code`: `0.45`
- `general`: `0.40`
#### Semantic Firewall Defaults
- `SEMANTIC_FIREWALL_CONFIG` controls optional semantic attack detection.
- Default device preference is `cpu`.
- Similarity threshold is `0.85` by default.
#### Semantic Stopwords
- `SEMANTIC_STOPWORDS` contains curated low-value phrases used during prompt reconstruction.
- Includes English and Spanish tokens for bilingual prompt handling.
#### System Defaults
- `DEFAULT_AGGRESSIVENESS`: `0.2`
- `DEFAULT_CONFIDENCE_THRESHOLD`: `0.6`
- `DEFAULT_MAX_TOKENS`: `512`
- `DEFAULT_TIMEOUT_SECONDS`: `30.0`
- `DEFAULT_BATCH_SIZE`: `32`
- `DEFAULT_EMBEDDING_DIM`: `384`
- `DEFAULT_SIMILARITY_THRESHOLD`: `0.92`
#### API Constants
- `API_VERSION`: `v1`
- `CHAT_ENDPOINT`: `/v1/chat/completions`
- `HEALTH_ENDPOINT`: `/health`
- `METRICS_ENDPOINT`: `/metrics`
- `DOCS_ENDPOINT`: `/docs`
#### Security Constants
- `PLACEHOLDER_PREFIX`: `__PROT_`
- `MAX_PROMPT_LENGTH`: `100_000`
- `MAX_RESPONSE_LENGTH`: `50_000`
- `ALLOWED_ROLES`: `{"system", "user", "assistant"}`
#### Utility Functions
- `get_pricing(model_name: str) -> Dict[str, float]`
- Returns pricing data for a given model name.
- Falls back to `default` pricing if unknown.
### `utils/logger.py`
#### Purpose
Standardizes logging configuration and request-context logging across the project.
#### Key Components
- `ContextFilter`
- Thread-local context propagation for `request_id`, `user_id`, and custom fields.
- Methods: `set_context()`, `get_context()`, `clear_context()`.
- `JSONFormatter`
- Emits structured JSON logs for production observability.
- Includes timestamp, level, module, function, line, and additional context.
- `PrettyFormatter`
- Emits ANSI-colored human-readable logs for development.
- `setup_logging(level, format_type, log_dir, max_bytes, backup_count, disable_existing)`
- Configures root logger and optional rotating file output.
- Detects environment via `NLPROXY_ENV`.
- `get_request_logger(name)`
- Returns a `LoggerAdapter` bound to current context.
#### Performance Considerations
- Logging format is selected based on environment to balance readability and parsing.
- File rotation prevents unbounded disk growth.
- Third-party noise is suppressed for stable runtime logs.
#### Edge Cases
- `setup_logging()` is idempotent and no-op after first initialization.
- Context filter gracefully handles missing context data.
## Implementation Notes
- Utility modules are intentionally low-dependency and safe to import in startup initialization.
- Logging format selection is automatic unless overridden explicitly.