Spaces:
Running
Running
File size: 2,790 Bytes
2129c29 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | # NLProxy Cache Module Reference
This module documents `cache/semantic_cache.py`.
## Purpose
`SemanticLLMCache` provides a Redis-backed semantic cache for LLM prompt-response pairs. It stores response metadata and embedding vectors, enabling retrieval of semantically similar prior prompts rather than regenerating responses.
## Key Class
### `SemanticLLMCache`
#### Responsibilities
- Normalize and store embedding vectors in RedisVL.
- Search cached vectors based on cosine similarity.
- Enforce TTL-based expiration and domain isolation.
- Maintain hit/miss statistics.
#### Constructor
```python
SemanticLLMCache(
redis_url: str = "redis://localhost:6379",
similarity_threshold: float = 0.92,
default_ttl: int = 3600,
dimension: int = 384,
index_name: str = "prompt_cache",
prefix: str = "cache:",
max_connections: int = 50,
socket_timeout: float = 5.0,
)
```
#### Important Methods
- `_normalize(embedding: np.ndarray) -> List[float]`
- Converts raw embeddings into L2-normalized Python lists.
- Complexity: O(d).
- `store(query_embedding, response_text, metadata, domain)`
- Stores a cached entry in a RedisVL vector index.
- Writes both vector and metadata fields.
- `search(query_embedding, domain=None) -> Optional[Dict[str, Any]]`
- Performs vector similarity search with threshold filtering.
- Complexity: O(N · d) for flat scan; uses RedisVL index heuristics.
- `clear(domain: Optional[str] = None)`
- Deletes cached entries globally or within a domain.
- `get_stats() -> Dict[str, int]`
- Returns hit/miss counters.
## Dependencies
- `redis` / `redis-py`
- `redisvl` for vector search index management
- `numpy`
## Performance Characteristics
- Embedding normalization is linear in embedding dimension.
- Search cost scales with number of indexed entries and vector size.
- RedisVL reduces query latency compared to raw key scans, but the module remains CPU-bound for large indexes.
## Scalability Considerations
- Default Redis connection pool size is 50. This is configurable via `max_connections`.
- `socket_timeout` ensures network faults fail fast.
- For high-volume deployments, Redis clustering or an approximate nearest neighbor store is recommended.
## Operational Guidelines
- Ensure `dimension` matches the embedding model output size.
- Configure `similarity_threshold` carefully; values near `1.0` reduce false positives but also lower hit rate.
- Monitor hit/miss ratios and eviction trends.
## Edge Cases
- The cache treats a missing Redis connection as a hard failure during initialization.
- A vector index with unmatched schema or incompatible dimension will fail to create.
- Entries with stale TTL values are not automatically instant-removed until read or cleanup operations occur.
|