YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

aurekai/semantic-cache-bench

Semantic caching benchmarks and performance suite for Aurekai. Validates cache consistency, hit rates, and query latency across different model architectures and corpus sizes.

Overview

Semantic caching is a core optimization in Aurekai that deduplicates semantically similar queries without exact matching. This repository hosts:

Benchmark Datasets: Query corpora with semantic similarity annotations
Evaluation Scripts: Performance measurement and validation tools
Results: Baseline metrics across different models and cache configurations
Methodology: Detailed documentation of benchmark setup and evaluation protocols

Quick Start

# Download benchmark suite
git clone https://huggingface.co/aurekai/semantic-cache-bench
cd semantic-cache-bench

# Run quick benchmark
akai semantic-cache:bench \
  --dataset queries-10k.jsonl \
  --model qwen3-8b \
  --cache-size 1GB \
  --output results.json

# Compare results
akai semantic-cache:compare \
  --baseline baseline-results.json \
  --current results.json

Benchmark Datasets

queries-1k (Minimal Validation)

Size: 1,024 queries
Purpose: Quick validation of cache functionality
Format: JSONL with semantic similarity pairs
Runtime: ~5 minutes on GPU

Schema:

{
  "id": "q_001_234",
  "query": "What are the benefits of renewable energy?",
  "semantic_variations": [
    "Advantages of wind and solar power",
    "Why should we invest in renewables?"
  ],
  "dissimilar_queries": [
    "How do fossil fuels work?"
  ],
  "expected_cache_hit": true,
  "similarity_threshold": 0.87
}

queries-10k (Standard Benchmark)

Size: 10,240 queries
Purpose: Standard performance baseline
Corpus: Diverse knowledge domains and query patterns
Expected cache hit rate: 68-72%
Runtime: ~45 minutes on GPU

queries-100k (Comprehensive)

Size: 102,400 queries
Purpose: Large-scale cache behavior validation
Corpus: Realistic production query distribution
Expected cache hit rate: 72-76%
Runtime: ~8 hours on high-end GPU
Disk space: 15 GB decompressed

Metrics & Evaluation

Cache Performance

Metric	Qwen3-8B	LLaMA3-8B	Meaning
Hit Rate	71.2%	69.8%	% of queries found in cache
False Positives	0.3%	0.4%	Incorrect cache matches
False Negatives	2.1%	2.4%	Missed cache opportunities
Recall @ 0.90	94.2%	92.8%	True positives at high threshold

Latency Improvement

Cache Miss:    125 ms (full inference)
Cache Hit:     2 ms (embedding lookup + cache retrieval)
Speedup:       62.5x

Average (71% hit rate): 125*0.29 + 2*0.71 = 38 ms
Effective speedup:      3.3x vs. no cache

Memory Efficiency

Embedding cache size: ~800 MB for 100K queries
Memory per cached embedding: ~8 KB
Compression ratio: 1.4x with optional zstd compression
Peak memory during benchmark: 4 GB (with batch size 32)

Running Benchmarks

Standard Evaluation

# Benchmark specific model
akai semantic-cache:bench \
  --model qwen3-8b \
  --dataset queries-10k.jsonl \
  --batch-size 32 \
  --cache-size 2GB \
  --output results.json

# With logging
akai semantic-cache:bench \
  --model qwen3-8b \
  --dataset queries-10k.jsonl \
  --cache-size 2GB \
  --verbose \
  --log-interval 100 \
  --output results.json

Comparison Between Models

# Run on multiple models
for model in qwen3-8b llama3-8b; do
  akai semantic-cache:bench \
    --model $model \
    --dataset queries-10k.jsonl \
    --output results-$model.json
done

# Compare results
akai semantic-cache:compare \
  --results results-qwen3-8b.json results-llama3-8b.json \
  --report comparison-report.md

Validation with Custom Threshold

# Test different similarity thresholds
akai semantic-cache:threshold-sweep \
  --model qwen3-8b \
  --dataset queries-10k.jsonl \
  --thresholds "0.80,0.85,0.90,0.95" \
  --output threshold-sweep.json

Benchmark Results

Latest Results (Aurekai v0.8.0-alpha.1)

Hardware: NVIDIA H100 80GB, AMD EPYC 9654 Date: 2026-05-02

Model	Dataset	Hit Rate	P@0.90	Latency (hit)	Latency (miss)
Qwen3-8B	1K	72.3%	94.1%	1.8ms	124ms
Qwen3-8B	10K	71.2%	93.8%	1.9ms	126ms
LLaMA3-8B	1K	70.1%	92.4%	2.1ms	127ms
LLaMA3-8B	10K	69.8%	92.1%	2.2ms	129ms

See results/ for detailed breakdowns by domain and query type.

Implementation Notes

Cache Configuration

{
  "semantic_cache": {
    "enabled": true,
    "similarity_threshold": 0.88,
    "max_cache_size": "2GB",
    "eviction_policy": "lru",
    "embedding_model": "qwen3-8b",
    "batch_size": 32,
    "use_mmap": true
  }
}

Threshold Selection

Aggressive (0.80): High hit rate (75%+), more false positives
Balanced (0.88): Recommended default, 71% hit rate, minimal false positives
Conservative (0.95): Very few false positives, lower hit rate

Methodology

Query Similarity Annotation

Each benchmark dataset includes human-validated semantic similarity annotations:

Query pairs sampled from corpus
Annotators rate similarity (0-1)
Disagreements resolved with third annotator
Inter-rater reliability: Krippendorff's α = 0.89

Cache Consistency Validation

All cache results validated against ground truth:

# For each cached result:
1. Verify embedding matches original query
2. Re-rank all cached results for current query  
3. Confirm top match was indeed in cache
4. Validate latency was significantly improved

Contributing Results

To contribute benchmark results:

Run benchmark suite on your hardware
Include system specs (GPU, CPU, memory, disk)
Report all metrics from evaluation output
Submit results via PR with hardware metadata

Result file format:

{
  "metadata": {
    "hardware": "NVIDIA H100, 512GB RAM",
    "date": "2026-05-02",
    "aurekai_version": "0.8.0-alpha.1"
  },
  "results": [
    {
      "model": "qwen3-8b",
      "dataset": "queries-10k",
      "hit_rate": 0.712,
      "recall_at_0_90": 0.938
    }
  ]
}

Related Repositories

Main Aurekai Repo: https://github.com/aurekai/aurekai
Model Memory: https://huggingface.co/aurekai/model-memory
SAE Dictionaries: https://huggingface.co/aurekai/sae-dictionaries
FPQx Alignments: https://huggingface.co/aurekai/fpqx-alignments

Tools & Scripts

akai semantic-cache:bench: Run full benchmark suite
akai semantic-cache:compare: Compare benchmark results
akai semantic-cache:threshold-sweep: Test different thresholds
benchmark_to_csv.py: Export results to CSV format
visualize_results.py: Generate performance plots

Citation

If you reference these benchmarks in research:

@dataset{aurekai_semantic_cache_bench_2026,
  title={Aurekai Semantic Cache Benchmarks},
  author={Aurekai Community},
  year={2026},
  url={https://huggingface.co/aurekai/semantic-cache-bench}
}

License

Licensed under the Aurekai Open Source License. See main repository for details.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support