E5-base Code Search v9-200k (LoRA fine-tuned)
A fine-tuned code search embedding model based on intfloat/e5-base-v2 (110M parameters, 768 dimensions). Trained with call-graph false-negative filtering on 200K balanced pairs across 9 programming languages. Built for cqs β code intelligence and RAG for AI agents.
Production Eval (v3.v2 fixture, 2026-05-01)
The headline results below are from cqs's production fixture β 218 queries (109 test + 109 dev) curated from real agent telemetry and LLM-generated retrieval cases on the cqs codebase itself. This is the eval that drives default-model decisions.
| split | metric | BGE-large (1024-dim) | v9-200k bare (768-dim) | v9-200k + LLM summaries |
|---|---|---|---|---|
| test | R@1 | 43.1% | 45.9% | 39.4% |
| test | R@5 | 69.7% | 70.6% | 69.7% |
| test | R@20 | 83.5% | 80.7% | 80.7% |
| dev | R@1 | 45.9% | 46.8% | 45.0% |
| dev | R@5 | 77.1% | 68.8% | 67.9% |
| dev | R@20 | 86.2% | 81.7% | 86.2% |
v9-200k essentially ties BGE-large on test R@5 and edges it on R@1 across both splits β at 1/3 the parameter count and 25% smaller embeddings (768 vs 1024 dim). The cost is on dev R@5/R@20 (~5β8 pp behind), where BGE-large's broader pre-training base helps on out-of-distribution queries. For latency-sensitive or memory-constrained workloads, v9-200k is the right choice.
Run v9-200k bare β skip cqs's --llm-summaries enrichment pass for this model. Adding LLM-generated summaries to chunks (and re-running the embedding pass over the summary-augmented text) hurts test R@1 by ~6 pp and is a wash on R@5 across both splits. The call-graph-trained dense channel already captures the signal summaries would add; injecting summary text dilutes the model's strongest top-1 signal. The only metric that materially benefits is dev R@20 (+4.5 pp). For BGE-large the same enrichment pass is a small net win; the v9-200k training distribution is the difference.
Decision (2026-05-01): cqs keeps BGE-large as default for the dev R@5 hedge, but v9-200k is shipped as a first-class opt-in preset. Set CQS_EMBEDDING_MODEL=v9-200k or cqs slot create v9 --model v9-200k to use it. Don't pass --llm-summaries on the index command unless you're specifically optimizing for dev R@20.
Historical results (296q synthetic fixture)
These are from an earlier synthetic eval (296q across 7 languages, enriched chunks). They show the model's strength on cleanly-curated code-search pairs, where the call-graph training signal is most visible:
| Eval | Metric | This Model | BGE-large (335M) | BGE-large FT (335M) |
|---|---|---|---|---|
| Fixture (296q, 7 languages, enriched) | R@1 | 90.5% | 90.9% | 91.6% |
| Fixture | MRR | 0.948 | 0.949 | 0.952 |
| Raw code embedding (55q, no enrichment) | R@1 | 70.9% | 61.8% | 66.2% |
| CoIR 9-task (19 subtasks) | Overall | 52.7 | 55.7 | 57.5 |
| CoIR CodeSearchNet (6 languages) | NDCG@10 | 0.615 | 0.721 | 0.779 |
Note (2026-05-01): an earlier evaluation against v3.v2 (2026-04-25) reported v9-200k as ~30 pp behind BGE-large and led to a "retired" verdict. That gap turned out to be ~95% fixture-side artifact β cqs's eval matcher required strict (file, name, line_start) to score a gold chunk, and routine code edits between the fixture's pin date and the rerun shifted ~38% of gold-chunk line numbers, making them invisible to the matcher even when search returned them. After loosening the matcher to (file, name) (cqs PR #1284), the numbers in the Production Eval table above are what holds. Lesson: when a benchmark drops 25 pp overnight, suspect the harness before the model.
Training Details
- Base Model: intfloat/e5-base-v2 (110M params, 768 dimensions)
- Data: 200K balanced pairs (22,222 per language Γ 9 languages) from cqs-indexed Stack repos
- Key Technique: Call-graph false-negative filtering β excludes structurally related functions from contrastive negatives (zero API cost, SQLite lookup)
- Loss: CachedGISTEmbedLoss (guide: intfloat/e5-base-v2) + MatryoshkaLoss (768/384/192/128 dims)
- LoRA: rank 16, alpha 32 (targets: query, key, value, dense)
- Epochs: 1 (more epochs degrades enrichment compatibility)
- Dataset: jamie8johnson/cqs-code-search-200k
Supported Languages
Go, Java, JavaScript, PHP, Python, Ruby, Rust, TypeScript, C++
Usage
With cqs
export CQS_EMBEDDING_MODEL=v9-200k
cqs index --force
# or, for slot-based comparisons:
cqs slot create v9 --model v9-200k
cqs index --slot v9 --force
With sentence-transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("jamie8johnson/e5-base-v2-code-search")
query_emb = model.encode("query: find functions that validate email addresses")
code_emb = model.encode("passage: def validate_email(addr): ...")
License
Apache 2.0 (same as base model).
- Downloads last month
- 25
Model tree for jamie8johnson/e5-base-v2-code-search
Base model
intfloat/e5-base-v2