E5-base Code Search v9-200k (LoRA fine-tuned)

A fine-tuned code search embedding model based on intfloat/e5-base-v2 (110M parameters, 768 dimensions). Trained with call-graph false-negative filtering on 200K balanced pairs across 9 programming languages. Built for cqs β€” code intelligence and RAG for AI agents.

Production Eval (v3.v2 fixture, 2026-05-01)

The headline results below are from cqs's production fixture β€” 218 queries (109 test + 109 dev) curated from real agent telemetry and LLM-generated retrieval cases on the cqs codebase itself. This is the eval that drives default-model decisions.

split metric BGE-large (1024-dim) v9-200k bare (768-dim) v9-200k + LLM summaries
test R@1 43.1% 45.9% 39.4%
test R@5 69.7% 70.6% 69.7%
test R@20 83.5% 80.7% 80.7%
dev R@1 45.9% 46.8% 45.0%
dev R@5 77.1% 68.8% 67.9%
dev R@20 86.2% 81.7% 86.2%

v9-200k essentially ties BGE-large on test R@5 and edges it on R@1 across both splits β€” at 1/3 the parameter count and 25% smaller embeddings (768 vs 1024 dim). The cost is on dev R@5/R@20 (~5–8 pp behind), where BGE-large's broader pre-training base helps on out-of-distribution queries. For latency-sensitive or memory-constrained workloads, v9-200k is the right choice.

Run v9-200k bare β€” skip cqs's --llm-summaries enrichment pass for this model. Adding LLM-generated summaries to chunks (and re-running the embedding pass over the summary-augmented text) hurts test R@1 by ~6 pp and is a wash on R@5 across both splits. The call-graph-trained dense channel already captures the signal summaries would add; injecting summary text dilutes the model's strongest top-1 signal. The only metric that materially benefits is dev R@20 (+4.5 pp). For BGE-large the same enrichment pass is a small net win; the v9-200k training distribution is the difference.

Decision (2026-05-01): cqs keeps BGE-large as default for the dev R@5 hedge, but v9-200k is shipped as a first-class opt-in preset. Set CQS_EMBEDDING_MODEL=v9-200k or cqs slot create v9 --model v9-200k to use it. Don't pass --llm-summaries on the index command unless you're specifically optimizing for dev R@20.

Historical results (296q synthetic fixture)

These are from an earlier synthetic eval (296q across 7 languages, enriched chunks). They show the model's strength on cleanly-curated code-search pairs, where the call-graph training signal is most visible:

Eval Metric This Model BGE-large (335M) BGE-large FT (335M)
Fixture (296q, 7 languages, enriched) R@1 90.5% 90.9% 91.6%
Fixture MRR 0.948 0.949 0.952
Raw code embedding (55q, no enrichment) R@1 70.9% 61.8% 66.2%
CoIR 9-task (19 subtasks) Overall 52.7 55.7 57.5
CoIR CodeSearchNet (6 languages) NDCG@10 0.615 0.721 0.779

Note (2026-05-01): an earlier evaluation against v3.v2 (2026-04-25) reported v9-200k as ~30 pp behind BGE-large and led to a "retired" verdict. That gap turned out to be ~95% fixture-side artifact β€” cqs's eval matcher required strict (file, name, line_start) to score a gold chunk, and routine code edits between the fixture's pin date and the rerun shifted ~38% of gold-chunk line numbers, making them invisible to the matcher even when search returned them. After loosening the matcher to (file, name) (cqs PR #1284), the numbers in the Production Eval table above are what holds. Lesson: when a benchmark drops 25 pp overnight, suspect the harness before the model.

Training Details

  • Base Model: intfloat/e5-base-v2 (110M params, 768 dimensions)
  • Data: 200K balanced pairs (22,222 per language Γ— 9 languages) from cqs-indexed Stack repos
  • Key Technique: Call-graph false-negative filtering β€” excludes structurally related functions from contrastive negatives (zero API cost, SQLite lookup)
  • Loss: CachedGISTEmbedLoss (guide: intfloat/e5-base-v2) + MatryoshkaLoss (768/384/192/128 dims)
  • LoRA: rank 16, alpha 32 (targets: query, key, value, dense)
  • Epochs: 1 (more epochs degrades enrichment compatibility)
  • Dataset: jamie8johnson/cqs-code-search-200k

Supported Languages

Go, Java, JavaScript, PHP, Python, Ruby, Rust, TypeScript, C++

Usage

With cqs

export CQS_EMBEDDING_MODEL=v9-200k
cqs index --force
# or, for slot-based comparisons:
cqs slot create v9 --model v9-200k
cqs index --slot v9 --force

With sentence-transformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("jamie8johnson/e5-base-v2-code-search")
query_emb = model.encode("query: find functions that validate email addresses")
code_emb = model.encode("passage: def validate_email(addr): ...")

License

Apache 2.0 (same as base model).

Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jamie8johnson/e5-base-v2-code-search

Quantized
(13)
this model

Dataset used to train jamie8johnson/e5-base-v2-code-search