Feature Extraction
sentence-transformers
Safetensors
modernbert
code-search
code-embedding
retrieval
dense
text-embeddings-inference

NightOwl-CodeEmbedding 🦉

NightOwl-CodeEmbedding is a compact 768-dimensional dense embedding model specialized for code retrieval, code-edit retrieval, and technical question answering.

The model is fine-tuned from Shuu12121/NightOwl, a ModernBERT-based code model. It uses CLS pooling with cosine similarity and does not require query: / passage: style prefixes.

Highlights

  • Compact (150.8M parameters) yet competitive on CoIR-style code retrieval benchmarks
  • Covers eight programming languages, including Rust and TypeScript in addition to the six CodeSearchNet languages
  • Handles a wide range of code retrieval scenarios: NL-to-code search, code-to-code retrieval, code-edit retrieval, and technical QA
  • Trained with hard negatives mined by Qwen/Qwen3-Embedding-0.6B (15 hard negatives per anchor)
  • Decontaminated against CodeSearchNet test splits and the CodeEditSearchRetrieval benchmark (see Data Decontamination)
  • Drop-in compatible with sentence-transformers, Apache-2.0 license

Supported Languages

The training data covers the six CodeSearchNet languages plus two additional languages:

  • Go, Java, JavaScript, PHP, Python, Ruby (CodeSearchNet languages)
  • Rust, TypeScript (additional)

Performance on languages outside this set is not guaranteed and may vary.

Usage

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Shuu12121/NightOwl-CodeEmbedding")

queries = ["Python function that sorts a list in descending order"]
documents = [
    "def sort_desc(values): return sorted(values, reverse=True)",
    "def average(values): return sum(values) / len(values)",
]

query_embeddings = model.encode(queries)
document_embeddings = model.encode(documents)

# Cosine similarity (embeddings are normalized internally by similarity())
scores = model.similarity(query_embeddings, document_embeddings)
print(scores)

Model Details

Property Value
Base model Shuu12121/NightOwl
Architecture ModernBERT
Parameters 150,779,136
Embedding dimension 768
Pooling CLS pooling
Maximum sequence length 1,024 tokens
Similarity Cosine similarity
Query/document prefixes Not required
Weight dtype FP32
Weight memory 575 MiB
License Apache-2.0

MTEB Results

The model was evaluated with MTEB on code-related retrieval and technical QA tasks.

Evaluation setup:

  • Model revision: c7c8a57b9539297e192d5cf39b9aecf1fb376edd
  • MTEB version: 2.15.1
  • Metric: NDCG@10
  • Hardware: NVIDIA GeForce RTX 5090
  • Batch size: 64

Multi-subset task scores are reported as macro averages.

Task Split NDCG@10
AppsRetrieval test 0.39177
COIRCodeSearchNetRetrieval test 0.84264
CodeEditSearchRetrieval train¹ 0.74808
CodeFeedbackMT test 0.76690
CodeFeedbackST test 0.85207
CodeSearchNetCCRetrieval test 0.91805
CodeSearchNetRetrieval test 0.89239
CodeTransOceanContest test 0.75953
CodeTransOceanDL test 0.36057
CosQA test 0.42810
StackOverflowQA test 0.86608
SyntheticText2SQL test 0.68266
Macro average, all 12 tasks 0.70907
CoIR macro average, 10 tasks 0.68684

¹ CodeEditSearchRetrieval does not provide a standard test split in MTEB, so the official train split is used for evaluation. These examples were not used for fine-tuning. See Data Decontamination for details.

Because the benchmark suite consists of in-domain code retrieval tasks related to the model's training distribution, these results should not be interpreted as strictly zero-shot performance.

Training

The model was trained with CachedMultipleNegativesRankingLoss using bidirectional query-to-document and document-to-query objectives.

Property Value
Training samples 2,534,400
Positives per anchor 1
Negatives per anchor 15
Loss CachedMultipleNegativesRankingLoss
Objective Bidirectional retrieval training
Hard-negative mining model Qwen/Qwen3-Embedding-0.6B
Epochs 1
Learning rate 6e-5
Batch size 1024

Training Data

The training data is a mixture of:

  1. Public code-retrieval datasets covering the following CoIR task families: AppsRetrieval, COIRCodeSearchNetRetrieval, CodeFeedbackMT, CodeFeedbackST, CodeSearchNetCCRetrieval, CodeSearchNetRetrieval, CodeTransOceanContest, CodeTransOceanDL, CosQA, StackOverflowQA, and SyntheticText2SQL.
  2. Custom code-comment pair data consisting of code snippets paired with natural-language description comments across the eight supported languages (the six CodeSearchNet languages plus Rust and TypeScript).
  3. Code-edit data derived from commitpackft, pairing edit intents with code changes.

All datasets were constructed as hard-negative retrieval datasets: for each anchor, one positive and fifteen hard negatives were used. Hard negatives were mined with Qwen/Qwen3-Embedding-0.6B, which retrieves semantically similar but non-matching candidates, producing negatives that are more difficult than random negatives. The mining model is used only during dataset construction and is not required at inference time.

This setup is intended to improve discrimination between code snippets, programming questions, edit examples, and technically similar retrieval candidates.

Data Decontamination

To reduce benchmark contamination, the following overlaps were removed from the training data before training:

  • Overlaps between the custom code-comment pair data and the CodeSearchNet test split
  • Overlaps between the commitpackft-derived code-edit data and the CodeEditSearchRetrieval benchmark evaluation data

For CodeEditSearchRetrieval, note that MTEB labels the evaluation split train. This refers only to the official split name available for the task; the evaluated examples were not included in this model's fine-tuning data. The reported score should therefore be interpreted as in-domain generalization on held-out benchmark examples, not as training-set performance — though, given the in-domain training distribution, also not as strictly zero-shot performance.

Intended Use

This model is intended for code-related retrieval tasks such as:

  • Natural language to code search
  • Code-to-code retrieval and similar function search
  • Code-edit retrieval (matching edit intents to code changes)
  • Retrieval over programming Q&A and technical questions
  • Local semantic code search systems
  • RAG systems over codebases and developer documentation

Example use cases include indexing functions, snippets, programming solutions, StackOverflow-style answers, code review examples, and edit-related code examples.

Limitations

  • The model is specialized for code-related retrieval and may underperform general-purpose text embedding models on unrelated natural language tasks.
  • Inputs longer than 1,024 tokens are truncated.
  • Performance may vary by programming language, query style, and the granularity of indexed code chunks; languages outside the eight supported languages are untested.
  • The model uses dense single-vector embeddings. For very fine-grained matching, rerankers or late-interaction models may provide better precision.

Recommended Indexing Settings

Encode both queries and documents with normalized embeddings:

embeddings = model.encode(texts, normalize_embeddings=True)

With normalized embeddings, dot product is equivalent to cosine similarity.

For codebase search, indexing function-level or class-level chunks is usually recommended. Very long files may exceed the 1,024-token context limit and should be split into smaller semantic chunks.

Citation

If you use this model, please cite it together with the base model and Sentence Transformers.

@misc{nightowl_codeembedding,
  title = {NightOwl-CodeEmbedding},
  author = {Shuu12121},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Shuu12121/NightOwl-CodeEmbedding}
}
Downloads last month
48
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Shuu12121/NightOwl-CodeEmbedding

Finetuned
(2)
this model

Datasets used to train Shuu12121/NightOwl-CodeEmbedding

Space using Shuu12121/NightOwl-CodeEmbedding 1