File size: 7,015 Bytes

---
tags:
- ColBERT
- PyLate
- sentence-transformers
- sentence-similarity
- feature-extraction
- code-search
- knowledge-distillation
- modernbert
- apple-silicon
- mps
pipeline_tag: sentence-similarity
library_name: PyLate
license: apache-2.0
language:
- en
datasets:
- sentence-transformers/codesearchnet
base_model: lightonai/ColBERT-Zero
---

# ColBERT-Zero-6L-CodeSearch

A **6-layer ColBERT model** distilled from [ColBERT-Zero](https://huggingface.co/lightonai/ColBERT-Zero) (22 layers) for code search, achieving **85% of the teacher's retrieval quality at 13x faster query speed**.

## Model Details

| Parameter | Value |
|-----------|-------|
| **Architecture** | ModernBERT (6 layers, 768 hidden, 12 heads) |
| **Base Model** | [lightonai/ColBERT-Zero](https://huggingface.co/lightonai/ColBERT-Zero) |
| **Output Dimensionality** | 128 per-token embeddings |
| **Similarity Function** | MaxSim (late interaction) |
| **Parameters** | ~38M (vs ~100M teacher) |
| **Query Length** | 32 tokens |
| **Document Length** | 180 tokens |
| **License** | Apache 2.0 |

## Benchmark Results

Evaluated on 3 code search corpora (150 questions total) via [litembeddings](https://github.com/alexandernicholson/litembeddings):

| Corpus | Teacher MRR | Student MRR | % of Teacher | Student Query Speed |
|--------|------------|-------------|--------------|---------------------|
| jq (C) | 0.539 | 0.355 | 65.9% | ~7ms |
| Rails (Ruby) | 0.679 | 0.581 | 85.6% | ~3ms |
| FastAPI (Python) | 0.782 | 0.766 | **98.0%** | ~4ms |
| **Aggregate** | **0.667** | **0.568** | **85.1%** | **~5ms** |

The student model is approximately **13x faster** at query time than the teacher while retaining 85% of retrieval quality. Performance is particularly strong on Python code search (98% of teacher).

## How the Student Was Built

### Architecture: Layer Pruning from Teacher

The student was created by selecting 6 layers from ColBERT-Zero's 22-layer ModernBERT backbone using a **skewed-late** strategy that preserves more upper layers (which encode retrieval-relevant semantics):

```
Teacher layers: [0, 1, 2, ..., 21]  (22 total)
Student layers: [0, 8, 14, 17, 19, 21]  (6 selected)
```

The student inherits:
- All embedding weights from the teacher
- The 768-to-128 ColBERT projection layer
- Selected transformer layers with full weight copying

### Training: Knowledge Distillation

- **Dataset**: [CodeSearchNet](https://huggingface.co/datasets/sentence-transformers/codesearchnet) (10,000 comment-code pairs)
- **Teacher scoring**: ColBERT-Zero generates MaxSim relevance scores for each query against 1 positive + 3 random negative documents
- **Loss**: PyLate Distillation loss (KL divergence between teacher and student score distributions)
- **Optimizer**: AdamW, lr=5e-5, weight_decay=0.01, warmup_ratio=0.1
- **Training**: 1000 steps, batch_size=8, gradient_accumulation=4 (effective batch size 32)
- **Hardware**: Apple Silicon (M4 Max) via PyTorch MPS backend, ~17 minutes total

### Hyperparameter Search

The optimal configuration was found through **30 autonomous experiments** sweeping learning rate, layer selection strategy, batch size, gradient accumulation, weight decay, warmup ratio, number of negatives, training steps, and embedding dimensions. Key findings:

- **Teacher initialization is critical**: Starting from ColBERT-Zero's weights (MRR 0.46) vs raw ModernBERT (MRR 0.08) — a 5.6x improvement
- **Skewed-late layer selection** outperforms evenly-spaced, last-6, and other strategies
- **Effective batch size 32** (bs=8, grad_accum=4) is optimal
- **Weight decay 0.01** provides regularization benefit

## Usage

### Installation

```bash
pip install pylate
```

### Encoding & Retrieval

```python
from pylate import indexes, models, retrieve

# Load model
model = models.ColBERT(model_name_or_path="ctrltokyo/ColBERT-Zero-6L-CodeSearch")

# Encode documents
doc_embeddings = model.encode(
    ["def hello():\n    print('Hello, World!')", "class UserAuth:\n    ..."],
    batch_size=32,
    is_query=False,
    show_progress_bar=True,
)

# Encode queries
query_embeddings = model.encode(
    ["function that prints a greeting"],
    batch_size=32,
    is_query=True,
    show_progress_bar=True,
)

# Score with MaxSim
from pylate.scores import colbert_scores
scores = colbert_scores(query_embeddings, doc_embeddings)
print(scores)  # Higher = more relevant
```

### Reranking

```python
from pylate import rank, models

model = models.ColBERT(model_name_or_path="ctrltokyo/ColBERT-Zero-6L-CodeSearch")

queries = ["how to authenticate users"]
documents = [["def login(user, pwd): ...", "def sort_list(arr): ...", "class AuthMiddleware: ..."]]
documents_ids = [["doc1", "doc2", "doc3"]]

queries_embeddings = model.encode(queries, is_query=True)
documents_embeddings = model.encode(documents, is_query=False)

reranked = rank.rerank(
    documents_ids=documents_ids,
    queries_embeddings=queries_embeddings,
    documents_embeddings=documents_embeddings,
)
```

## GGUF / litembeddings

This model can be converted to GGUF format for use with [litembeddings](https://github.com/alexandernicholson/litembeddings) (SQLite-based embedding engine with SIMD-accelerated MaxSim):

```bash
# Convert to GGUF
python convert_hf_to_gguf.py ctrltokyo/ColBERT-Zero-6L-CodeSearch --outfile model-f16.gguf --outtype f16

# Extract projection
python -c "
from safetensors import safe_open
import numpy as np
f = safe_open('1_Dense/model.safetensors', framework='numpy')
f.get_tensor('linear.weight').astype(np.float32).tofile('model.projection')
"
```

Then in SQL:
```sql
SELECT lembed_model('codesearch', 'model-f16.gguf', '{"colbert_projection": "model.projection"}');
SELECT lembed_maxsim(
    lembed_tokens('search_query: how to sort a list'),
    lembed_tokens('search_document: def quicksort(arr): ...')
);
```

## Limitations

- **Weakest on C code search** (65.9% of teacher on jq corpus) — likely because CodeSearchNet training data is Python-heavy
- **Trained on 10k pairs only** — larger training sets or hard negative mining could improve quality further
- **English only** — inherits ColBERT-Zero's language capabilities
- **No asymmetric prompts** — unlike the teacher, this model does not use `search_query:`/`search_document:` prompts (uses `[Q]`/`[D]` prefixes instead)

## Citation

```bibtex
@misc{colbert-zero-6l-codesearch,
  title={ColBERT-Zero-6L-CodeSearch: A Distilled ColBERT Model for Code Search},
  author={Alexander Nicholson},
  year={2026},
  note={Distilled from ColBERT-Zero (Chaffin et al., 2026) using PyLate on Apple Silicon}
}
```

## Acknowledgments

- [ColBERT-Zero](https://huggingface.co/lightonai/ColBERT-Zero) by LightOn AI — the teacher model
- [PyLate](https://github.com/lightonai/pylate) — ColBERT training framework
- [litembeddings](https://github.com/alexandernicholson/litembeddings) — SQLite embedding engine used for benchmarking
- Training and experimentation performed entirely on Apple Silicon (M4 Max) using PyTorch MPS backend