Add task-specific instructions via enum (T2C/C2C/C2T) with usage examples

24d2ad5 verified 13 days ago

5.67 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3-Reranker-4B
	tags:
	- code-search
	- reranker
	- code-retrieval
	- peft
	- lora
	language:
	- en
	- code
	datasets:
	- hq-bench/coreb
	pipeline_tag: text-classification
	library_name: transformers
	---

	[![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://hq-bench.github.io/coreb-page/)
	[![arXiv](https://img.shields.io/badge/arXiv-2605.04615-b31b1b.svg)](https://arxiv.org/abs/2605.04615)
	[![Dataset](https://img.shields.io/badge/HuggingFace-Dataset-yellow)](https://huggingface.co/datasets/hq-bench/coreb)
	[![Code](https://img.shields.io/badge/GitHub-Code-black)](https://github.com/hq-bench/coreb)

	# CoREB-Reranker

	CoREB-Reranker is a code reranker fine-tuned from [Qwen3-Reranker-4B](https://huggingface.co/Qwen/Qwen3-Reranker-4B) via LoRA on a mixed reranker corpus. It is the only reranker we evaluate that achieves consistent gains across all three code search tasks (text-to-code, code-to-text, and code-to-code).

	## Highlights

	- Fine-tuned from Qwen3-Reranker-4B using LoRA (rank=16, alpha=16) on 3.1M training samples from a mixed corpus
	- Evaluated on CoREB v202603 (problem-disjoint from training set, no data leakage)
	- Achieves positive reranking delta on all three tasks, unlike all off-the-shelf rerankers tested

	## Reranking Results (nDCG@10 Delta %)

	Reranking delta on CoREB v202603, using C2LLM-7B as the first-stage retriever:

	\| Reranker \| Text-to-Code \| Code-to-Text \| Code-to-Code \|
	\|----------\|:---:\|:---:\|:---:\|
	\| Jina Reranker v2 \| -8.3 \| -22.4 \| -8.8 \|
	\| Jina Reranker v3 \| -2.2 \| -5.0 \| -0.1 \|
	\| Qwen3-Reranker-0.6B \| -0.6 \| -8.2 \| -2.3 \|
	\| Qwen3-Reranker-4B \| -0.1 \| -3.2 \| +3.3 \|
	\| CoREB-Reranker (ours) \| +1.1 \| +0.8 \| +5.1 \|

	## Training Details

	- Base model: [Qwen/Qwen3-Reranker-4B](https://huggingface.co/Qwen/Qwen3-Reranker-4B)
	- Method: LoRA (rank=16, alpha=16, dropout=0.05)
	- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
	- Training data: A mixed reranker corpus consisting of [CoREB v202602](https://huggingface.co/datasets/hq-bench/coreb), [CodeSearchNet](https://github.com/github/CodeSearchNet) (code-to-code, code-to-text, text-to-code), [APPS](https://github.com/hendrycks/apps), [CosQA](https://github.com/Jun-jie-Huang/CosQA), and [CodeFeedback](https://github.com/OpenCodeInterpreter/OpenCodeInterpreter) (single-turn and multi-turn). Each record is normalized into binary reranking examples (instruction, query, document, yes/no). Positives are duplicated twice; one easy negative and one hard negative are sampled per record.
	- Evaluation data: CoREB v202603 (problem-disjoint from CoREB v202602 training split; covers a different contest time window)
	- Training samples: ~3.1M binary reranking examples across text-to-code, code-to-text, and code-to-code tasks
	- Top-k retrieval for reranking: 128

	## Usage

	CoREB-Reranker follows the same usage pattern as Qwen3-Reranker. The instruction is task-specific — use the appropriate one for your retrieval task:

	```python
	from enum import Enum
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	class Task(Enum):
	TEXT_TO_CODE = "Given a natural language programming task, retrieve code that correctly solves or implements the task."
	CODE_TO_CODE = "Given a code snippet, retrieve code that is semantically equivalent or solves the same task."
	CODE_TO_TEXT = "Given a code snippet, retrieve the natural language description or problem statement that best matches the code."

	model_id = "hq-bench/coreb-code-reranker"
	tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, trust_remote_code=True)
	model.eval()

	PREFIX = '<\|im_start\|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<\|im_end\|>\n<\|im_start\|>user\n'
	SUFFIX = "<\|im_end\|>\n<\|im_start\|>assistant\n"
	yes_id = tokenizer.convert_tokens_to_ids("yes")
	no_id = tokenizer.convert_tokens_to_ids("no")

	def score(query: str, document: str, task: Task) -> float:
	prompt = f"{PREFIX}<Instruct>: {task.value}\n<Query>: {query}\n<Document>: {document}{SUFFIX}"
	inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=4096)
	with torch.no_grad():
	logits = model(**inputs).logits[0, -1, :]
	return (logits[yes_id] - logits[no_id]).item()

	# Text-to-Code: natural language query -> code
	print(score(
	query="binary search implementation",
	document="def binary_search(arr, target):\n lo, hi = 0, len(arr) - 1\n ...",
	task=Task.TEXT_TO_CODE,
	))

	# Code-to-Code: code -> semantically equivalent code
	print(score(
	query="def binary_search(arr, target): ...",
	document="int binarySearch(int[] arr, int target) { ... }",
	task=Task.CODE_TO_CODE,
	))

	# Code-to-Text: code -> problem description
	print(score(
	query="def binary_search(arr, target): ...",
	document="Find the index of a target value in a sorted array using binary search.",
	task=Task.CODE_TO_TEXT,
	))
	```

	For batch reranking with the CoREB evaluation pipeline, see the [CoREB repository](https://github.com/hq-bench/coreb).

	## Citation

	```bibtex
	@article{xue2026coreb,
	title={Beyond Retrieval: A Multitask Benchmark and Reranker for Code Search},
	author={Xue, Siqiao and Liao, Zihan and Qin, Jin and Zhang, Ziyin and Mu, Yixiang and Zhou, Fan and Yu, Hang},
	journal={arXiv preprint arXiv:2605.04615},
	year={2026},
	url={https://arxiv.org/abs/2605.04615}
	}
	```