Instructions to use codefuse-ai/SWE-CARE-RM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use codefuse-ai/SWE-CARE-RM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="codefuse-ai/SWE-CARE-RM") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("codefuse-ai/SWE-CARE-RM") model = AutoModelForCausalLM.from_pretrained("codefuse-ai/SWE-CARE-RM") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use codefuse-ai/SWE-CARE-RM with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "codefuse-ai/SWE-CARE-RM" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "codefuse-ai/SWE-CARE-RM", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/codefuse-ai/SWE-CARE-RM
- SGLang
How to use codefuse-ai/SWE-CARE-RM with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "codefuse-ai/SWE-CARE-RM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "codefuse-ai/SWE-CARE-RM", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "codefuse-ai/SWE-CARE-RM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "codefuse-ai/SWE-CARE-RM", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use codefuse-ai/SWE-CARE-RM with Docker Model Runner:
docker model run hf.co/codefuse-ai/SWE-CARE-RM
SWE-CARE-RM
This model is a custom reward model built on top of Qwen3-8B with:
- a merged LoRA adapter
- an additional projector head
- a scalar reward output in [0, 1]
The model is designed to score the quality of a review conditioned on:
- an issue / problem statement
- a code patch
- a candidate review
A higher score means the model considers the review better under the given issue and patch.
Model Architecture
The model consists of:
- base model: Qwen3-8B
- adaptation: LoRA
- reward head: a custom MLP projector
- final score:
sigmoid(projector(last_hidden_state[:, -1]))
This repository contains the merged decoder weights together with projector.pth.
Input Format
The model expects three text fields:
issuepatchreview
During inference, the input is formatted as:
<issue>{issue}</issue><patch>{patch}</patch><review>{review}<review>
The score is computed from the last token hidden state.
Quick Start
from pathlib import Path
import json
import torch
import torch.nn as nn
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL_DIR = "codefuse-ai/SWE-CARE-RM"
MAX_SEQ_LEN = 51200
MIN_REVIEW_LEN = 4096
TRUST_REMOTE_CODE = True
with open(f"{MODEL_DIR}/data_sample.jsonl", "r") as fr:
for line in fr:
json_data = json.loads(line)
break
SAMPLE = {
"issue": json_data['problem_statement'],
"patch": json_data['patch_to_review'],
"review": json_data['pos_review'][0]
}
class Projector(nn.Module):
def __init__(self, arch, input_size, hidden_size, use_bf16):
super().__init__()
depth = int(arch[len("mlp"): arch.index("x_relu")])
layers = [nn.Linear(input_size, hidden_size).bfloat16() if use_bf16 else
nn.Linear(input_size, hidden_size)]
for _ in range(1, depth):
layers.append(nn.ReLU())
layers.append(nn.Linear(hidden_size, 1).bfloat16() if use_bf16 else
nn.Linear(hidden_size, 1))
self.model = nn.Sequential(*layers)
def forward(self, x):
return self.model(x)
def resolve_dtype(dtype_name):
if dtype_name in {"bf16", "bfloat16"}:
return torch.bfloat16
if dtype_name in {"fp16", "float16"}:
return torch.float16
return torch.float32
def infer_proj_arch(projector_state_dict):
linear_weight_keys = [k for k in projector_state_dict if k.startswith("model.")
and k.endswith(".weight")]
return f"mlp{len(linear_weight_keys)}x_relu"
def process_one(issue_ids, issue_masks, patch_ids, patch_masks, review_ids,
review_masks, max_len, min_review_len):
review_keep = min(min_review_len, len(review_ids))
remain_for_patch = max(max_len - len(issue_ids) - review_keep, 0)
patch_keep = min(len(patch_ids), remain_for_patch)
ids_all = issue_ids + patch_ids[:patch_keep] + review_ids[-review_keep:]
masks_all = issue_masks + patch_masks[:patch_keep] + review_masks[-review_keep:]
if len(ids_all) < max_len:
pad_len = max_len - len(ids_all)
ids_all = [0] * pad_len + ids_all
masks_all = [0] * pad_len + masks_all
return ids_all[:max_len], masks_all[:max_len]
reward_config = {}
reward_config_path = Path(MODEL_DIR) / "reward_config.json"
if reward_config_path.exists():
reward_config = json.load(open(reward_config_path, "r", encoding="utf-8"))
projector_path = Path(MODEL_DIR) / "projector.pth"
projector_state_dict = torch.load(projector_path, map_location="cpu")
proj_arch = reward_config.get("proj_arch") or infer_proj_arch(projector_state_dict)
torch_dtype = resolve_dtype(reward_config.get("torch_dtype") or "bfloat16")
attn_implementation = reward_config.get("attn_implementation")
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR,
trust_remote_code=TRUST_REMOTE_CODE, padding_side="left")
model_kwargs = {"trust_remote_code": TRUST_REMOTE_CODE, "torch_dtype": torch_dtype}
if attn_implementation:
model_kwargs["attn_implementation"] = attn_implementation
decoder = AutoModelForCausalLM.from_pretrained(MODEL_DIR, **model_kwargs)
projector = Projector(proj_arch, decoder.config.hidden_size,
decoder.config.hidden_size, torch_dtype == torch.bfloat16)
projector.load_state_dict(projector_state_dict)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
decoder.to(device).eval()
projector.to(device).eval()
issue_inputs = tokenizer(f"<issue>{SAMPLE['issue']}</issue>", padding=False,
truncation="longest_first")
patch_inputs = tokenizer(f"<patch>{SAMPLE['patch']}</patch>", padding=False,
truncation="longest_first")
review_inputs = tokenizer(SAMPLE["review"], padding=False, truncation="longest_first")
input_ids, attention_mask = process_one(
issue_inputs["input_ids"],
issue_inputs["attention_mask"],
patch_inputs["input_ids"],
patch_inputs["attention_mask"],
review_inputs["input_ids"],
review_inputs["attention_mask"],
max_len=MAX_SEQ_LEN,
min_review_len=MIN_REVIEW_LEN,
)
inputs = {
"input_ids": torch.tensor([input_ids], dtype=torch.long, device=device),
"attention_mask": torch.tensor([attention_mask], dtype=torch.long, device=device),
}
with torch.no_grad():
hidden_state = decoder(**inputs, output_hidden_states=True).hidden_states[-1]
reward = torch.sigmoid(projector(hidden_state).squeeze(-1)[:, -1]).item()
print(reward)
Output
The model outputs a single scalar reward score in [0, 1].
Typical interpretation:
- higher score: better review quality
- lower score: worse review quality
This score is best used for:
- ranking candidate reviews
- pairwise comparison
- reward modeling in downstream training or reranking
Intended Use
This model is intended for:
- code review quality scoring
- reward modeling for review generation
- reranking multiple candidate reviews for the same issue and patch
Limitations
- The score is relative, not an absolute guarantee of correctness.
- Long-input truncation may affect results.
- The model should not be used as the only signal for production-critical review decisions.
Citation
If you use this model, please cite SWE-CARE as appropriate.
@misc{guo2025codefusecrbenchcomprehensivenessawarebenchmarkendtoend,
title={CodeFuse-CR-Bench: A Comprehensiveness-aware Benchmark for End-to-End Code Review Evaluation in Python Projects},
author={Hanyang Guo and Xunjin Zheng and Zihan Liao and Hang Yu and Peng DI and Ziyin Zhang and Hong-Ning Dai},
year={2025},
eprint={2509.14856},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2509.14856},
}
- Downloads last month
- 39