ConvMemory v2 Evidence Reranker

This is the ConvMemory v0.5.0 protected top-10 token-evidence reranker checkpoint. It is intended to be attached to the base ConvMemory LoCoMo/MPNet checkpoint: Purdy0228/ConvMemory-LoCoMo-MPNet.

Usage

from convmemory import ConvMemory

model = ConvMemory.from_pretrained("Purdy0228/ConvMemory-LoCoMo-MPNet")
model.load_evidence_reranker("Purdy0228/ConvMemory-v2-Evidence-Reranker")

ranked = model.retrieve(
    query=query,
    memories=memories,
    evidence_reranker="v2",
    top_k=10,
)

The repository layout is compatible with convmemory.EvidenceReranker.from_pretrained:

evidence_reranker_config.json
MANIFEST.json
cross_encoder/ SentenceTransformers CrossEncoder checkpoint

What It Does

ConvMemory v2 preserves the exact ConvMemory v1 top-10 candidate set and only reorders that protected prefix using token-level query/memory evidence. It cannot recover a gold memory that v1 failed to retrieve into top-10.

Training

Source experiment: experiments/v361_top10_evidence_reranker.py
Seed: 7
Base model: cross-encoder/ms-marco-MiniLM-L-6-v2
Training target: gold-only listwise retrieval cross-entropy
Teacher weight: 0.0
Candidate pool: ConvMemory v1 top-10 from dense MPNet top-500

Headline Evaluation

Canonical v361 5-seed headline, reported in the ConvMemory repository:

ConvMemory v1 FULL MRR: 0.5824
ConvMemory v2 FULL MRR: 0.6560
Delta: +0.0734, paired bootstrap 95% CI [+0.0645, +0.0827]

The v364 load-bearing ablation retrained the same full-text arm in an ablation harness and obtained FULL MRR 0.6677. Text perturbations collapsed:

no memory text: 0.2966
random other-query text: 0.2506
shuffled memory text: 0.2731
scalar only: 0.5792

Anti-Leak Discipline

The public inference API rejects gold-defining or post-hoc fields such as: gold, gold_ids, is_current, is_latest, is_stale, stale, answer, answer_text, ce_score, mxbai_score, teacher_score, gpt_label, entity_id, and slot_id.

Inference inputs are query text, candidate id/text, optional candidate position or time metadata, and the protected v1 top-10 candidate set.

Limitations

LoCoMo-specific fine-tuning; validate or retrain before using cross-domain.
Recall-preserving over v1 top-10, not a replacement for candidate generation.
Not a full top-500 cross-encoder. It is a bounded precision stage after v1.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Text Ranking

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support