ConvMemory v2 Evidence Reranker

This is the ConvMemory v0.5.0 protected top-10 token-evidence reranker checkpoint. It is intended to be attached to the base ConvMemory LoCoMo/MPNet checkpoint: Purdy0228/ConvMemory-LoCoMo-MPNet.

Usage

from convmemory import ConvMemory

model = ConvMemory.from_pretrained("Purdy0228/ConvMemory-LoCoMo-MPNet")
model.load_evidence_reranker("Purdy0228/ConvMemory-v2-Evidence-Reranker")

ranked = model.retrieve(
    query=query,
    memories=memories,
    evidence_reranker="v2",
    top_k=10,
)

The repository layout is compatible with convmemory.EvidenceReranker.from_pretrained:

  • evidence_reranker_config.json
  • MANIFEST.json
  • cross_encoder/ SentenceTransformers CrossEncoder checkpoint

What It Does

ConvMemory v2 preserves the exact ConvMemory v1 top-10 candidate set and only reorders that protected prefix using token-level query/memory evidence. It cannot recover a gold memory that v1 failed to retrieve into top-10.

Training

  • Source experiment: experiments/v361_top10_evidence_reranker.py
  • Seed: 7
  • Base model: cross-encoder/ms-marco-MiniLM-L-6-v2
  • Training target: gold-only listwise retrieval cross-entropy
  • Teacher weight: 0.0
  • Candidate pool: ConvMemory v1 top-10 from dense MPNet top-500

Headline Evaluation

Canonical v361 5-seed headline, reported in the ConvMemory repository:

  • ConvMemory v1 FULL MRR: 0.5824
  • ConvMemory v2 FULL MRR: 0.6560
  • Delta: +0.0734, paired bootstrap 95% CI [+0.0645, +0.0827]

The v364 load-bearing ablation retrained the same full-text arm in an ablation harness and obtained FULL MRR 0.6677. Text perturbations collapsed:

  • no memory text: 0.2966
  • random other-query text: 0.2506
  • shuffled memory text: 0.2731
  • scalar only: 0.5792

Anti-Leak Discipline

The public inference API rejects gold-defining or post-hoc fields such as: gold, gold_ids, is_current, is_latest, is_stale, stale, answer, answer_text, ce_score, mxbai_score, teacher_score, gpt_label, entity_id, and slot_id.

Inference inputs are query text, candidate id/text, optional candidate position or time metadata, and the protected v1 top-10 candidate set.

Limitations

  • LoCoMo-specific fine-tuning; validate or retrain before using cross-domain.
  • Recall-preserving over v1 top-10, not a replacement for candidate generation.
  • Not a full top-500 cross-encoder. It is a bounded precision stage after v1.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support