--- library_name: sentence-transformers pipeline_tag: text-ranking tags: - convmemory - reranking - conversational-memory - cross-encoder - evidence-reranker license: mit --- # ConvMemory v2 Evidence Reranker This is the ConvMemory v0.5.0 protected top-10 token-evidence reranker checkpoint. It is intended to be attached to the base ConvMemory LoCoMo/MPNet checkpoint: `Purdy0228/ConvMemory-LoCoMo-MPNet`. ## Usage ```python from convmemory import ConvMemory model = ConvMemory.from_pretrained("Purdy0228/ConvMemory-LoCoMo-MPNet") model.load_evidence_reranker("Purdy0228/ConvMemory-v2-Evidence-Reranker") ranked = model.retrieve( query=query, memories=memories, evidence_reranker="v2", top_k=10, ) ``` The repository layout is compatible with `convmemory.EvidenceReranker.from_pretrained`: - `evidence_reranker_config.json` - `MANIFEST.json` - `cross_encoder/` SentenceTransformers CrossEncoder checkpoint ## What It Does ConvMemory v2 preserves the exact ConvMemory v1 top-10 candidate set and only reorders that protected prefix using token-level query/memory evidence. It cannot recover a gold memory that v1 failed to retrieve into top-10. ## Training - Source experiment: `experiments/v361_top10_evidence_reranker.py` - Seed: 7 - Base model: `cross-encoder/ms-marco-MiniLM-L-6-v2` - Training target: gold-only listwise retrieval cross-entropy - Teacher weight: 0.0 - Candidate pool: ConvMemory v1 top-10 from dense MPNet top-500 ## Headline Evaluation Canonical v361 5-seed headline, reported in the ConvMemory repository: - ConvMemory v1 FULL MRR: 0.5824 - ConvMemory v2 FULL MRR: 0.6560 - Delta: +0.0734, paired bootstrap 95% CI [+0.0645, +0.0827] The v364 load-bearing ablation retrained the same full-text arm in an ablation harness and obtained FULL MRR 0.6677. Text perturbations collapsed: - no memory text: 0.2966 - random other-query text: 0.2506 - shuffled memory text: 0.2731 - scalar only: 0.5792 ## Anti-Leak Discipline The public inference API rejects gold-defining or post-hoc fields such as: `gold`, `gold_ids`, `is_current`, `is_latest`, `is_stale`, `stale`, `answer`, `answer_text`, `ce_score`, `mxbai_score`, `teacher_score`, `gpt_label`, `entity_id`, and `slot_id`. Inference inputs are query text, candidate id/text, optional candidate position or time metadata, and the protected v1 top-10 candidate set. ## Limitations - LoCoMo-specific fine-tuning; validate or retrain before using cross-domain. - Recall-preserving over v1 top-10, not a replacement for candidate generation. - Not a full top-500 cross-encoder. It is a bounded precision stage after v1.