---
library_name: sentence-transformers
pipeline_tag: text-ranking
tags:
- convmemory
- reranking
- conversational-memory
- cross-encoder
- evidence-reranker
license: mit
---

# ConvMemory v2 Evidence Reranker

This is the ConvMemory v0.5.0 protected top-10 token-evidence reranker checkpoint.
It is intended to be attached to the base ConvMemory LoCoMo/MPNet checkpoint:
`Purdy0228/ConvMemory-LoCoMo-MPNet`.

## Usage

```python
from convmemory import ConvMemory

model = ConvMemory.from_pretrained("Purdy0228/ConvMemory-LoCoMo-MPNet")
model.load_evidence_reranker("Purdy0228/ConvMemory-v2-Evidence-Reranker")

ranked = model.retrieve(
    query=query,
    memories=memories,
    evidence_reranker="v2",
    top_k=10,
)
```

The repository layout is compatible with `convmemory.EvidenceReranker.from_pretrained`:

- `evidence_reranker_config.json`
- `MANIFEST.json`
- `cross_encoder/` SentenceTransformers CrossEncoder checkpoint

## What It Does

ConvMemory v2 preserves the exact ConvMemory v1 top-10 candidate set and only
reorders that protected prefix using token-level query/memory evidence. It cannot
recover a gold memory that v1 failed to retrieve into top-10.

## Training

- Source experiment: `experiments/v361_top10_evidence_reranker.py`
- Seed: 7
- Base model: `cross-encoder/ms-marco-MiniLM-L-6-v2`
- Training target: gold-only listwise retrieval cross-entropy
- Teacher weight: 0.0
- Candidate pool: ConvMemory v1 top-10 from dense MPNet top-500

## Headline Evaluation

Canonical v361 5-seed headline, reported in the ConvMemory repository:

- ConvMemory v1 FULL MRR: 0.5824
- ConvMemory v2 FULL MRR: 0.6560
- Delta: +0.0734, paired bootstrap 95% CI [+0.0645, +0.0827]

The v364 load-bearing ablation retrained the same full-text arm in an ablation
harness and obtained FULL MRR 0.6677. Text perturbations collapsed:

- no memory text: 0.2966
- random other-query text: 0.2506
- shuffled memory text: 0.2731
- scalar only: 0.5792

## Anti-Leak Discipline

The public inference API rejects gold-defining or post-hoc fields such as:
`gold`, `gold_ids`, `is_current`, `is_latest`, `is_stale`, `stale`, `answer`,
`answer_text`, `ce_score`, `mxbai_score`, `teacher_score`, `gpt_label`,
`entity_id`, and `slot_id`.

Inference inputs are query text, candidate id/text, optional candidate position
or time metadata, and the protected v1 top-10 candidate set.

## Limitations

- LoCoMo-specific fine-tuning; validate or retrain before using cross-domain.
- Recall-preserving over v1 top-10, not a replacement for candidate generation.
- Not a full top-500 cross-encoder. It is a bounded precision stage after v1.