Purdy0228's picture
Upload ConvMemory v2 evidence reranker checkpoint (v0.5.0)
1dc4b55 verified
---
library_name: sentence-transformers
pipeline_tag: text-ranking
tags:
- convmemory
- reranking
- conversational-memory
- cross-encoder
- evidence-reranker
license: mit
---
# ConvMemory v2 Evidence Reranker
This is the ConvMemory v0.5.0 protected top-10 token-evidence reranker checkpoint.
It is intended to be attached to the base ConvMemory LoCoMo/MPNet checkpoint:
`Purdy0228/ConvMemory-LoCoMo-MPNet`.
## Usage
```python
from convmemory import ConvMemory
model = ConvMemory.from_pretrained("Purdy0228/ConvMemory-LoCoMo-MPNet")
model.load_evidence_reranker("Purdy0228/ConvMemory-v2-Evidence-Reranker")
ranked = model.retrieve(
query=query,
memories=memories,
evidence_reranker="v2",
top_k=10,
)
```
The repository layout is compatible with `convmemory.EvidenceReranker.from_pretrained`:
- `evidence_reranker_config.json`
- `MANIFEST.json`
- `cross_encoder/` SentenceTransformers CrossEncoder checkpoint
## What It Does
ConvMemory v2 preserves the exact ConvMemory v1 top-10 candidate set and only
reorders that protected prefix using token-level query/memory evidence. It cannot
recover a gold memory that v1 failed to retrieve into top-10.
## Training
- Source experiment: `experiments/v361_top10_evidence_reranker.py`
- Seed: 7
- Base model: `cross-encoder/ms-marco-MiniLM-L-6-v2`
- Training target: gold-only listwise retrieval cross-entropy
- Teacher weight: 0.0
- Candidate pool: ConvMemory v1 top-10 from dense MPNet top-500
## Headline Evaluation
Canonical v361 5-seed headline, reported in the ConvMemory repository:
- ConvMemory v1 FULL MRR: 0.5824
- ConvMemory v2 FULL MRR: 0.6560
- Delta: +0.0734, paired bootstrap 95% CI [+0.0645, +0.0827]
The v364 load-bearing ablation retrained the same full-text arm in an ablation
harness and obtained FULL MRR 0.6677. Text perturbations collapsed:
- no memory text: 0.2966
- random other-query text: 0.2506
- shuffled memory text: 0.2731
- scalar only: 0.5792
## Anti-Leak Discipline
The public inference API rejects gold-defining or post-hoc fields such as:
`gold`, `gold_ids`, `is_current`, `is_latest`, `is_stale`, `stale`, `answer`,
`answer_text`, `ce_score`, `mxbai_score`, `teacher_score`, `gpt_label`,
`entity_id`, and `slot_id`.
Inference inputs are query text, candidate id/text, optional candidate position
or time metadata, and the protected v1 top-10 candidate set.
## Limitations
- LoCoMo-specific fine-tuning; validate or retrain before using cross-domain.
- Recall-preserving over v1 top-10, not a replacement for candidate generation.
- Not a full top-500 cross-encoder. It is a bounded precision stage after v1.