File size: 2,708 Bytes
1dc4b55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---

library_name: sentence-transformers
pipeline_tag: text-ranking
tags:
- convmemory
- reranking
- conversational-memory
- cross-encoder
- evidence-reranker
license: mit
---


# ConvMemory v2 Evidence Reranker

This is the ConvMemory v0.5.0 protected top-10 token-evidence reranker checkpoint.
It is intended to be attached to the base ConvMemory LoCoMo/MPNet checkpoint:
`Purdy0228/ConvMemory-LoCoMo-MPNet`.

## Usage

```python

from convmemory import ConvMemory



model = ConvMemory.from_pretrained("Purdy0228/ConvMemory-LoCoMo-MPNet")

model.load_evidence_reranker("Purdy0228/ConvMemory-v2-Evidence-Reranker")



ranked = model.retrieve(

    query=query,

    memories=memories,

    evidence_reranker="v2",

    top_k=10,

)

```

The repository layout is compatible with `convmemory.EvidenceReranker.from_pretrained`:

- `evidence_reranker_config.json`
- `MANIFEST.json`
- `cross_encoder/` SentenceTransformers CrossEncoder checkpoint

## What It Does

ConvMemory v2 preserves the exact ConvMemory v1 top-10 candidate set and only
reorders that protected prefix using token-level query/memory evidence. It cannot
recover a gold memory that v1 failed to retrieve into top-10.

## Training

- Source experiment: `experiments/v361_top10_evidence_reranker.py`
- Seed: 7
- Base model: `cross-encoder/ms-marco-MiniLM-L-6-v2`
- Training target: gold-only listwise retrieval cross-entropy
- Teacher weight: 0.0
- Candidate pool: ConvMemory v1 top-10 from dense MPNet top-500

## Headline Evaluation

Canonical v361 5-seed headline, reported in the ConvMemory repository:

- ConvMemory v1 FULL MRR: 0.5824
- ConvMemory v2 FULL MRR: 0.6560
- Delta: +0.0734, paired bootstrap 95% CI [+0.0645, +0.0827]

The v364 load-bearing ablation retrained the same full-text arm in an ablation
harness and obtained FULL MRR 0.6677. Text perturbations collapsed:

- no memory text: 0.2966
- random other-query text: 0.2506
- shuffled memory text: 0.2731
- scalar only: 0.5792

## Anti-Leak Discipline

The public inference API rejects gold-defining or post-hoc fields such as:
`gold`, `gold_ids`, `is_current`, `is_latest`, `is_stale`, `stale`, `answer`,
`answer_text`, `ce_score`, `mxbai_score`, `teacher_score`, `gpt_label`,
`entity_id`, and `slot_id`.

Inference inputs are query text, candidate id/text, optional candidate position
or time metadata, and the protected v1 top-10 candidate set.

## Limitations

- LoCoMo-specific fine-tuning; validate or retrain before using cross-domain.
- Recall-preserving over v1 top-10, not a replacement for candidate generation.
- Not a full top-500 cross-encoder. It is a bounded precision stage after v1.