Upload ConvMemory v2 evidence reranker checkpoint (v0.5.0)

1dc4b55 verified 4 days ago

2.71 kB

	---
	library_name: sentence-transformers
	pipeline_tag: text-ranking
	tags:
	- convmemory
	- reranking
	- conversational-memory
	- cross-encoder
	- evidence-reranker
	license: mit
	---

	# ConvMemory v2 Evidence Reranker

	This is the ConvMemory v0.5.0 protected top-10 token-evidence reranker checkpoint.
	It is intended to be attached to the base ConvMemory LoCoMo/MPNet checkpoint:
	`Purdy0228/ConvMemory-LoCoMo-MPNet`.

	## Usage

	```python
	from convmemory import ConvMemory

	model = ConvMemory.from_pretrained("Purdy0228/ConvMemory-LoCoMo-MPNet")
	model.load_evidence_reranker("Purdy0228/ConvMemory-v2-Evidence-Reranker")

	ranked = model.retrieve(
	query=query,
	memories=memories,
	evidence_reranker="v2",
	top_k=10,
	)
	```

	The repository layout is compatible with `convmemory.EvidenceReranker.from_pretrained`:

	- `evidence_reranker_config.json`
	- `MANIFEST.json`
	- `cross_encoder/` SentenceTransformers CrossEncoder checkpoint

	## What It Does

	ConvMemory v2 preserves the exact ConvMemory v1 top-10 candidate set and only
	reorders that protected prefix using token-level query/memory evidence. It cannot
	recover a gold memory that v1 failed to retrieve into top-10.

	## Training

	- Source experiment: `experiments/v361_top10_evidence_reranker.py`
	- Seed: 7
	- Base model: `cross-encoder/ms-marco-MiniLM-L-6-v2`
	- Training target: gold-only listwise retrieval cross-entropy
	- Teacher weight: 0.0
	- Candidate pool: ConvMemory v1 top-10 from dense MPNet top-500

	## Headline Evaluation

	Canonical v361 5-seed headline, reported in the ConvMemory repository:

	- ConvMemory v1 FULL MRR: 0.5824
	- ConvMemory v2 FULL MRR: 0.6560
	- Delta: +0.0734, paired bootstrap 95% CI [+0.0645, +0.0827]

	The v364 load-bearing ablation retrained the same full-text arm in an ablation
	harness and obtained FULL MRR 0.6677. Text perturbations collapsed:

	- no memory text: 0.2966
	- random other-query text: 0.2506
	- shuffled memory text: 0.2731
	- scalar only: 0.5792

	## Anti-Leak Discipline

	The public inference API rejects gold-defining or post-hoc fields such as:
	`gold`, `gold_ids`, `is_current`, `is_latest`, `is_stale`, `stale`, `answer`,
	`answer_text`, `ce_score`, `mxbai_score`, `teacher_score`, `gpt_label`,
	`entity_id`, and `slot_id`.

	Inference inputs are query text, candidate id/text, optional candidate position
	or time metadata, and the protected v1 top-10 candidate set.

	## Limitations

	- LoCoMo-specific fine-tuning; validate or retrain before using cross-domain.
	- Recall-preserving over v1 top-10, not a replacement for candidate generation.
	- Not a full top-500 cross-encoder. It is a bounded precision stage after v1.