Spaces:

Felladrin
/

MiniSearch

Running

App Files Files Community

MiniSearch / docs /reranking.md

github-actions[bot]

Sync from https://github.com/felladrin/MiniSearch

d86da64 6 days ago

preview code

raw

history blame contribute delete

5.85 kB

Search Result Reranking

MiniSearch optionally reranks search results using a cross-encoder model running on a local llama-server instance. This secondary search stage reorders initial SearXNG results based on their semantic relevance to the user's query.

Architecture Overview

The reranking subsystem consists of three components:

Component	File	Responsibility
Service Manager	`server/rerankerService.ts`	llama-server lifecycle, health checks, reranking API calls
Ranking Logic	`server/rankSearchResults.ts`	Score-based filtering and result reordering
Server Hook	`server/rerankerServiceHook.ts`	Startup/shutdown coordination with Vite server

Service Lifecycle

Startup

The rerankerServiceHook starts the reranker during server initialization:

Downloads the model from HuggingFace if not present (Felladrin/gguf-jina-reranker-v1-tiny-en/jina-reranker-v1-tiny-en-Q8_0.gguf)
Spawns llama-server as a child process
Polls /health endpoint until status is ok
Performs a warmup rerank request (query: "test", documents: ["test document"]) to ensure the model is fully loaded
Sets isReady = true

llama-server Configuration

The reranker process is spawned with these arguments:

Argument	Value	Purpose
`--model`	`jina-reranker-v1-tiny-en-Q8_0.gguf`	Cross-encoder reranking model
`--ctx-size`	2048	Context window size
`--batch-size`	2048	Batch processing size
`--ubatch-size`	2048	Micro-batch size
`--flash-attn`	auto	Flash attention optimization
`--host`	127.0.0.1	Local-only binding
`--port`	8012	Service port
`--threads`	1	Single-threaded operation
`--parallel`	1	Single parallel request
`--reranking`	(flag)	Enable reranking mode
`--pooling`	rank	Rank pooling strategy

Automatic Restart

If the llama-server process exits unexpectedly:

isReady is set to false
A 5-second restart timeout is scheduled
startRerankerService() is called again automatically
Binary compatibility errors (SIGTRAP, SIGILL) are logged with architecture details

Shutdown

On server close, stopRerankerService() clears any pending restart timeout and kills the child process.

Health Monitoring

getRerankerStatus() performs a live health check by fetching /health from the llama-server. Returns false if:

isReady flag is false
Health endpoint is unreachable
Response status is not ok

The search endpoint checks reranker health before attempting ranking and falls back to unranked SearXNG results if unhealthy.

Reranking Process

Document Preparation

Search results are formatted as Markdown-style strings and truncated:

const doc = `[${title}](${url} "${snippet}")`.toLocaleLowerCase();
// Truncated to MAX_DOCUMENT_LENGTH (512 characters)

Both query and documents are lowercased and Unicode surrogates are sanitized before sending to the reranker.

Unicode Sanitization

sanitizeUnicodeSurrogates() validates Unicode surrogate pairs in input strings. Invalid surrogates are replaced with the Unicode replacement character (\ufffd). This prevents crashes when processing malformed UTF-8 from web search results.

Scoring and Filtering

The reranker returns relevance scores for each document. Results are filtered using a two-stage statistical approach:

Score Normalization: Scores are shifted to positive range by adding the absolute value of the minimum score
Standard Deviation Filter: Results below mean - kStandardDeviationFactor * standardDeviation are filtered out
- kStandardDeviationFactor = 0.3
Percentage Fallback: If fewer than 40% of results pass the standard deviation filter, a fallback threshold is applied:
- minPercentageFallback = 0.4 (40% of the highest normalized score)

Preserve Top Results Mode

When preserveTopResults = true, the ranking algorithm:

Keeps the original top result (first from SearXNG) at position 1
Filters remaining results by score
Takes up to 9 next-best results, sorted by reranker score
Appends remaining filtered results

This mode ensures the original top result is never displaced by the reranker, while still improving the ordering of subsequent results.

Integration with Search Pipeline

The search endpoint coordinates reranking in searchEndpointServerHook.ts:

1. fetchSearXNG(query, searchType, limit)
2. processSearchResults(rawResults)
3. Check getRerankerStatus()
   - If healthy: rankSearchResults(query, results)
   - If unhealthy: return unranked results (fallback)
4. Return structured JSON response

Reranking is applied only to text search results, not image results.

Model Details

Property	Value
Model	jina-reranker-v1-tiny-en
Format	GGUF (Q8_0 quantized)
HuggingFace Repo	Felladrin/gguf-jina-reranker-v1-tiny-en
Type	Cross-encoder reranker
Language	English
Storage	`server/models/Felladrin/gguf-jina-reranker-v1-tiny-en/`

Error Handling

Scenario	Behavior
Reranker not ready	Falls back to unranked SearXNG results
Reranking API error	`isReady` set to `false`, process killed, auto-restart scheduled
Empty documents array	Returns empty array without calling reranker
Unicode sanitization needed	Logs warning, continues with sanitized input
Binary architecture mismatch	Logs `SIGTRAP`/`SIGILL` error with architecture details