MiniSearch / docs /reranking.md
github-actions[bot]
Sync from https://github.com/felladrin/MiniSearch
d86da64

Search Result Reranking

MiniSearch optionally reranks search results using a cross-encoder model running on a local llama-server instance. This secondary search stage reorders initial SearXNG results based on their semantic relevance to the user's query.

Architecture Overview

The reranking subsystem consists of three components:

Component File Responsibility
Service Manager server/rerankerService.ts llama-server lifecycle, health checks, reranking API calls
Ranking Logic server/rankSearchResults.ts Score-based filtering and result reordering
Server Hook server/rerankerServiceHook.ts Startup/shutdown coordination with Vite server

Service Lifecycle

Startup

The rerankerServiceHook starts the reranker during server initialization:

  1. Downloads the model from HuggingFace if not present (Felladrin/gguf-jina-reranker-v1-tiny-en/jina-reranker-v1-tiny-en-Q8_0.gguf)
  2. Spawns llama-server as a child process
  3. Polls /health endpoint until status is ok
  4. Performs a warmup rerank request (query: "test", documents: ["test document"]) to ensure the model is fully loaded
  5. Sets isReady = true

llama-server Configuration

The reranker process is spawned with these arguments:

Argument Value Purpose
--model jina-reranker-v1-tiny-en-Q8_0.gguf Cross-encoder reranking model
--ctx-size 2048 Context window size
--batch-size 2048 Batch processing size
--ubatch-size 2048 Micro-batch size
--flash-attn auto Flash attention optimization
--host 127.0.0.1 Local-only binding
--port 8012 Service port
--threads 1 Single-threaded operation
--parallel 1 Single parallel request
--reranking (flag) Enable reranking mode
--pooling rank Rank pooling strategy

Automatic Restart

If the llama-server process exits unexpectedly:

  1. isReady is set to false
  2. A 5-second restart timeout is scheduled
  3. startRerankerService() is called again automatically
  4. Binary compatibility errors (SIGTRAP, SIGILL) are logged with architecture details

Shutdown

On server close, stopRerankerService() clears any pending restart timeout and kills the child process.

Health Monitoring

getRerankerStatus() performs a live health check by fetching /health from the llama-server. Returns false if:

  • isReady flag is false
  • Health endpoint is unreachable
  • Response status is not ok

The search endpoint checks reranker health before attempting ranking and falls back to unranked SearXNG results if unhealthy.

Reranking Process

Document Preparation

Search results are formatted as Markdown-style strings and truncated:

const doc = `[${title}](${url} "${snippet}")`.toLocaleLowerCase();
// Truncated to MAX_DOCUMENT_LENGTH (512 characters)

Both query and documents are lowercased and Unicode surrogates are sanitized before sending to the reranker.

Unicode Sanitization

sanitizeUnicodeSurrogates() validates Unicode surrogate pairs in input strings. Invalid surrogates are replaced with the Unicode replacement character (\ufffd). This prevents crashes when processing malformed UTF-8 from web search results.

Scoring and Filtering

The reranker returns relevance scores for each document. Results are filtered using a two-stage statistical approach:

  1. Score Normalization: Scores are shifted to positive range by adding the absolute value of the minimum score
  2. Standard Deviation Filter: Results below mean - kStandardDeviationFactor * standardDeviation are filtered out
    • kStandardDeviationFactor = 0.3
  3. Percentage Fallback: If fewer than 40% of results pass the standard deviation filter, a fallback threshold is applied:
    • minPercentageFallback = 0.4 (40% of the highest normalized score)

Preserve Top Results Mode

When preserveTopResults = true, the ranking algorithm:

  1. Keeps the original top result (first from SearXNG) at position 1
  2. Filters remaining results by score
  3. Takes up to 9 next-best results, sorted by reranker score
  4. Appends remaining filtered results

This mode ensures the original top result is never displaced by the reranker, while still improving the ordering of subsequent results.

Integration with Search Pipeline

The search endpoint coordinates reranking in searchEndpointServerHook.ts:

1. fetchSearXNG(query, searchType, limit)
2. processSearchResults(rawResults)
3. Check getRerankerStatus()
   - If healthy: rankSearchResults(query, results)
   - If unhealthy: return unranked results (fallback)
4. Return structured JSON response

Reranking is applied only to text search results, not image results.

Model Details

Property Value
Model jina-reranker-v1-tiny-en
Format GGUF (Q8_0 quantized)
HuggingFace Repo Felladrin/gguf-jina-reranker-v1-tiny-en
Type Cross-encoder reranker
Language English
Storage server/models/Felladrin/gguf-jina-reranker-v1-tiny-en/

Error Handling

Scenario Behavior
Reranker not ready Falls back to unranked SearXNG results
Reranking API error isReady set to false, process killed, auto-restart scheduled
Empty documents array Returns empty array without calling reranker
Unicode sanitization needed Logs warning, continues with sanitized input
Binary architecture mismatch Logs SIGTRAP/SIGILL error with architecture details

Related Topics

  • Search System: docs/overview.md - Search pipeline overview
  • Server Hooks: docs/overview.md#server-hook-system - Hook registration
  • Web Search Service: server/webSearchService.ts - SearXNG integration