Spaces:
Running
Search Result Reranking
MiniSearch optionally reranks search results using a cross-encoder model running on a local llama-server instance. This secondary search stage reorders initial SearXNG results based on their semantic relevance to the user's query.
Architecture Overview
The reranking subsystem consists of three components:
| Component | File | Responsibility |
|---|---|---|
| Service Manager | server/rerankerService.ts |
llama-server lifecycle, health checks, reranking API calls |
| Ranking Logic | server/rankSearchResults.ts |
Score-based filtering and result reordering |
| Server Hook | server/rerankerServiceHook.ts |
Startup/shutdown coordination with Vite server |
Service Lifecycle
Startup
The rerankerServiceHook starts the reranker during server initialization:
- Downloads the model from HuggingFace if not present (
Felladrin/gguf-jina-reranker-v1-tiny-en/jina-reranker-v1-tiny-en-Q8_0.gguf) - Spawns
llama-serveras a child process - Polls
/healthendpoint until status isok - Performs a warmup rerank request (
query: "test",documents: ["test document"]) to ensure the model is fully loaded - Sets
isReady = true
llama-server Configuration
The reranker process is spawned with these arguments:
| Argument | Value | Purpose |
|---|---|---|
--model |
jina-reranker-v1-tiny-en-Q8_0.gguf |
Cross-encoder reranking model |
--ctx-size |
2048 | Context window size |
--batch-size |
2048 | Batch processing size |
--ubatch-size |
2048 | Micro-batch size |
--flash-attn |
auto | Flash attention optimization |
--host |
127.0.0.1 | Local-only binding |
--port |
8012 | Service port |
--threads |
1 | Single-threaded operation |
--parallel |
1 | Single parallel request |
--reranking |
(flag) | Enable reranking mode |
--pooling |
rank | Rank pooling strategy |
Automatic Restart
If the llama-server process exits unexpectedly:
isReadyis set tofalse- A 5-second restart timeout is scheduled
startRerankerService()is called again automatically- Binary compatibility errors (
SIGTRAP,SIGILL) are logged with architecture details
Shutdown
On server close, stopRerankerService() clears any pending restart timeout and kills the child process.
Health Monitoring
getRerankerStatus() performs a live health check by fetching /health from the llama-server. Returns false if:
isReadyflag isfalse- Health endpoint is unreachable
- Response status is not
ok
The search endpoint checks reranker health before attempting ranking and falls back to unranked SearXNG results if unhealthy.
Reranking Process
Document Preparation
Search results are formatted as Markdown-style strings and truncated:
const doc = `[${title}](${url} "${snippet}")`.toLocaleLowerCase();
// Truncated to MAX_DOCUMENT_LENGTH (512 characters)
Both query and documents are lowercased and Unicode surrogates are sanitized before sending to the reranker.
Unicode Sanitization
sanitizeUnicodeSurrogates() validates Unicode surrogate pairs in input strings. Invalid surrogates are replaced with the Unicode replacement character (\ufffd). This prevents crashes when processing malformed UTF-8 from web search results.
Scoring and Filtering
The reranker returns relevance scores for each document. Results are filtered using a two-stage statistical approach:
- Score Normalization: Scores are shifted to positive range by adding the absolute value of the minimum score
- Standard Deviation Filter: Results below
mean - kStandardDeviationFactor * standardDeviationare filtered outkStandardDeviationFactor = 0.3
- Percentage Fallback: If fewer than 40% of results pass the standard deviation filter, a fallback threshold is applied:
minPercentageFallback = 0.4(40% of the highest normalized score)
Preserve Top Results Mode
When preserveTopResults = true, the ranking algorithm:
- Keeps the original top result (first from SearXNG) at position 1
- Filters remaining results by score
- Takes up to 9 next-best results, sorted by reranker score
- Appends remaining filtered results
This mode ensures the original top result is never displaced by the reranker, while still improving the ordering of subsequent results.
Integration with Search Pipeline
The search endpoint coordinates reranking in searchEndpointServerHook.ts:
1. fetchSearXNG(query, searchType, limit)
2. processSearchResults(rawResults)
3. Check getRerankerStatus()
- If healthy: rankSearchResults(query, results)
- If unhealthy: return unranked results (fallback)
4. Return structured JSON response
Reranking is applied only to text search results, not image results.
Model Details
| Property | Value |
|---|---|
| Model | jina-reranker-v1-tiny-en |
| Format | GGUF (Q8_0 quantized) |
| HuggingFace Repo | Felladrin/gguf-jina-reranker-v1-tiny-en |
| Type | Cross-encoder reranker |
| Language | English |
| Storage | server/models/Felladrin/gguf-jina-reranker-v1-tiny-en/ |
Error Handling
| Scenario | Behavior |
|---|---|
| Reranker not ready | Falls back to unranked SearXNG results |
| Reranking API error | isReady set to false, process killed, auto-restart scheduled |
| Empty documents array | Returns empty array without calling reranker |
| Unicode sanitization needed | Logs warning, continues with sanitized input |
| Binary architecture mismatch | Logs SIGTRAP/SIGILL error with architecture details |
Related Topics
- Search System:
docs/overview.md- Search pipeline overview - Server Hooks:
docs/overview.md#server-hook-system- Hook registration - Web Search Service:
server/webSearchService.ts- SearXNG integration