File size: 5,854 Bytes
d86da64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
# Search Result Reranking

MiniSearch optionally reranks search results using a cross-encoder model running on a local `llama-server` instance. This secondary search stage reorders initial SearXNG results based on their semantic relevance to the user's query.

## Architecture Overview

The reranking subsystem consists of three components:

| Component | File | Responsibility |
|-----------|------|----------------|
| Service Manager | `server/rerankerService.ts` | llama-server lifecycle, health checks, reranking API calls |
| Ranking Logic | `server/rankSearchResults.ts` | Score-based filtering and result reordering |
| Server Hook | `server/rerankerServiceHook.ts` | Startup/shutdown coordination with Vite server |

## Service Lifecycle

### Startup

The `rerankerServiceHook` starts the reranker during server initialization:

1. Downloads the model from HuggingFace if not present (`Felladrin/gguf-jina-reranker-v1-tiny-en/jina-reranker-v1-tiny-en-Q8_0.gguf`)
2. Spawns `llama-server` as a child process
3. Polls `/health` endpoint until status is `ok`
4. Performs a warmup rerank request (`query: "test"`, `documents: ["test document"]`) to ensure the model is fully loaded
5. Sets `isReady = true`

### llama-server Configuration

The reranker process is spawned with these arguments:

| Argument | Value | Purpose |
|----------|-------|---------|
| `--model` | `jina-reranker-v1-tiny-en-Q8_0.gguf` | Cross-encoder reranking model |
| `--ctx-size` | 2048 | Context window size |
| `--batch-size` | 2048 | Batch processing size |
| `--ubatch-size` | 2048 | Micro-batch size |
| `--flash-attn` | auto | Flash attention optimization |
| `--host` | 127.0.0.1 | Local-only binding |
| `--port` | 8012 | Service port |
| `--threads` | 1 | Single-threaded operation |
| `--parallel` | 1 | Single parallel request |
| `--reranking` | (flag) | Enable reranking mode |
| `--pooling` | rank | Rank pooling strategy |

### Automatic Restart

If the `llama-server` process exits unexpectedly:

1. `isReady` is set to `false`
2. A 5-second restart timeout is scheduled
3. `startRerankerService()` is called again automatically
4. Binary compatibility errors (`SIGTRAP`, `SIGILL`) are logged with architecture details

### Shutdown

On server close, `stopRerankerService()` clears any pending restart timeout and kills the child process.

## Health Monitoring

`getRerankerStatus()` performs a live health check by fetching `/health` from the llama-server. Returns `false` if:
- `isReady` flag is `false`
- Health endpoint is unreachable
- Response status is not `ok`

The search endpoint checks reranker health before attempting ranking and falls back to unranked SearXNG results if unhealthy.

## Reranking Process

### Document Preparation

Search results are formatted as Markdown-style strings and truncated:

```typescript
const doc = `[${title}](${url} "${snippet}")`.toLocaleLowerCase();
// Truncated to MAX_DOCUMENT_LENGTH (512 characters)
```

Both query and documents are lowercased and Unicode surrogates are sanitized before sending to the reranker.

### Unicode Sanitization

`sanitizeUnicodeSurrogates()` validates Unicode surrogate pairs in input strings. Invalid surrogates are replaced with the Unicode replacement character (`\ufffd`). This prevents crashes when processing malformed UTF-8 from web search results.

### Scoring and Filtering

The reranker returns relevance scores for each document. Results are filtered using a two-stage statistical approach:

1. **Score Normalization**: Scores are shifted to positive range by adding the absolute value of the minimum score
2. **Standard Deviation Filter**: Results below `mean - kStandardDeviationFactor * standardDeviation` are filtered out
   - `kStandardDeviationFactor = 0.3`
3. **Percentage Fallback**: If fewer than 40% of results pass the standard deviation filter, a fallback threshold is applied:
   - `minPercentageFallback = 0.4` (40% of the highest normalized score)

### Preserve Top Results Mode

When `preserveTopResults = true`, the ranking algorithm:

1. Keeps the original top result (first from SearXNG) at position 1
2. Filters remaining results by score
3. Takes up to 9 next-best results, sorted by reranker score
4. Appends remaining filtered results

This mode ensures the original top result is never displaced by the reranker, while still improving the ordering of subsequent results.

## Integration with Search Pipeline

The search endpoint coordinates reranking in `searchEndpointServerHook.ts`:

```
1. fetchSearXNG(query, searchType, limit)
2. processSearchResults(rawResults)
3. Check getRerankerStatus()
   - If healthy: rankSearchResults(query, results)
   - If unhealthy: return unranked results (fallback)
4. Return structured JSON response
```

Reranking is applied only to text search results, not image results.

## Model Details

| Property | Value |
|----------|-------|
| Model | jina-reranker-v1-tiny-en |
| Format | GGUF (Q8_0 quantized) |
| HuggingFace Repo | Felladrin/gguf-jina-reranker-v1-tiny-en |
| Type | Cross-encoder reranker |
| Language | English |
| Storage | `server/models/Felladrin/gguf-jina-reranker-v1-tiny-en/` |

## Error Handling

| Scenario | Behavior |
|----------|----------|
| Reranker not ready | Falls back to unranked SearXNG results |
| Reranking API error | `isReady` set to `false`, process killed, auto-restart scheduled |
| Empty documents array | Returns empty array without calling reranker |
| Unicode sanitization needed | Logs warning, continues with sanitized input |
| Binary architecture mismatch | Logs `SIGTRAP`/`SIGILL` error with architecture details |

## Related Topics

- **Search System**: `docs/overview.md` - Search pipeline overview
- **Server Hooks**: `docs/overview.md#server-hook-system` - Hook registration
- **Web Search Service**: `server/webSearchService.ts` - SearXNG integration