pluginengine01 / ANTI-BOT-STRATEGY.md
krystv's picture
docs: add simple production anti-bot strategy without complex TLS impersonation
806fb75 verified
# Simple Production Anti-Bot Strategy
This document replaces the overly-complex idea of forcing perfect Chrome TLS impersonation inside the core engine.
## Principle
**Do not make the core plugin engine a fragile browser clone.**
Keep the BEX engine:
- portable
- buildable everywhere
- easy to embed in C++ apps
- deterministic where possible
- independent from experimental TLS impersonation crates
Then add simple challenge handling around it.
## Recommended Flow
```text
Plugin request
BEX normal HTTP backend
Success? ────────────────→ return data
↓ no
Challenge detected?
↓ yes
Return CHALLENGE_REQUIRED with URL/domain/reason
C++ app decides fallback:
- use cached cookies
- ask user to import cookies
- open system browser/WebView only when needed
- use app-specific HTTP fetcher
- use optional proxy service
```
## Why This Is Better
Perfect Chrome impersonation is not simple:
- TLS JA3/JA4 changes with Chrome versions.
- HTTP/2 fingerprints change.
- Libraries using BoringSSL are harder to cross-compile.
- Mobile/iOS/Android builds need separate proof.
- One wrong cipher order or H2 setting can still get blocked.
- CAPTCHA/Turnstile still cannot be solved silently.
For an engine that must be used inside **many C++ apps**, the stable approach is:
- use portable Rust HTTP by default
- detect challenge pages reliably
- delegate rare hard anti-bot cases to the host app
## Challenge Detection
A response should be treated as anti-bot/challenge if any of these are true:
### Status codes
- `403`
- `429`
- `503`
### Headers
- `server: cloudflare`
- `cf-ray`
- `cf-chl-*`
- `x-datadome`
- `x-perimeterx`
- `akamai-*`
### Body markers
- `Just a moment...`
- `Checking your browser`
- `cf-browser-verification`
- `cf-chl-`
- `turnstile`
- `captcha`
- `datadome`
- `px-captcha`
## Engine-Level Behavior
The BEX engine should not try to solve every challenge itself.
Instead:
1. Detect likely challenge.
2. Return structured error:
```json
{
"code": "CHALLENGE_REQUIRED",
"url": "https://example.com/path",
"final_url": "https://example.com/cdn-cgi/challenge-platform/...",
"status": 403,
"provider": "cloudflare",
"domain": "example.com",
"hint": "Host app should provide cookies or browser-backed fetch."
}
```
3. Host app can then retry with cookies or a browser-backed fetcher.
## Simple Fallback Options
### Option A — User-provided cookies
The app allows the user to paste/export cookies for a domain.
Then plugins can send:
```http
Cookie: cf_clearance=...; session=...
```
This is simple, cross-platform, and avoids hidden browser automation.
### Option B — App-level browser session
The app opens a system browser/WebView **only when needed**.
After challenge is solved, app stores cookies in BEX secret/KV store.
Future requests use those cookies and avoid WebView.
### Option C — External fetcher callback
Expose an optional C ABI hook:
```c
typedef bool (*BexExternalFetch)(
void* user_data,
const char* method,
const char* url,
const uint8_t* body,
size_t body_len,
BexFetchResult* out
);
```
Then the host app can provide:
- libcurl-impersonate
- platform-native HTTP stack
- browser-backed fetch
- company proxy
- Android/iOS native networking
The core engine stays simple.
### Option D — Optional proxy service
For apps that control their backend, route difficult sites through a server-side fetcher with proper browser fingerprinting.
The engine stays portable and does not embed fragile anti-bot logic.
## Plugin Guidance
Plugins should:
- set `Referer` correctly
- preserve cookies when provided
- avoid excessive retries
- return `PluginError::Forbidden` or `PluginError::RateLimited` for challenge pages
- prefer local JS ciphers over third-party helper APIs when possible
Plugins should not:
- hardcode fake TLS assumptions
- rely on one external decoder service forever
- endlessly retry CF challenge pages
## Recommended Near-Term Fixes
1. Add challenge detection in `HttpHostService`.
2. Map challenges to a structured error payload for C ABI.
3. Add cookie helper APIs:
- set domain cookies
- clear domain cookies
- list stored challenge domains
4. Add optional external fetch callback in C ABI.
5. Keep advanced TLS impersonation as an optional backend only.
## Final Recommendation
For production:
- Default: `reqwest + rustls` portable backend.
- Add: challenge detection and external fallback hook.
- Optional later: verified impersonation backend behind feature flag.
This gives the best balance of:
- reliability
- cross-platform support
- maintainability
- app integration flexibility
- real-world anti-bot handling