| # Simple Production Anti-Bot Strategy |
|
|
| This document replaces the overly-complex idea of forcing perfect Chrome TLS impersonation inside the core engine. |
|
|
| ## Principle |
|
|
| **Do not make the core plugin engine a fragile browser clone.** |
|
|
| Keep the BEX engine: |
|
|
| - portable |
| - buildable everywhere |
| - easy to embed in C++ apps |
| - deterministic where possible |
| - independent from experimental TLS impersonation crates |
|
|
| Then add simple challenge handling around it. |
|
|
| ## Recommended Flow |
|
|
| ```text |
| Plugin request |
| ↓ |
| BEX normal HTTP backend |
| ↓ |
| Success? ────────────────→ return data |
| ↓ no |
| Challenge detected? |
| ↓ yes |
| Return CHALLENGE_REQUIRED with URL/domain/reason |
| ↓ |
| C++ app decides fallback: |
| - use cached cookies |
| - ask user to import cookies |
| - open system browser/WebView only when needed |
| - use app-specific HTTP fetcher |
| - use optional proxy service |
| ``` |
|
|
| ## Why This Is Better |
|
|
| Perfect Chrome impersonation is not simple: |
|
|
| - TLS JA3/JA4 changes with Chrome versions. |
| - HTTP/2 fingerprints change. |
| - Libraries using BoringSSL are harder to cross-compile. |
| - Mobile/iOS/Android builds need separate proof. |
| - One wrong cipher order or H2 setting can still get blocked. |
| - CAPTCHA/Turnstile still cannot be solved silently. |
|
|
| For an engine that must be used inside **many C++ apps**, the stable approach is: |
|
|
| - use portable Rust HTTP by default |
| - detect challenge pages reliably |
| - delegate rare hard anti-bot cases to the host app |
|
|
| ## Challenge Detection |
|
|
| A response should be treated as anti-bot/challenge if any of these are true: |
|
|
| ### Status codes |
|
|
| - `403` |
| - `429` |
| - `503` |
|
|
| ### Headers |
|
|
| - `server: cloudflare` |
| - `cf-ray` |
| - `cf-chl-*` |
| - `x-datadome` |
| - `x-perimeterx` |
| - `akamai-*` |
|
|
| ### Body markers |
|
|
| - `Just a moment...` |
| - `Checking your browser` |
| - `cf-browser-verification` |
| - `cf-chl-` |
| - `turnstile` |
| - `captcha` |
| - `datadome` |
| - `px-captcha` |
|
|
| ## Engine-Level Behavior |
|
|
| The BEX engine should not try to solve every challenge itself. |
|
|
| Instead: |
|
|
| 1. Detect likely challenge. |
| 2. Return structured error: |
|
|
| ```json |
| { |
| "code": "CHALLENGE_REQUIRED", |
| "url": "https://example.com/path", |
| "final_url": "https://example.com/cdn-cgi/challenge-platform/...", |
| "status": 403, |
| "provider": "cloudflare", |
| "domain": "example.com", |
| "hint": "Host app should provide cookies or browser-backed fetch." |
| } |
| ``` |
|
|
| 3. Host app can then retry with cookies or a browser-backed fetcher. |
|
|
| ## Simple Fallback Options |
|
|
| ### Option A — User-provided cookies |
|
|
| The app allows the user to paste/export cookies for a domain. |
|
|
| Then plugins can send: |
|
|
| ```http |
| Cookie: cf_clearance=...; session=... |
| ``` |
|
|
| This is simple, cross-platform, and avoids hidden browser automation. |
|
|
| ### Option B — App-level browser session |
|
|
| The app opens a system browser/WebView **only when needed**. |
|
|
| After challenge is solved, app stores cookies in BEX secret/KV store. |
|
|
| Future requests use those cookies and avoid WebView. |
|
|
| ### Option C — External fetcher callback |
|
|
| Expose an optional C ABI hook: |
|
|
| ```c |
| typedef bool (*BexExternalFetch)( |
| void* user_data, |
| const char* method, |
| const char* url, |
| const uint8_t* body, |
| size_t body_len, |
| BexFetchResult* out |
| ); |
| ``` |
|
|
| Then the host app can provide: |
|
|
| - libcurl-impersonate |
| - platform-native HTTP stack |
| - browser-backed fetch |
| - company proxy |
| - Android/iOS native networking |
|
|
| The core engine stays simple. |
|
|
| ### Option D — Optional proxy service |
|
|
| For apps that control their backend, route difficult sites through a server-side fetcher with proper browser fingerprinting. |
|
|
| The engine stays portable and does not embed fragile anti-bot logic. |
|
|
| ## Plugin Guidance |
|
|
| Plugins should: |
|
|
| - set `Referer` correctly |
| - preserve cookies when provided |
| - avoid excessive retries |
| - return `PluginError::Forbidden` or `PluginError::RateLimited` for challenge pages |
| - prefer local JS ciphers over third-party helper APIs when possible |
|
|
| Plugins should not: |
|
|
| - hardcode fake TLS assumptions |
| - rely on one external decoder service forever |
| - endlessly retry CF challenge pages |
|
|
| ## Recommended Near-Term Fixes |
|
|
| 1. Add challenge detection in `HttpHostService`. |
| 2. Map challenges to a structured error payload for C ABI. |
| 3. Add cookie helper APIs: |
| - set domain cookies |
| - clear domain cookies |
| - list stored challenge domains |
| 4. Add optional external fetch callback in C ABI. |
| 5. Keep advanced TLS impersonation as an optional backend only. |
|
|
| ## Final Recommendation |
|
|
| For production: |
|
|
| - Default: `reqwest + rustls` portable backend. |
| - Add: challenge detection and external fallback hook. |
| - Optional later: verified impersonation backend behind feature flag. |
|
|
| This gives the best balance of: |
|
|
| - reliability |
| - cross-platform support |
| - maintainability |
| - app integration flexibility |
| - real-world anti-bot handling |
|
|