pluginengine01 / ANTI-BOT-STRATEGY.md
krystv's picture
docs: add simple production anti-bot strategy without complex TLS impersonation
806fb75 verified

Simple Production Anti-Bot Strategy

This document replaces the overly-complex idea of forcing perfect Chrome TLS impersonation inside the core engine.

Principle

Do not make the core plugin engine a fragile browser clone.

Keep the BEX engine:

  • portable
  • buildable everywhere
  • easy to embed in C++ apps
  • deterministic where possible
  • independent from experimental TLS impersonation crates

Then add simple challenge handling around it.

Recommended Flow

Plugin request
   ↓
BEX normal HTTP backend
   ↓
Success? ────────────────→ return data
   ↓ no
Challenge detected?
   ↓ yes
Return CHALLENGE_REQUIRED with URL/domain/reason
   ↓
C++ app decides fallback:
   - use cached cookies
   - ask user to import cookies
   - open system browser/WebView only when needed
   - use app-specific HTTP fetcher
   - use optional proxy service

Why This Is Better

Perfect Chrome impersonation is not simple:

  • TLS JA3/JA4 changes with Chrome versions.
  • HTTP/2 fingerprints change.
  • Libraries using BoringSSL are harder to cross-compile.
  • Mobile/iOS/Android builds need separate proof.
  • One wrong cipher order or H2 setting can still get blocked.
  • CAPTCHA/Turnstile still cannot be solved silently.

For an engine that must be used inside many C++ apps, the stable approach is:

  • use portable Rust HTTP by default
  • detect challenge pages reliably
  • delegate rare hard anti-bot cases to the host app

Challenge Detection

A response should be treated as anti-bot/challenge if any of these are true:

Status codes

  • 403
  • 429
  • 503

Headers

  • server: cloudflare
  • cf-ray
  • cf-chl-*
  • x-datadome
  • x-perimeterx
  • akamai-*

Body markers

  • Just a moment...
  • Checking your browser
  • cf-browser-verification
  • cf-chl-
  • turnstile
  • captcha
  • datadome
  • px-captcha

Engine-Level Behavior

The BEX engine should not try to solve every challenge itself.

Instead:

  1. Detect likely challenge.
  2. Return structured error:
{
  "code": "CHALLENGE_REQUIRED",
  "url": "https://example.com/path",
  "final_url": "https://example.com/cdn-cgi/challenge-platform/...",
  "status": 403,
  "provider": "cloudflare",
  "domain": "example.com",
  "hint": "Host app should provide cookies or browser-backed fetch."
}
  1. Host app can then retry with cookies or a browser-backed fetcher.

Simple Fallback Options

Option A β€” User-provided cookies

The app allows the user to paste/export cookies for a domain.

Then plugins can send:

Cookie: cf_clearance=...; session=...

This is simple, cross-platform, and avoids hidden browser automation.

Option B β€” App-level browser session

The app opens a system browser/WebView only when needed.

After challenge is solved, app stores cookies in BEX secret/KV store.

Future requests use those cookies and avoid WebView.

Option C β€” External fetcher callback

Expose an optional C ABI hook:

typedef bool (*BexExternalFetch)(
    void* user_data,
    const char* method,
    const char* url,
    const uint8_t* body,
    size_t body_len,
    BexFetchResult* out
);

Then the host app can provide:

  • libcurl-impersonate
  • platform-native HTTP stack
  • browser-backed fetch
  • company proxy
  • Android/iOS native networking

The core engine stays simple.

Option D β€” Optional proxy service

For apps that control their backend, route difficult sites through a server-side fetcher with proper browser fingerprinting.

The engine stays portable and does not embed fragile anti-bot logic.

Plugin Guidance

Plugins should:

  • set Referer correctly
  • preserve cookies when provided
  • avoid excessive retries
  • return PluginError::Forbidden or PluginError::RateLimited for challenge pages
  • prefer local JS ciphers over third-party helper APIs when possible

Plugins should not:

  • hardcode fake TLS assumptions
  • rely on one external decoder service forever
  • endlessly retry CF challenge pages

Recommended Near-Term Fixes

  1. Add challenge detection in HttpHostService.
  2. Map challenges to a structured error payload for C ABI.
  3. Add cookie helper APIs:
    • set domain cookies
    • clear domain cookies
    • list stored challenge domains
  4. Add optional external fetch callback in C ABI.
  5. Keep advanced TLS impersonation as an optional backend only.

Final Recommendation

For production:

  • Default: reqwest + rustls portable backend.
  • Add: challenge detection and external fallback hook.
  • Optional later: verified impersonation backend behind feature flag.

This gives the best balance of:

  • reliability
  • cross-platform support
  • maintainability
  • app integration flexibility
  • real-world anti-bot handling