pluginengine01 / ANTI-BOT-STRATEGY.md

krystv

docs: add simple production anti-bot strategy without complex TLS impersonation

806fb75 verified 7 days ago

preview code

raw

history blame contribute delete

4.75 kB

Simple Production Anti-Bot Strategy

This document replaces the overly-complex idea of forcing perfect Chrome TLS impersonation inside the core engine.

Principle

Do not make the core plugin engine a fragile browser clone.

Keep the BEX engine:

portable
buildable everywhere
easy to embed in C++ apps
deterministic where possible
independent from experimental TLS impersonation crates

Then add simple challenge handling around it.

Recommended Flow

Plugin request
   ↓
BEX normal HTTP backend
   ↓
Success? ────────────────→ return data
   ↓ no
Challenge detected?
   ↓ yes
Return CHALLENGE_REQUIRED with URL/domain/reason
   ↓
C++ app decides fallback:
   - use cached cookies
   - ask user to import cookies
   - open system browser/WebView only when needed
   - use app-specific HTTP fetcher
   - use optional proxy service

Why This Is Better

Perfect Chrome impersonation is not simple:

TLS JA3/JA4 changes with Chrome versions.
HTTP/2 fingerprints change.
Libraries using BoringSSL are harder to cross-compile.
Mobile/iOS/Android builds need separate proof.
One wrong cipher order or H2 setting can still get blocked.
CAPTCHA/Turnstile still cannot be solved silently.

For an engine that must be used inside many C++ apps, the stable approach is:

use portable Rust HTTP by default
detect challenge pages reliably
delegate rare hard anti-bot cases to the host app

Challenge Detection

A response should be treated as anti-bot/challenge if any of these are true:

Status codes

403
429
503

Headers

server: cloudflare
cf-ray
cf-chl-*
x-datadome
x-perimeterx
akamai-*

Body markers

Just a moment...
Checking your browser
cf-browser-verification
cf-chl-
turnstile
captcha
datadome
px-captcha

Engine-Level Behavior

The BEX engine should not try to solve every challenge itself.

Instead:

Detect likely challenge.
Return structured error:

{
  "code": "CHALLENGE_REQUIRED",
  "url": "https://example.com/path",
  "final_url": "https://example.com/cdn-cgi/challenge-platform/...",
  "status": 403,
  "provider": "cloudflare",
  "domain": "example.com",
  "hint": "Host app should provide cookies or browser-backed fetch."
}

Host app can then retry with cookies or a browser-backed fetcher.

Simple Fallback Options

Option A — User-provided cookies

The app allows the user to paste/export cookies for a domain.

Then plugins can send:

Cookie: cf_clearance=...; session=...

This is simple, cross-platform, and avoids hidden browser automation.

Option B — App-level browser session

The app opens a system browser/WebView only when needed.

After challenge is solved, app stores cookies in BEX secret/KV store.

Future requests use those cookies and avoid WebView.

Option C — External fetcher callback

Expose an optional C ABI hook:

typedef bool (*BexExternalFetch)(
    void* user_data,
    const char* method,
    const char* url,
    const uint8_t* body,
    size_t body_len,
    BexFetchResult* out
);

Then the host app can provide:

libcurl-impersonate
platform-native HTTP stack
browser-backed fetch
company proxy
Android/iOS native networking

The core engine stays simple.

Option D — Optional proxy service

For apps that control their backend, route difficult sites through a server-side fetcher with proper browser fingerprinting.

The engine stays portable and does not embed fragile anti-bot logic.

Plugin Guidance

Plugins should:

set Referer correctly
preserve cookies when provided
avoid excessive retries
return PluginError::Forbidden or PluginError::RateLimited for challenge pages
prefer local JS ciphers over third-party helper APIs when possible

Plugins should not:

hardcode fake TLS assumptions
rely on one external decoder service forever
endlessly retry CF challenge pages

Recommended Near-Term Fixes

Add challenge detection in HttpHostService.
Map challenges to a structured error payload for C ABI.
Add cookie helper APIs:
- set domain cookies
- clear domain cookies
- list stored challenge domains
Add optional external fetch callback in C ABI.
Keep advanced TLS impersonation as an optional backend only.

Final Recommendation

For production:

Default: reqwest + rustls portable backend.
Add: challenge detection and external fallback hook.
Optional later: verified impersonation backend behind feature flag.

This gives the best balance of:

reliability
cross-platform support
maintainability
app integration flexibility
real-world anti-bot handling