pluginengine01 / ANTI-BOT-STRATEGY.md

docs: add simple production anti-bot strategy without complex TLS impersonation

806fb75 verified 7 days ago

4.75 kB

	# Simple Production Anti-Bot Strategy

	This document replaces the overly-complex idea of forcing perfect Chrome TLS impersonation inside the core engine.

	## Principle

	Do not make the core plugin engine a fragile browser clone.

	Keep the BEX engine:

	- portable
	- buildable everywhere
	- easy to embed in C++ apps
	- deterministic where possible
	- independent from experimental TLS impersonation crates

	Then add simple challenge handling around it.

	## Recommended Flow

	```text
	Plugin request
	↓
	BEX normal HTTP backend
	↓
	Success? ────────────────→ return data
	↓ no
	Challenge detected?
	↓ yes
	Return CHALLENGE_REQUIRED with URL/domain/reason
	↓
	C++ app decides fallback:
	- use cached cookies
	- ask user to import cookies
	- open system browser/WebView only when needed
	- use app-specific HTTP fetcher
	- use optional proxy service
	```

	## Why This Is Better

	Perfect Chrome impersonation is not simple:

	- TLS JA3/JA4 changes with Chrome versions.
	- HTTP/2 fingerprints change.
	- Libraries using BoringSSL are harder to cross-compile.
	- Mobile/iOS/Android builds need separate proof.
	- One wrong cipher order or H2 setting can still get blocked.
	- CAPTCHA/Turnstile still cannot be solved silently.

	For an engine that must be used inside many C++ apps, the stable approach is:

	- use portable Rust HTTP by default
	- detect challenge pages reliably
	- delegate rare hard anti-bot cases to the host app

	## Challenge Detection

	A response should be treated as anti-bot/challenge if any of these are true:

	### Status codes

	- `403`
	- `429`
	- `503`

	### Headers

	- `server: cloudflare`
	- `cf-ray`
	- `cf-chl-*`
	- `x-datadome`
	- `x-perimeterx`
	- `akamai-*`

	### Body markers

	- `Just a moment...`
	- `Checking your browser`
	- `cf-browser-verification`
	- `cf-chl-`
	- `turnstile`
	- `captcha`
	- `datadome`
	- `px-captcha`

	## Engine-Level Behavior

	The BEX engine should not try to solve every challenge itself.

	Instead:

	1. Detect likely challenge.
	2. Return structured error:

	```json
	{
	"code": "CHALLENGE_REQUIRED",
	"url": "https://example.com/path",
	"final_url": "https://example.com/cdn-cgi/challenge-platform/...",
	"status": 403,
	"provider": "cloudflare",
	"domain": "example.com",
	"hint": "Host app should provide cookies or browser-backed fetch."
	}
	```

	3. Host app can then retry with cookies or a browser-backed fetcher.

	## Simple Fallback Options

	### Option A — User-provided cookies

	The app allows the user to paste/export cookies for a domain.

	Then plugins can send:

	```http
	Cookie: cf_clearance=...; session=...
	```

	This is simple, cross-platform, and avoids hidden browser automation.

	### Option B — App-level browser session

	The app opens a system browser/WebView only when needed.

	After challenge is solved, app stores cookies in BEX secret/KV store.

	Future requests use those cookies and avoid WebView.

	### Option C — External fetcher callback

	Expose an optional C ABI hook:

	```c
	typedef bool (*BexExternalFetch)(
	void* user_data,
	const char* method,
	const char* url,
	const uint8_t* body,
	size_t body_len,
	BexFetchResult* out
	);
	```

	Then the host app can provide:

	- libcurl-impersonate
	- platform-native HTTP stack
	- browser-backed fetch
	- company proxy
	- Android/iOS native networking

	The core engine stays simple.

	### Option D — Optional proxy service

	For apps that control their backend, route difficult sites through a server-side fetcher with proper browser fingerprinting.

	The engine stays portable and does not embed fragile anti-bot logic.

	## Plugin Guidance

	Plugins should:

	- set `Referer` correctly
	- preserve cookies when provided
	- avoid excessive retries
	- return `PluginError::Forbidden` or `PluginError::RateLimited` for challenge pages
	- prefer local JS ciphers over third-party helper APIs when possible

	Plugins should not:

	- hardcode fake TLS assumptions
	- rely on one external decoder service forever
	- endlessly retry CF challenge pages

	## Recommended Near-Term Fixes

	1. Add challenge detection in `HttpHostService`.
	2. Map challenges to a structured error payload for C ABI.
	3. Add cookie helper APIs:
	- set domain cookies
	- clear domain cookies
	- list stored challenge domains
	4. Add optional external fetch callback in C ABI.
	5. Keep advanced TLS impersonation as an optional backend only.

	## Final Recommendation

	For production:

	- Default: `reqwest + rustls` portable backend.
	- Add: challenge detection and external fallback hook.
	- Optional later: verified impersonation backend behind feature flag.

	This gives the best balance of:

	- reliability
	- cross-platform support
	- maintainability
	- app integration flexibility
	- real-world anti-bot handling