krystv commited on
Commit
806fb75
Β·
verified Β·
1 Parent(s): 538142a

docs: add simple production anti-bot strategy without complex TLS impersonation

Browse files
Files changed (1) hide show
  1. ANTI-BOT-STRATEGY.md +204 -0
ANTI-BOT-STRATEGY.md ADDED
@@ -0,0 +1,204 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Simple Production Anti-Bot Strategy
2
+
3
+ This document replaces the overly-complex idea of forcing perfect Chrome TLS impersonation inside the core engine.
4
+
5
+ ## Principle
6
+
7
+ **Do not make the core plugin engine a fragile browser clone.**
8
+
9
+ Keep the BEX engine:
10
+
11
+ - portable
12
+ - buildable everywhere
13
+ - easy to embed in C++ apps
14
+ - deterministic where possible
15
+ - independent from experimental TLS impersonation crates
16
+
17
+ Then add simple challenge handling around it.
18
+
19
+ ## Recommended Flow
20
+
21
+ ```text
22
+ Plugin request
23
+ ↓
24
+ BEX normal HTTP backend
25
+ ↓
26
+ Success? ────────────────→ return data
27
+ ↓ no
28
+ Challenge detected?
29
+ ↓ yes
30
+ Return CHALLENGE_REQUIRED with URL/domain/reason
31
+ ↓
32
+ C++ app decides fallback:
33
+ - use cached cookies
34
+ - ask user to import cookies
35
+ - open system browser/WebView only when needed
36
+ - use app-specific HTTP fetcher
37
+ - use optional proxy service
38
+ ```
39
+
40
+ ## Why This Is Better
41
+
42
+ Perfect Chrome impersonation is not simple:
43
+
44
+ - TLS JA3/JA4 changes with Chrome versions.
45
+ - HTTP/2 fingerprints change.
46
+ - Libraries using BoringSSL are harder to cross-compile.
47
+ - Mobile/iOS/Android builds need separate proof.
48
+ - One wrong cipher order or H2 setting can still get blocked.
49
+ - CAPTCHA/Turnstile still cannot be solved silently.
50
+
51
+ For an engine that must be used inside **many C++ apps**, the stable approach is:
52
+
53
+ - use portable Rust HTTP by default
54
+ - detect challenge pages reliably
55
+ - delegate rare hard anti-bot cases to the host app
56
+
57
+ ## Challenge Detection
58
+
59
+ A response should be treated as anti-bot/challenge if any of these are true:
60
+
61
+ ### Status codes
62
+
63
+ - `403`
64
+ - `429`
65
+ - `503`
66
+
67
+ ### Headers
68
+
69
+ - `server: cloudflare`
70
+ - `cf-ray`
71
+ - `cf-chl-*`
72
+ - `x-datadome`
73
+ - `x-perimeterx`
74
+ - `akamai-*`
75
+
76
+ ### Body markers
77
+
78
+ - `Just a moment...`
79
+ - `Checking your browser`
80
+ - `cf-browser-verification`
81
+ - `cf-chl-`
82
+ - `turnstile`
83
+ - `captcha`
84
+ - `datadome`
85
+ - `px-captcha`
86
+
87
+ ## Engine-Level Behavior
88
+
89
+ The BEX engine should not try to solve every challenge itself.
90
+
91
+ Instead:
92
+
93
+ 1. Detect likely challenge.
94
+ 2. Return structured error:
95
+
96
+ ```json
97
+ {
98
+ "code": "CHALLENGE_REQUIRED",
99
+ "url": "https://example.com/path",
100
+ "final_url": "https://example.com/cdn-cgi/challenge-platform/...",
101
+ "status": 403,
102
+ "provider": "cloudflare",
103
+ "domain": "example.com",
104
+ "hint": "Host app should provide cookies or browser-backed fetch."
105
+ }
106
+ ```
107
+
108
+ 3. Host app can then retry with cookies or a browser-backed fetcher.
109
+
110
+ ## Simple Fallback Options
111
+
112
+ ### Option A β€” User-provided cookies
113
+
114
+ The app allows the user to paste/export cookies for a domain.
115
+
116
+ Then plugins can send:
117
+
118
+ ```http
119
+ Cookie: cf_clearance=...; session=...
120
+ ```
121
+
122
+ This is simple, cross-platform, and avoids hidden browser automation.
123
+
124
+ ### Option B β€” App-level browser session
125
+
126
+ The app opens a system browser/WebView **only when needed**.
127
+
128
+ After challenge is solved, app stores cookies in BEX secret/KV store.
129
+
130
+ Future requests use those cookies and avoid WebView.
131
+
132
+ ### Option C β€” External fetcher callback
133
+
134
+ Expose an optional C ABI hook:
135
+
136
+ ```c
137
+ typedef bool (*BexExternalFetch)(
138
+ void* user_data,
139
+ const char* method,
140
+ const char* url,
141
+ const uint8_t* body,
142
+ size_t body_len,
143
+ BexFetchResult* out
144
+ );
145
+ ```
146
+
147
+ Then the host app can provide:
148
+
149
+ - libcurl-impersonate
150
+ - platform-native HTTP stack
151
+ - browser-backed fetch
152
+ - company proxy
153
+ - Android/iOS native networking
154
+
155
+ The core engine stays simple.
156
+
157
+ ### Option D β€” Optional proxy service
158
+
159
+ For apps that control their backend, route difficult sites through a server-side fetcher with proper browser fingerprinting.
160
+
161
+ The engine stays portable and does not embed fragile anti-bot logic.
162
+
163
+ ## Plugin Guidance
164
+
165
+ Plugins should:
166
+
167
+ - set `Referer` correctly
168
+ - preserve cookies when provided
169
+ - avoid excessive retries
170
+ - return `PluginError::Forbidden` or `PluginError::RateLimited` for challenge pages
171
+ - prefer local JS ciphers over third-party helper APIs when possible
172
+
173
+ Plugins should not:
174
+
175
+ - hardcode fake TLS assumptions
176
+ - rely on one external decoder service forever
177
+ - endlessly retry CF challenge pages
178
+
179
+ ## Recommended Near-Term Fixes
180
+
181
+ 1. Add challenge detection in `HttpHostService`.
182
+ 2. Map challenges to a structured error payload for C ABI.
183
+ 3. Add cookie helper APIs:
184
+ - set domain cookies
185
+ - clear domain cookies
186
+ - list stored challenge domains
187
+ 4. Add optional external fetch callback in C ABI.
188
+ 5. Keep advanced TLS impersonation as an optional backend only.
189
+
190
+ ## Final Recommendation
191
+
192
+ For production:
193
+
194
+ - Default: `reqwest + rustls` portable backend.
195
+ - Add: challenge detection and external fallback hook.
196
+ - Optional later: verified impersonation backend behind feature flag.
197
+
198
+ This gives the best balance of:
199
+
200
+ - reliability
201
+ - cross-platform support
202
+ - maintainability
203
+ - app integration flexibility
204
+ - real-world anti-bot handling