# Overview This is a Cloudflare protection bypass service that provides web scraping and bot protection circumvention capabilities. The application acts as an API service that uses headless browser automation (via Puppeteer) to solve Cloudflare challenges, including Turnstile CAPTCHAs and WAF (Web Application Firewall) sessions. It exposes multiple endpoints for different bypass strategies and can operate with or without proxy support. # User Preferences Preferred communication style: Simple, everyday language. # System Architecture ## Backend Architecture **Framework**: Express.js REST API server - **Problem**: Need to provide multiple browser automation endpoints as HTTP services - **Solution**: Express server with JSON body parsing, CORS support, and configurable timeout handling - **Rationale**: Express provides a lightweight, flexible framework for creating API endpoints with middleware support for authentication and request validation **Request Handling Pattern**: Unified handler with mode-based routing - **Problem**: Multiple similar endpoints with shared authentication and rate limiting logic - **Solution**: Single `handleSolverRequest` function that routes based on `mode` parameter - **Alternatives**: Separate route handlers per endpoint - **Pros**: Centralized validation, authentication, and rate limiting; reduced code duplication - **Cons**: Single handler must accommodate different parameter requirements per mode **Global State Management**: Module-level globals for browser instance and rate limiting - **Problem**: Need to share browser instance and track concurrent requests across all handlers - **Solution**: Global variables (`global.browser`, `global.browserLength`, `global.browserLimit`) - **Rationale**: Browser instance is expensive to create; rate limiting requires shared state - **Cons**: Not horizontally scalable; state is lost on restart ## Browser Automation **Browser Management**: Puppeteer with automatic reconnection - **Problem**: Headless browsers can crash or disconnect unpredictably - **Solution**: Auto-reconnection logic in `createBrowser.js` with exponential backoff - **Technology**: `puppeteer-real-browser` package for enhanced stealth capabilities - **Features**: Turnstile-ready configuration, XVFB support for headless environments, real browser fingerprinting **Context Isolation**: Browser contexts per request - **Problem**: Need to isolate each request's cookies, sessions, and proxy settings - **Solution**: Create new browser context for each request with independent proxy configuration - **Pros**: Complete isolation between concurrent requests; clean state per request - **Cons**: Higher resource usage than tab-based isolation **Timeout Management**: Per-request timeout with cleanup - **Problem**: Browser automation can hang indefinitely on problematic sites - **Solution**: Configurable timeout (`global.timeOut`) with context cleanup in all endpoints - **Rationale**: Prevents resource leaks and ensures predictable response times ## API Endpoints & Modes **Mode-Based Architecture**: Six distinct bypass strategies 1. **source**: Fetch page HTML after Cloudflare challenge resolution 2. **turnstile-min**: Solve Turnstile CAPTCHA using injected fake page with real site key 3. **turnstile-max**: Solve Turnstile on actual target page 4. **waf-session**: Create authenticated session with Cloudflare cookies and headers 5. **proxy-request**: Make authenticated requests using cookies/headers from waf-session (session reuse) **Session Reuse Workflow** (proxy-request): - First call `waf-session` to get cookies + headers - Pass those to `proxy-request` via `cookies` and `sessionHeaders` parameters - Requests are made through the same browser (Chrome fingerprint), maintaining CF clearance **Request Validation**: JSON Schema validation with AJV - **Problem**: Need to validate complex nested request structures (proxy configs, cookies, headers) - **Solution**: AJV with format validation for URIs and enums for modes/HTTP methods - **Rationale**: Schema-based validation provides clear error messages and type safety ## Security & Rate Limiting **Authentication**: Optional token-based authentication - **Implementation**: `authToken` environment variable compared against request parameter - **Design**: Simple bearer-style token authentication - **Rationale**: Lightweight protection for self-hosted deployments **Rate Limiting**: Browser instance concurrency limits - **Problem**: Too many concurrent browser contexts can exhaust system resources - **Solution**: `browserLimit` (default 20) enforced via `browserLength` counter - **Mechanism**: Request rejected with 429 if limit exceeded - **Cons**: In-memory counter doesn't work across multiple instances **Proxy Authentication**: Built-in proxy credential handling - **Problem**: Many proxies require username/password authentication - **Solution**: Puppeteer's `page.authenticate()` for proxy credentials - **Support**: Configurable per-request proxy with optional authentication ## Environment Configuration **Configuration Strategy**: Environment variables with sensible defaults - `PORT`: Server port (default: 3939) - `authToken`: Optional API authentication - `browserLimit`: Max concurrent browser contexts (default: 20) - `timeOut`: Request timeout in milliseconds (default: 60000) - `SKIP_LAUNCH`: Skip browser initialization for testing - `NODE_ENV`: Environment mode (development vs production) # External Dependencies ## Core Dependencies **puppeteer-real-browser** (v1.4.0) - **Purpose**: Enhanced Puppeteer wrapper with anti-detection features - **Key Features**: Turnstile challenge solving, real browser fingerprinting, XVFB support - **Integration**: Global browser instance managed in `module/createBrowser.js` **Express** (v4.21.0) - **Purpose**: HTTP server framework - **Integration**: Main application server with middleware stack **body-parser** (v1.20.3) - **Purpose**: Parse JSON and URL-encoded request bodies - **Integration**: Middleware for POST request handling **cors** (v2.8.5) - **Purpose**: Enable Cross-Origin Resource Sharing - **Integration**: Middleware to allow API access from web clients ## Validation & Schema **ajv** (v8.17.1) + **ajv-formats** (v3.0.1) - **Purpose**: JSON Schema validation with format extensions - **Integration**: Request parameter validation in `module/reqValidate.js` - **Schemas**: Validates URL formats, proxy configs, HTTP methods, cookie structures ## Testing **jest** (v29.7.0) - **Purpose**: Test framework - **Configuration**: Tests located in `tests/` directory with verbose output **supertest** (v7.0.0) - **Purpose**: HTTP assertion library for API testing - **Integration**: Testing Express endpoints ## External Services **Cloudflare Turnstile API** - **Integration**: Loaded via CDN script in fake page template - **URL**: `https://challenges.cloudflare.com/turnstile/v0/api.js` - **Purpose**: CAPTCHA challenge rendering and token generation **httpbin.org** - **Purpose**: Header detection service used in `wafSession.js` to extract Accept-Language header - **Endpoint**: `https://httpbin.org/get` - **Usage**: Validate browser fingerprinting fidelity