cf-solver / replit.md
Samuel
update
15f7aec

Overview

This is a Cloudflare protection bypass service that provides web scraping and bot protection circumvention capabilities. The application acts as an API service that uses headless browser automation (via Puppeteer) to solve Cloudflare challenges, including Turnstile CAPTCHAs and WAF (Web Application Firewall) sessions. It exposes multiple endpoints for different bypass strategies and can operate with or without proxy support.

User Preferences

Preferred communication style: Simple, everyday language.

System Architecture

Backend Architecture

Framework: Express.js REST API server

  • Problem: Need to provide multiple browser automation endpoints as HTTP services
  • Solution: Express server with JSON body parsing, CORS support, and configurable timeout handling
  • Rationale: Express provides a lightweight, flexible framework for creating API endpoints with middleware support for authentication and request validation

Request Handling Pattern: Unified handler with mode-based routing

  • Problem: Multiple similar endpoints with shared authentication and rate limiting logic
  • Solution: Single handleSolverRequest function that routes based on mode parameter
  • Alternatives: Separate route handlers per endpoint
  • Pros: Centralized validation, authentication, and rate limiting; reduced code duplication
  • Cons: Single handler must accommodate different parameter requirements per mode

Global State Management: Module-level globals for browser instance and rate limiting

  • Problem: Need to share browser instance and track concurrent requests across all handlers
  • Solution: Global variables (global.browser, global.browserLength, global.browserLimit)
  • Rationale: Browser instance is expensive to create; rate limiting requires shared state
  • Cons: Not horizontally scalable; state is lost on restart

Browser Automation

Browser Management: Puppeteer with automatic reconnection

  • Problem: Headless browsers can crash or disconnect unpredictably
  • Solution: Auto-reconnection logic in createBrowser.js with exponential backoff
  • Technology: puppeteer-real-browser package for enhanced stealth capabilities
  • Features: Turnstile-ready configuration, XVFB support for headless environments, real browser fingerprinting

Context Isolation: Browser contexts per request

  • Problem: Need to isolate each request's cookies, sessions, and proxy settings
  • Solution: Create new browser context for each request with independent proxy configuration
  • Pros: Complete isolation between concurrent requests; clean state per request
  • Cons: Higher resource usage than tab-based isolation

Timeout Management: Per-request timeout with cleanup

  • Problem: Browser automation can hang indefinitely on problematic sites
  • Solution: Configurable timeout (global.timeOut) with context cleanup in all endpoints
  • Rationale: Prevents resource leaks and ensures predictable response times

API Endpoints & Modes

Mode-Based Architecture: Six distinct bypass strategies

  1. source: Fetch page HTML after Cloudflare challenge resolution
  2. turnstile-min: Solve Turnstile CAPTCHA using injected fake page with real site key
  3. turnstile-max: Solve Turnstile on actual target page
  4. waf-session: Create authenticated session with Cloudflare cookies and headers
  5. proxy-request: Make authenticated requests using cookies/headers from waf-session (session reuse)

Session Reuse Workflow (proxy-request):

  • First call waf-session to get cookies + headers
  • Pass those to proxy-request via cookies and sessionHeaders parameters
  • Requests are made through the same browser (Chrome fingerprint), maintaining CF clearance

Request Validation: JSON Schema validation with AJV

  • Problem: Need to validate complex nested request structures (proxy configs, cookies, headers)
  • Solution: AJV with format validation for URIs and enums for modes/HTTP methods
  • Rationale: Schema-based validation provides clear error messages and type safety

Security & Rate Limiting

Authentication: Optional token-based authentication

  • Implementation: authToken environment variable compared against request parameter
  • Design: Simple bearer-style token authentication
  • Rationale: Lightweight protection for self-hosted deployments

Rate Limiting: Browser instance concurrency limits

  • Problem: Too many concurrent browser contexts can exhaust system resources
  • Solution: browserLimit (default 20) enforced via browserLength counter
  • Mechanism: Request rejected with 429 if limit exceeded
  • Cons: In-memory counter doesn't work across multiple instances

Proxy Authentication: Built-in proxy credential handling

  • Problem: Many proxies require username/password authentication
  • Solution: Puppeteer's page.authenticate() for proxy credentials
  • Support: Configurable per-request proxy with optional authentication

Environment Configuration

Configuration Strategy: Environment variables with sensible defaults

  • PORT: Server port (default: 3939)
  • authToken: Optional API authentication
  • browserLimit: Max concurrent browser contexts (default: 20)
  • timeOut: Request timeout in milliseconds (default: 60000)
  • SKIP_LAUNCH: Skip browser initialization for testing
  • NODE_ENV: Environment mode (development vs production)

External Dependencies

Core Dependencies

puppeteer-real-browser (v1.4.0)

  • Purpose: Enhanced Puppeteer wrapper with anti-detection features
  • Key Features: Turnstile challenge solving, real browser fingerprinting, XVFB support
  • Integration: Global browser instance managed in module/createBrowser.js

Express (v4.21.0)

  • Purpose: HTTP server framework
  • Integration: Main application server with middleware stack

body-parser (v1.20.3)

  • Purpose: Parse JSON and URL-encoded request bodies
  • Integration: Middleware for POST request handling

cors (v2.8.5)

  • Purpose: Enable Cross-Origin Resource Sharing
  • Integration: Middleware to allow API access from web clients

Validation & Schema

ajv (v8.17.1) + ajv-formats (v3.0.1)

  • Purpose: JSON Schema validation with format extensions
  • Integration: Request parameter validation in module/reqValidate.js
  • Schemas: Validates URL formats, proxy configs, HTTP methods, cookie structures

Testing

jest (v29.7.0)

  • Purpose: Test framework
  • Configuration: Tests located in tests/ directory with verbose output

supertest (v7.0.0)

  • Purpose: HTTP assertion library for API testing
  • Integration: Testing Express endpoints

External Services

Cloudflare Turnstile API

  • Integration: Loaded via CDN script in fake page template
  • URL: https://challenges.cloudflare.com/turnstile/v0/api.js
  • Purpose: CAPTCHA challenge rendering and token generation

httpbin.org

  • Purpose: Header detection service used in wafSession.js to extract Accept-Language header
  • Endpoint: https://httpbin.org/get
  • Usage: Validate browser fingerprinting fidelity