# Complete API Flow Documentation ## Overview The DocGenie API provides three endpoints for synthetic document generation, implementing a 19-stage pipeline that transforms seed images and prompts into complete datasets with OCR, ground truth, and optional handwriting/visual elements. **Base URL**: `http://localhost:8000` (development) or Railway deployment **Documentation**: `/docs` (FastAPI auto-generated Swagger UI) --- ## API Endpoints ### 1. `/generate` - Legacy JSON Response (POST) **Purpose**: Generate documents and return complete JSON metadata **Response**: JSON with HTML, PDF (base64), bounding boxes, optional handwriting/visual elements **Use Case**: Testing, development, full metadata inspection **Pipeline Stages**: 1-19 (configurable via parameters) ### 2. `/generate/pdf` - Sync PDF+Dataset ZIP (POST) **Purpose**: Generate documents and return ZIP file with all artifacts **Response**: ZIP file containing: - `*.pdf` - Generated document PDFs - `*_final.pdf` - PDFs with handwriting/visual elements (if enabled) - `*.msgpack` - Dataset format (if export enabled) - `metadata.json` - Complete generation metadata - `handwriting/` - Individual handwriting images - `visual_elements/` - Individual visual element images **Use Case**: Production dataset generation, batch processing **Pipeline Stages**: 1-19 (all features available) ### 3. `/generate/async` - Async Batch Processing (POST) **Purpose**: Queue large batch jobs via background worker (Redis Queue) **Response**: Task ID for status polling **Status Check**: `GET /generate/async/status/{task_id}` **Result Download**: `GET /generate/async/result/{task_id}` (returns ZIP) **Use Case**: Large-scale dataset generation (100+ documents) **Pipeline Stages**: 1-19 (via worker.py) --- ## Request Parameters ```python class GenerateDocumentRequest: seed_images: List[HttpUrl] # 1-8 seed images from web URLs prompt_params: PromptParameters # Generation configuration class PromptParameters: # Core Parameters language: str = "english" # Document language doc_type: str = "invoice" # Document type (invoice, receipt, form, etc.) gt_type: str = "qa" # Ground truth format (qa, kie) gt_format: str = "json" # GT encoding (json, annotation) num_solutions: int = 1 # Documents per seed set # Feature Toggles (Stages 07-19) enable_handwriting: bool = False # Stage 07-09, 12 handwriting_ratio: float = 0.2 # Probabilistic filter (0.0-1.0) enable_visual_elements: bool = False # Stage 08, 10, 13 visual_element_types: List[str] = [] # Filter types: logo, photo, figure, barcode, etc. enable_ocr: bool = True # Stage 15 enable_bbox_normalization: bool = True # Stage 16 enable_gt_verification: bool = False # Stage 17 enable_analysis: bool = False # Stage 18 enable_debug_visualization: bool = False # Stage 19 enable_dataset_export: bool = False # Stage 19 (msgpack format) dataset_export_format: str = "msgpack" # Currently only msgpack supported # Reproducibility seed: Optional[int] = None # Random seed (null = random, int = reproducible) ``` --- ## Pipeline Architecture: The 19 Stages The API implements all 19 stages of the original batch pipeline in `docgenie/generation/`. Each stage is mapped to corresponding functions in `api/utils.py`. ### **Phase 1: Core Pipeline (Stages 01-06)** Generate base documents from seed images and LLM prompts. #### **Stage 01: Seed Selection & Download** - **Original**: `pipeline_01_select_seeds.py` - **API**: `download_seed_images()` in `api/utils.py:117-161` - **Process**: 1. Accept user-provided seed image URLs (1-8 images) 2. Download with retry logic (3 attempts, exponential backoff) 3. Handle transient HTTP errors (502, 503, 504, 429) 4. Convert to base64 for LLM input - **Error Handling**: Retry with 2s, 4s, 8s delays; raise HTTPException on failure #### **Stage 02: Prompt LLM** - **Original**: `pipeline_02_prompt_llm.py` - **API**: `call_claude_api_direct()` in `api/utils.py:550-600` - **Process**: 1. Load prompt template: `data/prompt_templates/ClaudeRefined12/seed-based-json.txt` 2. Build prompt with parameters: language, doc_type, gt_type, num_solutions 3. Call Claude API (Anthropic Messages API v1) - Model: `claude-3-5-sonnet-20241022` (configurable) - Max tokens: 16,000 - Temperature: 1.0 - Vision: Send base64-encoded seed images 4. Receive HTML documents with embedded ground truth - **LLM Output Format**: Multiple `...` blocks with: - CSS styling with page dimensions - HTML elements with semantic classes - Handwriting markers: `class="handwritten author1"` (author1, author2, etc.) - Visual element placeholders: `data-placeholder="logo"`, `data-content="company-logo"` - Ground truth: `` #### **Stage 03: Process Response & Extract HTML** - **Original**: `pipeline_03_process_response.py` - **API**: `extract_html_documents_from_response()` in `api/utils.py:605-635` - **Process**: 1. Parse LLM response for `...` blocks (regex) 2. Prettify HTML with BeautifulSoup 3. Validate HTML structure 4. Extract ground truth JSON from `