| # DocGenie API |
|
|
| FastAPI-based REST API for generating synthetic documents using LLMs. This API is **optimized for ML dataset creation** with comprehensive handwriting and visual element support. |
|
|
| ## Features |
|
|
| - π **Simple REST API** - Easy to integrate with any frontend |
| - πΌοΈ **URL-based seed images** - Provide seed images via URLs |
| - π¨ **Customizable prompts** - Control document type, language, and ground truth format |
| - βοΈ **Handwriting Generation** - WordStylist diffusion model with 339 author styles |
| - π― **Visual Elements** - Stamps, logos, barcodes, photos, figures |
| - π **ML-Ready Datasets** - Individual token images with complete metadata |
| - π **Complete output** - Returns PDF, HTML, CSS, and bounding boxes |
| - β‘ **Async processing** - Fast and efficient document generation |
|
|
| ## ML Dataset Creation |
|
|
| The API is **fully equipped for ML training dataset creation** with `output_detail: "dataset"` mode: |
|
|
| ### β
Handwriting Data |
| - **Individual token images**: Each handwriting field saved as separate PNG (`hw0.png`, `hw1.png`, ...) |
| - **Author style IDs**: 339 unique writer styles (0-338) for style-consistent generation |
| - **Text content**: Original text for each handwriting field |
| - **Position data**: Precise bounding boxes (x, y, width, height) in mm |
| - **Signature detection**: Boolean flag for signature vs regular handwriting |
| - **Image dimensions**: Width and height for each generated token |
|
|
| ### β
Visual Element Data |
| - **Stamps**: Generated with realistic textures, borders, and rotations |
| - Text content preserved |
| - Red/green color variants |
| - Circle/rectangle shapes |
| - **Logos**: Random selection from 6+ logo prefabs |
| - **Barcodes**: Code128 format with customizable content |
| - **Photos**: Random selection from 5+ photo prefabs |
| - **Figures/Charts**: Random selection from 6+ chart/diagram prefabs |
| - **Individual images**: Each element saved as separate PNG with transparency |
|
|
| ### β
Dataset Metadata |
| - **Token mapping JSON**: Complete mapping with: |
| - Token IDs and references |
| - Style IDs for handwriting |
| - Element types for visual elements |
| - Position rectangles |
| - Image filenames |
| - Content text |
| - **Ground truth annotations**: QA pairs, classification labels, NER tags |
| - **Bounding boxes**: Word, segment, and layout-level bboxes |
| - **Normalized coordinates**: [0,1] scaled for ML frameworks |
| - **Msgpack export**: Compatible with datadings library |
|
|
| ### β
Additional ML Features |
| - **OCR results**: Word-level bboxes and text for Document AI training |
| - **Layout elements**: Document structure annotations |
| - **Page dimensions**: Physical measurements (mm) and pixel dimensions |
| - **Reproducibility**: Seed-based generation for consistent results |
|
|
| ## Pipeline Overview |
|
|
| The API implements a simplified version of the DocGenie generation pipeline: |
|
|
| 1. **Download seed images** from URLs |
| 2. **Convert to base64** for LLM input |
| 3. **Build custom prompt** with user parameters |
| 4. **Call Claude API** to generate HTML documents |
| 5. **Extract HTML/CSS** and ground truth from response |
| 6. **Render to PDF** using Playwright |
| 7. **Extract bounding boxes** from PDF |
| 8. **Return results** as JSON with base64-encoded PDF |
|
|
| ## Installation |
|
|
| ### Prerequisites |
|
|
| - Python 3.10+ |
| - DocGenie main package installed |
| - Playwright browsers installed |
|
|
| ### Setup |
|
|
| 1. Install dependencies (all API dependencies are included in the main project): |
| ```bash |
| # Using uv (recommended) |
| uv sync |
| |
| # Or using pip |
| pip install -e . |
| |
| # Or install API-specific dependencies |
| cd api/ |
| pip install -r requirements.txt |
| ``` |
|
|
| **Note**: For async endpoint support, ensure you have: |
| - `redis>=5.0.0` and `rq>=1.15.0` (job queue) |
| - `supabase>=2.0.0` (database) |
| - `google-api-python-client>=2.100.0` (Google Drive integration) |
|
|
| 2. Install Playwright browsers: |
| ```bash |
| playwright install chromium |
| ``` |
|
|
| 3. Install Tesseract OCR (for local OCR support): |
| ```bash |
| # Ubuntu/Debian |
| sudo apt-get update && sudo apt-get install tesseract-ocr |
| |
| # macOS |
| brew install tesseract |
| |
| # Windows |
| # Download installer from: https://github.com/UB-Mannheim/tesseract/wiki |
| ``` |
|
|
| 4. Set your Anthropic API key: |
| ```bash |
| export ANTHROPIC_API_KEY="your-api-key-here" |
| ``` |
|
|
| 5. Configure OCR in `.env`: |
| ```bash |
| cp .env.example .env |
| # Edit .env and set: |
| OCR_SERVICE_ENABLED=true |
| OCR_USE_LOCAL=true # Use local Tesseract (recommended) |
| ``` |
|
|
| ## Running the API |
|
|
| ### Development Mode |
|
|
| ```bash |
| cd api |
| python main.py |
| ``` |
|
|
| The API will be available at `http://localhost:8000` |
|
|
| ### Production Mode |
|
|
| ```bash |
| cd api |
| uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4 |
| ``` |
|
|
| ## API Endpoints |
|
|
| ### Health Check |
|
|
| ```http |
| GET /health |
| ``` |
|
|
| **Response:** |
| ```json |
| { |
| "status": "healthy", |
| "version": "1.0.0" |
| } |
| ``` |
|
|
| ### Generate Documents |
|
|
| ```http |
| POST /generate |
| ``` |
|
|
| **Request Body:** |
| ```json |
| { |
| "seed_images": [ |
| "https://example.com/seed1.jpg", |
| "https://example.com/seed2.jpg" |
| ], |
| "prompt_params": { |
| "language": "English", |
| "doc_type": "business and administrative", |
| "gt_type": "Multiple questions about each document, with their answers taken **verbatim** from the document.", |
| "gt_format": "{\"<Text of question 1>\": \"<Answer to question 1>\", \"<Text of question 2>\": \"<Answer to question 2>\", ...}", |
| "num_solutions": 3 |
| }, |
| "model": "claude-sonnet-4-5-20250929", |
| "api_key": "optional-api-key" |
| } |
| ``` |
|
|
| **Response:** |
| ```json |
| { |
| "success": true, |
| "message": "Successfully generated 3 documents", |
| "total_documents": 3, |
| "documents": [ |
| { |
| "document_id": "uuid-123_0", |
| "html": "<!DOCTYPE html>...", |
| "css": "body { ... }", |
| "ground_truth": { |
| "What is the invoice number?": "INV-12345", |
| "What is the total amount?": "$1,234.56" |
| }, |
| "pdf_base64": "JVBERi0xLjQK...", |
| "bboxes": [ |
| { |
| "text": "Invoice", |
| "x": 0.1, |
| "y": 0.05, |
| "width": 0.2, |
| "height": 0.03, |
| "page": 0 |
| } |
| ], |
| "page_width_mm": 210.0, |
| "page_height_mm": 297.0 |
| } |
| ] |
| } |
| ``` |
|
|
| ### Generate Documents (Async) - **Recommended for Production** |
|
|
| ```http |
| POST /generate/async |
| ``` |
|
|
| **π― Cost Optimization**: This endpoint uses Claude's **Batch API** for **50% cost savings** ($2.50 vs $5.00 per 1M input tokens). |
|
|
| **β±οΈ Latency**: 5-30 minutes (vs 30-120 seconds for direct API) |
|
|
| **β
Best For**: Multi-user production systems with non-realtime requirements |
|
|
| **Request Body:** |
| ```json |
| { |
| "user_id": 123, |
| "seed_images": [ |
| "https://example.com/seed1.jpg", |
| "https://example.com/seed2.jpg" |
| ], |
| "prompt_params": { |
| "language": "English", |
| "doc_type": "business and administrative", |
| "num_solutions": 3, |
| "enable_handwriting": true, |
| "enable_visual_elements": true, |
| "enable_ocr": true, |
| "output_detail": "dataset" |
| } |
| } |
| ``` |
|
|
| **Response:** |
| ```json |
| { |
| "request_id": "550e8400-e29b-41d4-a716-446655440000", |
| "status": "queued", |
| "estimated_time_minutes": 10, |
| "poll_url": "/jobs/550e8400-e29b-41d4-a716-446655440000/status", |
| "created_at": "2025-01-15T12:00:00Z" |
| } |
| ``` |
|
|
| **Workflow:** |
| 1. Submit generation request β Get `request_id` |
| 2. Poll status endpoint every 30-60 seconds |
| 3. When `status: "completed"`, download from Google Drive |
| 4. Results uploaded to user's Google Drive with shareable link |
|
|
| ### Check Job Status |
|
|
| ```http |
| GET /jobs/{request_id}/status |
| ``` |
|
|
| **Response (Queued):** |
| ```json |
| { |
| "request_id": "550e8400-e29b-41d4-a716-446655440000", |
| "status": "queued", |
| "created_at": "2025-01-15T12:00:00Z", |
| "updated_at": "2025-01-15T12:00:00Z" |
| } |
| ``` |
|
|
| **Response (Processing):** |
| ```json |
| { |
| "request_id": "550e8400-e29b-41d4-a716-446655440000", |
| "status": "processing", |
| "created_at": "2025-01-15T12:00:00Z", |
| "updated_at": "2025-01-15T12:05:00Z", |
| "progress": "Creating batch request..." |
| } |
| ``` |
|
|
| **Response (Completed):** |
| ```json |
| { |
| "request_id": "550e8400-e29b-41d4-a716-446655440000", |
| "status": "completed", |
| "created_at": "2025-01-15T12:00:00Z", |
| "updated_at": "2025-01-15T12:15:00Z", |
| "download_url": "https://drive.google.com/file/d/abc123xyz/view?usp=sharing", |
| "file_size_mb": 15.4, |
| "document_count": 3 |
| } |
| ``` |
|
|
| **Response (Failed):** |
| ```json |
| { |
| "request_id": "550e8400-e29b-41d4-a716-446655440000", |
| "status": "failed", |
| "created_at": "2025-01-15T12:00:00Z", |
| "updated_at": "2025-01-15T12:08:00Z", |
| "error_message": "Batch processing timeout" |
| } |
| ``` |
|
|
| **Status Values:** |
| - `queued`: Job submitted, waiting for worker |
| - `processing`: Worker picked up job, creating batch |
| - `generating`: Batch submitted to Claude, waiting for completion |
| - `completed`: Documents generated and uploaded to Google Drive |
| - `failed`: Error occurred (see `error_message`) |
|
|
| ### List User Jobs |
|
|
| ```http |
| GET /jobs/user/{user_id}?limit=50&offset=0 |
| ``` |
|
|
| **Response:** |
| ```json |
| { |
| "user_id": 123, |
| "jobs": [ |
| { |
| "request_id": "550e8400-e29b-41d4-a716-446655440000", |
| "status": "completed", |
| "created_at": "2025-01-15T12:00:00Z", |
| "download_url": "https://drive.google.com/...", |
| "document_count": 3 |
| }, |
| { |
| "request_id": "660e8400-e29b-41d4-a716-446655440111", |
| "status": "processing", |
| "created_at": "2025-01-15T12:30:00Z" |
| } |
| ], |
| "count": 2, |
| "limit": 50, |
| "offset": 0 |
| } |
| ``` |
|
|
| ## Usage Examples |
|
|
| ### cURL |
|
|
| ```bash |
| curl -X POST http://localhost:8000/generate \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "seed_images": [ |
| "https://example.com/receipt1.jpg", |
| "https://example.com/receipt2.jpg" |
| ], |
| "prompt_params": { |
| "language": "English", |
| "doc_type": "receipts", |
| "num_solutions": 2 |
| } |
| }' |
| ``` |
|
|
| ### Python (Direct API) |
|
|
| ```python |
| import requests |
| import base64 |
| |
| response = requests.post( |
| "http://localhost:8000/generate", |
| json={ |
| "seed_images": [ |
| "https://example.com/seed1.jpg", |
| "https://example.com/seed2.jpg" |
| ], |
| "prompt_params": { |
| "language": "English", |
| "doc_type": "business forms", |
| "num_solutions": 3 |
| } |
| } |
| ) |
| |
| result = response.json() |
| |
| # Save first PDF |
| if result["success"]: |
| pdf_data = base64.b64decode(result["documents"][0]["pdf_base64"]) |
| with open("generated_doc.pdf", "wb") as f: |
| f.write(pdf_data) |
| ``` |
|
|
| ### Python (Async API with Polling) - **Recommended** |
|
|
| ```python |
| import requests |
| import time |
| |
| # Step 1: Submit job |
| response = requests.post( |
| "http://localhost:8000/generate/async", |
| json={ |
| "user_id": 123, |
| "seed_images": [ |
| "https://example.com/seed1.jpg", |
| "https://example.com/seed2.jpg" |
| ], |
| "prompt_params": { |
| "language": "English", |
| "doc_type": "receipts and invoices", |
| "num_solutions": 5, |
| "enable_handwriting": True, |
| "enable_visual_elements": True, |
| "enable_ocr": True, |
| "output_detail": "dataset" |
| } |
| } |
| ) |
| |
| job = response.json() |
| request_id = job["request_id"] |
| print(f"β Job submitted: {request_id}") |
| print(f" Estimated time: {job['estimated_time_minutes']} minutes") |
| |
| # Step 2: Poll status until complete |
| while True: |
| status_response = requests.get( |
| f"http://localhost:8000/jobs/{request_id}/status" |
| ) |
| status = status_response.json() |
| |
| print(f" Status: {status['status']}", end="") |
| if status.get("progress"): |
| print(f" - {status['progress']}") |
| else: |
| print() |
| |
| if status["status"] == "completed": |
| print(f"β Generation complete!") |
| print(f" Download: {status['download_url']}") |
| print(f" Size: {status.get('file_size_mb', 0):.1f} MB") |
| print(f" Documents: {status.get('document_count', 0)}") |
| break |
| elif status["status"] == "failed": |
| print(f"β Generation failed: {status.get('error_message')}") |
| break |
| |
| # Wait 30 seconds before next poll |
| time.sleep(30) |
| |
| # Step 3: Download from Google Drive (if completed) |
| if status["status"] == "completed": |
| # User can download from their Google Drive using the shareable link |
| print(f"\nDownload your documents at:\n{status['download_url']}") |
| ``` |
|
|
| ### JavaScript |
|
|
| ```javascript |
| const response = await fetch('http://localhost:8000/generate', { |
| method: 'POST', |
| headers: { |
| 'Content-Type': 'application/json', |
| }, |
| body: JSON.stringify({ |
| seed_images: [ |
| 'https://example.com/seed1.jpg', |
| 'https://example.com/seed2.jpg' |
| ], |
| prompt_params: { |
| language: 'English', |
| doc_type: 'invoices', |
| num_solutions: 2 |
| } |
| }) |
| }); |
| |
| const result = await response.json(); |
| |
| // Convert base64 PDF to blob |
| const pdfBlob = await fetch(`data:application/pdf;base64,${result.documents[0].pdf_base64}`) |
| .then(res => res.blob()); |
| ``` |
|
|
| ## Configuration |
|
|
| ### Prompt Parameters |
|
|
| - **language**: Language for generated documents (default: "English") |
| - **doc_type**: Type of documents to generate (e.g., "business and administrative", "receipts", "forms") |
| - **gt_type**: Description of ground truth type to generate |
| - **gt_format**: Format specification for ground truth JSON |
| - **num_solutions**: Number of document variations (1-5) |
|
|
| ### Stage 3-5 Advanced Features |
|
|
| The API supports advanced document synthesis and dataset packaging: |
|
|
| #### Stage 3: Handwriting & Visual Elements |
| - **enable_handwriting**: Add handwritten text using diffusion model (default: false) |
| - **handwriting_ratio**: Percentage of text to convert to handwriting 0-1 (default: 0.5) |
| - **enable_visual_elements**: Add stamps, barcodes, logos (default: false) |
| - **visual_element_types**: Types of elements to add: ["stamp", "logo", "figure", "barcode", "photo"] (default: all types) |
|
|
| #### Stage 4: OCR |
| - **enable_ocr**: Perform OCR on generated document (default: false) |
| - **ocr_language**: OCR language code (default: "en") |
|
|
| #### Stage 5: Dataset Packaging |
| - **enable_bbox_normalization**: Normalize bboxes to [0,1] scale (default: false) |
| - **enable_gt_verification**: Verify ground truth quality (default: false) |
| - **enable_analysis**: Generate dataset statistics (default: false) |
| - **enable_debug_visualization**: Create bbox overlay images (default: false) |
| |
| #### Dataset Export (Msgpack Format) |
| - **enable_dataset_export**: Export as msgpack dataset format (default: false) |
| - **dataset_export_format**: Export format - only "msgpack" is supported (default: "msgpack") |
| |
| **Note**: Only msgpack format is implemented in the current pipeline. COCO and HuggingFace export formats mentioned in some documentation are not yet available. |
| |
| #### Output Detail Level |
| - **output_detail**: Controls how much data is returned/saved (default: "minimal") |
| - `"minimal"` (default): Final outputs only (PDFs, images, metadata) - 2-5 MB per document |
| - `"dataset"`: Includes individual token images for ML training - 10-20 MB per document |
| - Individual handwriting token images (`handwriting_tokens/hw0.png`, ...) |
| - Individual visual element images (`visual_elements/logo_0.png`, ...) |
| - Token mapping JSON with style IDs and positions |
| - `"complete"`: All intermediate files and debug info - 20-50 MB per document |
| - Everything from `dataset` mode |
| - Intermediate PDFs from each processing stage |
| - Generation logs |
| - β οΈ **Warning**: Can result in 50+ MB JSON responses for `/generate` endpoint |
|
|
| **Recommendation**: Use `"minimal"` for production, `"dataset"` for ML research, `"complete"` for debugging (only with `/generate/pdf`). |
|
|
| **Example with dataset output detail:** |
| ```python |
| import requests |
| import base64 |
| import json |
| |
| # Generate ML training dataset |
| response = requests.post( |
| "http://localhost:8000/generate", |
| json={ |
| "seed_images": ["https://example.com/seed.jpg"], |
| "prompt_params": { |
| "language": "English", |
| "doc_type": "receipts and invoices", |
| "num_solutions": 5, |
| |
| # Enable handwriting and visual elements |
| "enable_handwriting": True, |
| "handwriting_ratio": 0.4, |
| "enable_visual_elements": True, |
| "visual_element_types": ["stamp", "logo", "figure", "barcode", "photo"], # All types by default |
| |
| # Enable dataset features |
| "enable_ocr": True, |
| "enable_bbox_normalization": True, |
| "enable_dataset_export": True, |
| |
| # IMPORTANT: Set output_detail to "dataset" for ML training |
| "output_detail": "dataset", |
| |
| # Use seed for reproducibility |
| "seed": 42 |
| } |
| } |
| ) |
| |
| result = response.json() |
| |
| # Process each generated document |
| for doc in result["documents"]: |
| doc_id = doc["document_id"] |
| print(f"\\nProcessing {doc_id}:") |
| |
| # 1. Save individual handwriting token images |
| if doc.get("handwriting_token_images"): |
| print(f" - Handwriting tokens: {len(doc['handwriting_token_images'])}") |
| for hw_id, img_b64 in doc["handwriting_token_images"].items(): |
| with open(f"dataset/{doc_id}/{hw_id}.png", "wb") as f: |
| f.write(base64.b64decode(img_b64)) |
| |
| # 2. Save individual visual element images |
| if doc.get("visual_element_images"): |
| print(f" - Visual elements: {len(doc['visual_element_images'])}") |
| for ve_id, img_b64 in doc["visual_element_images"].items(): |
| with open(f"dataset/{doc_id}/{ve_id}.png", "wb") as f: |
| f.write(base64.b64decode(img_b64)) |
| |
| # 3. Save token mapping for ML training |
| if doc.get("token_mapping"): |
| mapping = doc["token_mapping"] |
| print(f" - Mapping: {mapping['handwriting']['total_count']} HW + {mapping['visual_elements']['total_count']} VE") |
| with open(f"dataset/{doc_id}/token_mapping.json", "w") as f: |
| json.dump(mapping, f, indent=2) |
| |
| # 4. Save ground truth annotations |
| if doc.get("ground_truth"): |
| with open(f"dataset/{doc_id}/ground_truth.json", "w") as f: |
| json.dump(doc["ground_truth"], f, indent=2) |
| |
| # 5. Save bounding boxes (normalized coordinates) |
| if doc.get("normalized_bboxes_word"): |
| with open(f"dataset/{doc_id}/bboxes_normalized.json", "w") as f: |
| json.dump(doc["normalized_bboxes_word"], f, indent=2) |
| |
| # 6. Save final document image |
| if doc.get("image_base64"): |
| with open(f"dataset/{doc_id}/final_image.png", "wb") as f: |
| f.write(base64.b64decode(doc["image_base64"])) |
| |
| # 7. Save msgpack dataset file |
| if doc.get("dataset_export") and doc["dataset_export"].get("msgpack_base64"): |
| with open(f"dataset/{doc_id}/dataset.msgpack", "wb") as f: |
| f.write(base64.b64decode(doc["dataset_export"]["msgpack_base64"])) |
| |
| print(f"\\nβ
Generated {len(result['documents'])} ML-ready documents") |
| ``` |
|
|
| ### PDF Generation Endpoint (Recommended for Large Datasets) |
|
|
| For bulk generation with comprehensive file outputs, use `/generate/pdf`: |
|
|
| ```bash |
| curl -X POST http://localhost:8000/generate/pdf \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "seed_images": ["https://example.com/seed1.jpg"], |
| "prompt_params": { |
| "num_solutions": 3, |
| "enable_handwriting": true, |
| "enable_ocr": true, |
| "enable_bbox_normalization": true, |
| "enable_dataset_export": true, |
| "output_detail": "dataset" |
| } |
| }' \ |
| --output documents.zip |
| ``` |
|
|
| #### ZIP File Contents |
|
|
| Based on `output_detail` level: |
|
|
| **Minimal (default):** |
| - `document_<id>.pdf` - Generated PDF files |
| - `document_<id>/` - Per-document directories with: |
| - `document.html`, `document.css` - Source files |
| - `ground_truth.json`, `bboxes.json` - Annotations |
| - `final_image.png` - Final rendered image (if Stage 3 enabled) |
| - `handwriting_regions.json`, `visual_elements.json` - Stage 3 metadata (if enabled) |
| - `ocr_results.json` - OCR word-level data (if OCR enabled) |
| - `README.md` - Package documentation |
| - `metadata.json` - Combined metadata |
|
|
| **Dataset (for ML training):** |
| - All files from "minimal" level, plus: |
| - `handwriting_tokens/` - Individual token images (`hw0.png`, `hw1.png`, ...) |
| - `visual_elements/` - Individual element images (`logo_0.png`, `stamp_1.png`, ...) |
| - `token_mapping.json` - Complete mapping with style IDs and positions |
| - `dataset.msgpack` - Msgpack dataset file (if export enabled) |
| - `normalized_bboxes_word.json` - Normalized coordinates (if Stage 5 enabled) |
|
|
| **Complete (for debugging):** |
| - All files from "dataset" level, plus: |
| - Intermediate PDFs from each processing stage |
| - Generation logs with timing information |
| - `debug_visualization.png` - Bbox overlay images |
|
|
| ### Supported Models |
|
|
| - `claude-sonnet-4-5-20250929` (default, recommended) |
| - `claude-3-5-sonnet-20241022` |
|
|
| ### Environment Variables |
|
|
| - `ANTHROPIC_API_KEY`: Your Anthropic API key (required if not provided in request) |
|
|
| ## API Documentation |
|
|
| Interactive API documentation is available when the server is running: |
|
|
| - **Swagger UI**: http://localhost:8000/docs |
| - **ReDoc**: http://localhost:8000/redoc |
|
|
| ## Error Handling |
|
|
| The API returns appropriate HTTP status codes: |
|
|
| - `200 OK`: Successful generation |
| - `400 Bad Request`: Invalid input (e.g., invalid image URLs) |
| - `401 Unauthorized`: Missing or invalid API key |
| - `500 Internal Server Error`: Processing error |
|
|
| Error response format: |
| ```json |
| { |
| "detail": "Error message describing what went wrong" |
| } |
| ``` |
|
|
| ## Performance Considerations |
|
|
| - **Concurrent requests**: The API can handle multiple requests concurrently |
| - **Image size**: Larger seed images take longer to process |
| - **Number of solutions**: More solutions = longer processing time |
| - **Model selection**: Sonnet is slower but higher quality than Haiku |
|
|
| ## Limitations |
|
|
| - Maximum 10 seed images per request |
| - Maximum 5 document variations (`num_solutions`) |
| - Single-page documents only |
| - Timeout: 60 seconds per PDF render |
|
|
| ## Troubleshooting |
|
|
| ### Playwright browser not found |
|
|
| ```bash |
| playwright install chromium |
| ``` |
|
|
| ### API key not working |
|
|
| Make sure your API key is set correctly: |
| ```bash |
| echo $ANTHROPIC_API_KEY |
| ``` |
|
|
| ### PDF rendering fails |
|
|
| Ensure Chromium is installed and accessible: |
| ```bash |
| playwright show-trace |
| ``` |
|
|
| ## Integration with Frontend |
|
|
| Example React integration: |
|
|
| ```jsx |
| const [loading, setLoading] = useState(false); |
| const [result, setResult] = useState(null); |
| |
| const generateDocuments = async () => { |
| setLoading(true); |
| |
| try { |
| const response = await fetch('http://localhost:8000/generate', { |
| method: 'POST', |
| headers: { 'Content-Type': 'application/json' }, |
| body: JSON.stringify({ |
| seed_images: seedImageUrls, |
| prompt_params: { |
| language: 'English', |
| doc_type: documentType, |
| num_solutions: 3 |
| } |
| }) |
| }); |
| |
| const data = await response.json(); |
| setResult(data); |
| } catch (error) { |
| console.error('Generation failed:', error); |
| } finally { |
| setLoading(false); |
| } |
| }; |
| ``` |
|
|
| ### React Integration (Async API with Progress) |
|
|
| ```jsx |
| import { useState, useEffect } from 'react'; |
| |
| function DocumentGenerator({ userId, seedImages }) { |
| const [requestId, setRequestId] = useState(null); |
| const [status, setStatus] = useState(null); |
| const [progress, setProgress] = useState(0); |
| |
| // Submit job |
| const handleGenerate = async () => { |
| const response = await fetch('http://localhost:8000/generate/async', { |
| method: 'POST', |
| headers: { 'Content-Type': 'application/json' }, |
| body: JSON.stringify({ |
| user_id: userId, |
| seed_images: seedImages, |
| prompt_params: { |
| language: 'English', |
| doc_type: 'receipts', |
| num_solutions: 3, |
| enable_handwriting: true, |
| output_detail: 'dataset' |
| } |
| }) |
| }); |
| |
| const job = await response.json(); |
| setRequestId(job.request_id); |
| setStatus('queued'); |
| }; |
| |
| // Poll job status |
| useEffect(() => { |
| if (!requestId || status === 'completed' || status === 'failed') return; |
| |
| const interval = setInterval(async () => { |
| const response = await fetch(`http://localhost:8000/jobs/${requestId}/status`); |
| const jobStatus = await response.json(); |
| |
| setStatus(jobStatus.status); |
| |
| // Update progress bar |
| const progressMap = { |
| 'queued': 10, |
| 'processing': 30, |
| 'generating': 60, |
| 'completed': 100, |
| 'failed': 0 |
| }; |
| setProgress(progressMap[jobStatus.status] || 0); |
| |
| if (jobStatus.status === 'completed') { |
| // Open Google Drive download link |
| window.open(jobStatus.download_url, '_blank'); |
| } |
| }, 30000); // Poll every 30 seconds |
| |
| return () => clearInterval(interval); |
| }, [requestId, status]); |
| |
| return ( |
| <div> |
| <button onClick={handleGenerate} disabled={status && status !== 'completed'}> |
| Generate Documents |
| </button> |
| |
| {status && ( |
| <div className="progress-container"> |
| <div className="progress-bar" style={{ width: `${progress}%` }} /> |
| <p>Status: {status}</p> |
| {status === 'completed' && ( |
| <a href={`http://localhost:8000/jobs/${requestId}/status`}> |
| Download Results |
| </a> |
| )} |
| </div> |
| )} |
| </div> |
| ); |
| } |
| ``` |
|
|
| ## Background Processing Setup |
|
|
| The async endpoints (`/generate/async`) require a background worker system for job processing. |
|
|
| ### Prerequisites |
|
|
| 1. **Redis** - Job queue storage |
| 2. **Supabase** - Database for job tracking and user data |
| 3. **Google Drive OAuth** - For uploading results to user's Drive |
|
|
| ### Installing Redis |
|
|
| **Ubuntu/Debian:** |
| ```bash |
| sudo apt-get update |
| sudo apt-get install redis-server |
| sudo systemctl start redis |
| sudo systemctl enable redis |
| ``` |
|
|
| **macOS:** |
| ```bash |
| brew install redis |
| brew services start redis |
| ``` |
|
|
| **Docker:** |
| ```bash |
| docker run -d -p 6379:6379 --name redis redis:7-alpine |
| ``` |
|
|
| **Verify Redis is running:** |
| ```bash |
| redis-cli ping |
| # Should return: PONG |
| ``` |
|
|
| ### Configuring Supabase |
|
|
| 1. Create a Supabase project at [supabase.com](https://supabase.com) |
|
|
| 2. Create the required tables in your Supabase SQL Editor: |
|
|
| ```sql |
| -- Document generation requests |
| CREATE TABLE document_requests ( |
| id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), |
| user_id INTEGER NOT NULL, |
| status TEXT NOT NULL CHECK (status IN ('queued', 'processing', 'generating', 'completed', 'failed')), |
| request_metadata JSONB NOT NULL, |
| error_message TEXT, |
| created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), |
| updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() |
| ); |
| |
| -- Generated documents |
| CREATE TABLE generated_documents ( |
| id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), |
| request_id UUID NOT NULL REFERENCES document_requests(id), |
| document_id TEXT NOT NULL, |
| file_url TEXT, |
| zip_url TEXT, |
| file_size_mb DECIMAL, |
| created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() |
| ); |
| |
| -- User integrations (Google Drive OAuth) |
| CREATE TABLE user_integrations ( |
| id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), |
| user_id INTEGER NOT NULL, |
| integration_type TEXT NOT NULL CHECK (integration_type IN ('google_drive', 'dropbox')), |
| access_token TEXT NOT NULL, |
| refresh_token TEXT, |
| token_expiry TIMESTAMPTZ, |
| created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), |
| updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), |
| UNIQUE(user_id, integration_type) |
| ); |
| |
| -- Analytics events |
| CREATE TABLE analytics_events ( |
| id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), |
| user_id INTEGER, |
| event_type TEXT NOT NULL, |
| entity_id UUID, |
| event_data JSONB, |
| created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() |
| ); |
| |
| -- Indexes for performance |
| CREATE INDEX idx_document_requests_user_id ON document_requests(user_id); |
| CREATE INDEX idx_document_requests_status ON document_requests(status); |
| CREATE INDEX idx_generated_documents_request_id ON generated_documents(request_id); |
| CREATE INDEX idx_user_integrations_user_id ON user_integrations(user_id); |
| CREATE INDEX idx_analytics_events_user_id ON analytics_events(user_id); |
| ``` |
|
|
| 3. Add your Supabase credentials to `.env`: |
|
|
| ```bash |
| # In api/.env |
| SUPABASE_URL=https://your-project-ref.supabase.co |
| SUPABASE_KEY=your-anon-or-service-role-key |
| ``` |
|
|
| ### Configuring Google Drive OAuth |
|
|
| Users need to connect their Google Drive account for result storage: |
|
|
| 1. Create a Google Cloud Project at [console.cloud.google.com](https://console.cloud.google.com) |
| 2. Enable Google Drive API |
| 3. Create OAuth 2.0 credentials (Web application) |
| 4. Add authorized redirect URIs (e.g., `http://localhost:3000/auth/google/callback`) |
| 5. Download credentials JSON |
|
|
| 6. Users authenticate via OAuth flow (implement in your frontend): |
|
|
| ```python |
| # Example OAuth flow (implement in your auth system) |
| from google_auth_oauthlib.flow import Flow |
| |
| flow = Flow.from_client_config( |
| client_config={ |
| "web": { |
| "client_id": "YOUR_CLIENT_ID", |
| "client_secret": "YOUR_CLIENT_SECRET", |
| "auth_uri": "https://accounts.google.com/o/oauth2/auth", |
| "token_uri": "https://oauth2.googleapis.com/token", |
| "redirect_uris": ["http://localhost:3000/auth/google/callback"] |
| } |
| }, |
| scopes=["https://www.googleapis.com/auth/drive.file"] |
| ) |
| |
| # User visits auth URL, gets redirected back with code |
| authorization_url, state = flow.authorization_url(access_type='offline', include_granted_scopes='true') |
| |
| # Exchange code for tokens |
| flow.fetch_token(code=authorization_code) |
| credentials = flow.credentials |
| |
| # Store in Supabase user_integrations table |
| supabase.table('user_integrations').insert({ |
| 'user_id': user_id, |
| 'integration_type': 'google_drive', |
| 'access_token': credentials.token, |
| 'refresh_token': credentials.refresh_token, |
| 'token_expiry': credentials.expiry |
| }).execute() |
| ``` |
|
|
| ### Starting the Background Worker |
|
|
| 1. Configure environment variables in `api/.env`: |
|
|
| ```bash |
| # Redis Configuration |
| REDIS_URL=redis://localhost:6379/0 |
| RQ_QUEUE_NAME=docgenie |
| |
| # Batch Processing |
| BATCH_POLL_INTERVAL=30 # seconds |
| BATCH_DATA_DIR=/tmp/docgenie_batches |
| MESSAGE_DATA_DIR=/tmp/docgenie_messages |
| |
| # Google Drive |
| GOOGLE_DRIVE_FOLDER_NAME=DocGenie Documents |
| |
| # Supabase (already configured above) |
| SUPABASE_URL=https://your-project.supabase.co |
| SUPABASE_KEY=your_key_here |
| |
| # Claude API |
| ANTHROPIC_API_KEY=your_api_key_here |
| ``` |
|
|
| 2. Start the worker: |
|
|
| ```bash |
| cd api/ |
| ./start_worker.sh |
| ``` |
|
|
| The worker will: |
| - β Check Redis connection |
| - β Validate Supabase configuration |
| - β Verify Claude API key |
| - β Create temporary directories |
| - β Start RQ worker listening on `docgenie` queue |
|
|
| **Output:** |
| ``` |
| π Starting DocGenie RQ Worker... |
| β Loading .env file... |
| β Redis connected |
| β Supabase configured |
| β Claude API key configured |
| β Temporary directories created |
| |
| ============================================ |
| Worker Configuration: |
| Queue: docgenie |
| Redis: redis://localhost:6379/0 |
| Batch Data: /tmp/docgenie_batches |
| Message Data: /tmp/docgenie_messages |
| ============================================ |
| |
| β
Starting RQ worker (press Ctrl+C to stop)... |
| |
| 12:00:00 RQ worker 'worker-abc123' started on docgenie queue |
| ``` |
|
|
| ### Running Multiple Workers (Production) |
|
|
| For production systems with high load, run multiple workers: |
|
|
| ```bash |
| # Terminal 1 |
| ./start_worker.sh |
| |
| # Terminal 2 |
| ./start_worker.sh |
| |
| # Terminal 3 |
| ./start_worker.sh |
| ``` |
|
|
| Each worker processes jobs independently from the same queue. |
|
|
| **For detailed scaling instructions**, see [SCALING.md](SCALING.md). |
|
|
| ### Monitoring Workers |
|
|
| ```bash |
| # View worker status |
| rq info --url redis://localhost:6379/0 |
| |
| # View queue status |
| rq info --queue docgenie --url redis://localhost:6379/0 |
| |
| # View failed jobs |
| rq info --queue failed --url redis://localhost:6379/0 |
| ``` |
|
|
| ### Architecture Overview |
|
|
| ``` |
| βββββββββββββββ βββββββββββββββ βββββββββββββββββββ |
| β FastAPI βββββββββΆβ Redis ββββββββββ RQ Workers β |
| β Server β β Queue β β (1-5 instances)β |
| β β β β β β |
| β /generate/ β β Job Queue: β β β’ Downloads β |
| β async β β - queued β β β’ Claude Batch β |
| β β β - pending β β β’ PDF render β |
| β /jobs/ β β - active β β β’ Handwriting β |
| β {id}/ β β β β β’ OCR β |
| β status β β β β β’ ZIP creation β |
| ββββββββ¬βββββββ βββββββββββββββ ββββββββββ¬βββββββββ |
| β β |
| β β |
| βΌ βΌ |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β Supabase β |
| β β’ document_requests (job tracking) β |
| β β’ generated_documents (results metadata) β |
| β β’ user_integrations (Google Drive OAuth) β |
| β β’ analytics_events (usage tracking) β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β |
| β Upload Results |
| βΌ |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β Google Drive β |
| β β’ User's "DocGenie Documents" folder β |
| β β’ ZIP files with generated documents β |
| β β’ Shareable links returned to API β |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
|
|
| ### Cost Comparison: Direct vs Batched API |
|
|
| | API Type | Cost (Input) | Cost (Output) | Latency | Use Case | |
| |----------|-------------|---------------|---------|----------| |
| | Direct | $5.00/1M tokens | $15.00/1M tokens | 30-120s | Real-time, interactive | |
| | **Batched** | **$2.50/1M tokens** | **$7.50/1M tokens** | 5-30 min | **Background jobs (recommended)** | |
|
|
| **Example Cost Calculation:** |
| - Generate 100 documents per day |
| - Each request: 5,000 input tokens, 10,000 output tokens |
|
|
| **Direct API Cost:** |
| - Input: (100 Γ 5,000 / 1M) Γ $5.00 = $2.50/day |
| - Output: (100 Γ 10,000 / 1M) Γ $15.00 = $15.00/day |
| - **Total: $17.50/day = $525/month** |
|
|
| **Batched API Cost:** |
| - Input: (100 Γ 5,000 / 1M) Γ $2.50 = $1.25/day |
| - Output: (100 Γ 10,000 / 1M) Γ $7.50 = $7.50/day |
| - **Total: $8.75/day = $262.50/month** |
|
|
| **π° Savings: $262.50/month (50% reduction)** |
|
|
| ## Scaling Workers |
|
|
| The API uses Redis Queue (RQ) workers for background job processing. Scale workers based on load: |
|
|
| | User Load | Workers | Redis RAM | Notes | |
| |-----------|---------|-----------|-------| |
| | < 10 req/hr | 1 | 256 MB | Development | |
| | 10β50 req/hr | 2β3 | 512 MB | Small production | |
| | 50β200 req/hr | 3β5 | 1 GB | Medium production | |
| | > 200 req/hr | 5+ | 2+ GB | Large production | |
|
|
| ### Starting Workers |
|
|
| ```bash |
| # Single worker (development) |
| ./start_worker.sh |
| |
| # Multiple workers (production) β run in separate terminals |
| ./start_worker.sh # Terminal 1 |
| ./start_worker.sh # Terminal 2 |
| |
| # Docker Compose β scale to 3 workers |
| docker-compose up --scale worker=3 |
| |
| # Monitor |
| rq info --url redis://localhost:6379/0 |
| rq info --queue docgenie --url redis://localhost:6379/0 |
| ``` |
|
|
| ### Railway Multi-Worker (Separate Service) |
| 1. Railway dashboard β New Service β GitHub Repo (same repo) |
| 2. Name: `docgenie-worker` |
| 3. Custom Start Command: `rq worker --url $REDIS_URL` |
| 4. Add the same environment variables as the API service |
|
|
| > For most use cases the **combined** mode (API + worker in one service, see `railway.json`) is sufficient and cheaper. |
|
|
| ## Contributing |
|
|
| This API is a simplified interface to the DocGenie pipeline. For the full pipeline with all features, see the main DocGenie documentation. |
|
|
| ## License |
|
|
| Same as DocGenie main project. |
|
|