# DocGenie API FastAPI-based REST API for generating synthetic documents using LLMs. This API is **optimized for ML dataset creation** with comprehensive handwriting and visual element support. ## Features - πŸš€ **Simple REST API** - Easy to integrate with any frontend - πŸ–ΌοΈ **URL-based seed images** - Provide seed images via URLs - 🎨 **Customizable prompts** - Control document type, language, and ground truth format - ✍️ **Handwriting Generation** - WordStylist diffusion model with 339 author styles - 🎯 **Visual Elements** - Stamps, logos, barcodes, photos, figures - πŸ“Š **ML-Ready Datasets** - Individual token images with complete metadata - πŸ“„ **Complete output** - Returns PDF, HTML, CSS, and bounding boxes - ⚑ **Async processing** - Fast and efficient document generation ## ML Dataset Creation The API is **fully equipped for ML training dataset creation** with `output_detail: "dataset"` mode: ### βœ… Handwriting Data - **Individual token images**: Each handwriting field saved as separate PNG (`hw0.png`, `hw1.png`, ...) - **Author style IDs**: 339 unique writer styles (0-338) for style-consistent generation - **Text content**: Original text for each handwriting field - **Position data**: Precise bounding boxes (x, y, width, height) in mm - **Signature detection**: Boolean flag for signature vs regular handwriting - **Image dimensions**: Width and height for each generated token ### βœ… Visual Element Data - **Stamps**: Generated with realistic textures, borders, and rotations - Text content preserved - Red/green color variants - Circle/rectangle shapes - **Logos**: Random selection from 6+ logo prefabs - **Barcodes**: Code128 format with customizable content - **Photos**: Random selection from 5+ photo prefabs - **Figures/Charts**: Random selection from 6+ chart/diagram prefabs - **Individual images**: Each element saved as separate PNG with transparency ### βœ… Dataset Metadata - **Token mapping JSON**: Complete mapping with: - Token IDs and references - Style IDs for handwriting - Element types for visual elements - Position rectangles - Image filenames - Content text - **Ground truth annotations**: QA pairs, classification labels, NER tags - **Bounding boxes**: Word, segment, and layout-level bboxes - **Normalized coordinates**: [0,1] scaled for ML frameworks - **Msgpack export**: Compatible with datadings library ### βœ… Additional ML Features - **OCR results**: Word-level bboxes and text for Document AI training - **Layout elements**: Document structure annotations - **Page dimensions**: Physical measurements (mm) and pixel dimensions - **Reproducibility**: Seed-based generation for consistent results ## Pipeline Overview The API implements a simplified version of the DocGenie generation pipeline: 1. **Download seed images** from URLs 2. **Convert to base64** for LLM input 3. **Build custom prompt** with user parameters 4. **Call Claude API** to generate HTML documents 5. **Extract HTML/CSS** and ground truth from response 6. **Render to PDF** using Playwright 7. **Extract bounding boxes** from PDF 8. **Return results** as JSON with base64-encoded PDF ## Installation ### Prerequisites - Python 3.10+ - DocGenie main package installed - Playwright browsers installed ### Setup 1. Install dependencies (all API dependencies are included in the main project): ```bash # Using uv (recommended) uv sync # Or using pip pip install -e . # Or install API-specific dependencies cd api/ pip install -r requirements.txt ``` **Note**: For async endpoint support, ensure you have: - `redis>=5.0.0` and `rq>=1.15.0` (job queue) - `supabase>=2.0.0` (database) - `google-api-python-client>=2.100.0` (Google Drive integration) 2. Install Playwright browsers: ```bash playwright install chromium ``` 3. Install Tesseract OCR (for local OCR support): ```bash # Ubuntu/Debian sudo apt-get update && sudo apt-get install tesseract-ocr # macOS brew install tesseract # Windows # Download installer from: https://github.com/UB-Mannheim/tesseract/wiki ``` 4. Set your Anthropic API key: ```bash export ANTHROPIC_API_KEY="your-api-key-here" ``` 5. Configure OCR in `.env`: ```bash cp .env.example .env # Edit .env and set: OCR_SERVICE_ENABLED=true OCR_USE_LOCAL=true # Use local Tesseract (recommended) ``` ## Running the API ### Development Mode ```bash cd api python main.py ``` The API will be available at `http://localhost:8000` ### Production Mode ```bash cd api uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4 ``` ## API Endpoints ### Health Check ```http GET /health ``` **Response:** ```json { "status": "healthy", "version": "1.0.0" } ``` ### Generate Documents ```http POST /generate ``` **Request Body:** ```json { "seed_images": [ "https://example.com/seed1.jpg", "https://example.com/seed2.jpg" ], "prompt_params": { "language": "English", "doc_type": "business and administrative", "gt_type": "Multiple questions about each document, with their answers taken **verbatim** from the document.", "gt_format": "{\"\": \"\", \"\": \"\", ...}", "num_solutions": 3 }, "model": "claude-sonnet-4-5-20250929", "api_key": "optional-api-key" } ``` **Response:** ```json { "success": true, "message": "Successfully generated 3 documents", "total_documents": 3, "documents": [ { "document_id": "uuid-123_0", "html": "...", "css": "body { ... }", "ground_truth": { "What is the invoice number?": "INV-12345", "What is the total amount?": "$1,234.56" }, "pdf_base64": "JVBERi0xLjQK...", "bboxes": [ { "text": "Invoice", "x": 0.1, "y": 0.05, "width": 0.2, "height": 0.03, "page": 0 } ], "page_width_mm": 210.0, "page_height_mm": 297.0 } ] } ``` ### Generate Documents (Async) - **Recommended for Production** ```http POST /generate/async ``` **🎯 Cost Optimization**: This endpoint uses Claude's **Batch API** for **50% cost savings** ($2.50 vs $5.00 per 1M input tokens). **⏱️ Latency**: 5-30 minutes (vs 30-120 seconds for direct API) **βœ… Best For**: Multi-user production systems with non-realtime requirements **Request Body:** ```json { "user_id": 123, "seed_images": [ "https://example.com/seed1.jpg", "https://example.com/seed2.jpg" ], "prompt_params": { "language": "English", "doc_type": "business and administrative", "num_solutions": 3, "enable_handwriting": true, "enable_visual_elements": true, "enable_ocr": true, "output_detail": "dataset" } } ``` **Response:** ```json { "request_id": "550e8400-e29b-41d4-a716-446655440000", "status": "queued", "estimated_time_minutes": 10, "poll_url": "/jobs/550e8400-e29b-41d4-a716-446655440000/status", "created_at": "2025-01-15T12:00:00Z" } ``` **Workflow:** 1. Submit generation request β†’ Get `request_id` 2. Poll status endpoint every 30-60 seconds 3. When `status: "completed"`, download from Google Drive 4. Results uploaded to user's Google Drive with shareable link ### Check Job Status ```http GET /jobs/{request_id}/status ``` **Response (Queued):** ```json { "request_id": "550e8400-e29b-41d4-a716-446655440000", "status": "queued", "created_at": "2025-01-15T12:00:00Z", "updated_at": "2025-01-15T12:00:00Z" } ``` **Response (Processing):** ```json { "request_id": "550e8400-e29b-41d4-a716-446655440000", "status": "processing", "created_at": "2025-01-15T12:00:00Z", "updated_at": "2025-01-15T12:05:00Z", "progress": "Creating batch request..." } ``` **Response (Completed):** ```json { "request_id": "550e8400-e29b-41d4-a716-446655440000", "status": "completed", "created_at": "2025-01-15T12:00:00Z", "updated_at": "2025-01-15T12:15:00Z", "download_url": "https://drive.google.com/file/d/abc123xyz/view?usp=sharing", "file_size_mb": 15.4, "document_count": 3 } ``` **Response (Failed):** ```json { "request_id": "550e8400-e29b-41d4-a716-446655440000", "status": "failed", "created_at": "2025-01-15T12:00:00Z", "updated_at": "2025-01-15T12:08:00Z", "error_message": "Batch processing timeout" } ``` **Status Values:** - `queued`: Job submitted, waiting for worker - `processing`: Worker picked up job, creating batch - `generating`: Batch submitted to Claude, waiting for completion - `completed`: Documents generated and uploaded to Google Drive - `failed`: Error occurred (see `error_message`) ### List User Jobs ```http GET /jobs/user/{user_id}?limit=50&offset=0 ``` **Response:** ```json { "user_id": 123, "jobs": [ { "request_id": "550e8400-e29b-41d4-a716-446655440000", "status": "completed", "created_at": "2025-01-15T12:00:00Z", "download_url": "https://drive.google.com/...", "document_count": 3 }, { "request_id": "660e8400-e29b-41d4-a716-446655440111", "status": "processing", "created_at": "2025-01-15T12:30:00Z" } ], "count": 2, "limit": 50, "offset": 0 } ``` ## Usage Examples ### cURL ```bash curl -X POST http://localhost:8000/generate \ -H "Content-Type: application/json" \ -d '{ "seed_images": [ "https://example.com/receipt1.jpg", "https://example.com/receipt2.jpg" ], "prompt_params": { "language": "English", "doc_type": "receipts", "num_solutions": 2 } }' ``` ### Python (Direct API) ```python import requests import base64 response = requests.post( "http://localhost:8000/generate", json={ "seed_images": [ "https://example.com/seed1.jpg", "https://example.com/seed2.jpg" ], "prompt_params": { "language": "English", "doc_type": "business forms", "num_solutions": 3 } } ) result = response.json() # Save first PDF if result["success"]: pdf_data = base64.b64decode(result["documents"][0]["pdf_base64"]) with open("generated_doc.pdf", "wb") as f: f.write(pdf_data) ``` ### Python (Async API with Polling) - **Recommended** ```python import requests import time # Step 1: Submit job response = requests.post( "http://localhost:8000/generate/async", json={ "user_id": 123, "seed_images": [ "https://example.com/seed1.jpg", "https://example.com/seed2.jpg" ], "prompt_params": { "language": "English", "doc_type": "receipts and invoices", "num_solutions": 5, "enable_handwriting": True, "enable_visual_elements": True, "enable_ocr": True, "output_detail": "dataset" } } ) job = response.json() request_id = job["request_id"] print(f"βœ“ Job submitted: {request_id}") print(f" Estimated time: {job['estimated_time_minutes']} minutes") # Step 2: Poll status until complete while True: status_response = requests.get( f"http://localhost:8000/jobs/{request_id}/status" ) status = status_response.json() print(f" Status: {status['status']}", end="") if status.get("progress"): print(f" - {status['progress']}") else: print() if status["status"] == "completed": print(f"βœ“ Generation complete!") print(f" Download: {status['download_url']}") print(f" Size: {status.get('file_size_mb', 0):.1f} MB") print(f" Documents: {status.get('document_count', 0)}") break elif status["status"] == "failed": print(f"βœ— Generation failed: {status.get('error_message')}") break # Wait 30 seconds before next poll time.sleep(30) # Step 3: Download from Google Drive (if completed) if status["status"] == "completed": # User can download from their Google Drive using the shareable link print(f"\nDownload your documents at:\n{status['download_url']}") ``` ### JavaScript ```javascript const response = await fetch('http://localhost:8000/generate', { method: 'POST', headers: { 'Content-Type': 'application/json', }, body: JSON.stringify({ seed_images: [ 'https://example.com/seed1.jpg', 'https://example.com/seed2.jpg' ], prompt_params: { language: 'English', doc_type: 'invoices', num_solutions: 2 } }) }); const result = await response.json(); // Convert base64 PDF to blob const pdfBlob = await fetch(`data:application/pdf;base64,${result.documents[0].pdf_base64}`) .then(res => res.blob()); ``` ## Configuration ### Prompt Parameters - **language**: Language for generated documents (default: "English") - **doc_type**: Type of documents to generate (e.g., "business and administrative", "receipts", "forms") - **gt_type**: Description of ground truth type to generate - **gt_format**: Format specification for ground truth JSON - **num_solutions**: Number of document variations (1-5) ### Stage 3-5 Advanced Features The API supports advanced document synthesis and dataset packaging: #### Stage 3: Handwriting & Visual Elements - **enable_handwriting**: Add handwritten text using diffusion model (default: false) - **handwriting_ratio**: Percentage of text to convert to handwriting 0-1 (default: 0.5) - **enable_visual_elements**: Add stamps, barcodes, logos (default: false) - **visual_element_types**: Types of elements to add: ["stamp", "logo", "figure", "barcode", "photo"] (default: all types) #### Stage 4: OCR - **enable_ocr**: Perform OCR on generated document (default: false) - **ocr_language**: OCR language code (default: "en") #### Stage 5: Dataset Packaging - **enable_bbox_normalization**: Normalize bboxes to [0,1] scale (default: false) - **enable_gt_verification**: Verify ground truth quality (default: false) - **enable_analysis**: Generate dataset statistics (default: false) - **enable_debug_visualization**: Create bbox overlay images (default: false) #### Dataset Export (Msgpack Format) - **enable_dataset_export**: Export as msgpack dataset format (default: false) - **dataset_export_format**: Export format - only "msgpack" is supported (default: "msgpack") **Note**: Only msgpack format is implemented in the current pipeline. COCO and HuggingFace export formats mentioned in some documentation are not yet available. #### Output Detail Level - **output_detail**: Controls how much data is returned/saved (default: "minimal") - `"minimal"` (default): Final outputs only (PDFs, images, metadata) - 2-5 MB per document - `"dataset"`: Includes individual token images for ML training - 10-20 MB per document - Individual handwriting token images (`handwriting_tokens/hw0.png`, ...) - Individual visual element images (`visual_elements/logo_0.png`, ...) - Token mapping JSON with style IDs and positions - `"complete"`: All intermediate files and debug info - 20-50 MB per document - Everything from `dataset` mode - Intermediate PDFs from each processing stage - Generation logs - ⚠️ **Warning**: Can result in 50+ MB JSON responses for `/generate` endpoint **Recommendation**: Use `"minimal"` for production, `"dataset"` for ML research, `"complete"` for debugging (only with `/generate/pdf`). **Example with dataset output detail:** ```python import requests import base64 import json # Generate ML training dataset response = requests.post( "http://localhost:8000/generate", json={ "seed_images": ["https://example.com/seed.jpg"], "prompt_params": { "language": "English", "doc_type": "receipts and invoices", "num_solutions": 5, # Enable handwriting and visual elements "enable_handwriting": True, "handwriting_ratio": 0.4, "enable_visual_elements": True, "visual_element_types": ["stamp", "logo", "figure", "barcode", "photo"], # All types by default # Enable dataset features "enable_ocr": True, "enable_bbox_normalization": True, "enable_dataset_export": True, # IMPORTANT: Set output_detail to "dataset" for ML training "output_detail": "dataset", # Use seed for reproducibility "seed": 42 } } ) result = response.json() # Process each generated document for doc in result["documents"]: doc_id = doc["document_id"] print(f"\\nProcessing {doc_id}:") # 1. Save individual handwriting token images if doc.get("handwriting_token_images"): print(f" - Handwriting tokens: {len(doc['handwriting_token_images'])}") for hw_id, img_b64 in doc["handwriting_token_images"].items(): with open(f"dataset/{doc_id}/{hw_id}.png", "wb") as f: f.write(base64.b64decode(img_b64)) # 2. Save individual visual element images if doc.get("visual_element_images"): print(f" - Visual elements: {len(doc['visual_element_images'])}") for ve_id, img_b64 in doc["visual_element_images"].items(): with open(f"dataset/{doc_id}/{ve_id}.png", "wb") as f: f.write(base64.b64decode(img_b64)) # 3. Save token mapping for ML training if doc.get("token_mapping"): mapping = doc["token_mapping"] print(f" - Mapping: {mapping['handwriting']['total_count']} HW + {mapping['visual_elements']['total_count']} VE") with open(f"dataset/{doc_id}/token_mapping.json", "w") as f: json.dump(mapping, f, indent=2) # 4. Save ground truth annotations if doc.get("ground_truth"): with open(f"dataset/{doc_id}/ground_truth.json", "w") as f: json.dump(doc["ground_truth"], f, indent=2) # 5. Save bounding boxes (normalized coordinates) if doc.get("normalized_bboxes_word"): with open(f"dataset/{doc_id}/bboxes_normalized.json", "w") as f: json.dump(doc["normalized_bboxes_word"], f, indent=2) # 6. Save final document image if doc.get("image_base64"): with open(f"dataset/{doc_id}/final_image.png", "wb") as f: f.write(base64.b64decode(doc["image_base64"])) # 7. Save msgpack dataset file if doc.get("dataset_export") and doc["dataset_export"].get("msgpack_base64"): with open(f"dataset/{doc_id}/dataset.msgpack", "wb") as f: f.write(base64.b64decode(doc["dataset_export"]["msgpack_base64"])) print(f"\\nβœ… Generated {len(result['documents'])} ML-ready documents") ``` ### PDF Generation Endpoint (Recommended for Large Datasets) For bulk generation with comprehensive file outputs, use `/generate/pdf`: ```bash curl -X POST http://localhost:8000/generate/pdf \ -H "Content-Type: application/json" \ -d '{ "seed_images": ["https://example.com/seed1.jpg"], "prompt_params": { "num_solutions": 3, "enable_handwriting": true, "enable_ocr": true, "enable_bbox_normalization": true, "enable_dataset_export": true, "output_detail": "dataset" } }' \ --output documents.zip ``` #### ZIP File Contents Based on `output_detail` level: **Minimal (default):** - `document_.pdf` - Generated PDF files - `document_/` - Per-document directories with: - `document.html`, `document.css` - Source files - `ground_truth.json`, `bboxes.json` - Annotations - `final_image.png` - Final rendered image (if Stage 3 enabled) - `handwriting_regions.json`, `visual_elements.json` - Stage 3 metadata (if enabled) - `ocr_results.json` - OCR word-level data (if OCR enabled) - `README.md` - Package documentation - `metadata.json` - Combined metadata **Dataset (for ML training):** - All files from "minimal" level, plus: - `handwriting_tokens/` - Individual token images (`hw0.png`, `hw1.png`, ...) - `visual_elements/` - Individual element images (`logo_0.png`, `stamp_1.png`, ...) - `token_mapping.json` - Complete mapping with style IDs and positions - `dataset.msgpack` - Msgpack dataset file (if export enabled) - `normalized_bboxes_word.json` - Normalized coordinates (if Stage 5 enabled) **Complete (for debugging):** - All files from "dataset" level, plus: - Intermediate PDFs from each processing stage - Generation logs with timing information - `debug_visualization.png` - Bbox overlay images ### Supported Models - `claude-sonnet-4-5-20250929` (default, recommended) - `claude-3-5-sonnet-20241022` ### Environment Variables - `ANTHROPIC_API_KEY`: Your Anthropic API key (required if not provided in request) ## API Documentation Interactive API documentation is available when the server is running: - **Swagger UI**: http://localhost:8000/docs - **ReDoc**: http://localhost:8000/redoc ## Error Handling The API returns appropriate HTTP status codes: - `200 OK`: Successful generation - `400 Bad Request`: Invalid input (e.g., invalid image URLs) - `401 Unauthorized`: Missing or invalid API key - `500 Internal Server Error`: Processing error Error response format: ```json { "detail": "Error message describing what went wrong" } ``` ## Performance Considerations - **Concurrent requests**: The API can handle multiple requests concurrently - **Image size**: Larger seed images take longer to process - **Number of solutions**: More solutions = longer processing time - **Model selection**: Sonnet is slower but higher quality than Haiku ## Limitations - Maximum 10 seed images per request - Maximum 5 document variations (`num_solutions`) - Single-page documents only - Timeout: 60 seconds per PDF render ## Troubleshooting ### Playwright browser not found ```bash playwright install chromium ``` ### API key not working Make sure your API key is set correctly: ```bash echo $ANTHROPIC_API_KEY ``` ### PDF rendering fails Ensure Chromium is installed and accessible: ```bash playwright show-trace ``` ## Integration with Frontend Example React integration: ```jsx const [loading, setLoading] = useState(false); const [result, setResult] = useState(null); const generateDocuments = async () => { setLoading(true); try { const response = await fetch('http://localhost:8000/generate', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ seed_images: seedImageUrls, prompt_params: { language: 'English', doc_type: documentType, num_solutions: 3 } }) }); const data = await response.json(); setResult(data); } catch (error) { console.error('Generation failed:', error); } finally { setLoading(false); } }; ``` ### React Integration (Async API with Progress) ```jsx import { useState, useEffect } from 'react'; function DocumentGenerator({ userId, seedImages }) { const [requestId, setRequestId] = useState(null); const [status, setStatus] = useState(null); const [progress, setProgress] = useState(0); // Submit job const handleGenerate = async () => { const response = await fetch('http://localhost:8000/generate/async', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ user_id: userId, seed_images: seedImages, prompt_params: { language: 'English', doc_type: 'receipts', num_solutions: 3, enable_handwriting: true, output_detail: 'dataset' } }) }); const job = await response.json(); setRequestId(job.request_id); setStatus('queued'); }; // Poll job status useEffect(() => { if (!requestId || status === 'completed' || status === 'failed') return; const interval = setInterval(async () => { const response = await fetch(`http://localhost:8000/jobs/${requestId}/status`); const jobStatus = await response.json(); setStatus(jobStatus.status); // Update progress bar const progressMap = { 'queued': 10, 'processing': 30, 'generating': 60, 'completed': 100, 'failed': 0 }; setProgress(progressMap[jobStatus.status] || 0); if (jobStatus.status === 'completed') { // Open Google Drive download link window.open(jobStatus.download_url, '_blank'); } }, 30000); // Poll every 30 seconds return () => clearInterval(interval); }, [requestId, status]); return (
{status && (

Status: {status}

{status === 'completed' && ( Download Results )}
)}
); } ``` ## Background Processing Setup The async endpoints (`/generate/async`) require a background worker system for job processing. ### Prerequisites 1. **Redis** - Job queue storage 2. **Supabase** - Database for job tracking and user data 3. **Google Drive OAuth** - For uploading results to user's Drive ### Installing Redis **Ubuntu/Debian:** ```bash sudo apt-get update sudo apt-get install redis-server sudo systemctl start redis sudo systemctl enable redis ``` **macOS:** ```bash brew install redis brew services start redis ``` **Docker:** ```bash docker run -d -p 6379:6379 --name redis redis:7-alpine ``` **Verify Redis is running:** ```bash redis-cli ping # Should return: PONG ``` ### Configuring Supabase 1. Create a Supabase project at [supabase.com](https://supabase.com) 2. Create the required tables in your Supabase SQL Editor: ```sql -- Document generation requests CREATE TABLE document_requests ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), user_id INTEGER NOT NULL, status TEXT NOT NULL CHECK (status IN ('queued', 'processing', 'generating', 'completed', 'failed')), request_metadata JSONB NOT NULL, error_message TEXT, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); -- Generated documents CREATE TABLE generated_documents ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), request_id UUID NOT NULL REFERENCES document_requests(id), document_id TEXT NOT NULL, file_url TEXT, zip_url TEXT, file_size_mb DECIMAL, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); -- User integrations (Google Drive OAuth) CREATE TABLE user_integrations ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), user_id INTEGER NOT NULL, integration_type TEXT NOT NULL CHECK (integration_type IN ('google_drive', 'dropbox')), access_token TEXT NOT NULL, refresh_token TEXT, token_expiry TIMESTAMPTZ, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), UNIQUE(user_id, integration_type) ); -- Analytics events CREATE TABLE analytics_events ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), user_id INTEGER, event_type TEXT NOT NULL, entity_id UUID, event_data JSONB, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); -- Indexes for performance CREATE INDEX idx_document_requests_user_id ON document_requests(user_id); CREATE INDEX idx_document_requests_status ON document_requests(status); CREATE INDEX idx_generated_documents_request_id ON generated_documents(request_id); CREATE INDEX idx_user_integrations_user_id ON user_integrations(user_id); CREATE INDEX idx_analytics_events_user_id ON analytics_events(user_id); ``` 3. Add your Supabase credentials to `.env`: ```bash # In api/.env SUPABASE_URL=https://your-project-ref.supabase.co SUPABASE_KEY=your-anon-or-service-role-key ``` ### Configuring Google Drive OAuth Users need to connect their Google Drive account for result storage: 1. Create a Google Cloud Project at [console.cloud.google.com](https://console.cloud.google.com) 2. Enable Google Drive API 3. Create OAuth 2.0 credentials (Web application) 4. Add authorized redirect URIs (e.g., `http://localhost:3000/auth/google/callback`) 5. Download credentials JSON 6. Users authenticate via OAuth flow (implement in your frontend): ```python # Example OAuth flow (implement in your auth system) from google_auth_oauthlib.flow import Flow flow = Flow.from_client_config( client_config={ "web": { "client_id": "YOUR_CLIENT_ID", "client_secret": "YOUR_CLIENT_SECRET", "auth_uri": "https://accounts.google.com/o/oauth2/auth", "token_uri": "https://oauth2.googleapis.com/token", "redirect_uris": ["http://localhost:3000/auth/google/callback"] } }, scopes=["https://www.googleapis.com/auth/drive.file"] ) # User visits auth URL, gets redirected back with code authorization_url, state = flow.authorization_url(access_type='offline', include_granted_scopes='true') # Exchange code for tokens flow.fetch_token(code=authorization_code) credentials = flow.credentials # Store in Supabase user_integrations table supabase.table('user_integrations').insert({ 'user_id': user_id, 'integration_type': 'google_drive', 'access_token': credentials.token, 'refresh_token': credentials.refresh_token, 'token_expiry': credentials.expiry }).execute() ``` ### Starting the Background Worker 1. Configure environment variables in `api/.env`: ```bash # Redis Configuration REDIS_URL=redis://localhost:6379/0 RQ_QUEUE_NAME=docgenie # Batch Processing BATCH_POLL_INTERVAL=30 # seconds BATCH_DATA_DIR=/tmp/docgenie_batches MESSAGE_DATA_DIR=/tmp/docgenie_messages # Google Drive GOOGLE_DRIVE_FOLDER_NAME=DocGenie Documents # Supabase (already configured above) SUPABASE_URL=https://your-project.supabase.co SUPABASE_KEY=your_key_here # Claude API ANTHROPIC_API_KEY=your_api_key_here ``` 2. Start the worker: ```bash cd api/ ./start_worker.sh ``` The worker will: - βœ“ Check Redis connection - βœ“ Validate Supabase configuration - βœ“ Verify Claude API key - βœ“ Create temporary directories - βœ“ Start RQ worker listening on `docgenie` queue **Output:** ``` πŸš€ Starting DocGenie RQ Worker... βœ“ Loading .env file... βœ“ Redis connected βœ“ Supabase configured βœ“ Claude API key configured βœ“ Temporary directories created ============================================ Worker Configuration: Queue: docgenie Redis: redis://localhost:6379/0 Batch Data: /tmp/docgenie_batches Message Data: /tmp/docgenie_messages ============================================ βœ… Starting RQ worker (press Ctrl+C to stop)... 12:00:00 RQ worker 'worker-abc123' started on docgenie queue ``` ### Running Multiple Workers (Production) For production systems with high load, run multiple workers: ```bash # Terminal 1 ./start_worker.sh # Terminal 2 ./start_worker.sh # Terminal 3 ./start_worker.sh ``` Each worker processes jobs independently from the same queue. **For detailed scaling instructions**, see [SCALING.md](SCALING.md). ### Monitoring Workers ```bash # View worker status rq info --url redis://localhost:6379/0 # View queue status rq info --queue docgenie --url redis://localhost:6379/0 # View failed jobs rq info --queue failed --url redis://localhost:6379/0 ``` ### Architecture Overview ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ FastAPI │───────▢│ Redis │◀───────│ RQ Workers β”‚ β”‚ Server β”‚ β”‚ Queue β”‚ β”‚ (1-5 instances)β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ /generate/ β”‚ β”‚ Job Queue: β”‚ β”‚ β€’ Downloads β”‚ β”‚ async β”‚ β”‚ - queued β”‚ β”‚ β€’ Claude Batch β”‚ β”‚ β”‚ β”‚ - pending β”‚ β”‚ β€’ PDF render β”‚ β”‚ /jobs/ β”‚ β”‚ - active β”‚ β”‚ β€’ Handwriting β”‚ β”‚ {id}/ β”‚ β”‚ β”‚ β”‚ β€’ OCR β”‚ β”‚ status β”‚ β”‚ β”‚ β”‚ β€’ ZIP creation β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Supabase β”‚ β”‚ β€’ document_requests (job tracking) β”‚ β”‚ β€’ generated_documents (results metadata) β”‚ β”‚ β€’ user_integrations (Google Drive OAuth) β”‚ β”‚ β€’ analytics_events (usage tracking) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ Upload Results β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Google Drive β”‚ β”‚ β€’ User's "DocGenie Documents" folder β”‚ β”‚ β€’ ZIP files with generated documents β”‚ β”‚ β€’ Shareable links returned to API β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### Cost Comparison: Direct vs Batched API | API Type | Cost (Input) | Cost (Output) | Latency | Use Case | |----------|-------------|---------------|---------|----------| | Direct | $5.00/1M tokens | $15.00/1M tokens | 30-120s | Real-time, interactive | | **Batched** | **$2.50/1M tokens** | **$7.50/1M tokens** | 5-30 min | **Background jobs (recommended)** | **Example Cost Calculation:** - Generate 100 documents per day - Each request: 5,000 input tokens, 10,000 output tokens **Direct API Cost:** - Input: (100 Γ— 5,000 / 1M) Γ— $5.00 = $2.50/day - Output: (100 Γ— 10,000 / 1M) Γ— $15.00 = $15.00/day - **Total: $17.50/day = $525/month** **Batched API Cost:** - Input: (100 Γ— 5,000 / 1M) Γ— $2.50 = $1.25/day - Output: (100 Γ— 10,000 / 1M) Γ— $7.50 = $7.50/day - **Total: $8.75/day = $262.50/month** **πŸ’° Savings: $262.50/month (50% reduction)** ## Scaling Workers The API uses Redis Queue (RQ) workers for background job processing. Scale workers based on load: | User Load | Workers | Redis RAM | Notes | |-----------|---------|-----------|-------| | < 10 req/hr | 1 | 256 MB | Development | | 10–50 req/hr | 2–3 | 512 MB | Small production | | 50–200 req/hr | 3–5 | 1 GB | Medium production | | > 200 req/hr | 5+ | 2+ GB | Large production | ### Starting Workers ```bash # Single worker (development) ./start_worker.sh # Multiple workers (production) β€” run in separate terminals ./start_worker.sh # Terminal 1 ./start_worker.sh # Terminal 2 # Docker Compose β€” scale to 3 workers docker-compose up --scale worker=3 # Monitor rq info --url redis://localhost:6379/0 rq info --queue docgenie --url redis://localhost:6379/0 ``` ### Railway Multi-Worker (Separate Service) 1. Railway dashboard β†’ New Service β†’ GitHub Repo (same repo) 2. Name: `docgenie-worker` 3. Custom Start Command: `rq worker --url $REDIS_URL` 4. Add the same environment variables as the API service > For most use cases the **combined** mode (API + worker in one service, see `railway.json`) is sufficient and cheaper. ## Contributing This API is a simplified interface to the DocGenie pipeline. For the full pipeline with all features, see the main DocGenie documentation. ## License Same as DocGenie main project.