Spaces:

Text-to-Document-Generation
/

Docgenie-API

Running

App Files Files Community

Docgenie-API / api /README.md

Ahadhassan-2003

deploy: update HF Space

dc4e6da about 5 hours ago

preview code

raw

history blame contribute delete

36.4 kB

	# DocGenie API

	FastAPI-based REST API for generating synthetic documents using LLMs. This API is optimized for ML dataset creation with comprehensive handwriting and visual element support.

	## Features

	- 🚀 Simple REST API - Easy to integrate with any frontend
	- 🖼️ URL-based seed images - Provide seed images via URLs
	- 🎨 Customizable prompts - Control document type, language, and ground truth format
	- ✍️ Handwriting Generation - WordStylist diffusion model with 339 author styles
	- 🎯 Visual Elements - Stamps, logos, barcodes, photos, figures
	- 📊 ML-Ready Datasets - Individual token images with complete metadata
	- 📄 Complete output - Returns PDF, HTML, CSS, and bounding boxes
	- ⚡ Async processing - Fast and efficient document generation

	## ML Dataset Creation

	The API is fully equipped for ML training dataset creation with `output_detail: "dataset"` mode:

	### ✅ Handwriting Data
	- Individual token images: Each handwriting field saved as separate PNG (`hw0.png`, `hw1.png`, ...)
	- Author style IDs: 339 unique writer styles (0-338) for style-consistent generation
	- Text content: Original text for each handwriting field
	- Position data: Precise bounding boxes (x, y, width, height) in mm
	- Signature detection: Boolean flag for signature vs regular handwriting
	- Image dimensions: Width and height for each generated token

	### ✅ Visual Element Data
	- Stamps: Generated with realistic textures, borders, and rotations
	- Text content preserved
	- Red/green color variants
	- Circle/rectangle shapes
	- Logos: Random selection from 6+ logo prefabs
	- Barcodes: Code128 format with customizable content
	- Photos: Random selection from 5+ photo prefabs
	- Figures/Charts: Random selection from 6+ chart/diagram prefabs
	- Individual images: Each element saved as separate PNG with transparency

	### ✅ Dataset Metadata
	- Token mapping JSON: Complete mapping with:
	- Token IDs and references
	- Style IDs for handwriting
	- Element types for visual elements
	- Position rectangles
	- Image filenames
	- Content text
	- Ground truth annotations: QA pairs, classification labels, NER tags
	- Bounding boxes: Word, segment, and layout-level bboxes
	- Normalized coordinates: [0,1] scaled for ML frameworks
	- Msgpack export: Compatible with datadings library

	### ✅ Additional ML Features
	- OCR results: Word-level bboxes and text for Document AI training
	- Layout elements: Document structure annotations
	- Page dimensions: Physical measurements (mm) and pixel dimensions
	- Reproducibility: Seed-based generation for consistent results

	## Pipeline Overview

	The API implements a simplified version of the DocGenie generation pipeline:

	1. Download seed images from URLs
	2. Convert to base64 for LLM input
	3. Build custom prompt with user parameters
	4. Call Claude API to generate HTML documents
	5. Extract HTML/CSS and ground truth from response
	6. Render to PDF using Playwright
	7. Extract bounding boxes from PDF
	8. Return results as JSON with base64-encoded PDF

	## Installation

	### Prerequisites

	- Python 3.10+
	- DocGenie main package installed
	- Playwright browsers installed

	### Setup

	1. Install dependencies (all API dependencies are included in the main project):
	```bash
	# Using uv (recommended)
	uv sync

	# Or using pip
	pip install -e .

	# Or install API-specific dependencies
	cd api/
	pip install -r requirements.txt
	```

	Note: For async endpoint support, ensure you have:
	- `redis>=5.0.0` and `rq>=1.15.0` (job queue)
	- `supabase>=2.0.0` (database)
	- `google-api-python-client>=2.100.0` (Google Drive integration)

	2. Install Playwright browsers:
	```bash
	playwright install chromium
	```

	3. Install Tesseract OCR (for local OCR support):
	```bash
	# Ubuntu/Debian
	sudo apt-get update && sudo apt-get install tesseract-ocr

	# macOS
	brew install tesseract

	# Windows
	# Download installer from: https://github.com/UB-Mannheim/tesseract/wiki
	```

	4. Set your Anthropic API key:
	```bash
	export ANTHROPIC_API_KEY="your-api-key-here"
	```

	5. Configure OCR in `.env`:
	```bash
	cp .env.example .env
	# Edit .env and set:
	OCR_SERVICE_ENABLED=true
	OCR_USE_LOCAL=true # Use local Tesseract (recommended)
	```

	## Running the API

	### Development Mode

	```bash
	cd api
	python main.py
	```

	The API will be available at `http://localhost:8000`

	### Production Mode

	```bash
	cd api
	uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4
	```

	## API Endpoints

	### Health Check

	```http
	GET /health
	```

	Response:
	```json
	{
	"status": "healthy",
	"version": "1.0.0"
	}
	```

	### Generate Documents

	```http
	POST /generate
	```

	Request Body:
	```json
	{
	"seed_images": [
	"https://example.com/seed1.jpg",
	"https://example.com/seed2.jpg"
	],
	"prompt_params": {
	"language": "English",
	"doc_type": "business and administrative",
	"gt_type": "Multiple questions about each document, with their answers taken verbatim from the document.",
	"gt_format": "{\"<Text of question 1>\": \"<Answer to question 1>\", \"<Text of question 2>\": \"<Answer to question 2>\", ...}",
	"num_solutions": 3
	},
	"model": "claude-sonnet-4-5-20250929",
	"api_key": "optional-api-key"
	}
	```

	Response:
	```json
	{
	"success": true,
	"message": "Successfully generated 3 documents",
	"total_documents": 3,
	"documents": [
	{
	"document_id": "uuid-123_0",
	"html": "<!DOCTYPE html>...",
	"css": "body { ... }",
	"ground_truth": {
	"What is the invoice number?": "INV-12345",
	"What is the total amount?": "$1,234.56"
	},
	"pdf_base64": "JVBERi0xLjQK...",
	"bboxes": [
	{
	"text": "Invoice",
	"x": 0.1,
	"y": 0.05,
	"width": 0.2,
	"height": 0.03,
	"page": 0
	}
	],
	"page_width_mm": 210.0,
	"page_height_mm": 297.0
	}
	]
	}
	```

	### Generate Documents (Async) - Recommended for Production

	```http
	POST /generate/async
	```

	🎯 Cost Optimization: This endpoint uses Claude's Batch API for 50% cost savings ($2.50 vs $5.00 per 1M input tokens).

	⏱️ Latency: 5-30 minutes (vs 30-120 seconds for direct API)

	✅ Best For: Multi-user production systems with non-realtime requirements

	Request Body:
	```json
	{
	"user_id": 123,
	"seed_images": [
	"https://example.com/seed1.jpg",
	"https://example.com/seed2.jpg"
	],
	"prompt_params": {
	"language": "English",
	"doc_type": "business and administrative",
	"num_solutions": 3,
	"enable_handwriting": true,
	"enable_visual_elements": true,
	"enable_ocr": true,
	"output_detail": "dataset"
	}
	}
	```

	Response:
	```json
	{
	"request_id": "550e8400-e29b-41d4-a716-446655440000",
	"status": "queued",
	"estimated_time_minutes": 10,
	"poll_url": "/jobs/550e8400-e29b-41d4-a716-446655440000/status",
	"created_at": "2025-01-15T12:00:00Z"
	}
	```

	Workflow:
	1. Submit generation request → Get `request_id`
	2. Poll status endpoint every 30-60 seconds
	3. When `status: "completed"`, download from Google Drive
	4. Results uploaded to user's Google Drive with shareable link

	### Check Job Status

	```http
	GET /jobs/{request_id}/status
	```

	Response (Queued):
	```json
	{
	"request_id": "550e8400-e29b-41d4-a716-446655440000",
	"status": "queued",
	"created_at": "2025-01-15T12:00:00Z",
	"updated_at": "2025-01-15T12:00:00Z"
	}
	```

	Response (Processing):
	```json
	{
	"request_id": "550e8400-e29b-41d4-a716-446655440000",
	"status": "processing",
	"created_at": "2025-01-15T12:00:00Z",
	"updated_at": "2025-01-15T12:05:00Z",
	"progress": "Creating batch request..."
	}
	```

	Response (Completed):
	```json
	{
	"request_id": "550e8400-e29b-41d4-a716-446655440000",
	"status": "completed",
	"created_at": "2025-01-15T12:00:00Z",
	"updated_at": "2025-01-15T12:15:00Z",
	"download_url": "https://drive.google.com/file/d/abc123xyz/view?usp=sharing",
	"file_size_mb": 15.4,
	"document_count": 3
	}
	```

	Response (Failed):
	```json
	{
	"request_id": "550e8400-e29b-41d4-a716-446655440000",
	"status": "failed",
	"created_at": "2025-01-15T12:00:00Z",
	"updated_at": "2025-01-15T12:08:00Z",
	"error_message": "Batch processing timeout"
	}
	```

	Status Values:
	- `queued`: Job submitted, waiting for worker
	- `processing`: Worker picked up job, creating batch
	- `generating`: Batch submitted to Claude, waiting for completion
	- `completed`: Documents generated and uploaded to Google Drive
	- `failed`: Error occurred (see `error_message`)

	### List User Jobs

	```http
	GET /jobs/user/{user_id}?limit=50&offset=0
	```

	Response:
	```json
	{
	"user_id": 123,
	"jobs": [
	{
	"request_id": "550e8400-e29b-41d4-a716-446655440000",
	"status": "completed",
	"created_at": "2025-01-15T12:00:00Z",
	"download_url": "https://drive.google.com/...",
	"document_count": 3
	},
	{
	"request_id": "660e8400-e29b-41d4-a716-446655440111",
	"status": "processing",
	"created_at": "2025-01-15T12:30:00Z"
	}
	],
	"count": 2,
	"limit": 50,
	"offset": 0
	}
	```

	## Usage Examples

	### cURL

	```bash
	curl -X POST http://localhost:8000/generate \
	-H "Content-Type: application/json" \
	-d '{
	"seed_images": [
	"https://example.com/receipt1.jpg",
	"https://example.com/receipt2.jpg"
	],
	"prompt_params": {
	"language": "English",
	"doc_type": "receipts",
	"num_solutions": 2
	}
	}'
	```

	### Python (Direct API)

	```python
	import requests
	import base64

	response = requests.post(
	"http://localhost:8000/generate",
	json={
	"seed_images": [
	"https://example.com/seed1.jpg",
	"https://example.com/seed2.jpg"
	],
	"prompt_params": {
	"language": "English",
	"doc_type": "business forms",
	"num_solutions": 3
	}
	}
	)

	result = response.json()

	# Save first PDF
	if result["success"]:
	pdf_data = base64.b64decode(result["documents"][0]["pdf_base64"])
	with open("generated_doc.pdf", "wb") as f:
	f.write(pdf_data)
	```

	### Python (Async API with Polling) - Recommended

	```python
	import requests
	import time

	# Step 1: Submit job
	response = requests.post(
	"http://localhost:8000/generate/async",
	json={
	"user_id": 123,
	"seed_images": [
	"https://example.com/seed1.jpg",
	"https://example.com/seed2.jpg"
	],
	"prompt_params": {
	"language": "English",
	"doc_type": "receipts and invoices",
	"num_solutions": 5,
	"enable_handwriting": True,
	"enable_visual_elements": True,
	"enable_ocr": True,
	"output_detail": "dataset"
	}
	}
	)

	job = response.json()
	request_id = job["request_id"]
	print(f"✓ Job submitted: {request_id}")
	print(f" Estimated time: {job['estimated_time_minutes']} minutes")

	# Step 2: Poll status until complete
	while True:
	status_response = requests.get(
	f"http://localhost:8000/jobs/{request_id}/status"
	)
	status = status_response.json()

	print(f" Status: {status['status']}", end="")
	if status.get("progress"):
	print(f" - {status['progress']}")
	else:
	print()

	if status["status"] == "completed":
	print(f"✓ Generation complete!")
	print(f" Download: {status['download_url']}")
	print(f" Size: {status.get('file_size_mb', 0):.1f} MB")
	print(f" Documents: {status.get('document_count', 0)}")
	break
	elif status["status"] == "failed":
	print(f"✗ Generation failed: {status.get('error_message')}")
	break

	# Wait 30 seconds before next poll
	time.sleep(30)

	# Step 3: Download from Google Drive (if completed)
	if status["status"] == "completed":
	# User can download from their Google Drive using the shareable link
	print(f"\nDownload your documents at:\n{status['download_url']}")
	```

	### JavaScript

	```javascript
	const response = await fetch('http://localhost:8000/generate', {
	method: 'POST',
	headers: {
	'Content-Type': 'application/json',
	},
	body: JSON.stringify({
	seed_images: [
	'https://example.com/seed1.jpg',
	'https://example.com/seed2.jpg'
	],
	prompt_params: {
	language: 'English',
	doc_type: 'invoices',
	num_solutions: 2
	}
	})
	});

	const result = await response.json();

	// Convert base64 PDF to blob
	const pdfBlob = await fetch(`data:application/pdf;base64,${result.documents[0].pdf_base64}`)
	.then(res => res.blob());
	```

	## Configuration

	### Prompt Parameters

	- language: Language for generated documents (default: "English")
	- doc_type: Type of documents to generate (e.g., "business and administrative", "receipts", "forms")
	- gt_type: Description of ground truth type to generate
	- gt_format: Format specification for ground truth JSON
	- num_solutions: Number of document variations (1-5)

	### Stage 3-5 Advanced Features

	The API supports advanced document synthesis and dataset packaging:

	#### Stage 3: Handwriting & Visual Elements
	- enable_handwriting: Add handwritten text using diffusion model (default: false)
	- handwriting_ratio: Percentage of text to convert to handwriting 0-1 (default: 0.5)
	- enable_visual_elements: Add stamps, barcodes, logos (default: false)
	- visual_element_types: Types of elements to add: ["stamp", "logo", "figure", "barcode", "photo"] (default: all types)

	#### Stage 4: OCR
	- enable_ocr: Perform OCR on generated document (default: false)
	- ocr_language: OCR language code (default: "en")

	#### Stage 5: Dataset Packaging
	- enable_bbox_normalization: Normalize bboxes to [0,1] scale (default: false)
	- enable_gt_verification: Verify ground truth quality (default: false)
	- enable_analysis: Generate dataset statistics (default: false)
	- enable_debug_visualization: Create bbox overlay images (default: false)

	#### Dataset Export (Msgpack Format)
	- enable_dataset_export: Export as msgpack dataset format (default: false)
	- dataset_export_format: Export format - only "msgpack" is supported (default: "msgpack")

	Note: Only msgpack format is implemented in the current pipeline. COCO and HuggingFace export formats mentioned in some documentation are not yet available.

	#### Output Detail Level
	- output_detail: Controls how much data is returned/saved (default: "minimal")
	- `"minimal"` (default): Final outputs only (PDFs, images, metadata) - 2-5 MB per document
	- `"dataset"`: Includes individual token images for ML training - 10-20 MB per document
	- Individual handwriting token images (`handwriting_tokens/hw0.png`, ...)
	- Individual visual element images (`visual_elements/logo_0.png`, ...)
	- Token mapping JSON with style IDs and positions
	- `"complete"`: All intermediate files and debug info - 20-50 MB per document
	- Everything from `dataset` mode
	- Intermediate PDFs from each processing stage
	- Generation logs
	- ⚠️ Warning: Can result in 50+ MB JSON responses for `/generate` endpoint

	Recommendation: Use `"minimal"` for production, `"dataset"` for ML research, `"complete"` for debugging (only with `/generate/pdf`).

	Example with dataset output detail:
	```python
	import requests
	import base64
	import json

	# Generate ML training dataset
	response = requests.post(
	"http://localhost:8000/generate",
	json={
	"seed_images": ["https://example.com/seed.jpg"],
	"prompt_params": {
	"language": "English",
	"doc_type": "receipts and invoices",
	"num_solutions": 5,

	# Enable handwriting and visual elements
	"enable_handwriting": True,
	"handwriting_ratio": 0.4,
	"enable_visual_elements": True,
	"visual_element_types": ["stamp", "logo", "figure", "barcode", "photo"], # All types by default

	# Enable dataset features
	"enable_ocr": True,
	"enable_bbox_normalization": True,
	"enable_dataset_export": True,

	# IMPORTANT: Set output_detail to "dataset" for ML training
	"output_detail": "dataset",

	# Use seed for reproducibility
	"seed": 42
	}
	}
	)

	result = response.json()

	# Process each generated document
	for doc in result["documents"]:
	doc_id = doc["document_id"]
	print(f"\\nProcessing {doc_id}:")

	# 1. Save individual handwriting token images
	if doc.get("handwriting_token_images"):
	print(f" - Handwriting tokens: {len(doc['handwriting_token_images'])}")
	for hw_id, img_b64 in doc["handwriting_token_images"].items():
	with open(f"dataset/{doc_id}/{hw_id}.png", "wb") as f:
	f.write(base64.b64decode(img_b64))

	# 2. Save individual visual element images
	if doc.get("visual_element_images"):
	print(f" - Visual elements: {len(doc['visual_element_images'])}")
	for ve_id, img_b64 in doc["visual_element_images"].items():
	with open(f"dataset/{doc_id}/{ve_id}.png", "wb") as f:
	f.write(base64.b64decode(img_b64))

	# 3. Save token mapping for ML training
	if doc.get("token_mapping"):
	mapping = doc["token_mapping"]
	print(f" - Mapping: {mapping['handwriting']['total_count']} HW + {mapping['visual_elements']['total_count']} VE")
	with open(f"dataset/{doc_id}/token_mapping.json", "w") as f:
	json.dump(mapping, f, indent=2)

	# 4. Save ground truth annotations
	if doc.get("ground_truth"):
	with open(f"dataset/{doc_id}/ground_truth.json", "w") as f:
	json.dump(doc["ground_truth"], f, indent=2)

	# 5. Save bounding boxes (normalized coordinates)
	if doc.get("normalized_bboxes_word"):
	with open(f"dataset/{doc_id}/bboxes_normalized.json", "w") as f:
	json.dump(doc["normalized_bboxes_word"], f, indent=2)

	# 6. Save final document image
	if doc.get("image_base64"):
	with open(f"dataset/{doc_id}/final_image.png", "wb") as f:
	f.write(base64.b64decode(doc["image_base64"]))

	# 7. Save msgpack dataset file
	if doc.get("dataset_export") and doc["dataset_export"].get("msgpack_base64"):
	with open(f"dataset/{doc_id}/dataset.msgpack", "wb") as f:
	f.write(base64.b64decode(doc["dataset_export"]["msgpack_base64"]))

	print(f"\\n✅ Generated {len(result['documents'])} ML-ready documents")
	```

	### PDF Generation Endpoint (Recommended for Large Datasets)

	For bulk generation with comprehensive file outputs, use `/generate/pdf`:

	```bash
	curl -X POST http://localhost:8000/generate/pdf \
	-H "Content-Type: application/json" \
	-d '{
	"seed_images": ["https://example.com/seed1.jpg"],
	"prompt_params": {
	"num_solutions": 3,
	"enable_handwriting": true,
	"enable_ocr": true,
	"enable_bbox_normalization": true,
	"enable_dataset_export": true,
	"output_detail": "dataset"
	}
	}' \
	--output documents.zip
	```

	#### ZIP File Contents

	Based on `output_detail` level:

	Minimal (default):
	- `document_<id>.pdf` - Generated PDF files
	- `document_<id>/` - Per-document directories with:
	- `document.html`, `document.css` - Source files
	- `ground_truth.json`, `bboxes.json` - Annotations
	- `final_image.png` - Final rendered image (if Stage 3 enabled)
	- `handwriting_regions.json`, `visual_elements.json` - Stage 3 metadata (if enabled)
	- `ocr_results.json` - OCR word-level data (if OCR enabled)
	- `README.md` - Package documentation
	- `metadata.json` - Combined metadata

	Dataset (for ML training):
	- All files from "minimal" level, plus:
	- `handwriting_tokens/` - Individual token images (`hw0.png`, `hw1.png`, ...)
	- `visual_elements/` - Individual element images (`logo_0.png`, `stamp_1.png`, ...)
	- `token_mapping.json` - Complete mapping with style IDs and positions
	- `dataset.msgpack` - Msgpack dataset file (if export enabled)
	- `normalized_bboxes_word.json` - Normalized coordinates (if Stage 5 enabled)

	Complete (for debugging):
	- All files from "dataset" level, plus:
	- Intermediate PDFs from each processing stage
	- Generation logs with timing information
	- `debug_visualization.png` - Bbox overlay images

	### Supported Models

	- `claude-sonnet-4-5-20250929` (default, recommended)
	- `claude-3-5-sonnet-20241022`

	### Environment Variables

	- `ANTHROPIC_API_KEY`: Your Anthropic API key (required if not provided in request)

	## API Documentation

	Interactive API documentation is available when the server is running:

	- Swagger UI: http://localhost:8000/docs
	- ReDoc: http://localhost:8000/redoc

	## Error Handling

	The API returns appropriate HTTP status codes:

	- `200 OK`: Successful generation
	- `400 Bad Request`: Invalid input (e.g., invalid image URLs)
	- `401 Unauthorized`: Missing or invalid API key
	- `500 Internal Server Error`: Processing error

	Error response format:
	```json
	{
	"detail": "Error message describing what went wrong"
	}
	```

	## Performance Considerations

	- Concurrent requests: The API can handle multiple requests concurrently
	- Image size: Larger seed images take longer to process
	- Number of solutions: More solutions = longer processing time
	- Model selection: Sonnet is slower but higher quality than Haiku

	## Limitations

	- Maximum 10 seed images per request
	- Maximum 5 document variations (`num_solutions`)
	- Single-page documents only
	- Timeout: 60 seconds per PDF render

	## Troubleshooting

	### Playwright browser not found

	```bash
	playwright install chromium
	```

	### API key not working

	Make sure your API key is set correctly:
	```bash
	echo $ANTHROPIC_API_KEY
	```

	### PDF rendering fails

	Ensure Chromium is installed and accessible:
	```bash
	playwright show-trace
	```

	## Integration with Frontend

	Example React integration:

	```jsx
	const [loading, setLoading] = useState(false);
	const [result, setResult] = useState(null);

	const generateDocuments = async () => {
	setLoading(true);

	try {
	const response = await fetch('http://localhost:8000/generate', {
	method: 'POST',
	headers: { 'Content-Type': 'application/json' },
	body: JSON.stringify({
	seed_images: seedImageUrls,
	prompt_params: {
	language: 'English',
	doc_type: documentType,
	num_solutions: 3
	}
	})
	});

	const data = await response.json();
	setResult(data);
	} catch (error) {
	console.error('Generation failed:', error);
	} finally {
	setLoading(false);
	}
	};
	```

	### React Integration (Async API with Progress)

	```jsx
	import { useState, useEffect } from 'react';

	function DocumentGenerator({ userId, seedImages }) {
	const [requestId, setRequestId] = useState(null);
	const [status, setStatus] = useState(null);
	const [progress, setProgress] = useState(0);

	// Submit job
	const handleGenerate = async () => {
	const response = await fetch('http://localhost:8000/generate/async', {
	method: 'POST',
	headers: { 'Content-Type': 'application/json' },
	body: JSON.stringify({
	user_id: userId,
	seed_images: seedImages,
	prompt_params: {
	language: 'English',
	doc_type: 'receipts',
	num_solutions: 3,
	enable_handwriting: true,
	output_detail: 'dataset'
	}
	})
	});

	const job = await response.json();
	setRequestId(job.request_id);
	setStatus('queued');
	};

	// Poll job status
	useEffect(() => {
	if (!requestId \|\| status === 'completed' \|\| status === 'failed') return;

	const interval = setInterval(async () => {
	const response = await fetch(`http://localhost:8000/jobs/${requestId}/status`);
	const jobStatus = await response.json();

	setStatus(jobStatus.status);

	// Update progress bar
	const progressMap = {
	'queued': 10,
	'processing': 30,
	'generating': 60,
	'completed': 100,
	'failed': 0
	};
	setProgress(progressMap[jobStatus.status] \|\| 0);

	if (jobStatus.status === 'completed') {
	// Open Google Drive download link
	window.open(jobStatus.download_url, '_blank');
	}
	}, 30000); // Poll every 30 seconds

	return () => clearInterval(interval);
	}, [requestId, status]);

	return (
	<div>
	<button onClick={handleGenerate} disabled={status && status !== 'completed'}>
	Generate Documents
	</button>

	{status && (
	<div className="progress-container">
	<div className="progress-bar" style={{ width: `${progress}%` }} />
	<p>Status: {status}</p>
	{status === 'completed' && (
	<a href={`http://localhost:8000/jobs/${requestId}/status`}>
	Download Results
	</a>
	)}
	</div>
	)}
	</div>
	);
	}
	```

	## Background Processing Setup

	The async endpoints (`/generate/async`) require a background worker system for job processing.

	### Prerequisites

	1. Redis - Job queue storage
	2. Supabase - Database for job tracking and user data
	3. Google Drive OAuth - For uploading results to user's Drive

	### Installing Redis

	Ubuntu/Debian:
	```bash
	sudo apt-get update
	sudo apt-get install redis-server
	sudo systemctl start redis
	sudo systemctl enable redis
	```

	macOS:
	```bash
	brew install redis
	brew services start redis
	```

	Docker:
	```bash
	docker run -d -p 6379:6379 --name redis redis:7-alpine
	```

	Verify Redis is running:
	```bash
	redis-cli ping
	# Should return: PONG
	```

	### Configuring Supabase

	1. Create a Supabase project at [supabase.com](https://supabase.com)

	2. Create the required tables in your Supabase SQL Editor:

	```sql
	-- Document generation requests
	CREATE TABLE document_requests (
	id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
	user_id INTEGER NOT NULL,
	status TEXT NOT NULL CHECK (status IN ('queued', 'processing', 'generating', 'completed', 'failed')),
	request_metadata JSONB NOT NULL,
	error_message TEXT,
	created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
	updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
	);

	-- Generated documents
	CREATE TABLE generated_documents (
	id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
	request_id UUID NOT NULL REFERENCES document_requests(id),
	document_id TEXT NOT NULL,
	file_url TEXT,
	zip_url TEXT,
	file_size_mb DECIMAL,
	created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
	);

	-- User integrations (Google Drive OAuth)
	CREATE TABLE user_integrations (
	id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
	user_id INTEGER NOT NULL,
	integration_type TEXT NOT NULL CHECK (integration_type IN ('google_drive', 'dropbox')),
	access_token TEXT NOT NULL,
	refresh_token TEXT,
	token_expiry TIMESTAMPTZ,
	created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
	updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
	UNIQUE(user_id, integration_type)
	);

	-- Analytics events
	CREATE TABLE analytics_events (
	id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
	user_id INTEGER,
	event_type TEXT NOT NULL,
	entity_id UUID,
	event_data JSONB,
	created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
	);

	-- Indexes for performance
	CREATE INDEX idx_document_requests_user_id ON document_requests(user_id);
	CREATE INDEX idx_document_requests_status ON document_requests(status);
	CREATE INDEX idx_generated_documents_request_id ON generated_documents(request_id);
	CREATE INDEX idx_user_integrations_user_id ON user_integrations(user_id);
	CREATE INDEX idx_analytics_events_user_id ON analytics_events(user_id);
	```

	3. Add your Supabase credentials to `.env`:

	```bash
	# In api/.env
	SUPABASE_URL=https://your-project-ref.supabase.co
	SUPABASE_KEY=your-anon-or-service-role-key
	```

	### Configuring Google Drive OAuth

	Users need to connect their Google Drive account for result storage:

	1. Create a Google Cloud Project at [console.cloud.google.com](https://console.cloud.google.com)
	2. Enable Google Drive API
	3. Create OAuth 2.0 credentials (Web application)
	4. Add authorized redirect URIs (e.g., `http://localhost:3000/auth/google/callback`)
	5. Download credentials JSON

	6. Users authenticate via OAuth flow (implement in your frontend):

	```python
	# Example OAuth flow (implement in your auth system)
	from google_auth_oauthlib.flow import Flow

	flow = Flow.from_client_config(
	client_config={
	"web": {
	"client_id": "YOUR_CLIENT_ID",
	"client_secret": "YOUR_CLIENT_SECRET",
	"auth_uri": "https://accounts.google.com/o/oauth2/auth",
	"token_uri": "https://oauth2.googleapis.com/token",
	"redirect_uris": ["http://localhost:3000/auth/google/callback"]
	}
	},
	scopes=["https://www.googleapis.com/auth/drive.file"]
	)

	# User visits auth URL, gets redirected back with code
	authorization_url, state = flow.authorization_url(access_type='offline', include_granted_scopes='true')

	# Exchange code for tokens
	flow.fetch_token(code=authorization_code)
	credentials = flow.credentials

	# Store in Supabase user_integrations table
	supabase.table('user_integrations').insert({
	'user_id': user_id,
	'integration_type': 'google_drive',
	'access_token': credentials.token,
	'refresh_token': credentials.refresh_token,
	'token_expiry': credentials.expiry
	}).execute()
	```

	### Starting the Background Worker

	1. Configure environment variables in `api/.env`:

	```bash
	# Redis Configuration
	REDIS_URL=redis://localhost:6379/0
	RQ_QUEUE_NAME=docgenie

	# Batch Processing
	BATCH_POLL_INTERVAL=30 # seconds
	BATCH_DATA_DIR=/tmp/docgenie_batches
	MESSAGE_DATA_DIR=/tmp/docgenie_messages

	# Google Drive
	GOOGLE_DRIVE_FOLDER_NAME=DocGenie Documents

	# Supabase (already configured above)
	SUPABASE_URL=https://your-project.supabase.co
	SUPABASE_KEY=your_key_here

	# Claude API
	ANTHROPIC_API_KEY=your_api_key_here
	```

	2. Start the worker:

	```bash
	cd api/
	./start_worker.sh
	```

	The worker will:
	- ✓ Check Redis connection
	- ✓ Validate Supabase configuration
	- ✓ Verify Claude API key
	- ✓ Create temporary directories
	- ✓ Start RQ worker listening on `docgenie` queue

	Output:
	```
	🚀 Starting DocGenie RQ Worker...
	✓ Loading .env file...
	✓ Redis connected
	✓ Supabase configured
	✓ Claude API key configured
	✓ Temporary directories created

	============================================
	Worker Configuration:
	Queue: docgenie
	Redis: redis://localhost:6379/0
	Batch Data: /tmp/docgenie_batches
	Message Data: /tmp/docgenie_messages
	============================================

	✅ Starting RQ worker (press Ctrl+C to stop)...

	12:00:00 RQ worker 'worker-abc123' started on docgenie queue
	```

	### Running Multiple Workers (Production)

	For production systems with high load, run multiple workers:

	```bash
	# Terminal 1
	./start_worker.sh

	# Terminal 2
	./start_worker.sh

	# Terminal 3
	./start_worker.sh
	```

	Each worker processes jobs independently from the same queue.

	For detailed scaling instructions, see [SCALING.md](SCALING.md).

	### Monitoring Workers

	```bash
	# View worker status
	rq info --url redis://localhost:6379/0

	# View queue status
	rq info --queue docgenie --url redis://localhost:6379/0

	# View failed jobs
	rq info --queue failed --url redis://localhost:6379/0
	```

	### Architecture Overview

	```
	┌─────────────┐ ┌─────────────┐ ┌─────────────────┐
	│ FastAPI │───────▶│ Redis │◀───────│ RQ Workers │
	│ Server │ │ Queue │ │ (1-5 instances)│
	│ │ │ │ │ │
	│ /generate/ │ │ Job Queue: │ │ • Downloads │
	│ async │ │ - queued │ │ • Claude Batch │
	│ │ │ - pending │ │ • PDF render │
	│ /jobs/ │ │ - active │ │ • Handwriting │
	│ {id}/ │ │ │ │ • OCR │
	│ status │ │ │ │ • ZIP creation │
	└──────┬──────┘ └─────────────┘ └────────┬────────┘
	│ │
	│ │
	▼ ▼
	┌──────────────────────────────────────────────────────────────┐
	│ Supabase │
	│ • document_requests (job tracking) │
	│ • generated_documents (results metadata) │
	│ • user_integrations (Google Drive OAuth) │
	│ • analytics_events (usage tracking) │
	└───────────────────────────────────────────────────────────────┘
	│
	│ Upload Results
	▼
	┌──────────────────────────────────────────────────────────────┐
	│ Google Drive │
	│ • User's "DocGenie Documents" folder │
	│ • ZIP files with generated documents │
	│ • Shareable links returned to API │
	└──────────────────────────────────────────────────────────────┘
	```

	### Cost Comparison: Direct vs Batched API

	\| API Type \| Cost (Input) \| Cost (Output) \| Latency \| Use Case \|
	\|----------\|-------------\|---------------\|---------\|----------\|
	\| Direct \| $5.00/1M tokens \| $15.00/1M tokens \| 30-120s \| Real-time, interactive \|
	\| Batched \| $2.50/1M tokens \| $7.50/1M tokens \| 5-30 min \| Background jobs (recommended) \|

	Example Cost Calculation:
	- Generate 100 documents per day
	- Each request: 5,000 input tokens, 10,000 output tokens

	Direct API Cost:
	- Input: (100 × 5,000 / 1M) × $5.00 = $2.50/day
	- Output: (100 × 10,000 / 1M) × $15.00 = $15.00/day
	- Total: $17.50/day = $525/month

	Batched API Cost:
	- Input: (100 × 5,000 / 1M) × $2.50 = $1.25/day
	- Output: (100 × 10,000 / 1M) × $7.50 = $7.50/day
	- Total: $8.75/day = $262.50/month

	💰 Savings: $262.50/month (50% reduction)

	## Scaling Workers

	The API uses Redis Queue (RQ) workers for background job processing. Scale workers based on load:

	\| User Load \| Workers \| Redis RAM \| Notes \|
	\|-----------\|---------\|-----------\|-------\|
	\| < 10 req/hr \| 1 \| 256 MB \| Development \|
	\| 10–50 req/hr \| 2–3 \| 512 MB \| Small production \|
	\| 50–200 req/hr \| 3–5 \| 1 GB \| Medium production \|
	\| > 200 req/hr \| 5+ \| 2+ GB \| Large production \|

	### Starting Workers

	```bash
	# Single worker (development)
	./start_worker.sh

	# Multiple workers (production) — run in separate terminals
	./start_worker.sh # Terminal 1
	./start_worker.sh # Terminal 2

	# Docker Compose — scale to 3 workers
	docker-compose up --scale worker=3

	# Monitor
	rq info --url redis://localhost:6379/0
	rq info --queue docgenie --url redis://localhost:6379/0
	```

	### Railway Multi-Worker (Separate Service)
	1. Railway dashboard → New Service → GitHub Repo (same repo)
	2. Name: `docgenie-worker`
	3. Custom Start Command: `rq worker --url $REDIS_URL`
	4. Add the same environment variables as the API service

	> For most use cases the combined mode (API + worker in one service, see `railway.json`) is sufficient and cheaper.

	## Contributing

	This API is a simplified interface to the DocGenie pipeline. For the full pipeline with all features, see the main DocGenie documentation.

	## License

	Same as DocGenie main project.