DocGenie API
FastAPI-based REST API for generating synthetic documents using LLMs. This API is optimized for ML dataset creation with comprehensive handwriting and visual element support.
Features
- π Simple REST API - Easy to integrate with any frontend
- πΌοΈ URL-based seed images - Provide seed images via URLs
- π¨ Customizable prompts - Control document type, language, and ground truth format
- βοΈ Handwriting Generation - WordStylist diffusion model with 339 author styles
- π― Visual Elements - Stamps, logos, barcodes, photos, figures
- π ML-Ready Datasets - Individual token images with complete metadata
- π Complete output - Returns PDF, HTML, CSS, and bounding boxes
- β‘ Async processing - Fast and efficient document generation
ML Dataset Creation
The API is fully equipped for ML training dataset creation with output_detail: "dataset" mode:
β Handwriting Data
- Individual token images: Each handwriting field saved as separate PNG (
hw0.png,hw1.png, ...) - Author style IDs: 339 unique writer styles (0-338) for style-consistent generation
- Text content: Original text for each handwriting field
- Position data: Precise bounding boxes (x, y, width, height) in mm
- Signature detection: Boolean flag for signature vs regular handwriting
- Image dimensions: Width and height for each generated token
β Visual Element Data
- Stamps: Generated with realistic textures, borders, and rotations
- Text content preserved
- Red/green color variants
- Circle/rectangle shapes
- Logos: Random selection from 6+ logo prefabs
- Barcodes: Code128 format with customizable content
- Photos: Random selection from 5+ photo prefabs
- Figures/Charts: Random selection from 6+ chart/diagram prefabs
- Individual images: Each element saved as separate PNG with transparency
β Dataset Metadata
- Token mapping JSON: Complete mapping with:
- Token IDs and references
- Style IDs for handwriting
- Element types for visual elements
- Position rectangles
- Image filenames
- Content text
- Ground truth annotations: QA pairs, classification labels, NER tags
- Bounding boxes: Word, segment, and layout-level bboxes
- Normalized coordinates: [0,1] scaled for ML frameworks
- Msgpack export: Compatible with datadings library
β Additional ML Features
- OCR results: Word-level bboxes and text for Document AI training
- Layout elements: Document structure annotations
- Page dimensions: Physical measurements (mm) and pixel dimensions
- Reproducibility: Seed-based generation for consistent results
Pipeline Overview
The API implements a simplified version of the DocGenie generation pipeline:
- Download seed images from URLs
- Convert to base64 for LLM input
- Build custom prompt with user parameters
- Call Claude API to generate HTML documents
- Extract HTML/CSS and ground truth from response
- Render to PDF using Playwright
- Extract bounding boxes from PDF
- Return results as JSON with base64-encoded PDF
Installation
Prerequisites
- Python 3.10+
- DocGenie main package installed
- Playwright browsers installed
Setup
- Install dependencies (all API dependencies are included in the main project):
# Using uv (recommended)
uv sync
# Or using pip
pip install -e .
# Or install API-specific dependencies
cd api/
pip install -r requirements.txt
Note: For async endpoint support, ensure you have:
redis>=5.0.0andrq>=1.15.0(job queue)supabase>=2.0.0(database)google-api-python-client>=2.100.0(Google Drive integration)
- Install Playwright browsers:
playwright install chromium
- Install Tesseract OCR (for local OCR support):
# Ubuntu/Debian
sudo apt-get update && sudo apt-get install tesseract-ocr
# macOS
brew install tesseract
# Windows
# Download installer from: https://github.com/UB-Mannheim/tesseract/wiki
- Set your Anthropic API key:
export ANTHROPIC_API_KEY="your-api-key-here"
- Configure OCR in
.env:
cp .env.example .env
# Edit .env and set:
OCR_SERVICE_ENABLED=true
OCR_USE_LOCAL=true # Use local Tesseract (recommended)
Running the API
Development Mode
cd api
python main.py
The API will be available at http://localhost:8000
Production Mode
cd api
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4
API Endpoints
Health Check
GET /health
Response:
{
"status": "healthy",
"version": "1.0.0"
}
Generate Documents
POST /generate
Request Body:
{
"seed_images": [
"https://example.com/seed1.jpg",
"https://example.com/seed2.jpg"
],
"prompt_params": {
"language": "English",
"doc_type": "business and administrative",
"gt_type": "Multiple questions about each document, with their answers taken **verbatim** from the document.",
"gt_format": "{\"<Text of question 1>\": \"<Answer to question 1>\", \"<Text of question 2>\": \"<Answer to question 2>\", ...}",
"num_solutions": 3
},
"model": "claude-sonnet-4-5-20250929",
"api_key": "optional-api-key"
}
Response:
{
"success": true,
"message": "Successfully generated 3 documents",
"total_documents": 3,
"documents": [
{
"document_id": "uuid-123_0",
"html": "<!DOCTYPE html>...",
"css": "body { ... }",
"ground_truth": {
"What is the invoice number?": "INV-12345",
"What is the total amount?": "$1,234.56"
},
"pdf_base64": "JVBERi0xLjQK...",
"bboxes": [
{
"text": "Invoice",
"x": 0.1,
"y": 0.05,
"width": 0.2,
"height": 0.03,
"page": 0
}
],
"page_width_mm": 210.0,
"page_height_mm": 297.0
}
]
}
Generate Documents (Async) - Recommended for Production
POST /generate/async
π― Cost Optimization: This endpoint uses Claude's Batch API for 50% cost savings ($2.50 vs $5.00 per 1M input tokens).
β±οΈ Latency: 5-30 minutes (vs 30-120 seconds for direct API)
β Best For: Multi-user production systems with non-realtime requirements
Request Body:
{
"user_id": 123,
"seed_images": [
"https://example.com/seed1.jpg",
"https://example.com/seed2.jpg"
],
"prompt_params": {
"language": "English",
"doc_type": "business and administrative",
"num_solutions": 3,
"enable_handwriting": true,
"enable_visual_elements": true,
"enable_ocr": true,
"output_detail": "dataset"
}
}
Response:
{
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "queued",
"estimated_time_minutes": 10,
"poll_url": "/jobs/550e8400-e29b-41d4-a716-446655440000/status",
"created_at": "2025-01-15T12:00:00Z"
}
Workflow:
- Submit generation request β Get
request_id - Poll status endpoint every 30-60 seconds
- When
status: "completed", download from Google Drive - Results uploaded to user's Google Drive with shareable link
Check Job Status
GET /jobs/{request_id}/status
Response (Queued):
{
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "queued",
"created_at": "2025-01-15T12:00:00Z",
"updated_at": "2025-01-15T12:00:00Z"
}
Response (Processing):
{
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "processing",
"created_at": "2025-01-15T12:00:00Z",
"updated_at": "2025-01-15T12:05:00Z",
"progress": "Creating batch request..."
}
Response (Completed):
{
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"created_at": "2025-01-15T12:00:00Z",
"updated_at": "2025-01-15T12:15:00Z",
"download_url": "https://drive.google.com/file/d/abc123xyz/view?usp=sharing",
"file_size_mb": 15.4,
"document_count": 3
}
Response (Failed):
{
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "failed",
"created_at": "2025-01-15T12:00:00Z",
"updated_at": "2025-01-15T12:08:00Z",
"error_message": "Batch processing timeout"
}
Status Values:
queued: Job submitted, waiting for workerprocessing: Worker picked up job, creating batchgenerating: Batch submitted to Claude, waiting for completioncompleted: Documents generated and uploaded to Google Drivefailed: Error occurred (seeerror_message)
List User Jobs
GET /jobs/user/{user_id}?limit=50&offset=0
Response:
{
"user_id": 123,
"jobs": [
{
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"created_at": "2025-01-15T12:00:00Z",
"download_url": "https://drive.google.com/...",
"document_count": 3
},
{
"request_id": "660e8400-e29b-41d4-a716-446655440111",
"status": "processing",
"created_at": "2025-01-15T12:30:00Z"
}
],
"count": 2,
"limit": 50,
"offset": 0
}
Usage Examples
cURL
curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{
"seed_images": [
"https://example.com/receipt1.jpg",
"https://example.com/receipt2.jpg"
],
"prompt_params": {
"language": "English",
"doc_type": "receipts",
"num_solutions": 2
}
}'
Python (Direct API)
import requests
import base64
response = requests.post(
"http://localhost:8000/generate",
json={
"seed_images": [
"https://example.com/seed1.jpg",
"https://example.com/seed2.jpg"
],
"prompt_params": {
"language": "English",
"doc_type": "business forms",
"num_solutions": 3
}
}
)
result = response.json()
# Save first PDF
if result["success"]:
pdf_data = base64.b64decode(result["documents"][0]["pdf_base64"])
with open("generated_doc.pdf", "wb") as f:
f.write(pdf_data)
Python (Async API with Polling) - Recommended
import requests
import time
# Step 1: Submit job
response = requests.post(
"http://localhost:8000/generate/async",
json={
"user_id": 123,
"seed_images": [
"https://example.com/seed1.jpg",
"https://example.com/seed2.jpg"
],
"prompt_params": {
"language": "English",
"doc_type": "receipts and invoices",
"num_solutions": 5,
"enable_handwriting": True,
"enable_visual_elements": True,
"enable_ocr": True,
"output_detail": "dataset"
}
}
)
job = response.json()
request_id = job["request_id"]
print(f"β Job submitted: {request_id}")
print(f" Estimated time: {job['estimated_time_minutes']} minutes")
# Step 2: Poll status until complete
while True:
status_response = requests.get(
f"http://localhost:8000/jobs/{request_id}/status"
)
status = status_response.json()
print(f" Status: {status['status']}", end="")
if status.get("progress"):
print(f" - {status['progress']}")
else:
print()
if status["status"] == "completed":
print(f"β Generation complete!")
print(f" Download: {status['download_url']}")
print(f" Size: {status.get('file_size_mb', 0):.1f} MB")
print(f" Documents: {status.get('document_count', 0)}")
break
elif status["status"] == "failed":
print(f"β Generation failed: {status.get('error_message')}")
break
# Wait 30 seconds before next poll
time.sleep(30)
# Step 3: Download from Google Drive (if completed)
if status["status"] == "completed":
# User can download from their Google Drive using the shareable link
print(f"\nDownload your documents at:\n{status['download_url']}")
JavaScript
const response = await fetch('http://localhost:8000/generate', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
seed_images: [
'https://example.com/seed1.jpg',
'https://example.com/seed2.jpg'
],
prompt_params: {
language: 'English',
doc_type: 'invoices',
num_solutions: 2
}
})
});
const result = await response.json();
// Convert base64 PDF to blob
const pdfBlob = await fetch(`data:application/pdf;base64,${result.documents[0].pdf_base64}`)
.then(res => res.blob());
Configuration
Prompt Parameters
- language: Language for generated documents (default: "English")
- doc_type: Type of documents to generate (e.g., "business and administrative", "receipts", "forms")
- gt_type: Description of ground truth type to generate
- gt_format: Format specification for ground truth JSON
- num_solutions: Number of document variations (1-5)
Stage 3-5 Advanced Features
The API supports advanced document synthesis and dataset packaging:
Stage 3: Handwriting & Visual Elements
- enable_handwriting: Add handwritten text using diffusion model (default: false)
- handwriting_ratio: Percentage of text to convert to handwriting 0-1 (default: 0.5)
- enable_visual_elements: Add stamps, barcodes, logos (default: false)
- visual_element_types: Types of elements to add: ["stamp", "logo", "figure", "barcode", "photo"] (default: all types)
Stage 4: OCR
- enable_ocr: Perform OCR on generated document (default: false)
- ocr_language: OCR language code (default: "en")
Stage 5: Dataset Packaging
- enable_bbox_normalization: Normalize bboxes to [0,1] scale (default: false)
- enable_gt_verification: Verify ground truth quality (default: false)
- enable_analysis: Generate dataset statistics (default: false)
- enable_debug_visualization: Create bbox overlay images (default: false)
Dataset Export (Msgpack Format)
- enable_dataset_export: Export as msgpack dataset format (default: false)
- dataset_export_format: Export format - only "msgpack" is supported (default: "msgpack")
Note: Only msgpack format is implemented in the current pipeline. COCO and HuggingFace export formats mentioned in some documentation are not yet available.
Output Detail Level
- output_detail: Controls how much data is returned/saved (default: "minimal")
"minimal"(default): Final outputs only (PDFs, images, metadata) - 2-5 MB per document"dataset": Includes individual token images for ML training - 10-20 MB per document- Individual handwriting token images (
handwriting_tokens/hw0.png, ...) - Individual visual element images (
visual_elements/logo_0.png, ...) - Token mapping JSON with style IDs and positions
- Individual handwriting token images (
"complete": All intermediate files and debug info - 20-50 MB per document- Everything from
datasetmode - Intermediate PDFs from each processing stage
- Generation logs
- β οΈ Warning: Can result in 50+ MB JSON responses for
/generateendpoint
- Everything from
Recommendation: Use "minimal" for production, "dataset" for ML research, "complete" for debugging (only with /generate/pdf).
Example with dataset output detail:
import requests
import base64
import json
# Generate ML training dataset
response = requests.post(
"http://localhost:8000/generate",
json={
"seed_images": ["https://example.com/seed.jpg"],
"prompt_params": {
"language": "English",
"doc_type": "receipts and invoices",
"num_solutions": 5,
# Enable handwriting and visual elements
"enable_handwriting": True,
"handwriting_ratio": 0.4,
"enable_visual_elements": True,
"visual_element_types": ["stamp", "logo", "figure", "barcode", "photo"], # All types by default
# Enable dataset features
"enable_ocr": True,
"enable_bbox_normalization": True,
"enable_dataset_export": True,
# IMPORTANT: Set output_detail to "dataset" for ML training
"output_detail": "dataset",
# Use seed for reproducibility
"seed": 42
}
}
)
result = response.json()
# Process each generated document
for doc in result["documents"]:
doc_id = doc["document_id"]
print(f"\\nProcessing {doc_id}:")
# 1. Save individual handwriting token images
if doc.get("handwriting_token_images"):
print(f" - Handwriting tokens: {len(doc['handwriting_token_images'])}")
for hw_id, img_b64 in doc["handwriting_token_images"].items():
with open(f"dataset/{doc_id}/{hw_id}.png", "wb") as f:
f.write(base64.b64decode(img_b64))
# 2. Save individual visual element images
if doc.get("visual_element_images"):
print(f" - Visual elements: {len(doc['visual_element_images'])}")
for ve_id, img_b64 in doc["visual_element_images"].items():
with open(f"dataset/{doc_id}/{ve_id}.png", "wb") as f:
f.write(base64.b64decode(img_b64))
# 3. Save token mapping for ML training
if doc.get("token_mapping"):
mapping = doc["token_mapping"]
print(f" - Mapping: {mapping['handwriting']['total_count']} HW + {mapping['visual_elements']['total_count']} VE")
with open(f"dataset/{doc_id}/token_mapping.json", "w") as f:
json.dump(mapping, f, indent=2)
# 4. Save ground truth annotations
if doc.get("ground_truth"):
with open(f"dataset/{doc_id}/ground_truth.json", "w") as f:
json.dump(doc["ground_truth"], f, indent=2)
# 5. Save bounding boxes (normalized coordinates)
if doc.get("normalized_bboxes_word"):
with open(f"dataset/{doc_id}/bboxes_normalized.json", "w") as f:
json.dump(doc["normalized_bboxes_word"], f, indent=2)
# 6. Save final document image
if doc.get("image_base64"):
with open(f"dataset/{doc_id}/final_image.png", "wb") as f:
f.write(base64.b64decode(doc["image_base64"]))
# 7. Save msgpack dataset file
if doc.get("dataset_export") and doc["dataset_export"].get("msgpack_base64"):
with open(f"dataset/{doc_id}/dataset.msgpack", "wb") as f:
f.write(base64.b64decode(doc["dataset_export"]["msgpack_base64"]))
print(f"\\nβ
Generated {len(result['documents'])} ML-ready documents")
PDF Generation Endpoint (Recommended for Large Datasets)
For bulk generation with comprehensive file outputs, use /generate/pdf:
curl -X POST http://localhost:8000/generate/pdf \
-H "Content-Type: application/json" \
-d '{
"seed_images": ["https://example.com/seed1.jpg"],
"prompt_params": {
"num_solutions": 3,
"enable_handwriting": true,
"enable_ocr": true,
"enable_bbox_normalization": true,
"enable_dataset_export": true,
"output_detail": "dataset"
}
}' \
--output documents.zip
ZIP File Contents
Based on output_detail level:
Minimal (default):
document_<id>.pdf- Generated PDF filesdocument_<id>/- Per-document directories with:document.html,document.css- Source filesground_truth.json,bboxes.json- Annotationsfinal_image.png- Final rendered image (if Stage 3 enabled)handwriting_regions.json,visual_elements.json- Stage 3 metadata (if enabled)ocr_results.json- OCR word-level data (if OCR enabled)
README.md- Package documentationmetadata.json- Combined metadata
Dataset (for ML training):
- All files from "minimal" level, plus:
handwriting_tokens/- Individual token images (hw0.png,hw1.png, ...)visual_elements/- Individual element images (logo_0.png,stamp_1.png, ...)token_mapping.json- Complete mapping with style IDs and positionsdataset.msgpack- Msgpack dataset file (if export enabled)normalized_bboxes_word.json- Normalized coordinates (if Stage 5 enabled)
Complete (for debugging):
- All files from "dataset" level, plus:
- Intermediate PDFs from each processing stage
- Generation logs with timing information
debug_visualization.png- Bbox overlay images
Supported Models
claude-sonnet-4-5-20250929(default, recommended)claude-3-5-sonnet-20241022
Environment Variables
ANTHROPIC_API_KEY: Your Anthropic API key (required if not provided in request)
API Documentation
Interactive API documentation is available when the server is running:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
Error Handling
The API returns appropriate HTTP status codes:
200 OK: Successful generation400 Bad Request: Invalid input (e.g., invalid image URLs)401 Unauthorized: Missing or invalid API key500 Internal Server Error: Processing error
Error response format:
{
"detail": "Error message describing what went wrong"
}
Performance Considerations
- Concurrent requests: The API can handle multiple requests concurrently
- Image size: Larger seed images take longer to process
- Number of solutions: More solutions = longer processing time
- Model selection: Sonnet is slower but higher quality than Haiku
Limitations
- Maximum 10 seed images per request
- Maximum 5 document variations (
num_solutions) - Single-page documents only
- Timeout: 60 seconds per PDF render
Troubleshooting
Playwright browser not found
playwright install chromium
API key not working
Make sure your API key is set correctly:
echo $ANTHROPIC_API_KEY
PDF rendering fails
Ensure Chromium is installed and accessible:
playwright show-trace
Integration with Frontend
Example React integration:
const [loading, setLoading] = useState(false);
const [result, setResult] = useState(null);
const generateDocuments = async () => {
setLoading(true);
try {
const response = await fetch('http://localhost:8000/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
seed_images: seedImageUrls,
prompt_params: {
language: 'English',
doc_type: documentType,
num_solutions: 3
}
})
});
const data = await response.json();
setResult(data);
} catch (error) {
console.error('Generation failed:', error);
} finally {
setLoading(false);
}
};
React Integration (Async API with Progress)
import { useState, useEffect } from 'react';
function DocumentGenerator({ userId, seedImages }) {
const [requestId, setRequestId] = useState(null);
const [status, setStatus] = useState(null);
const [progress, setProgress] = useState(0);
// Submit job
const handleGenerate = async () => {
const response = await fetch('http://localhost:8000/generate/async', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
user_id: userId,
seed_images: seedImages,
prompt_params: {
language: 'English',
doc_type: 'receipts',
num_solutions: 3,
enable_handwriting: true,
output_detail: 'dataset'
}
})
});
const job = await response.json();
setRequestId(job.request_id);
setStatus('queued');
};
// Poll job status
useEffect(() => {
if (!requestId || status === 'completed' || status === 'failed') return;
const interval = setInterval(async () => {
const response = await fetch(`http://localhost:8000/jobs/${requestId}/status`);
const jobStatus = await response.json();
setStatus(jobStatus.status);
// Update progress bar
const progressMap = {
'queued': 10,
'processing': 30,
'generating': 60,
'completed': 100,
'failed': 0
};
setProgress(progressMap[jobStatus.status] || 0);
if (jobStatus.status === 'completed') {
// Open Google Drive download link
window.open(jobStatus.download_url, '_blank');
}
}, 30000); // Poll every 30 seconds
return () => clearInterval(interval);
}, [requestId, status]);
return (
<div>
<button onClick={handleGenerate} disabled={status && status !== 'completed'}>
Generate Documents
</button>
{status && (
<div className="progress-container">
<div className="progress-bar" style={{ width: `${progress}%` }} />
<p>Status: {status}</p>
{status === 'completed' && (
<a href={`http://localhost:8000/jobs/${requestId}/status`}>
Download Results
</a>
)}
</div>
)}
</div>
);
}
Background Processing Setup
The async endpoints (/generate/async) require a background worker system for job processing.
Prerequisites
- Redis - Job queue storage
- Supabase - Database for job tracking and user data
- Google Drive OAuth - For uploading results to user's Drive
Installing Redis
Ubuntu/Debian:
sudo apt-get update
sudo apt-get install redis-server
sudo systemctl start redis
sudo systemctl enable redis
macOS:
brew install redis
brew services start redis
Docker:
docker run -d -p 6379:6379 --name redis redis:7-alpine
Verify Redis is running:
redis-cli ping
# Should return: PONG
Configuring Supabase
Create a Supabase project at supabase.com
Create the required tables in your Supabase SQL Editor:
-- Document generation requests
CREATE TABLE document_requests (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
user_id INTEGER NOT NULL,
status TEXT NOT NULL CHECK (status IN ('queued', 'processing', 'generating', 'completed', 'failed')),
request_metadata JSONB NOT NULL,
error_message TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Generated documents
CREATE TABLE generated_documents (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
request_id UUID NOT NULL REFERENCES document_requests(id),
document_id TEXT NOT NULL,
file_url TEXT,
zip_url TEXT,
file_size_mb DECIMAL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- User integrations (Google Drive OAuth)
CREATE TABLE user_integrations (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
user_id INTEGER NOT NULL,
integration_type TEXT NOT NULL CHECK (integration_type IN ('google_drive', 'dropbox')),
access_token TEXT NOT NULL,
refresh_token TEXT,
token_expiry TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(user_id, integration_type)
);
-- Analytics events
CREATE TABLE analytics_events (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
user_id INTEGER,
event_type TEXT NOT NULL,
entity_id UUID,
event_data JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Indexes for performance
CREATE INDEX idx_document_requests_user_id ON document_requests(user_id);
CREATE INDEX idx_document_requests_status ON document_requests(status);
CREATE INDEX idx_generated_documents_request_id ON generated_documents(request_id);
CREATE INDEX idx_user_integrations_user_id ON user_integrations(user_id);
CREATE INDEX idx_analytics_events_user_id ON analytics_events(user_id);
- Add your Supabase credentials to
.env:
# In api/.env
SUPABASE_URL=https://your-project-ref.supabase.co
SUPABASE_KEY=your-anon-or-service-role-key
Configuring Google Drive OAuth
Users need to connect their Google Drive account for result storage:
Create a Google Cloud Project at console.cloud.google.com
Enable Google Drive API
Create OAuth 2.0 credentials (Web application)
Add authorized redirect URIs (e.g.,
http://localhost:3000/auth/google/callback)Download credentials JSON
Users authenticate via OAuth flow (implement in your frontend):
# Example OAuth flow (implement in your auth system)
from google_auth_oauthlib.flow import Flow
flow = Flow.from_client_config(
client_config={
"web": {
"client_id": "YOUR_CLIENT_ID",
"client_secret": "YOUR_CLIENT_SECRET",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"redirect_uris": ["http://localhost:3000/auth/google/callback"]
}
},
scopes=["https://www.googleapis.com/auth/drive.file"]
)
# User visits auth URL, gets redirected back with code
authorization_url, state = flow.authorization_url(access_type='offline', include_granted_scopes='true')
# Exchange code for tokens
flow.fetch_token(code=authorization_code)
credentials = flow.credentials
# Store in Supabase user_integrations table
supabase.table('user_integrations').insert({
'user_id': user_id,
'integration_type': 'google_drive',
'access_token': credentials.token,
'refresh_token': credentials.refresh_token,
'token_expiry': credentials.expiry
}).execute()
Starting the Background Worker
- Configure environment variables in
api/.env:
# Redis Configuration
REDIS_URL=redis://localhost:6379/0
RQ_QUEUE_NAME=docgenie
# Batch Processing
BATCH_POLL_INTERVAL=30 # seconds
BATCH_DATA_DIR=/tmp/docgenie_batches
MESSAGE_DATA_DIR=/tmp/docgenie_messages
# Google Drive
GOOGLE_DRIVE_FOLDER_NAME=DocGenie Documents
# Supabase (already configured above)
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your_key_here
# Claude API
ANTHROPIC_API_KEY=your_api_key_here
- Start the worker:
cd api/
./start_worker.sh
The worker will:
- β Check Redis connection
- β Validate Supabase configuration
- β Verify Claude API key
- β Create temporary directories
- β Start RQ worker listening on
docgeniequeue
Output:
π Starting DocGenie RQ Worker...
β Loading .env file...
β Redis connected
β Supabase configured
β Claude API key configured
β Temporary directories created
============================================
Worker Configuration:
Queue: docgenie
Redis: redis://localhost:6379/0
Batch Data: /tmp/docgenie_batches
Message Data: /tmp/docgenie_messages
============================================
β
Starting RQ worker (press Ctrl+C to stop)...
12:00:00 RQ worker 'worker-abc123' started on docgenie queue
Running Multiple Workers (Production)
For production systems with high load, run multiple workers:
# Terminal 1
./start_worker.sh
# Terminal 2
./start_worker.sh
# Terminal 3
./start_worker.sh
Each worker processes jobs independently from the same queue.
For detailed scaling instructions, see SCALING.md.
Monitoring Workers
# View worker status
rq info --url redis://localhost:6379/0
# View queue status
rq info --queue docgenie --url redis://localhost:6379/0
# View failed jobs
rq info --queue failed --url redis://localhost:6379/0
Architecture Overview
βββββββββββββββ βββββββββββββββ βββββββββββββββββββ
β FastAPI βββββββββΆβ Redis ββββββββββ RQ Workers β
β Server β β Queue β β (1-5 instances)β
β β β β β β
β /generate/ β β Job Queue: β β β’ Downloads β
β async β β - queued β β β’ Claude Batch β
β β β - pending β β β’ PDF render β
β /jobs/ β β - active β β β’ Handwriting β
β {id}/ β β β β β’ OCR β
β status β β β β β’ ZIP creation β
ββββββββ¬βββββββ βββββββββββββββ ββββββββββ¬βββββββββ
β β
β β
βΌ βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Supabase β
β β’ document_requests (job tracking) β
β β’ generated_documents (results metadata) β
β β’ user_integrations (Google Drive OAuth) β
β β’ analytics_events (usage tracking) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β Upload Results
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Google Drive β
β β’ User's "DocGenie Documents" folder β
β β’ ZIP files with generated documents β
β β’ Shareable links returned to API β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Cost Comparison: Direct vs Batched API
| API Type | Cost (Input) | Cost (Output) | Latency | Use Case |
|---|---|---|---|---|
| Direct | $5.00/1M tokens | $15.00/1M tokens | 30-120s | Real-time, interactive |
| Batched | $2.50/1M tokens | $7.50/1M tokens | 5-30 min | Background jobs (recommended) |
Example Cost Calculation:
- Generate 100 documents per day
- Each request: 5,000 input tokens, 10,000 output tokens
Direct API Cost:
- Input: (100 Γ 5,000 / 1M) Γ $5.00 = $2.50/day
- Output: (100 Γ 10,000 / 1M) Γ $15.00 = $15.00/day
- Total: $17.50/day = $525/month
Batched API Cost:
- Input: (100 Γ 5,000 / 1M) Γ $2.50 = $1.25/day
- Output: (100 Γ 10,000 / 1M) Γ $7.50 = $7.50/day
- Total: $8.75/day = $262.50/month
π° Savings: $262.50/month (50% reduction)
Scaling Workers
The API uses Redis Queue (RQ) workers for background job processing. Scale workers based on load:
| User Load | Workers | Redis RAM | Notes |
|---|---|---|---|
| < 10 req/hr | 1 | 256 MB | Development |
| 10β50 req/hr | 2β3 | 512 MB | Small production |
| 50β200 req/hr | 3β5 | 1 GB | Medium production |
| > 200 req/hr | 5+ | 2+ GB | Large production |
Starting Workers
# Single worker (development)
./start_worker.sh
# Multiple workers (production) β run in separate terminals
./start_worker.sh # Terminal 1
./start_worker.sh # Terminal 2
# Docker Compose β scale to 3 workers
docker-compose up --scale worker=3
# Monitor
rq info --url redis://localhost:6379/0
rq info --queue docgenie --url redis://localhost:6379/0
Railway Multi-Worker (Separate Service)
- Railway dashboard β New Service β GitHub Repo (same repo)
- Name:
docgenie-worker - Custom Start Command:
rq worker --url $REDIS_URL - Add the same environment variables as the API service
For most use cases the combined mode (API + worker in one service, see
railway.json) is sufficient and cheaper.
Contributing
This API is a simplified interface to the DocGenie pipeline. For the full pipeline with all features, see the main DocGenie documentation.
License
Same as DocGenie main project.