Spaces:

Text-to-Document-Generation
/

Docgenie-API

Running

App Files Files Community

Docgenie-API / api /README.md

Ahadhassan-2003

deploy: update HF Space

336d94a 12 minutes ago

preview code

raw

history blame contribute delete

36.4 kB

DocGenie API

FastAPI-based REST API for generating synthetic documents using LLMs. This API is optimized for ML dataset creation with comprehensive handwriting and visual element support.

Features

🚀 Simple REST API - Easy to integrate with any frontend
🖼️ URL-based seed images - Provide seed images via URLs
🎨 Customizable prompts - Control document type, language, and ground truth format
✍️ Handwriting Generation - WordStylist diffusion model with 339 author styles
🎯 Visual Elements - Stamps, logos, barcodes, photos, figures
📊 ML-Ready Datasets - Individual token images with complete metadata
📄 Complete output - Returns PDF, HTML, CSS, and bounding boxes
⚡ Async processing - Fast and efficient document generation

ML Dataset Creation

The API is fully equipped for ML training dataset creation with output_detail: "dataset" mode:

✅ Handwriting Data

Individual token images: Each handwriting field saved as separate PNG (hw0.png, hw1.png, ...)
Author style IDs: 339 unique writer styles (0-338) for style-consistent generation
Text content: Original text for each handwriting field
Position data: Precise bounding boxes (x, y, width, height) in mm
Signature detection: Boolean flag for signature vs regular handwriting
Image dimensions: Width and height for each generated token

✅ Visual Element Data

Stamps: Generated with realistic textures, borders, and rotations
- Text content preserved
- Red/green color variants
- Circle/rectangle shapes
Logos: Random selection from 6+ logo prefabs
Barcodes: Code128 format with customizable content
Photos: Random selection from 5+ photo prefabs
Figures/Charts: Random selection from 6+ chart/diagram prefabs
Individual images: Each element saved as separate PNG with transparency

✅ Dataset Metadata

Token mapping JSON: Complete mapping with:
- Token IDs and references
- Style IDs for handwriting
- Element types for visual elements
- Position rectangles
- Image filenames
- Content text
Ground truth annotations: QA pairs, classification labels, NER tags
Bounding boxes: Word, segment, and layout-level bboxes
Normalized coordinates: [0,1] scaled for ML frameworks
Msgpack export: Compatible with datadings library

✅ Additional ML Features

OCR results: Word-level bboxes and text for Document AI training
Layout elements: Document structure annotations
Page dimensions: Physical measurements (mm) and pixel dimensions
Reproducibility: Seed-based generation for consistent results

Pipeline Overview

The API implements a simplified version of the DocGenie generation pipeline:

Download seed images from URLs
Convert to base64 for LLM input
Build custom prompt with user parameters
Call Claude API to generate HTML documents
Extract HTML/CSS and ground truth from response
Render to PDF using Playwright
Extract bounding boxes from PDF
Return results as JSON with base64-encoded PDF

Installation

Prerequisites

Python 3.10+
DocGenie main package installed
Playwright browsers installed

Setup

Install dependencies (all API dependencies are included in the main project):

# Using uv (recommended)
uv sync

# Or using pip
pip install -e .

# Or install API-specific dependencies
cd api/
pip install -r requirements.txt

Note: For async endpoint support, ensure you have:

redis>=5.0.0 and rq>=1.15.0 (job queue)
supabase>=2.0.0 (database)
google-api-python-client>=2.100.0 (Google Drive integration)

Install Playwright browsers:

playwright install chromium

Install Tesseract OCR (for local OCR support):

# Ubuntu/Debian
sudo apt-get update && sudo apt-get install tesseract-ocr

# macOS
brew install tesseract

# Windows
# Download installer from: https://github.com/UB-Mannheim/tesseract/wiki

Set your Anthropic API key:

export ANTHROPIC_API_KEY="your-api-key-here"

Configure OCR in .env:

cp .env.example .env
# Edit .env and set:
OCR_SERVICE_ENABLED=true
OCR_USE_LOCAL=true  # Use local Tesseract (recommended)

Running the API

Development Mode

cd api
python main.py

The API will be available at http://localhost:8000

Production Mode

cd api
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

API Endpoints

Health Check

GET /health

Response:

{
  "status": "healthy",
  "version": "1.0.0"
}

Generate Documents

POST /generate

Request Body:

{
  "seed_images": [
    "https://example.com/seed1.jpg",
    "https://example.com/seed2.jpg"
  ],
  "prompt_params": {
    "language": "English",
    "doc_type": "business and administrative",
    "gt_type": "Multiple questions about each document, with their answers taken **verbatim** from the document.",
    "gt_format": "{\"<Text of question 1>\": \"<Answer to question 1>\", \"<Text of question 2>\": \"<Answer to question 2>\", ...}",
    "num_solutions": 3
  },
  "model": "claude-sonnet-4-5-20250929",
  "api_key": "optional-api-key"
}

Response:

{
  "success": true,
  "message": "Successfully generated 3 documents",
  "total_documents": 3,
  "documents": [
    {
      "document_id": "uuid-123_0",
      "html": "<!DOCTYPE html>...",
      "css": "body { ... }",
      "ground_truth": {
        "What is the invoice number?": "INV-12345",
        "What is the total amount?": "$1,234.56"
      },
      "pdf_base64": "JVBERi0xLjQK...",
      "bboxes": [
        {
          "text": "Invoice",
          "x": 0.1,
          "y": 0.05,
          "width": 0.2,
          "height": 0.03,
          "page": 0
        }
      ],
      "page_width_mm": 210.0,
      "page_height_mm": 297.0
    }
  ]
}

Generate Documents (Async) - Recommended for Production

POST /generate/async

🎯 Cost Optimization: This endpoint uses Claude's Batch API for 50% cost savings ($2.50 vs $5.00 per 1M input tokens).

⏱️ Latency: 5-30 minutes (vs 30-120 seconds for direct API)

✅ Best For: Multi-user production systems with non-realtime requirements

Request Body:

{
  "user_id": 123,
  "seed_images": [
    "https://example.com/seed1.jpg",
    "https://example.com/seed2.jpg"
  ],
  "prompt_params": {
    "language": "English",
    "doc_type": "business and administrative",
    "num_solutions": 3,
    "enable_handwriting": true,
    "enable_visual_elements": true,
    "enable_ocr": true,
    "output_detail": "dataset"
  }
}

Response:

{
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "queued",
  "estimated_time_minutes": 10,
  "poll_url": "/jobs/550e8400-e29b-41d4-a716-446655440000/status",
  "created_at": "2025-01-15T12:00:00Z"
}

Workflow:

Submit generation request → Get request_id
Poll status endpoint every 30-60 seconds
When status: "completed", download from Google Drive
Results uploaded to user's Google Drive with shareable link

Check Job Status

GET /jobs/{request_id}/status

Response (Queued):

{
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "queued",
  "created_at": "2025-01-15T12:00:00Z",
  "updated_at": "2025-01-15T12:00:00Z"
}

Response (Processing):

{
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "processing",
  "created_at": "2025-01-15T12:00:00Z",
  "updated_at": "2025-01-15T12:05:00Z",
  "progress": "Creating batch request..."
}

Response (Completed):

{
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "created_at": "2025-01-15T12:00:00Z",
  "updated_at": "2025-01-15T12:15:00Z",
  "download_url": "https://drive.google.com/file/d/abc123xyz/view?usp=sharing",
  "file_size_mb": 15.4,
  "document_count": 3
}

Response (Failed):

{
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "failed",
  "created_at": "2025-01-15T12:00:00Z",
  "updated_at": "2025-01-15T12:08:00Z",
  "error_message": "Batch processing timeout"
}

Status Values:

queued: Job submitted, waiting for worker
processing: Worker picked up job, creating batch
generating: Batch submitted to Claude, waiting for completion
completed: Documents generated and uploaded to Google Drive
failed: Error occurred (see error_message)

List User Jobs

GET /jobs/user/{user_id}?limit=50&offset=0

Response:

{
  "user_id": 123,
  "jobs": [
    {
      "request_id": "550e8400-e29b-41d4-a716-446655440000",
      "status": "completed",
      "created_at": "2025-01-15T12:00:00Z",
      "download_url": "https://drive.google.com/...",
      "document_count": 3
    },
    {
      "request_id": "660e8400-e29b-41d4-a716-446655440111",
      "status": "processing",
      "created_at": "2025-01-15T12:30:00Z"
    }
  ],
  "count": 2,
  "limit": 50,
  "offset": 0
}

Usage Examples

cURL

curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{
    "seed_images": [
      "https://example.com/receipt1.jpg",
      "https://example.com/receipt2.jpg"
    ],
    "prompt_params": {
      "language": "English",
      "doc_type": "receipts",
      "num_solutions": 2
    }
  }'

Python (Direct API)

import requests
import base64

response = requests.post(
    "http://localhost:8000/generate",
    json={
        "seed_images": [
            "https://example.com/seed1.jpg",
            "https://example.com/seed2.jpg"
        ],
        "prompt_params": {
            "language": "English",
            "doc_type": "business forms",
            "num_solutions": 3
        }
    }
)

result = response.json()

# Save first PDF
if result["success"]:
    pdf_data = base64.b64decode(result["documents"][0]["pdf_base64"])
    with open("generated_doc.pdf", "wb") as f:
        f.write(pdf_data)

Python (Async API with Polling) - Recommended

import requests
import time

# Step 1: Submit job
response = requests.post(
    "http://localhost:8000/generate/async",
    json={
        "user_id": 123,
        "seed_images": [
            "https://example.com/seed1.jpg",
            "https://example.com/seed2.jpg"
        ],
        "prompt_params": {
            "language": "English",
            "doc_type": "receipts and invoices",
            "num_solutions": 5,
            "enable_handwriting": True,
            "enable_visual_elements": True,
            "enable_ocr": True,
            "output_detail": "dataset"
        }
    }
)

job = response.json()
request_id = job["request_id"]
print(f"✓ Job submitted: {request_id}")
print(f"  Estimated time: {job['estimated_time_minutes']} minutes")

# Step 2: Poll status until complete
while True:
    status_response = requests.get(
        f"http://localhost:8000/jobs/{request_id}/status"
    )
    status = status_response.json()
    
    print(f"  Status: {status['status']}", end="")
    if status.get("progress"):
        print(f" - {status['progress']}")
    else:
        print()
    
    if status["status"] == "completed":
        print(f"✓ Generation complete!")
        print(f"  Download: {status['download_url']}")
        print(f"  Size: {status.get('file_size_mb', 0):.1f} MB")
        print(f"  Documents: {status.get('document_count', 0)}")
        break
    elif status["status"] == "failed":
        print(f"✗ Generation failed: {status.get('error_message')}")
        break
    
    # Wait 30 seconds before next poll
    time.sleep(30)

# Step 3: Download from Google Drive (if completed)
if status["status"] == "completed":
    # User can download from their Google Drive using the shareable link
    print(f"\nDownload your documents at:\n{status['download_url']}")

JavaScript

const response = await fetch('http://localhost:8000/generate', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    seed_images: [
      'https://example.com/seed1.jpg',
      'https://example.com/seed2.jpg'
    ],
    prompt_params: {
      language: 'English',
      doc_type: 'invoices',
      num_solutions: 2
    }
  })
});

const result = await response.json();

// Convert base64 PDF to blob
const pdfBlob = await fetch(`data:application/pdf;base64,${result.documents[0].pdf_base64}`)
  .then(res => res.blob());

Configuration

Prompt Parameters

language: Language for generated documents (default: "English")
doc_type: Type of documents to generate (e.g., "business and administrative", "receipts", "forms")
gt_type: Description of ground truth type to generate
gt_format: Format specification for ground truth JSON
num_solutions: Number of document variations (1-5)

Stage 3-5 Advanced Features

The API supports advanced document synthesis and dataset packaging:

Stage 3: Handwriting & Visual Elements

enable_handwriting: Add handwritten text using diffusion model (default: false)
handwriting_ratio: Percentage of text to convert to handwriting 0-1 (default: 0.5)
enable_visual_elements: Add stamps, barcodes, logos (default: false)
visual_element_types: Types of elements to add: ["stamp", "logo", "figure", "barcode", "photo"] (default: all types)

Stage 4: OCR

enable_ocr: Perform OCR on generated document (default: false)
ocr_language: OCR language code (default: "en")

Stage 5: Dataset Packaging

enable_bbox_normalization: Normalize bboxes to [0,1] scale (default: false)
enable_gt_verification: Verify ground truth quality (default: false)
enable_analysis: Generate dataset statistics (default: false)
enable_debug_visualization: Create bbox overlay images (default: false)

Dataset Export (Msgpack Format)

enable_dataset_export: Export as msgpack dataset format (default: false)
dataset_export_format: Export format - only "msgpack" is supported (default: "msgpack")

Note: Only msgpack format is implemented in the current pipeline. COCO and HuggingFace export formats mentioned in some documentation are not yet available.

Output Detail Level

output_detail: Controls how much data is returned/saved (default: "minimal")
- "minimal" (default): Final outputs only (PDFs, images, metadata) - 2-5 MB per document
- "dataset": Includes individual token images for ML training - 10-20 MB per document
  - Individual handwriting token images (handwriting_tokens/hw0.png, ...)
  - Individual visual element images (visual_elements/logo_0.png, ...)
  - Token mapping JSON with style IDs and positions
- "complete": All intermediate files and debug info - 20-50 MB per document
  - Everything from dataset mode
  - Intermediate PDFs from each processing stage
  - Generation logs
  - ⚠️ Warning: Can result in 50+ MB JSON responses for /generate endpoint

Recommendation: Use "minimal" for production, "dataset" for ML research, "complete" for debugging (only with /generate/pdf).

Example with dataset output detail:

import requests
import base64
import json

# Generate ML training dataset
response = requests.post(
    "http://localhost:8000/generate",
    json={
        "seed_images": ["https://example.com/seed.jpg"],
        "prompt_params": {
            "language": "English",
            "doc_type": "receipts and invoices",
            "num_solutions": 5,
            
            # Enable handwriting and visual elements
            "enable_handwriting": True,
            "handwriting_ratio": 0.4,
            "enable_visual_elements": True,
            "visual_element_types": ["stamp", "logo", "figure", "barcode", "photo"],  # All types by default
            
            # Enable dataset features
            "enable_ocr": True,
            "enable_bbox_normalization": True,
            "enable_dataset_export": True,
            
            # IMPORTANT: Set output_detail to "dataset" for ML training
            "output_detail": "dataset",
            
            # Use seed for reproducibility
            "seed": 42
        }
    }
)

result = response.json()

# Process each generated document
for doc in result["documents"]:
    doc_id = doc["document_id"]
    print(f"\\nProcessing {doc_id}:")
    
    # 1. Save individual handwriting token images
    if doc.get("handwriting_token_images"):
        print(f"  - Handwriting tokens: {len(doc['handwriting_token_images'])}")
        for hw_id, img_b64 in doc["handwriting_token_images"].items():
            with open(f"dataset/{doc_id}/{hw_id}.png", "wb") as f:
                f.write(base64.b64decode(img_b64))
    
    # 2. Save individual visual element images
    if doc.get("visual_element_images"):
        print(f"  - Visual elements: {len(doc['visual_element_images'])}")
        for ve_id, img_b64 in doc["visual_element_images"].items():
            with open(f"dataset/{doc_id}/{ve_id}.png", "wb") as f:
                f.write(base64.b64decode(img_b64))
    
    # 3. Save token mapping for ML training
    if doc.get("token_mapping"):
        mapping = doc["token_mapping"]
        print(f"  - Mapping: {mapping['handwriting']['total_count']} HW + {mapping['visual_elements']['total_count']} VE")
        with open(f"dataset/{doc_id}/token_mapping.json", "w") as f:
            json.dump(mapping, f, indent=2)
    
    # 4. Save ground truth annotations
    if doc.get("ground_truth"):
        with open(f"dataset/{doc_id}/ground_truth.json", "w") as f:
            json.dump(doc["ground_truth"], f, indent=2)
    
    # 5. Save bounding boxes (normalized coordinates)
    if doc.get("normalized_bboxes_word"):
        with open(f"dataset/{doc_id}/bboxes_normalized.json", "w") as f:
            json.dump(doc["normalized_bboxes_word"], f, indent=2)
    
    # 6. Save final document image
    if doc.get("image_base64"):
        with open(f"dataset/{doc_id}/final_image.png", "wb") as f:
            f.write(base64.b64decode(doc["image_base64"]))
    
    # 7. Save msgpack dataset file
    if doc.get("dataset_export") and doc["dataset_export"].get("msgpack_base64"):
        with open(f"dataset/{doc_id}/dataset.msgpack", "wb") as f:
            f.write(base64.b64decode(doc["dataset_export"]["msgpack_base64"]))

print(f"\\n✅ Generated {len(result['documents'])} ML-ready documents")

PDF Generation Endpoint (Recommended for Large Datasets)

For bulk generation with comprehensive file outputs, use /generate/pdf:

curl -X POST http://localhost:8000/generate/pdf \
  -H "Content-Type: application/json" \
  -d '{
    "seed_images": ["https://example.com/seed1.jpg"],
    "prompt_params": {
      "num_solutions": 3,
      "enable_handwriting": true,
      "enable_ocr": true,
      "enable_bbox_normalization": true,
      "enable_dataset_export": true,
      "output_detail": "dataset"
    }
  }' \
  --output documents.zip

ZIP File Contents

Based on output_detail level:

Minimal (default):

document_<id>.pdf - Generated PDF files
document_<id>/ - Per-document directories with:
- document.html, document.css - Source files
- ground_truth.json, bboxes.json - Annotations
- final_image.png - Final rendered image (if Stage 3 enabled)
- handwriting_regions.json, visual_elements.json - Stage 3 metadata (if enabled)
- ocr_results.json - OCR word-level data (if OCR enabled)
README.md - Package documentation
metadata.json - Combined metadata

Dataset (for ML training):

All files from "minimal" level, plus:
- handwriting_tokens/ - Individual token images (hw0.png, hw1.png, ...)
- visual_elements/ - Individual element images (logo_0.png, stamp_1.png, ...)
- token_mapping.json - Complete mapping with style IDs and positions
- dataset.msgpack - Msgpack dataset file (if export enabled)
- normalized_bboxes_word.json - Normalized coordinates (if Stage 5 enabled)

Complete (for debugging):

All files from "dataset" level, plus:
- Intermediate PDFs from each processing stage
- Generation logs with timing information
- debug_visualization.png - Bbox overlay images

Supported Models

claude-sonnet-4-5-20250929 (default, recommended)
claude-3-5-sonnet-20241022

Environment Variables

ANTHROPIC_API_KEY: Your Anthropic API key (required if not provided in request)

API Documentation

Interactive API documentation is available when the server is running:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Error Handling

The API returns appropriate HTTP status codes:

200 OK: Successful generation
400 Bad Request: Invalid input (e.g., invalid image URLs)
401 Unauthorized: Missing or invalid API key
500 Internal Server Error: Processing error

Error response format:

{
  "detail": "Error message describing what went wrong"
}

Performance Considerations

Concurrent requests: The API can handle multiple requests concurrently
Image size: Larger seed images take longer to process
Number of solutions: More solutions = longer processing time
Model selection: Sonnet is slower but higher quality than Haiku

Limitations

Maximum 10 seed images per request
Maximum 5 document variations (num_solutions)
Single-page documents only
Timeout: 60 seconds per PDF render

Troubleshooting

Playwright browser not found

playwright install chromium

API key not working

Make sure your API key is set correctly:

echo $ANTHROPIC_API_KEY

PDF rendering fails

Ensure Chromium is installed and accessible:

playwright show-trace

Integration with Frontend

Example React integration:

const [loading, setLoading] = useState(false);
const [result, setResult] = useState(null);

const generateDocuments = async () => {
  setLoading(true);
  
  try {
    const response = await fetch('http://localhost:8000/generate', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        seed_images: seedImageUrls,
        prompt_params: {
          language: 'English',
          doc_type: documentType,
          num_solutions: 3
        }
      })
    });
    
    const data = await response.json();
    setResult(data);
  } catch (error) {
    console.error('Generation failed:', error);
  } finally {
    setLoading(false);
  }
};

React Integration (Async API with Progress)

import { useState, useEffect } from 'react';

function DocumentGenerator({ userId, seedImages }) {
  const [requestId, setRequestId] = useState(null);
  const [status, setStatus] = useState(null);
  const [progress, setProgress] = useState(0);

  // Submit job
  const handleGenerate = async () => {
    const response = await fetch('http://localhost:8000/generate/async', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        user_id: userId,
        seed_images: seedImages,
        prompt_params: {
          language: 'English',
          doc_type: 'receipts',
          num_solutions: 3,
          enable_handwriting: true,
          output_detail: 'dataset'
        }
      })
    });
    
    const job = await response.json();
    setRequestId(job.request_id);
    setStatus('queued');
  };

  // Poll job status
  useEffect(() => {
    if (!requestId || status === 'completed' || status === 'failed') return;

    const interval = setInterval(async () => {
      const response = await fetch(`http://localhost:8000/jobs/${requestId}/status`);
      const jobStatus = await response.json();
      
      setStatus(jobStatus.status);
      
      // Update progress bar
      const progressMap = {
        'queued': 10,
        'processing': 30,
        'generating': 60,
        'completed': 100,
        'failed': 0
      };
      setProgress(progressMap[jobStatus.status] || 0);
      
      if (jobStatus.status === 'completed') {
        // Open Google Drive download link
        window.open(jobStatus.download_url, '_blank');
      }
    }, 30000); // Poll every 30 seconds

    return () => clearInterval(interval);
  }, [requestId, status]);

  return (
    <div>
      <button onClick={handleGenerate} disabled={status && status !== 'completed'}>
        Generate Documents
      </button>
      
      {status && (
        <div className="progress-container">
          <div className="progress-bar" style={{ width: `${progress}%` }} />
          <p>Status: {status}</p>
          {status === 'completed' && (
            <a href={`http://localhost:8000/jobs/${requestId}/status`}>
              Download Results
            </a>
          )}
        </div>
      )}
    </div>
  );
}

Background Processing Setup

The async endpoints (/generate/async) require a background worker system for job processing.

Prerequisites

Redis - Job queue storage
Supabase - Database for job tracking and user data
Google Drive OAuth - For uploading results to user's Drive

Installing Redis

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install redis-server
sudo systemctl start redis
sudo systemctl enable redis

macOS:

brew install redis
brew services start redis

Docker:

docker run -d -p 6379:6379 --name redis redis:7-alpine

Verify Redis is running:

redis-cli ping
# Should return: PONG

Configuring Supabase

Create a Supabase project at supabase.com
Create the required tables in your Supabase SQL Editor:

-- Document generation requests
CREATE TABLE document_requests (
  id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  user_id INTEGER NOT NULL,
  status TEXT NOT NULL CHECK (status IN ('queued', 'processing', 'generating', 'completed', 'failed')),
  request_metadata JSONB NOT NULL,
  error_message TEXT,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Generated documents
CREATE TABLE generated_documents (
  id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  request_id UUID NOT NULL REFERENCES document_requests(id),
  document_id TEXT NOT NULL,
  file_url TEXT,
  zip_url TEXT,
  file_size_mb DECIMAL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- User integrations (Google Drive OAuth)
CREATE TABLE user_integrations (
  id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  user_id INTEGER NOT NULL,
  integration_type TEXT NOT NULL CHECK (integration_type IN ('google_drive', 'dropbox')),
  access_token TEXT NOT NULL,
  refresh_token TEXT,
  token_expiry TIMESTAMPTZ,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  UNIQUE(user_id, integration_type)
);

-- Analytics events
CREATE TABLE analytics_events (
  id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  user_id INTEGER,
  event_type TEXT NOT NULL,
  entity_id UUID,
  event_data JSONB,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Indexes for performance
CREATE INDEX idx_document_requests_user_id ON document_requests(user_id);
CREATE INDEX idx_document_requests_status ON document_requests(status);
CREATE INDEX idx_generated_documents_request_id ON generated_documents(request_id);
CREATE INDEX idx_user_integrations_user_id ON user_integrations(user_id);
CREATE INDEX idx_analytics_events_user_id ON analytics_events(user_id);

Add your Supabase credentials to .env:

# In api/.env
SUPABASE_URL=https://your-project-ref.supabase.co
SUPABASE_KEY=your-anon-or-service-role-key

Configuring Google Drive OAuth

Users need to connect their Google Drive account for result storage:

Create a Google Cloud Project at console.cloud.google.com
Enable Google Drive API
Create OAuth 2.0 credentials (Web application)
Add authorized redirect URIs (e.g., http://localhost:3000/auth/google/callback)
Download credentials JSON
Users authenticate via OAuth flow (implement in your frontend):

# Example OAuth flow (implement in your auth system)
from google_auth_oauthlib.flow import Flow

flow = Flow.from_client_config(
    client_config={
        "web": {
            "client_id": "YOUR_CLIENT_ID",
            "client_secret": "YOUR_CLIENT_SECRET",
            "auth_uri": "https://accounts.google.com/o/oauth2/auth",
            "token_uri": "https://oauth2.googleapis.com/token",
            "redirect_uris": ["http://localhost:3000/auth/google/callback"]
        }
    },
    scopes=["https://www.googleapis.com/auth/drive.file"]
)

# User visits auth URL, gets redirected back with code
authorization_url, state = flow.authorization_url(access_type='offline', include_granted_scopes='true')

# Exchange code for tokens
flow.fetch_token(code=authorization_code)
credentials = flow.credentials

# Store in Supabase user_integrations table
supabase.table('user_integrations').insert({
    'user_id': user_id,
    'integration_type': 'google_drive',
    'access_token': credentials.token,
    'refresh_token': credentials.refresh_token,
    'token_expiry': credentials.expiry
}).execute()

Starting the Background Worker

Configure environment variables in api/.env:

# Redis Configuration
REDIS_URL=redis://localhost:6379/0
RQ_QUEUE_NAME=docgenie

# Batch Processing
BATCH_POLL_INTERVAL=30  # seconds
BATCH_DATA_DIR=/tmp/docgenie_batches
MESSAGE_DATA_DIR=/tmp/docgenie_messages

# Google Drive
GOOGLE_DRIVE_FOLDER_NAME=DocGenie Documents

# Supabase (already configured above)
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your_key_here

# Claude API
ANTHROPIC_API_KEY=your_api_key_here

Start the worker:

cd api/
./start_worker.sh

The worker will:

✓ Check Redis connection
✓ Validate Supabase configuration
✓ Verify Claude API key
✓ Create temporary directories
✓ Start RQ worker listening on docgenie queue

Output:

🚀 Starting DocGenie RQ Worker...
✓ Loading .env file...
✓ Redis connected
✓ Supabase configured
✓ Claude API key configured
✓ Temporary directories created

============================================
Worker Configuration:
  Queue: docgenie
  Redis: redis://localhost:6379/0
  Batch Data: /tmp/docgenie_batches
  Message Data: /tmp/docgenie_messages
============================================

✅ Starting RQ worker (press Ctrl+C to stop)...

12:00:00 RQ worker 'worker-abc123' started on docgenie queue

Running Multiple Workers (Production)

For production systems with high load, run multiple workers:

# Terminal 1
./start_worker.sh

# Terminal 2
./start_worker.sh

# Terminal 3
./start_worker.sh

Each worker processes jobs independently from the same queue.

For detailed scaling instructions, see SCALING.md.

Monitoring Workers

# View worker status
rq info --url redis://localhost:6379/0

# View queue status
rq info --queue docgenie --url redis://localhost:6379/0

# View failed jobs
rq info --queue failed --url redis://localhost:6379/0

Architecture Overview

┌─────────────┐        ┌─────────────┐        ┌─────────────────┐
│   FastAPI   │───────▶│    Redis    │◀───────│  RQ Workers     │
│   Server    │        │   Queue     │        │  (1-5 instances)│
│             │        │             │        │                 │
│ /generate/  │        │ Job Queue:  │        │ • Downloads     │
│  async      │        │ - queued    │        │ • Claude Batch  │
│             │        │ - pending   │        │ • PDF render    │
│ /jobs/      │        │ - active    │        │ • Handwriting   │
│  {id}/      │        │             │        │ • OCR           │
│  status     │        │             │        │ • ZIP creation  │
└──────┬──────┘        └─────────────┘        └────────┬────────┘
       │                                               │
       │                                               │
       ▼                                               ▼
┌──────────────────────────────────────────────────────────────┐
│                          Supabase                             │
│  • document_requests (job tracking)                           │
│  • generated_documents (results metadata)                     │
│  • user_integrations (Google Drive OAuth)                     │
│  • analytics_events (usage tracking)                          │
└───────────────────────────────────────────────────────────────┘
       │
       │ Upload Results
       ▼
┌──────────────────────────────────────────────────────────────┐
│                      Google Drive                             │
│  • User's "DocGenie Documents" folder                         │
│  • ZIP files with generated documents                         │
│  • Shareable links returned to API                            │
└──────────────────────────────────────────────────────────────┘

Cost Comparison: Direct vs Batched API

API Type	Cost (Input)	Cost (Output)	Latency	Use Case
Direct	$5.00/1M tokens	$15.00/1M tokens	30-120s	Real-time, interactive
Batched	$2.50/1M tokens	$7.50/1M tokens	5-30 min	Background jobs (recommended)

Example Cost Calculation:

Generate 100 documents per day
Each request: 5,000 input tokens, 10,000 output tokens

Direct API Cost:

Input: (100 × 5,000 / 1M) × $5.00 = $2.50/day
Output: (100 × 10,000 / 1M) × $15.00 = $15.00/day
Total: $17.50/day = $525/month

Batched API Cost:

Input: (100 × 5,000 / 1M) × $2.50 = $1.25/day
Output: (100 × 10,000 / 1M) × $7.50 = $7.50/day
Total: $8.75/day = $262.50/month

💰 Savings: $262.50/month (50% reduction)

Scaling Workers

The API uses Redis Queue (RQ) workers for background job processing. Scale workers based on load:

User Load	Workers	Redis RAM	Notes
< 10 req/hr	1	256 MB	Development
10–50 req/hr	2–3	512 MB	Small production
50–200 req/hr	3–5	1 GB	Medium production
> 200 req/hr	5+	2+ GB	Large production

Starting Workers

# Single worker (development)
./start_worker.sh

# Multiple workers (production) — run in separate terminals
./start_worker.sh   # Terminal 1
./start_worker.sh   # Terminal 2

# Docker Compose — scale to 3 workers
docker-compose up --scale worker=3

# Monitor
rq info --url redis://localhost:6379/0
rq info --queue docgenie --url redis://localhost:6379/0

Railway Multi-Worker (Separate Service)

Railway dashboard → New Service → GitHub Repo (same repo)
Name: docgenie-worker
Custom Start Command: rq worker --url $REDIS_URL
Add the same environment variables as the API service

For most use cases the combined mode (API + worker in one service, see railway.json) is sufficient and cheaper.

Contributing

This API is a simplified interface to the DocGenie pipeline. For the full pipeline with all features, see the main DocGenie documentation.

License

Same as DocGenie main project.