Docgenie-API / api /README.md
Ahadhassan-2003
deploy: update HF Space
336d94a

DocGenie API

FastAPI-based REST API for generating synthetic documents using LLMs. This API is optimized for ML dataset creation with comprehensive handwriting and visual element support.

Features

  • πŸš€ Simple REST API - Easy to integrate with any frontend
  • πŸ–ΌοΈ URL-based seed images - Provide seed images via URLs
  • 🎨 Customizable prompts - Control document type, language, and ground truth format
  • ✍️ Handwriting Generation - WordStylist diffusion model with 339 author styles
  • 🎯 Visual Elements - Stamps, logos, barcodes, photos, figures
  • πŸ“Š ML-Ready Datasets - Individual token images with complete metadata
  • πŸ“„ Complete output - Returns PDF, HTML, CSS, and bounding boxes
  • ⚑ Async processing - Fast and efficient document generation

ML Dataset Creation

The API is fully equipped for ML training dataset creation with output_detail: "dataset" mode:

βœ… Handwriting Data

  • Individual token images: Each handwriting field saved as separate PNG (hw0.png, hw1.png, ...)
  • Author style IDs: 339 unique writer styles (0-338) for style-consistent generation
  • Text content: Original text for each handwriting field
  • Position data: Precise bounding boxes (x, y, width, height) in mm
  • Signature detection: Boolean flag for signature vs regular handwriting
  • Image dimensions: Width and height for each generated token

βœ… Visual Element Data

  • Stamps: Generated with realistic textures, borders, and rotations
    • Text content preserved
    • Red/green color variants
    • Circle/rectangle shapes
  • Logos: Random selection from 6+ logo prefabs
  • Barcodes: Code128 format with customizable content
  • Photos: Random selection from 5+ photo prefabs
  • Figures/Charts: Random selection from 6+ chart/diagram prefabs
  • Individual images: Each element saved as separate PNG with transparency

βœ… Dataset Metadata

  • Token mapping JSON: Complete mapping with:
    • Token IDs and references
    • Style IDs for handwriting
    • Element types for visual elements
    • Position rectangles
    • Image filenames
    • Content text
  • Ground truth annotations: QA pairs, classification labels, NER tags
  • Bounding boxes: Word, segment, and layout-level bboxes
  • Normalized coordinates: [0,1] scaled for ML frameworks
  • Msgpack export: Compatible with datadings library

βœ… Additional ML Features

  • OCR results: Word-level bboxes and text for Document AI training
  • Layout elements: Document structure annotations
  • Page dimensions: Physical measurements (mm) and pixel dimensions
  • Reproducibility: Seed-based generation for consistent results

Pipeline Overview

The API implements a simplified version of the DocGenie generation pipeline:

  1. Download seed images from URLs
  2. Convert to base64 for LLM input
  3. Build custom prompt with user parameters
  4. Call Claude API to generate HTML documents
  5. Extract HTML/CSS and ground truth from response
  6. Render to PDF using Playwright
  7. Extract bounding boxes from PDF
  8. Return results as JSON with base64-encoded PDF

Installation

Prerequisites

  • Python 3.10+
  • DocGenie main package installed
  • Playwright browsers installed

Setup

  1. Install dependencies (all API dependencies are included in the main project):
# Using uv (recommended)
uv sync

# Or using pip
pip install -e .

# Or install API-specific dependencies
cd api/
pip install -r requirements.txt

Note: For async endpoint support, ensure you have:

  • redis>=5.0.0 and rq>=1.15.0 (job queue)
  • supabase>=2.0.0 (database)
  • google-api-python-client>=2.100.0 (Google Drive integration)
  1. Install Playwright browsers:
playwright install chromium
  1. Install Tesseract OCR (for local OCR support):
# Ubuntu/Debian
sudo apt-get update && sudo apt-get install tesseract-ocr

# macOS
brew install tesseract

# Windows
# Download installer from: https://github.com/UB-Mannheim/tesseract/wiki
  1. Set your Anthropic API key:
export ANTHROPIC_API_KEY="your-api-key-here"
  1. Configure OCR in .env:
cp .env.example .env
# Edit .env and set:
OCR_SERVICE_ENABLED=true
OCR_USE_LOCAL=true  # Use local Tesseract (recommended)

Running the API

Development Mode

cd api
python main.py

The API will be available at http://localhost:8000

Production Mode

cd api
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

API Endpoints

Health Check

GET /health

Response:

{
  "status": "healthy",
  "version": "1.0.0"
}

Generate Documents

POST /generate

Request Body:

{
  "seed_images": [
    "https://example.com/seed1.jpg",
    "https://example.com/seed2.jpg"
  ],
  "prompt_params": {
    "language": "English",
    "doc_type": "business and administrative",
    "gt_type": "Multiple questions about each document, with their answers taken **verbatim** from the document.",
    "gt_format": "{\"<Text of question 1>\": \"<Answer to question 1>\", \"<Text of question 2>\": \"<Answer to question 2>\", ...}",
    "num_solutions": 3
  },
  "model": "claude-sonnet-4-5-20250929",
  "api_key": "optional-api-key"
}

Response:

{
  "success": true,
  "message": "Successfully generated 3 documents",
  "total_documents": 3,
  "documents": [
    {
      "document_id": "uuid-123_0",
      "html": "<!DOCTYPE html>...",
      "css": "body { ... }",
      "ground_truth": {
        "What is the invoice number?": "INV-12345",
        "What is the total amount?": "$1,234.56"
      },
      "pdf_base64": "JVBERi0xLjQK...",
      "bboxes": [
        {
          "text": "Invoice",
          "x": 0.1,
          "y": 0.05,
          "width": 0.2,
          "height": 0.03,
          "page": 0
        }
      ],
      "page_width_mm": 210.0,
      "page_height_mm": 297.0
    }
  ]
}

Generate Documents (Async) - Recommended for Production

POST /generate/async

🎯 Cost Optimization: This endpoint uses Claude's Batch API for 50% cost savings ($2.50 vs $5.00 per 1M input tokens).

⏱️ Latency: 5-30 minutes (vs 30-120 seconds for direct API)

βœ… Best For: Multi-user production systems with non-realtime requirements

Request Body:

{
  "user_id": 123,
  "seed_images": [
    "https://example.com/seed1.jpg",
    "https://example.com/seed2.jpg"
  ],
  "prompt_params": {
    "language": "English",
    "doc_type": "business and administrative",
    "num_solutions": 3,
    "enable_handwriting": true,
    "enable_visual_elements": true,
    "enable_ocr": true,
    "output_detail": "dataset"
  }
}

Response:

{
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "queued",
  "estimated_time_minutes": 10,
  "poll_url": "/jobs/550e8400-e29b-41d4-a716-446655440000/status",
  "created_at": "2025-01-15T12:00:00Z"
}

Workflow:

  1. Submit generation request β†’ Get request_id
  2. Poll status endpoint every 30-60 seconds
  3. When status: "completed", download from Google Drive
  4. Results uploaded to user's Google Drive with shareable link

Check Job Status

GET /jobs/{request_id}/status

Response (Queued):

{
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "queued",
  "created_at": "2025-01-15T12:00:00Z",
  "updated_at": "2025-01-15T12:00:00Z"
}

Response (Processing):

{
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "processing",
  "created_at": "2025-01-15T12:00:00Z",
  "updated_at": "2025-01-15T12:05:00Z",
  "progress": "Creating batch request..."
}

Response (Completed):

{
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "created_at": "2025-01-15T12:00:00Z",
  "updated_at": "2025-01-15T12:15:00Z",
  "download_url": "https://drive.google.com/file/d/abc123xyz/view?usp=sharing",
  "file_size_mb": 15.4,
  "document_count": 3
}

Response (Failed):

{
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "failed",
  "created_at": "2025-01-15T12:00:00Z",
  "updated_at": "2025-01-15T12:08:00Z",
  "error_message": "Batch processing timeout"
}

Status Values:

  • queued: Job submitted, waiting for worker
  • processing: Worker picked up job, creating batch
  • generating: Batch submitted to Claude, waiting for completion
  • completed: Documents generated and uploaded to Google Drive
  • failed: Error occurred (see error_message)

List User Jobs

GET /jobs/user/{user_id}?limit=50&offset=0

Response:

{
  "user_id": 123,
  "jobs": [
    {
      "request_id": "550e8400-e29b-41d4-a716-446655440000",
      "status": "completed",
      "created_at": "2025-01-15T12:00:00Z",
      "download_url": "https://drive.google.com/...",
      "document_count": 3
    },
    {
      "request_id": "660e8400-e29b-41d4-a716-446655440111",
      "status": "processing",
      "created_at": "2025-01-15T12:30:00Z"
    }
  ],
  "count": 2,
  "limit": 50,
  "offset": 0
}

Usage Examples

cURL

curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{
    "seed_images": [
      "https://example.com/receipt1.jpg",
      "https://example.com/receipt2.jpg"
    ],
    "prompt_params": {
      "language": "English",
      "doc_type": "receipts",
      "num_solutions": 2
    }
  }'

Python (Direct API)

import requests
import base64

response = requests.post(
    "http://localhost:8000/generate",
    json={
        "seed_images": [
            "https://example.com/seed1.jpg",
            "https://example.com/seed2.jpg"
        ],
        "prompt_params": {
            "language": "English",
            "doc_type": "business forms",
            "num_solutions": 3
        }
    }
)

result = response.json()

# Save first PDF
if result["success"]:
    pdf_data = base64.b64decode(result["documents"][0]["pdf_base64"])
    with open("generated_doc.pdf", "wb") as f:
        f.write(pdf_data)

Python (Async API with Polling) - Recommended

import requests
import time

# Step 1: Submit job
response = requests.post(
    "http://localhost:8000/generate/async",
    json={
        "user_id": 123,
        "seed_images": [
            "https://example.com/seed1.jpg",
            "https://example.com/seed2.jpg"
        ],
        "prompt_params": {
            "language": "English",
            "doc_type": "receipts and invoices",
            "num_solutions": 5,
            "enable_handwriting": True,
            "enable_visual_elements": True,
            "enable_ocr": True,
            "output_detail": "dataset"
        }
    }
)

job = response.json()
request_id = job["request_id"]
print(f"βœ“ Job submitted: {request_id}")
print(f"  Estimated time: {job['estimated_time_minutes']} minutes")

# Step 2: Poll status until complete
while True:
    status_response = requests.get(
        f"http://localhost:8000/jobs/{request_id}/status"
    )
    status = status_response.json()
    
    print(f"  Status: {status['status']}", end="")
    if status.get("progress"):
        print(f" - {status['progress']}")
    else:
        print()
    
    if status["status"] == "completed":
        print(f"βœ“ Generation complete!")
        print(f"  Download: {status['download_url']}")
        print(f"  Size: {status.get('file_size_mb', 0):.1f} MB")
        print(f"  Documents: {status.get('document_count', 0)}")
        break
    elif status["status"] == "failed":
        print(f"βœ— Generation failed: {status.get('error_message')}")
        break
    
    # Wait 30 seconds before next poll
    time.sleep(30)

# Step 3: Download from Google Drive (if completed)
if status["status"] == "completed":
    # User can download from their Google Drive using the shareable link
    print(f"\nDownload your documents at:\n{status['download_url']}")

JavaScript

const response = await fetch('http://localhost:8000/generate', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    seed_images: [
      'https://example.com/seed1.jpg',
      'https://example.com/seed2.jpg'
    ],
    prompt_params: {
      language: 'English',
      doc_type: 'invoices',
      num_solutions: 2
    }
  })
});

const result = await response.json();

// Convert base64 PDF to blob
const pdfBlob = await fetch(`data:application/pdf;base64,${result.documents[0].pdf_base64}`)
  .then(res => res.blob());

Configuration

Prompt Parameters

  • language: Language for generated documents (default: "English")
  • doc_type: Type of documents to generate (e.g., "business and administrative", "receipts", "forms")
  • gt_type: Description of ground truth type to generate
  • gt_format: Format specification for ground truth JSON
  • num_solutions: Number of document variations (1-5)

Stage 3-5 Advanced Features

The API supports advanced document synthesis and dataset packaging:

Stage 3: Handwriting & Visual Elements

  • enable_handwriting: Add handwritten text using diffusion model (default: false)
  • handwriting_ratio: Percentage of text to convert to handwriting 0-1 (default: 0.5)
  • enable_visual_elements: Add stamps, barcodes, logos (default: false)
  • visual_element_types: Types of elements to add: ["stamp", "logo", "figure", "barcode", "photo"] (default: all types)

Stage 4: OCR

  • enable_ocr: Perform OCR on generated document (default: false)
  • ocr_language: OCR language code (default: "en")

Stage 5: Dataset Packaging

  • enable_bbox_normalization: Normalize bboxes to [0,1] scale (default: false)
  • enable_gt_verification: Verify ground truth quality (default: false)
  • enable_analysis: Generate dataset statistics (default: false)
  • enable_debug_visualization: Create bbox overlay images (default: false)

Dataset Export (Msgpack Format)

  • enable_dataset_export: Export as msgpack dataset format (default: false)
  • dataset_export_format: Export format - only "msgpack" is supported (default: "msgpack")

Note: Only msgpack format is implemented in the current pipeline. COCO and HuggingFace export formats mentioned in some documentation are not yet available.

Output Detail Level

  • output_detail: Controls how much data is returned/saved (default: "minimal")
    • "minimal" (default): Final outputs only (PDFs, images, metadata) - 2-5 MB per document
    • "dataset": Includes individual token images for ML training - 10-20 MB per document
      • Individual handwriting token images (handwriting_tokens/hw0.png, ...)
      • Individual visual element images (visual_elements/logo_0.png, ...)
      • Token mapping JSON with style IDs and positions
    • "complete": All intermediate files and debug info - 20-50 MB per document
      • Everything from dataset mode
      • Intermediate PDFs from each processing stage
      • Generation logs
      • ⚠️ Warning: Can result in 50+ MB JSON responses for /generate endpoint

Recommendation: Use "minimal" for production, "dataset" for ML research, "complete" for debugging (only with /generate/pdf).

Example with dataset output detail:

import requests
import base64
import json

# Generate ML training dataset
response = requests.post(
    "http://localhost:8000/generate",
    json={
        "seed_images": ["https://example.com/seed.jpg"],
        "prompt_params": {
            "language": "English",
            "doc_type": "receipts and invoices",
            "num_solutions": 5,
            
            # Enable handwriting and visual elements
            "enable_handwriting": True,
            "handwriting_ratio": 0.4,
            "enable_visual_elements": True,
            "visual_element_types": ["stamp", "logo", "figure", "barcode", "photo"],  # All types by default
            
            # Enable dataset features
            "enable_ocr": True,
            "enable_bbox_normalization": True,
            "enable_dataset_export": True,
            
            # IMPORTANT: Set output_detail to "dataset" for ML training
            "output_detail": "dataset",
            
            # Use seed for reproducibility
            "seed": 42
        }
    }
)

result = response.json()

# Process each generated document
for doc in result["documents"]:
    doc_id = doc["document_id"]
    print(f"\\nProcessing {doc_id}:")
    
    # 1. Save individual handwriting token images
    if doc.get("handwriting_token_images"):
        print(f"  - Handwriting tokens: {len(doc['handwriting_token_images'])}")
        for hw_id, img_b64 in doc["handwriting_token_images"].items():
            with open(f"dataset/{doc_id}/{hw_id}.png", "wb") as f:
                f.write(base64.b64decode(img_b64))
    
    # 2. Save individual visual element images
    if doc.get("visual_element_images"):
        print(f"  - Visual elements: {len(doc['visual_element_images'])}")
        for ve_id, img_b64 in doc["visual_element_images"].items():
            with open(f"dataset/{doc_id}/{ve_id}.png", "wb") as f:
                f.write(base64.b64decode(img_b64))
    
    # 3. Save token mapping for ML training
    if doc.get("token_mapping"):
        mapping = doc["token_mapping"]
        print(f"  - Mapping: {mapping['handwriting']['total_count']} HW + {mapping['visual_elements']['total_count']} VE")
        with open(f"dataset/{doc_id}/token_mapping.json", "w") as f:
            json.dump(mapping, f, indent=2)
    
    # 4. Save ground truth annotations
    if doc.get("ground_truth"):
        with open(f"dataset/{doc_id}/ground_truth.json", "w") as f:
            json.dump(doc["ground_truth"], f, indent=2)
    
    # 5. Save bounding boxes (normalized coordinates)
    if doc.get("normalized_bboxes_word"):
        with open(f"dataset/{doc_id}/bboxes_normalized.json", "w") as f:
            json.dump(doc["normalized_bboxes_word"], f, indent=2)
    
    # 6. Save final document image
    if doc.get("image_base64"):
        with open(f"dataset/{doc_id}/final_image.png", "wb") as f:
            f.write(base64.b64decode(doc["image_base64"]))
    
    # 7. Save msgpack dataset file
    if doc.get("dataset_export") and doc["dataset_export"].get("msgpack_base64"):
        with open(f"dataset/{doc_id}/dataset.msgpack", "wb") as f:
            f.write(base64.b64decode(doc["dataset_export"]["msgpack_base64"]))

print(f"\\nβœ… Generated {len(result['documents'])} ML-ready documents")

PDF Generation Endpoint (Recommended for Large Datasets)

For bulk generation with comprehensive file outputs, use /generate/pdf:

curl -X POST http://localhost:8000/generate/pdf \
  -H "Content-Type: application/json" \
  -d '{
    "seed_images": ["https://example.com/seed1.jpg"],
    "prompt_params": {
      "num_solutions": 3,
      "enable_handwriting": true,
      "enable_ocr": true,
      "enable_bbox_normalization": true,
      "enable_dataset_export": true,
      "output_detail": "dataset"
    }
  }' \
  --output documents.zip

ZIP File Contents

Based on output_detail level:

Minimal (default):

  • document_<id>.pdf - Generated PDF files
  • document_<id>/ - Per-document directories with:
    • document.html, document.css - Source files
    • ground_truth.json, bboxes.json - Annotations
    • final_image.png - Final rendered image (if Stage 3 enabled)
    • handwriting_regions.json, visual_elements.json - Stage 3 metadata (if enabled)
    • ocr_results.json - OCR word-level data (if OCR enabled)
  • README.md - Package documentation
  • metadata.json - Combined metadata

Dataset (for ML training):

  • All files from "minimal" level, plus:
    • handwriting_tokens/ - Individual token images (hw0.png, hw1.png, ...)
    • visual_elements/ - Individual element images (logo_0.png, stamp_1.png, ...)
    • token_mapping.json - Complete mapping with style IDs and positions
    • dataset.msgpack - Msgpack dataset file (if export enabled)
    • normalized_bboxes_word.json - Normalized coordinates (if Stage 5 enabled)

Complete (for debugging):

  • All files from "dataset" level, plus:
    • Intermediate PDFs from each processing stage
    • Generation logs with timing information
    • debug_visualization.png - Bbox overlay images

Supported Models

  • claude-sonnet-4-5-20250929 (default, recommended)
  • claude-3-5-sonnet-20241022

Environment Variables

  • ANTHROPIC_API_KEY: Your Anthropic API key (required if not provided in request)

API Documentation

Interactive API documentation is available when the server is running:

Error Handling

The API returns appropriate HTTP status codes:

  • 200 OK: Successful generation
  • 400 Bad Request: Invalid input (e.g., invalid image URLs)
  • 401 Unauthorized: Missing or invalid API key
  • 500 Internal Server Error: Processing error

Error response format:

{
  "detail": "Error message describing what went wrong"
}

Performance Considerations

  • Concurrent requests: The API can handle multiple requests concurrently
  • Image size: Larger seed images take longer to process
  • Number of solutions: More solutions = longer processing time
  • Model selection: Sonnet is slower but higher quality than Haiku

Limitations

  • Maximum 10 seed images per request
  • Maximum 5 document variations (num_solutions)
  • Single-page documents only
  • Timeout: 60 seconds per PDF render

Troubleshooting

Playwright browser not found

playwright install chromium

API key not working

Make sure your API key is set correctly:

echo $ANTHROPIC_API_KEY

PDF rendering fails

Ensure Chromium is installed and accessible:

playwright show-trace

Integration with Frontend

Example React integration:

const [loading, setLoading] = useState(false);
const [result, setResult] = useState(null);

const generateDocuments = async () => {
  setLoading(true);
  
  try {
    const response = await fetch('http://localhost:8000/generate', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        seed_images: seedImageUrls,
        prompt_params: {
          language: 'English',
          doc_type: documentType,
          num_solutions: 3
        }
      })
    });
    
    const data = await response.json();
    setResult(data);
  } catch (error) {
    console.error('Generation failed:', error);
  } finally {
    setLoading(false);
  }
};

React Integration (Async API with Progress)

import { useState, useEffect } from 'react';

function DocumentGenerator({ userId, seedImages }) {
  const [requestId, setRequestId] = useState(null);
  const [status, setStatus] = useState(null);
  const [progress, setProgress] = useState(0);

  // Submit job
  const handleGenerate = async () => {
    const response = await fetch('http://localhost:8000/generate/async', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        user_id: userId,
        seed_images: seedImages,
        prompt_params: {
          language: 'English',
          doc_type: 'receipts',
          num_solutions: 3,
          enable_handwriting: true,
          output_detail: 'dataset'
        }
      })
    });
    
    const job = await response.json();
    setRequestId(job.request_id);
    setStatus('queued');
  };

  // Poll job status
  useEffect(() => {
    if (!requestId || status === 'completed' || status === 'failed') return;

    const interval = setInterval(async () => {
      const response = await fetch(`http://localhost:8000/jobs/${requestId}/status`);
      const jobStatus = await response.json();
      
      setStatus(jobStatus.status);
      
      // Update progress bar
      const progressMap = {
        'queued': 10,
        'processing': 30,
        'generating': 60,
        'completed': 100,
        'failed': 0
      };
      setProgress(progressMap[jobStatus.status] || 0);
      
      if (jobStatus.status === 'completed') {
        // Open Google Drive download link
        window.open(jobStatus.download_url, '_blank');
      }
    }, 30000); // Poll every 30 seconds

    return () => clearInterval(interval);
  }, [requestId, status]);

  return (
    <div>
      <button onClick={handleGenerate} disabled={status && status !== 'completed'}>
        Generate Documents
      </button>
      
      {status && (
        <div className="progress-container">
          <div className="progress-bar" style={{ width: `${progress}%` }} />
          <p>Status: {status}</p>
          {status === 'completed' && (
            <a href={`http://localhost:8000/jobs/${requestId}/status`}>
              Download Results
            </a>
          )}
        </div>
      )}
    </div>
  );
}

Background Processing Setup

The async endpoints (/generate/async) require a background worker system for job processing.

Prerequisites

  1. Redis - Job queue storage
  2. Supabase - Database for job tracking and user data
  3. Google Drive OAuth - For uploading results to user's Drive

Installing Redis

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install redis-server
sudo systemctl start redis
sudo systemctl enable redis

macOS:

brew install redis
brew services start redis

Docker:

docker run -d -p 6379:6379 --name redis redis:7-alpine

Verify Redis is running:

redis-cli ping
# Should return: PONG

Configuring Supabase

  1. Create a Supabase project at supabase.com

  2. Create the required tables in your Supabase SQL Editor:

-- Document generation requests
CREATE TABLE document_requests (
  id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  user_id INTEGER NOT NULL,
  status TEXT NOT NULL CHECK (status IN ('queued', 'processing', 'generating', 'completed', 'failed')),
  request_metadata JSONB NOT NULL,
  error_message TEXT,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Generated documents
CREATE TABLE generated_documents (
  id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  request_id UUID NOT NULL REFERENCES document_requests(id),
  document_id TEXT NOT NULL,
  file_url TEXT,
  zip_url TEXT,
  file_size_mb DECIMAL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- User integrations (Google Drive OAuth)
CREATE TABLE user_integrations (
  id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  user_id INTEGER NOT NULL,
  integration_type TEXT NOT NULL CHECK (integration_type IN ('google_drive', 'dropbox')),
  access_token TEXT NOT NULL,
  refresh_token TEXT,
  token_expiry TIMESTAMPTZ,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  UNIQUE(user_id, integration_type)
);

-- Analytics events
CREATE TABLE analytics_events (
  id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  user_id INTEGER,
  event_type TEXT NOT NULL,
  entity_id UUID,
  event_data JSONB,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Indexes for performance
CREATE INDEX idx_document_requests_user_id ON document_requests(user_id);
CREATE INDEX idx_document_requests_status ON document_requests(status);
CREATE INDEX idx_generated_documents_request_id ON generated_documents(request_id);
CREATE INDEX idx_user_integrations_user_id ON user_integrations(user_id);
CREATE INDEX idx_analytics_events_user_id ON analytics_events(user_id);
  1. Add your Supabase credentials to .env:
# In api/.env
SUPABASE_URL=https://your-project-ref.supabase.co
SUPABASE_KEY=your-anon-or-service-role-key

Configuring Google Drive OAuth

Users need to connect their Google Drive account for result storage:

  1. Create a Google Cloud Project at console.cloud.google.com

  2. Enable Google Drive API

  3. Create OAuth 2.0 credentials (Web application)

  4. Add authorized redirect URIs (e.g., http://localhost:3000/auth/google/callback)

  5. Download credentials JSON

  6. Users authenticate via OAuth flow (implement in your frontend):

# Example OAuth flow (implement in your auth system)
from google_auth_oauthlib.flow import Flow

flow = Flow.from_client_config(
    client_config={
        "web": {
            "client_id": "YOUR_CLIENT_ID",
            "client_secret": "YOUR_CLIENT_SECRET",
            "auth_uri": "https://accounts.google.com/o/oauth2/auth",
            "token_uri": "https://oauth2.googleapis.com/token",
            "redirect_uris": ["http://localhost:3000/auth/google/callback"]
        }
    },
    scopes=["https://www.googleapis.com/auth/drive.file"]
)

# User visits auth URL, gets redirected back with code
authorization_url, state = flow.authorization_url(access_type='offline', include_granted_scopes='true')

# Exchange code for tokens
flow.fetch_token(code=authorization_code)
credentials = flow.credentials

# Store in Supabase user_integrations table
supabase.table('user_integrations').insert({
    'user_id': user_id,
    'integration_type': 'google_drive',
    'access_token': credentials.token,
    'refresh_token': credentials.refresh_token,
    'token_expiry': credentials.expiry
}).execute()

Starting the Background Worker

  1. Configure environment variables in api/.env:
# Redis Configuration
REDIS_URL=redis://localhost:6379/0
RQ_QUEUE_NAME=docgenie

# Batch Processing
BATCH_POLL_INTERVAL=30  # seconds
BATCH_DATA_DIR=/tmp/docgenie_batches
MESSAGE_DATA_DIR=/tmp/docgenie_messages

# Google Drive
GOOGLE_DRIVE_FOLDER_NAME=DocGenie Documents

# Supabase (already configured above)
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your_key_here

# Claude API
ANTHROPIC_API_KEY=your_api_key_here
  1. Start the worker:
cd api/
./start_worker.sh

The worker will:

  • βœ“ Check Redis connection
  • βœ“ Validate Supabase configuration
  • βœ“ Verify Claude API key
  • βœ“ Create temporary directories
  • βœ“ Start RQ worker listening on docgenie queue

Output:

πŸš€ Starting DocGenie RQ Worker...
βœ“ Loading .env file...
βœ“ Redis connected
βœ“ Supabase configured
βœ“ Claude API key configured
βœ“ Temporary directories created

============================================
Worker Configuration:
  Queue: docgenie
  Redis: redis://localhost:6379/0
  Batch Data: /tmp/docgenie_batches
  Message Data: /tmp/docgenie_messages
============================================

βœ… Starting RQ worker (press Ctrl+C to stop)...

12:00:00 RQ worker 'worker-abc123' started on docgenie queue

Running Multiple Workers (Production)

For production systems with high load, run multiple workers:

# Terminal 1
./start_worker.sh

# Terminal 2
./start_worker.sh

# Terminal 3
./start_worker.sh

Each worker processes jobs independently from the same queue.

For detailed scaling instructions, see SCALING.md.

Monitoring Workers

# View worker status
rq info --url redis://localhost:6379/0

# View queue status
rq info --queue docgenie --url redis://localhost:6379/0

# View failed jobs
rq info --queue failed --url redis://localhost:6379/0

Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   FastAPI   │───────▢│    Redis    │◀───────│  RQ Workers     β”‚
β”‚   Server    β”‚        β”‚   Queue     β”‚        β”‚  (1-5 instances)β”‚
β”‚             β”‚        β”‚             β”‚        β”‚                 β”‚
β”‚ /generate/  β”‚        β”‚ Job Queue:  β”‚        β”‚ β€’ Downloads     β”‚
β”‚  async      β”‚        β”‚ - queued    β”‚        β”‚ β€’ Claude Batch  β”‚
β”‚             β”‚        β”‚ - pending   β”‚        β”‚ β€’ PDF render    β”‚
β”‚ /jobs/      β”‚        β”‚ - active    β”‚        β”‚ β€’ Handwriting   β”‚
β”‚  {id}/      β”‚        β”‚             β”‚        β”‚ β€’ OCR           β”‚
β”‚  status     β”‚        β”‚             β”‚        β”‚ β€’ ZIP creation  β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                                               β”‚
       β”‚                                               β”‚
       β–Ό                                               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                          Supabase                             β”‚
β”‚  β€’ document_requests (job tracking)                           β”‚
β”‚  β€’ generated_documents (results metadata)                     β”‚
β”‚  β€’ user_integrations (Google Drive OAuth)                     β”‚
β”‚  β€’ analytics_events (usage tracking)                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β”‚ Upload Results
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Google Drive                             β”‚
β”‚  β€’ User's "DocGenie Documents" folder                         β”‚
β”‚  β€’ ZIP files with generated documents                         β”‚
β”‚  β€’ Shareable links returned to API                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Cost Comparison: Direct vs Batched API

API Type Cost (Input) Cost (Output) Latency Use Case
Direct $5.00/1M tokens $15.00/1M tokens 30-120s Real-time, interactive
Batched $2.50/1M tokens $7.50/1M tokens 5-30 min Background jobs (recommended)

Example Cost Calculation:

  • Generate 100 documents per day
  • Each request: 5,000 input tokens, 10,000 output tokens

Direct API Cost:

  • Input: (100 Γ— 5,000 / 1M) Γ— $5.00 = $2.50/day
  • Output: (100 Γ— 10,000 / 1M) Γ— $15.00 = $15.00/day
  • Total: $17.50/day = $525/month

Batched API Cost:

  • Input: (100 Γ— 5,000 / 1M) Γ— $2.50 = $1.25/day
  • Output: (100 Γ— 10,000 / 1M) Γ— $7.50 = $7.50/day
  • Total: $8.75/day = $262.50/month

πŸ’° Savings: $262.50/month (50% reduction)

Scaling Workers

The API uses Redis Queue (RQ) workers for background job processing. Scale workers based on load:

User Load Workers Redis RAM Notes
< 10 req/hr 1 256 MB Development
10–50 req/hr 2–3 512 MB Small production
50–200 req/hr 3–5 1 GB Medium production
> 200 req/hr 5+ 2+ GB Large production

Starting Workers

# Single worker (development)
./start_worker.sh

# Multiple workers (production) β€” run in separate terminals
./start_worker.sh   # Terminal 1
./start_worker.sh   # Terminal 2

# Docker Compose β€” scale to 3 workers
docker-compose up --scale worker=3

# Monitor
rq info --url redis://localhost:6379/0
rq info --queue docgenie --url redis://localhost:6379/0

Railway Multi-Worker (Separate Service)

  1. Railway dashboard β†’ New Service β†’ GitHub Repo (same repo)
  2. Name: docgenie-worker
  3. Custom Start Command: rq worker --url $REDIS_URL
  4. Add the same environment variables as the API service

For most use cases the combined mode (API + worker in one service, see railway.json) is sufficient and cheaper.

Contributing

This API is a simplified interface to the DocGenie pipeline. For the full pipeline with all features, see the main DocGenie documentation.

License

Same as DocGenie main project.