Docgenie-API / ARCHITECTURE.md
Ahadhassan-2003
deploy: update HF Space
dc4e6da
# πŸ—οΈ DocGenie Architecture & Dependency Resolution
## πŸ“¦ Package Structure
```
docgenie/ ← Root monorepo
β”œβ”€β”€ docgenie/ ← Core package (importable)
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ generation/ ← Used by API
β”‚ β”‚ β”œβ”€β”€ pipeline_01/
β”‚ β”‚ β”‚ └── claude_batching.py ← ClaudeBatchedClient
β”‚ β”‚ β”œβ”€β”€ pipeline_03/
β”‚ β”‚ β”œβ”€β”€ pipeline_04/
β”‚ β”‚ └── utils/
β”‚ β”œβ”€β”€ evaluation/
β”‚ └── utils/
β”‚
β”œβ”€β”€ api/ ← API Service (imports docgenie.*)
β”‚ β”œβ”€β”€ main.py from docgenie import ENV
β”‚ β”œβ”€β”€ worker.py from docgenie.generation.pipeline_01...
β”‚ β”œβ”€β”€ utils.py from docgenie.generation...
β”‚ └── requirements.txt Extra: Redis, Supabase, Google
β”‚
β”œβ”€β”€ handwriting_service/ ← GPU Service (NO docgenie imports!)
β”‚ β”œβ”€β”€ main.py βœ“ Self-contained
β”‚ β”œβ”€β”€ inference.py βœ“ No external deps
β”‚ └── models.py
β”‚
└── WordStylist/ ← Model code (used by handwriting)
```
## πŸ”— Dependency Graph
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ API Service β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ api/main.py β”‚ β”‚
β”‚ β”‚ ↓ imports β”‚ β”‚
β”‚ β”‚ api/utils.py (call_claude_api_direct) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ api/worker.py β”‚ β”‚
β”‚ β”‚ ↓ imports β”‚ β”‚
β”‚ β”‚ from docgenie.generation.pipeline_01.claude_batching β”‚ β”‚
β”‚ β”‚ from docgenie.generation.constants β”‚ β”‚
β”‚ β”‚ from docgenie.generation.pipeline_03_process_responseβ”‚ β”‚
β”‚ β”‚ from docgenie.generation.pipeline_04_render_pdf... β”‚ β”‚
β”‚ β”‚ from docgenie import ENV β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ ↓ β”‚
β”‚ REQUIRES β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ docgenie/ package β”‚ β”‚
β”‚ β”‚ (entire generation module) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Handwriting Service β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ handwriting_service/main.py β”‚ β”‚
β”‚ β”‚ ↓ imports β”‚ β”‚
β”‚ β”‚ from handwriting_service.inference import ... β”‚ β”‚
β”‚ β”‚ from handwriting_service.models import ... β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ ↓ β”‚
β”‚ REQUIRES β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ WordStylist/ model β”‚ β”‚
β”‚ β”‚ (diffusion model code) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β”‚ βœ“ NO docgenie imports - completely independent! β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## 🐳 Docker Build Strategy
### ❌ What Doesn't Work
```dockerfile
# ❌ WRONG: Can't copy just api/ folder
FROM python:3.11
COPY api/ /app/api/ # Missing docgenie package!
RUN pip install -r requirements.txt
CMD ["uvicorn", "main:app"] # ImportError: No module named 'docgenie'
```
### βœ… What Works
```dockerfile
# βœ… CORRECT: Copy entire monorepo
FROM python:3.11
WORKDIR /app
# Copy everything
COPY . .
# Install docgenie as package
RUN pip install -e . # Makes docgenie.* importable
# Install API requirements
RUN pip install -r api/requirements.txt
WORKDIR /app/api
CMD ["uvicorn", "main:app"] # βœ“ docgenie imports work!
```
## 🚒 Deployment Strategy Comparison
### Option 1: Separate Deployments (❌ Won't Work)
```
API Deployment:
β”œβ”€β”€ api/ folder only
└── ❌ Missing docgenie package β†’ ImportError
Handwriting Deployment:
β”œβ”€β”€ handwriting_service/ folder
└── WordStylist/
```
**Problem:** API can't find docgenie imports!
### Option 2: Monorepo Deployment (βœ… Works)
```
API Deployment:
β”œβ”€β”€ docgenie/ package (core)
β”œβ”€β”€ api/ service (imports docgenie)
β”œβ”€β”€ setup.py
└── requirements.txt
Handwriting Deployment:
β”œβ”€β”€ handwriting_service/
└── WordStylist/
```
**Solution:** Deploy entire repo for API, standalone for handwriting!
## πŸ“ File Structure in Containers
### API Container (Railway/EC2)
```
/app/
β”œβ”€β”€ docgenie/ ← Installed as Python package
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ generation/
β”‚ └── utils/
β”œβ”€β”€ api/ ← Working directory
β”‚ β”œβ”€β”€ main.py
β”‚ β”œβ”€β”€ worker.py
β”‚ └── utils.py
β”œβ”€β”€ setup.py
└── pyproject.toml
Python can import:
βœ“ from docgenie.generation.pipeline_01 import ...
βœ“ from docgenie import ENV
```
### Handwriting Container (RunPod)
```
/app/
β”œβ”€β”€ handwriting_service/
β”‚ β”œβ”€β”€ main.py ← No docgenie imports!
β”‚ β”œβ”€β”€ inference.py
β”‚ └── models.py
└── WordStylist/ ← Model code
β”œβ”€β”€ ldm/
└── wordstylist_inference.py
Python can import:
βœ“ from handwriting_service.inference import ...
βœ“ No docgenie dependencies needed!
```
## 🎯 Import Resolution Flow
### API Service Import Chain
1. **FastAPI starts:**
```python
uvicorn main:app
```
2. **main.py imports utils:**
```python
from api.utils import call_claude_api_direct
```
3. **utils.py imports docgenie:**
```python
from docgenie.generation.pipeline_01.claude_batching import ClaudeBatchedClient
```
4. **Python looks for docgenie:**
- Checks sys.path
- Finds `/app` (where `pip install -e .` installed it)
- Loads `docgenie/__init__.py`
- βœ“ Import succeeds!
### Handwriting Service Import Chain
1. **FastAPI starts:**
```python
uvicorn main:app
```
2. **main.py imports local modules:**
```python
from handwriting_service.inference import HandwritingGenerator
```
3. **inference.py imports WordStylist:**
```python
sys.path.insert(0, str(Path(__file__).parent.parent / "WordStylist"))
from ldm.models.diffusion.ddpm import LatentDiffusion
```
4. **Python loads local modules:**
- No external package dependencies
- βœ“ Completely self-contained!
## πŸ” Verifying Imports
### Test API Imports
```bash
# Inside API container
python3 -c "from docgenie.generation.pipeline_01.claude_batching import ClaudeBatchedClient; print('βœ“ Import works!')"
```
### Test Handwriting Imports
```bash
# Inside handwriting container
python3 -c "from handwriting_service.inference import HandwritingGenerator; print('βœ“ Import works!')"
```
## πŸ’‘ Key Insights
1. **API needs monorepo:** Must deploy entire `docgenie/` folder structure
2. **Handwriting is independent:** Can deploy just `handwriting_service/` + `WordStylist/`
3. **Docker layer caching:** Install docgenie package first, then API requirements
4. **Working directory matters:** Set WORKDIR to /app/api for API service
5. **Python package installation:** `pip install -e .` makes docgenie importable globally
## πŸ“Š Deployment Size Comparison
| Deployment | Size | Contents |
|------------|------|----------|
| API (Railway) | ~2GB | Python 3.11 + docgenie + API deps + Playwright |
| Worker (Railway) | ~2GB | Same as API (shares image) |
| Handwriting (RunPod) | ~8GB | CUDA 11.8 + PyTorch + Diffusers + WordStylist |
**Total:** ~12GB (but cached independently)
## βœ… Checklist for Successful Deployment
- [ ] Dockerfile copies **entire monorepo** for API
- [ ] `pip install -e .` runs before API requirements
- [ ] WORKDIR set to /app/api for runtime
- [ ] Handwriting Dockerfile copies only handwriting_service/ + WordStylist/
- [ ] .dockerignore excludes data/ folders (too large)
- [ ] Environment variables set in Railway/EC2
- [ ] Redis URL points to Upstash
- [ ] HANDWRITING_SERVICE_URL points to RunPod endpoint
## πŸŽ‰ Result
```
βœ“ API can import from docgenie package
βœ“ Worker can use ClaudeBatchedClient
βœ“ Handwriting service runs independently
βœ“ All services communicate via HTTP
βœ“ No more ImportError!
```