# πŸ—οΈ DocGenie Architecture & Dependency Resolution ## πŸ“¦ Package Structure ``` docgenie/ ← Root monorepo β”œβ”€β”€ docgenie/ ← Core package (importable) β”‚ β”œβ”€β”€ __init__.py β”‚ β”œβ”€β”€ generation/ ← Used by API β”‚ β”‚ β”œβ”€β”€ pipeline_01/ β”‚ β”‚ β”‚ └── claude_batching.py ← ClaudeBatchedClient β”‚ β”‚ β”œβ”€β”€ pipeline_03/ β”‚ β”‚ β”œβ”€β”€ pipeline_04/ β”‚ β”‚ └── utils/ β”‚ β”œβ”€β”€ evaluation/ β”‚ └── utils/ β”‚ β”œβ”€β”€ api/ ← API Service (imports docgenie.*) β”‚ β”œβ”€β”€ main.py from docgenie import ENV β”‚ β”œβ”€β”€ worker.py from docgenie.generation.pipeline_01... β”‚ β”œβ”€β”€ utils.py from docgenie.generation... β”‚ └── requirements.txt Extra: Redis, Supabase, Google β”‚ β”œβ”€β”€ handwriting_service/ ← GPU Service (NO docgenie imports!) β”‚ β”œβ”€β”€ main.py βœ“ Self-contained β”‚ β”œβ”€β”€ inference.py βœ“ No external deps β”‚ └── models.py β”‚ └── WordStylist/ ← Model code (used by handwriting) ``` ## πŸ”— Dependency Graph ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ API Service β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ api/main.py β”‚ β”‚ β”‚ β”‚ ↓ imports β”‚ β”‚ β”‚ β”‚ api/utils.py (call_claude_api_direct) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ api/worker.py β”‚ β”‚ β”‚ β”‚ ↓ imports β”‚ β”‚ β”‚ β”‚ from docgenie.generation.pipeline_01.claude_batching β”‚ β”‚ β”‚ β”‚ from docgenie.generation.constants β”‚ β”‚ β”‚ β”‚ from docgenie.generation.pipeline_03_process_responseβ”‚ β”‚ β”‚ β”‚ from docgenie.generation.pipeline_04_render_pdf... β”‚ β”‚ β”‚ β”‚ from docgenie import ENV β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ ↓ β”‚ β”‚ REQUIRES β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ docgenie/ package β”‚ β”‚ β”‚ β”‚ (entire generation module) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Handwriting Service β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ handwriting_service/main.py β”‚ β”‚ β”‚ β”‚ ↓ imports β”‚ β”‚ β”‚ β”‚ from handwriting_service.inference import ... β”‚ β”‚ β”‚ β”‚ from handwriting_service.models import ... β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ ↓ β”‚ β”‚ REQUIRES β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ WordStylist/ model β”‚ β”‚ β”‚ β”‚ (diffusion model code) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ βœ“ NO docgenie imports - completely independent! β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## 🐳 Docker Build Strategy ### ❌ What Doesn't Work ```dockerfile # ❌ WRONG: Can't copy just api/ folder FROM python:3.11 COPY api/ /app/api/ # Missing docgenie package! RUN pip install -r requirements.txt CMD ["uvicorn", "main:app"] # ImportError: No module named 'docgenie' ``` ### βœ… What Works ```dockerfile # βœ… CORRECT: Copy entire monorepo FROM python:3.11 WORKDIR /app # Copy everything COPY . . # Install docgenie as package RUN pip install -e . # Makes docgenie.* importable # Install API requirements RUN pip install -r api/requirements.txt WORKDIR /app/api CMD ["uvicorn", "main:app"] # βœ“ docgenie imports work! ``` ## 🚒 Deployment Strategy Comparison ### Option 1: Separate Deployments (❌ Won't Work) ``` API Deployment: β”œβ”€β”€ api/ folder only └── ❌ Missing docgenie package β†’ ImportError Handwriting Deployment: β”œβ”€β”€ handwriting_service/ folder └── WordStylist/ ``` **Problem:** API can't find docgenie imports! ### Option 2: Monorepo Deployment (βœ… Works) ``` API Deployment: β”œβ”€β”€ docgenie/ package (core) β”œβ”€β”€ api/ service (imports docgenie) β”œβ”€β”€ setup.py └── requirements.txt Handwriting Deployment: β”œβ”€β”€ handwriting_service/ └── WordStylist/ ``` **Solution:** Deploy entire repo for API, standalone for handwriting! ## πŸ“ File Structure in Containers ### API Container (Railway/EC2) ``` /app/ β”œβ”€β”€ docgenie/ ← Installed as Python package β”‚ β”œβ”€β”€ __init__.py β”‚ β”œβ”€β”€ generation/ β”‚ └── utils/ β”œβ”€β”€ api/ ← Working directory β”‚ β”œβ”€β”€ main.py β”‚ β”œβ”€β”€ worker.py β”‚ └── utils.py β”œβ”€β”€ setup.py └── pyproject.toml Python can import: βœ“ from docgenie.generation.pipeline_01 import ... βœ“ from docgenie import ENV ``` ### Handwriting Container (RunPod) ``` /app/ β”œβ”€β”€ handwriting_service/ β”‚ β”œβ”€β”€ main.py ← No docgenie imports! β”‚ β”œβ”€β”€ inference.py β”‚ └── models.py └── WordStylist/ ← Model code β”œβ”€β”€ ldm/ └── wordstylist_inference.py Python can import: βœ“ from handwriting_service.inference import ... βœ“ No docgenie dependencies needed! ``` ## 🎯 Import Resolution Flow ### API Service Import Chain 1. **FastAPI starts:** ```python uvicorn main:app ``` 2. **main.py imports utils:** ```python from api.utils import call_claude_api_direct ``` 3. **utils.py imports docgenie:** ```python from docgenie.generation.pipeline_01.claude_batching import ClaudeBatchedClient ``` 4. **Python looks for docgenie:** - Checks sys.path - Finds `/app` (where `pip install -e .` installed it) - Loads `docgenie/__init__.py` - βœ“ Import succeeds! ### Handwriting Service Import Chain 1. **FastAPI starts:** ```python uvicorn main:app ``` 2. **main.py imports local modules:** ```python from handwriting_service.inference import HandwritingGenerator ``` 3. **inference.py imports WordStylist:** ```python sys.path.insert(0, str(Path(__file__).parent.parent / "WordStylist")) from ldm.models.diffusion.ddpm import LatentDiffusion ``` 4. **Python loads local modules:** - No external package dependencies - βœ“ Completely self-contained! ## πŸ” Verifying Imports ### Test API Imports ```bash # Inside API container python3 -c "from docgenie.generation.pipeline_01.claude_batching import ClaudeBatchedClient; print('βœ“ Import works!')" ``` ### Test Handwriting Imports ```bash # Inside handwriting container python3 -c "from handwriting_service.inference import HandwritingGenerator; print('βœ“ Import works!')" ``` ## πŸ’‘ Key Insights 1. **API needs monorepo:** Must deploy entire `docgenie/` folder structure 2. **Handwriting is independent:** Can deploy just `handwriting_service/` + `WordStylist/` 3. **Docker layer caching:** Install docgenie package first, then API requirements 4. **Working directory matters:** Set WORKDIR to /app/api for API service 5. **Python package installation:** `pip install -e .` makes docgenie importable globally ## πŸ“Š Deployment Size Comparison | Deployment | Size | Contents | |------------|------|----------| | API (Railway) | ~2GB | Python 3.11 + docgenie + API deps + Playwright | | Worker (Railway) | ~2GB | Same as API (shares image) | | Handwriting (RunPod) | ~8GB | CUDA 11.8 + PyTorch + Diffusers + WordStylist | **Total:** ~12GB (but cached independently) ## βœ… Checklist for Successful Deployment - [ ] Dockerfile copies **entire monorepo** for API - [ ] `pip install -e .` runs before API requirements - [ ] WORKDIR set to /app/api for runtime - [ ] Handwriting Dockerfile copies only handwriting_service/ + WordStylist/ - [ ] .dockerignore excludes data/ folders (too large) - [ ] Environment variables set in Railway/EC2 - [ ] Redis URL points to Upstash - [ ] HANDWRITING_SERVICE_URL points to RunPod endpoint ## πŸŽ‰ Result ``` βœ“ API can import from docgenie package βœ“ Worker can use ClaudeBatchedClient βœ“ Handwriting service runs independently βœ“ All services communicate via HTTP βœ“ No more ImportError! ```