Spaces:
Build error
Build error
| title: IIIF Studio | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: yellow | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| # IIIF Studio | |
| A generic platform for generating AI-augmented scholarly editions from digitized heritage documents β medieval manuscripts, incunabula, cartularies, archives, charters, papyri. Any document type, any era, any language. | |
| IIIF Studio ingests images from any [IIIF](https://iiif.io/)-compliant server, analyzes them with multimodal AI (Google Gemini, Mistral), and produces structured scholarly data: diplomatic OCR, layout detection, translations, commentaries, and iconographic analysis β all exportable as ALTO XML, METS, and IIIF Presentation 3.0 manifests. | |
| **Images are never stored locally.** The platform streams them from origin servers using the IIIF Image API, storing only the AI-generated metadata (~5 KB per page instead of ~50 MB). | |
| --- | |
| ## Features | |
| - **IIIF-native architecture** β images streamed from origin servers (Gallica, BnF, Bodleian, etc.) with tiled deep zoom via OpenSeadragon | |
| - **Multi-provider AI** β Google AI Studio, Vertex AI, Mistral AI. Model selected per corpus, auto-detected from environment | |
| - **Profile-driven analysis** β 4 built-in corpus profiles (medieval illuminated, medieval textual, early modern print, modern handwritten), each with tailored prompts and active layers | |
| - **Structured output** β layout regions with bounding boxes, diplomatic OCR, translations (FR/EN), scholarly and public commentary, iconographic analysis, uncertainty tracking | |
| - **Standards-compliant export** β IIIF Presentation 3.0 manifests (with Image Service for tiled zoom), ALTO XML, METS XML, ZIP bundles | |
| - **Human-in-the-loop** β editorial correction interface with versioned history and rollback | |
| - **Full-text search** β accent-insensitive search across OCR text, translations, and iconographic tags | |
| --- | |
| ## Architecture | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β IIIF IMAGE SERVERS β | |
| β Gallica Β· BnF Β· Bodleian Β· Europeana Β· ... β | |
| β (origin β images are never copied) β | |
| ββββββββββββ¬βββββββββββββββββββββ¬ββββββββββββββββββββββββββββββ¬ββββββββββββββββββββ | |
| β β β | |
| β info.json β /full/!1500,1500/ β /full/max/ | |
| β + tiles β 0/default.jpg β 0/default.jpg | |
| β β (1500px for AI) β | |
| β β β | |
| ββββββββββββΌβββββββββββ ββββββββΌββββββββββββ ββββββββββββββββΌβββββββββββββββββββ | |
| β β β β β β | |
| β FRONTEND (SPA) β β BACKEND (API) β β EXPORT GENERATORS β | |
| β React + Vite β β FastAPI β β β | |
| β β β β β IIIF Manifest 3.0 β | |
| β βββββββββββββββββββ β β ββββββββββββββββ β β (with Image Service refs) β | |
| β β OpenSeadragon β β β β Ingestion β β β β | |
| β β IIIF tiled zoom β β β β β β β METS XML β | |
| β β (info.json β β β β β manifest URL β β β (IIIF URLs, not file paths) β | |
| β β deep zoom) β β β β β detect svc β β β β | |
| β ββββββββββ¬ββββββββββ β β β β store meta β β β ALTO XML β | |
| β β β β ββββββββ¬ββββββββ β β (text geometry per page) β | |
| β ββββββββββΌββββββββββ β β β β β β | |
| β β Region overlays β β β βββββββΌββββββββββ β ZIP bundle β | |
| β β (bbox from β β β β AI Pipeline ββ β (manifest + METS + ALTO) β | |
| β β master.json, β β β β ββ βββββββββββββββββββββββββββββββββββ | |
| β β scaled to β β β β fetch 1500pxββ | |
| β β canvas coords) β β β β in memory ββ ββββββββββββββββββββββββ | |
| β ββββββββββββββββββββ β β β β ββ β β | |
| β β β β βΌ ββ β AI PROVIDERS β | |
| β ββββββββββββββββββββ β β β send bytes ββββββββββΊβ β | |
| β β Pages β β β β to AI ββ β Google Gemini β | |
| β β Home Β· Reader β β β β β βββββββββββ Vertex AI β | |
| β β Editor Β· Admin β β β β βΌ ββ JSON β Mistral AI β | |
| β ββββββββββ¬ββββββββββ β β β discard img ββ β β | |
| β β β β β keep JSON ββ β (auto-detected from β | |
| β β REST API β β β scale bbox ββ β environment vars) β | |
| β β /api/v1/* β β βββββββ¬βββββββββ ββββββββββββββββββββββββ | |
| ββββββββββββΌββββββββββββ β β β | |
| β β βββββββΌββββββββββ | |
| β β β Response ββ | |
| β β β Parser ββ | |
| βββββββββββββββ€ β ββ | |
| β β raw JSON ββ | |
| β β β layout ββ | |
| β β β OCR ββ | |
| β β β regions ββ | |
| β βββββββ¬βββββββββ | |
| β β β | |
| β βββββββΌββββββββββ | |
| β β Master ββ | |
| β β Writer ββ | |
| β β ββ | |
| β β ai_raw.json ββ βββββββββββββββββββββββββββββββββ | |
| β β master.json ββ β β | |
| β βββββββ¬βββββββββ β LOCAL STORAGE β | |
| β β β β β | |
| βββββββββΌββββββββββ β SQLite (corpus, pages, β | |
| β β manuscripts, jobs, models) β | |
| ββββββββββββββββΊ β | |
| β data/corpora/{slug}/pages/ β | |
| β {folio}/master.json β | |
| β {folio}/ai_raw.json β | |
| β {folio}/alto.xml β | |
| β β | |
| β ~5 KB per page (JSON only) β | |
| β NO image binaries β | |
| βββββββββββββββββββββββββββββββββ | |
| PIPELINE FLOW (per page): | |
| ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ | |
| β 1.INGEST βββββΊβ 2.DETECT βββββΊβ 3.FETCH βββββΊβ 4.AI βββββΊβ 5.PARSE β | |
| β β β β β β β β β β | |
| β manifest β β IIIF svc β β 1500px β β send β β layout β | |
| β URL β β URL + β β JPEG in β β image + β β regions β | |
| β β β canvas β β memory β β prompt β β OCR β | |
| β β β dims β β (discard β β to β β bbox β | |
| β β β β β after) β β provider β β β | |
| ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββ¬ββββββ | |
| β | |
| ββββββββββββ ββββββββββββ ββββββββββββ ββββββΌββββββ | |
| β 8.EXPORT ββββββ 7.REVIEW ββββββ 6.WRITE βββββββββββββββββββββ 5b.SCALE β | |
| β β β β β β β β | |
| β IIIF 3.0 β β human β β ai_raw + β β bbox β | |
| β ALTO XML β β correct β β master β β deriv β β | |
| β METS XML β β validate β β .json β β canvas β | |
| β ZIP β β version β β + ALTO β β coords β | |
| ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ | |
| ``` | |
| ### Tech stack | |
| | Layer | Technology | | |
| |-------|-----------| | |
| | Backend | Python 3.11+, FastAPI, Uvicorn | | |
| | Database | SQLite via SQLAlchemy 2.0 async + aiosqlite | | |
| | Validation | Pydantic v2 | | |
| | AI providers | Google Gemini (google-genai SDK), Mistral AI | | |
| | Image viewer | OpenSeadragon (IIIF tiled zoom) | | |
| | Frontend | React 18, TypeScript, Vite, Tailwind CSS, React Router | | |
| | Exports | lxml (ALTO/METS XML), IIIF Presentation 3.0 | | |
| | Deployment | Docker (HuggingFace Spaces) | | |
| --- | |
| ## Quick start | |
| ### Docker (recommended) | |
| ```bash | |
| git clone https://github.com/maribakulj/IIIF-Studio.git && cd IIIF-Studio | |
| # Configure at least one AI provider key | |
| cp .env.example .env | |
| # Edit .env and add your API key(s) | |
| # Build and run | |
| docker compose -f infra/docker-compose.yml up --build | |
| # Open http://localhost:7860 | |
| ``` | |
| ### Local development | |
| ```bash | |
| # Backend | |
| cd backend | |
| pip install -e ".[dev]" | |
| uvicorn app.main:app --reload --port 7860 | |
| # Frontend (separate terminal) | |
| cd frontend | |
| npm install | |
| npm run dev | |
| ``` | |
| The API is available at `http://localhost:7860/api/v1/`. Interactive Swagger docs at `http://localhost:7860/docs`. | |
| --- | |
| ## Usage workflow | |
| 1. **Create a corpus** β select a profile matching your document type | |
| 2. **Ingest pages** β provide a IIIF manifest URL, direct image URLs, or upload files | |
| 3. **Select an AI model** β choose a provider and model from the detected options | |
| 4. **Run the pipeline** β AI analyzes each page: layout detection, OCR, translation, commentary | |
| 5. **Review and correct** β use the Editor to validate, correct OCR, adjust regions | |
| 6. **Export** β download IIIF manifest, ALTO XML, METS XML, or a ZIP bundle | |
| --- | |
| ## Corpus profiles | |
| Profiles control which analysis layers are active, which prompt templates are used, and what uncertainty thresholds apply. | |
| | Profile | Script | Languages | Key layers | | |
| |---------|--------|-----------|------------| | |
| | `medieval-illuminated` | Caroline | Latin, French | OCR, translation, iconography, commentary, material notes | | |
| | `medieval-textual` | Gothic | Latin, French | OCR, translation, scholarly commentary | | |
| | `early-modern-print` | Print | French, Latin | OCR, summary | | |
| | `modern-handwritten` | Cursive | French | OCR, summary | | |
| Custom profiles can be added as JSON files in the `profiles/` directory with matching prompt templates in `prompts/`. | |
| --- | |
| ## AI providers | |
| The backend auto-detects available providers from environment variables. No global selector β the model is chosen per corpus from the admin interface. | |
| | Provider | Environment variable | Notes | | |
| |----------|---------------------|-------| | |
| | Google AI Studio | `GOOGLE_AI_STUDIO_API_KEY` | Free tier, good for development | | |
| | Vertex AI (service account) | `VERTEX_SERVICE_ACCOUNT_JSON` | Institutional deployments | | |
| | Mistral AI | `MISTRAL_API_KEY` | Alternative provider | | |
| At least **one** key is required for the pipeline to function. Keys must **never** appear in code, commits, or Docker images. | |
| --- | |
| ## API reference | |
| All endpoints are prefixed with `/api/v1/`. Full OpenAPI docs available at `/docs`. | |
| ### Corpus management | |
| | Method | Endpoint | Description | | |
| |--------|----------|-------------| | |
| | `GET` | `/corpora` | List all corpora | | |
| | `POST` | `/corpora` | Create a corpus (slug + title + profile) | | |
| | `GET` | `/corpora/{id}` | Get a corpus | | |
| | `DELETE` | `/corpora/{id}` | Delete a corpus (cascades) | | |
| | `GET` | `/corpora/{id}/manuscripts` | List manuscripts in a corpus | | |
| ### Ingestion | |
| | Method | Endpoint | Description | | |
| |--------|----------|-------------| | |
| | `POST` | `/corpora/{id}/ingest/iiif-manifest` | Ingest from a IIIF manifest URL | | |
| | `POST` | `/corpora/{id}/ingest/iiif-images` | Ingest from direct image URLs | | |
| | `POST` | `/corpora/{id}/ingest/files` | Upload image files | | |
| ### AI pipeline | |
| | Method | Endpoint | Description | | |
| |--------|----------|-------------| | |
| | `GET` | `/providers` | List detected AI providers | | |
| | `GET` | `/providers/{type}/models` | List models for a provider | | |
| | `PUT` | `/corpora/{id}/model` | Set AI model for a corpus | | |
| | `POST` | `/corpora/{id}/run` | Run pipeline on all pages | | |
| | `POST` | `/pages/{id}/run` | Run pipeline on a single page | | |
| | `GET` | `/jobs/{id}` | Check job status | | |
| | `POST` | `/jobs/{id}/retry` | Retry a failed job | | |
| ### Pages and content | |
| | Method | Endpoint | Description | | |
| |--------|----------|-------------| | |
| | `GET` | `/pages/{id}` | Page metadata | | |
| | `GET` | `/pages/{id}/master-json` | Full page master (canonical JSON) | | |
| | `GET` | `/pages/{id}/layers` | List annotation layers | | |
| | `POST` | `/pages/{id}/corrections` | Apply editorial corrections | | |
| | `GET` | `/pages/{id}/history` | Version history | | |
| | `GET` | `/search?q=` | Full-text search across all pages | | |
| ### Export | |
| | Method | Endpoint | Description | | |
| |--------|----------|-------------| | |
| | `GET` | `/manuscripts/{id}/iiif-manifest` | IIIF Presentation 3.0 manifest | | |
| | `GET` | `/manuscripts/{id}/mets` | METS XML | | |
| | `GET` | `/pages/{id}/alto` | ALTO XML | | |
| | `GET` | `/manuscripts/{id}/export.zip` | ZIP bundle (manifest + METS + ALTO) | | |
| --- | |
| ## Data model | |
| Each analyzed page produces a `master.json` β the canonical source of truth for all exports. | |
| ``` | |
| PageMaster | |
| βββ image β IIIF service URL, canvas dimensions, provenance | |
| βββ layout β regions with bounding boxes [x, y, w, h] in absolute pixels | |
| βββ ocr β diplomatic text, confidence, uncertain segments | |
| βββ translation β French, English | |
| βββ summary β short + detailed | |
| βββ commentary β public, scholarly, sourced claims with certainty levels | |
| βββ extensions β profile-specific data (iconography, materiality, etc.) | |
| βββ processing β provider, model, prompt version, timestamp | |
| βββ editorial β status (machine_draft β validated β published), version | |
| ``` | |
| Bounding boxes follow the convention `[x, y, width, height]` in absolute pixels of the original image. Coordinates are automatically scaled from AI analysis space to full canvas dimensions. | |
| --- | |
| ## IIIF-native image handling | |
| IIIF Studio operates in two modes: | |
| ### IIIF-native mode (default for manifest/URL ingestion) | |
| - Images are **never downloaded or stored** locally | |
| - At ingestion: IIIF Image Service URL and canvas dimensions are extracted from the manifest | |
| - At analysis: a 1500px derivative is fetched in memory via the IIIF Image API (`{service}/full/!1500,1500/0/default.jpg`), sent to the AI, then discarded | |
| - In the viewer: OpenSeadragon loads `info.json` from the IIIF server for native tiled deep zoom | |
| - Storage per page: **~5 KB** (JSON metadata only) | |
| ### File upload mode (for non-IIIF sources) | |
| - Uploaded images are stored locally in `data/corpora/{slug}/` | |
| - Derivatives (1500px) and thumbnails (256px) are created on disk | |
| - Storage per page: **~50 MB** (images + JSON) | |
| --- | |
| ## Project structure | |
| ``` | |
| IIIF-Studio/ | |
| βββ backend/ | |
| β βββ app/ | |
| β β βββ main.py # FastAPI entry point | |
| β β βββ config.py # Pydantic settings from env vars | |
| β β βββ api/v1/ # REST endpoints | |
| β β βββ models/ # SQLAlchemy ORM models | |
| β β βββ schemas/ # Pydantic v2 schemas (canonical) | |
| β β βββ services/ | |
| β β βββ ai/ # Provider factory, analyzer, prompt loader | |
| β β βββ ingest/ # IIIF fetcher, service detection | |
| β β βββ image/ # Normalizer (in-memory + legacy disk) | |
| β β βββ export/ # ALTO, METS, IIIF manifest generators | |
| β βββ tests/ # 585 tests (pytest + pytest-asyncio) | |
| β βββ pyproject.toml | |
| βββ frontend/ | |
| β βββ src/ | |
| β β βββ App.tsx # React Router (/, /admin, /reader, /editor) | |
| β β βββ lib/api.ts # Typed API client | |
| β β βββ pages/ # Home, Reader, Editor, Admin | |
| β β βββ components/ # Viewer (OpenSeadragon), retro UI system | |
| β βββ package.json | |
| βββ profiles/ # 4 corpus profile JSON files | |
| βββ prompts/ # 9 prompt templates organized by profile | |
| βββ Dockerfile # Multi-stage build (Node + Python) | |
| βββ infra/docker-compose.yml # Local development | |
| βββ .env.example # Environment variable template | |
| ``` | |
| --- | |
| ## Testing | |
| ```bash | |
| cd backend | |
| pip install -e ".[dev]" | |
| pytest tests/ -v --cov=app | |
| ``` | |
| Expected result: **585 passed, 3 skipped**. | |
| All AI calls are mocked in tests β no API keys required to run the test suite. | |
| --- | |
| ## Deployment | |
| ### HuggingFace Spaces | |
| This repository is configured for [HuggingFace Spaces](https://huggingface.co/spaces) with Docker SDK on port 7860. AI keys are stored as Space secrets (Settings β Repository secrets). | |
| The CI pipeline (`.github/workflows/`) runs tests on every push and auto-deploys to HuggingFace Spaces on merge to `main`. | |
| ### Self-hosted | |
| ```bash | |
| docker build -t iiif-studio . | |
| docker run -p 7860:7860 \ | |
| -e GOOGLE_AI_STUDIO_API_KEY=your_key \ | |
| -v ./data:/app/data \ | |
| iiif-studio | |
| ``` | |
| --- | |
| ## License | |
| [Apache License 2.0](LICENSE) | |