Kiri OCR Model

Kiri OCR is a lightweight OCR library for English and Khmer documents. It provides document-level text detection, recognition, and rendering capabilities in a compact package.

✨ Key Features

Lightweight: Compact model optimized for speed and efficiency
Bilingual: Native support for English and Khmer (including mixed text)
Document Processing: Automatic text line and word detection
Hybrid Decoding: CTC + Attention decoder with language model fusion

🏗️ Architecture

Component	Details
Type	Transformer Encoder-Decoder with CTC
Encoder	4 layers, 8 heads, 256 dim, 1024 FFN
Decoder	3 layers, 8 heads, 256 dim, 1024 FFN
CNN Backbone	ConvStem (4 conv layers with BatchNorm + SiLU)
Decoding	Beam search with CTC fusion + LM fusion
Input Size	48 × 640 px (height × width)
Framework	PyTorch

Model Diagram

Input Image (48×640)
       ↓
   ConvStem (CNN)
       ↓
  2D Positional Encoding
       ↓
  Transformer Encoder (4L)
       ↓
   ┌───┴───┐
   ↓       ↓
CTC Head   Transformer Decoder (3L)
   ↓       ↓
   └───┬───┘
       ↓
  Beam Search + CTC Fusion + LM Fusion
       ↓
    Output Text

📊 Dataset

The model is trained on the mrrtmob/khmer_english_ocr_image_line dataset, containing 12 million synthetic images of Khmer and English text lines.

💻 Usage

Installation

pip install kiri-ocr

Python API

from kiri_ocr import OCR

# Initialize (downloads from Hugging Face automatically)
ocr = OCR()

# Extract text from document
text, results = ocr.extract_text("document.jpg")
print(text)

# Access detailed results
for result in results:
    print(f"Text: {result.text}")
    print(f"Confidence: {result.confidence:.2%}")

CLI Tool

# Basic usage
kiri-ocr predict path/to/document.jpg

# With output directory
kiri-ocr predict path/to/document.jpg --output results/

📈 Benchmarks

Results on synthetic test images (10 popular fonts):

⚙️ Configuration

Default inference parameters:

Parameter	Value	Description
`beam_width`	4	Beam search width
`ctc_fusion_alpha`	0.5	CTC score fusion weight
`lm_fusion_alpha`	0.35	Language model fusion weight
`max_length`	260	Maximum output sequence length

📁 Model Files

kiri-ocr/
├── config.json          # Model configuration
├── vocab.json           # Character vocabulary
├── model.safetensors    # Model weights
└── README.md            # This file

🔗 Links

GitHub: github.com/mrrtmob/kiri-ocr
Dataset: mrrtmob/khmer_english_ocr_image_line
PyPI: pypi.org/project/kiri-ocr

Join our Discord Community](https://discord.gg/Vcrw274RVC)

📄 License

This model is released under the Apache 2.0 License.

Downloads last month: 2,203

mrrtmob
/

kiri-ocr