mrrtmob/khmer_english_ocr_image_line
Viewer β’ Updated β’ 12.1M β’ 287 β’ 2
Kiri OCR is a lightweight OCR library for English and Khmer documents. It provides document-level text detection, recognition, and rendering capabilities in a compact package.
| Component | Details |
|---|---|
| Type | Transformer Encoder-Decoder with CTC |
| Encoder | 4 layers, 8 heads, 256 dim, 1024 FFN |
| Decoder | 3 layers, 8 heads, 256 dim, 1024 FFN |
| CNN Backbone | ConvStem (4 conv layers with BatchNorm + SiLU) |
| Decoding | Beam search with CTC fusion + LM fusion |
| Input Size | 48 Γ 640 px (height Γ width) |
| Framework | PyTorch |
Input Image (48Γ640)
β
ConvStem (CNN)
β
2D Positional Encoding
β
Transformer Encoder (4L)
β
βββββ΄ββββ
β β
CTC Head Transformer Decoder (3L)
β β
βββββ¬ββββ
β
Beam Search + CTC Fusion + LM Fusion
β
Output Text
The model is trained on the mrrtmob/khmer_english_ocr_image_line dataset, containing 12 million synthetic images of Khmer and English text lines.
pip install kiri-ocr
from kiri_ocr import OCR
# Initialize (downloads from Hugging Face automatically)
ocr = OCR()
# Extract text from document
text, results = ocr.extract_text("document.jpg")
print(text)
# Access detailed results
for result in results:
print(f"Text: {result.text}")
print(f"Confidence: {result.confidence:.2%}")
# Basic usage
kiri-ocr predict path/to/document.jpg
# With output directory
kiri-ocr predict path/to/document.jpg --output results/
Results on synthetic test images (10 popular fonts):
Default inference parameters:
| Parameter | Value | Description |
|---|---|---|
beam_width |
4 | Beam search width |
ctc_fusion_alpha |
0.5 | CTC score fusion weight |
lm_fusion_alpha |
0.35 | Language model fusion weight |
max_length |
260 | Maximum output sequence length |
kiri-ocr/
βββ config.json # Model configuration
βββ vocab.json # Character vocabulary
βββ model.safetensors # Model weights
βββ README.md # This file
Join our Discord Community](https://discord.gg/Vcrw274RVC)
This model is released under the Apache 2.0 License.