LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR Oct 23, 2025 • 71
LightOnOCR-2 🦉 LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family lightonai/LightOnOCR-2-1B Image-Text-to-Text • 1B • Updated 5 days ago • 15.3k • 327 lightonai/LightOnOCR-2-1B-bbox Image-Text-to-Text • 1B • Updated 3 days ago • 2.17k • 13 Running on Zero Featured 58 LightOnOCR 2 1B Demo 🐨 58 Extract and recognize text from images and PDFs lightonai/LightOnOCR-2-1B-base Image-Text-to-Text • 1B • Updated 5 days ago • 1.71k • 7
Running on Zero Featured 58 LightOnOCR 2 1B Demo 🐨 58 Extract and recognize text from images and PDFs
Embeddings datasets ⚡️ This collection gather datasets for embeddings pre-training and fine-tuning. lightonai/embeddings-pre-training Viewer • Updated 21 days ago • 1.38B • 1.81k • 18 lightonai/nanobeir-multilingual Viewer • Updated Sep 16, 2025 • 522k • 732 • 11
ModernBERT Bringing BERT into modernity via both architecture changes and scaling answerdotai/ModernBERT-base Fill-Mask • 0.1B • Updated Jan 15, 2025 • 734k • 986 lightonai/GTE-ModernColBERT-v1 Sentence Similarity • 0.1B • Updated 5 days ago • 8.24k • • 149 lightonai/Reason-ModernColBERT Sentence Similarity • 0.1B • Updated Sep 9, 2025 • 10.8k • • 206 lightonai/modernbert-embed-large Sentence Similarity • 0.4B • Updated May 14, 2025 • 44.9k • • 28
RITA 🧿 A suite of autoregressive generative models for protein sequences, with up to 1.2Bparameters, trained on over 280 million protein sequences. lightonai/RITA_s Text Generation • 85.1M • Updated Nov 13, 2024 • 71 • 3 lightonai/RITA_m Text Generation • 0.3B • Updated Jan 6, 2025 • 16 lightonai/RITA_l Text Generation • Updated May 19, 2022 • 25 lightonai/RITA_xl Text Generation • 1B • Updated Dec 10, 2024 • 31 • 3
ArabicWeb24-ablation-models 900M models trained on 25BT to compare different data processing choices (filtering, sentence dedup, minhash, etc) lightonai/ArabicWeb24-ablation-model-v1 Text Generation • Updated Aug 19, 2024 • 4 lightonai/ArabicWeb24-ablation-model-v5 Text Generation • Updated Aug 19, 2024 • 4
LightOnOCR 🦉 The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR lightonai/LightOnOCR-1B-1025 Image-to-Text • Updated 5 days ago • 40k • 218 lightonai/LightOnOCR-0.9B-16k-1025 Updated 5 days ago • 1.77k • 11 lightonai/LightOnOCR-0.9B-32k-1025 Updated 5 days ago • 357 • 18 Running 39 LightOnOCR 1B Demo 💬 39 Extract text from images and PDFs
PyLate 🐕 lightonai/Reason-ModernColBERT Sentence Similarity • 0.1B • Updated Sep 9, 2025 • 10.8k • • 206 lightonai/GTE-ModernColBERT-v1 Sentence Similarity • 0.1B • Updated 5 days ago • 8.24k • • 149 lightonai/answerai-colbert-small-v1 Sentence Similarity • 33.4M • Updated Jun 30, 2025 • 216 • • 3 lightonai/colbertv2.0 Sentence Similarity • 0.1B • Updated Feb 10, 2025 • 3.21k • • 4
PAGnol 🇫🇷 French language models. These model were trained in early 2021 following the then scaling laws and using the exact same training data as the CamemBERT lightonai/pagnol-small Text Generation • Updated Mar 21, 2024 • 52 • 1 lightonai/pagnol-medium Text Generation • 0.4B • Updated Jan 6, 2025 • 9 • 1 lightonai/pagnol-large Text Generation • Updated Mar 24, 2024 • 16 • 1 lightonai/pagnol-xl Text Generation • 2B • Updated Nov 7, 2024 • 7 • 1
LightOnOCR-2 🦉 LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family lightonai/LightOnOCR-2-1B Image-Text-to-Text • 1B • Updated 5 days ago • 15.3k • 327 lightonai/LightOnOCR-2-1B-bbox Image-Text-to-Text • 1B • Updated 3 days ago • 2.17k • 13 Running on Zero Featured 58 LightOnOCR 2 1B Demo 🐨 58 Extract and recognize text from images and PDFs lightonai/LightOnOCR-2-1B-base Image-Text-to-Text • 1B • Updated 5 days ago • 1.71k • 7
Running on Zero Featured 58 LightOnOCR 2 1B Demo 🐨 58 Extract and recognize text from images and PDFs
LightOnOCR 🦉 The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR lightonai/LightOnOCR-1B-1025 Image-to-Text • Updated 5 days ago • 40k • 218 lightonai/LightOnOCR-0.9B-16k-1025 Updated 5 days ago • 1.77k • 11 lightonai/LightOnOCR-0.9B-32k-1025 Updated 5 days ago • 357 • 18 Running 39 LightOnOCR 1B Demo 💬 39 Extract text from images and PDFs
Embeddings datasets ⚡️ This collection gather datasets for embeddings pre-training and fine-tuning. lightonai/embeddings-pre-training Viewer • Updated 21 days ago • 1.38B • 1.81k • 18 lightonai/nanobeir-multilingual Viewer • Updated Sep 16, 2025 • 522k • 732 • 11
PyLate 🐕 lightonai/Reason-ModernColBERT Sentence Similarity • 0.1B • Updated Sep 9, 2025 • 10.8k • • 206 lightonai/GTE-ModernColBERT-v1 Sentence Similarity • 0.1B • Updated 5 days ago • 8.24k • • 149 lightonai/answerai-colbert-small-v1 Sentence Similarity • 33.4M • Updated Jun 30, 2025 • 216 • • 3 lightonai/colbertv2.0 Sentence Similarity • 0.1B • Updated Feb 10, 2025 • 3.21k • • 4
ModernBERT Bringing BERT into modernity via both architecture changes and scaling answerdotai/ModernBERT-base Fill-Mask • 0.1B • Updated Jan 15, 2025 • 734k • 986 lightonai/GTE-ModernColBERT-v1 Sentence Similarity • 0.1B • Updated 5 days ago • 8.24k • • 149 lightonai/Reason-ModernColBERT Sentence Similarity • 0.1B • Updated Sep 9, 2025 • 10.8k • • 206 lightonai/modernbert-embed-large Sentence Similarity • 0.4B • Updated May 14, 2025 • 44.9k • • 28
PAGnol 🇫🇷 French language models. These model were trained in early 2021 following the then scaling laws and using the exact same training data as the CamemBERT lightonai/pagnol-small Text Generation • Updated Mar 21, 2024 • 52 • 1 lightonai/pagnol-medium Text Generation • 0.4B • Updated Jan 6, 2025 • 9 • 1 lightonai/pagnol-large Text Generation • Updated Mar 24, 2024 • 16 • 1 lightonai/pagnol-xl Text Generation • 2B • Updated Nov 7, 2024 • 7 • 1
RITA 🧿 A suite of autoregressive generative models for protein sequences, with up to 1.2Bparameters, trained on over 280 million protein sequences. lightonai/RITA_s Text Generation • 85.1M • Updated Nov 13, 2024 • 71 • 3 lightonai/RITA_m Text Generation • 0.3B • Updated Jan 6, 2025 • 16 lightonai/RITA_l Text Generation • Updated May 19, 2022 • 25 lightonai/RITA_xl Text Generation • 1B • Updated Dec 10, 2024 • 31 • 3
ArabicWeb24-ablation-models 900M models trained on 25BT to compare different data processing choices (filtering, sentence dedup, minhash, etc) lightonai/ArabicWeb24-ablation-model-v1 Text Generation • Updated Aug 19, 2024 • 4 lightonai/ArabicWeb24-ablation-model-v5 Text Generation • Updated Aug 19, 2024 • 4