AI & ML interests

None defined yet.

Recent Activity

muhammad0-0hredenย  updated a model 15 days ago
Misraj/Baseer__Nakba
muhammad0-0hredenย  updated a Space 17 days ago
Misraj/README
muhammad0-0hredenย  published a Space 17 days ago
Misraj/README
View all activity

Organization Card

ู…ูุณุฑุงุฌ โ€” Misraj AI

Built on Trust. Measured by Impact.
The next-generation Arabic AI lab โ€” building the foundational infrastructure for Arabic language understanding, generation, and document intelligence.


๐Ÿงญ About Us

Misraj AI is the AI research division of Misraj Technology, a Saudi-based technology group with over 10 years of experience delivering enterprise digital solutions across 15 sectors. Our AI lab is dedicated to a singular mission: making Arabic a first-class language in the modern AI era.

We develop open models, large-scale datasets, rigorous benchmarks, and production-ready AI systems โ€” all purpose-built for Arabic, a morphologically rich language that has long been underserved by mainstream AI research.

From our research lab to operational products, we build a comprehensive system that enables governments and enterprises to adopt AI with confidence, depth, and speed.

๐Ÿ“Š 15+ research papers ยท 35 billion open Arabic data tokens ยท Honored by AI Pioneers


๐Ÿข Areas of Expertise

Our AI solutions span critical industry verticals, combining deep domain knowledge with state-of-the-art Arabic NLP:

  • ๐Ÿฅ Healthcare Technology โ€” Clinical documentation and Arabic medical NLP
  • ๐Ÿฆ Financial Technology โ€” Document intelligence for banking and finance
  • โš–๏ธ Legal Technology โ€” Contract analysis and legal document processing
  • ๐ŸŽ“ Educational Technology โ€” Arabic learning and knowledge systems
  • ๐Ÿ›๏ธ Administrative Technology โ€” Government and enterprise document automation

๐Ÿ“ฆ Open Datasets

We are committed to releasing high-quality, openly available Arabic AI resources to empower the global research community.

Dataset Description Scale
Misraj-DocOCR Expert-verified Arabic document OCR benchmark 400 images
KITAB PDF-to-Markdown Corrected Arabic PDF-to-Markdown corpus 62 documents
msdd Misraj Structured Document Dataset 26.4M rows
mudd Misraj Unstructured Document Dataset 4.76M rows
Tarjama-25 Bidirectional Arabic-English translation benchmark 5,000 expert-reviewed sentence pairs
Arabic-Image-Captioning 100M First large-scale Arabic multimodal captioning dataset 100M caption pairs
SadeedDiac-25 Arabic diacritization benchmark 1.2K samples
Sadeed Tashkeela Large-scale Arabic diacritization corpus 1.05M samples

35+ billion open Arabic data tokens released and growing.


๐Ÿ“ฌ Connect With Us

Platform Link
๐ŸŒ Misraj AI misraj.ai/en
๐ŸŒ Misraj Technology misraj.sa/en
๐Ÿ”ต Baseer OCR baseerocr.com
๐Ÿค— Hugging Face huggingface.co/Misraj
๐Ÿ’ผ LinkedIn linkedin.com/company/aimisraj
๐Ÿฆ X / Twitter @aimisraj
๐Ÿ’ป GitHub github.com/misraj-ai
๐Ÿ“ธ Instagram @misraj__ai