shoaibmohd 's Collections
MinerU2.5: A Decoupled Vision-Language Model for Efficient
High-Resolution Document Parsing
Paper
• 2509.22186
• Published
• 146
CommonForms: A Large, Diverse Dataset for Form Field Detection
Paper
• 2509.16506
• Published
• 22
Automated Structured Radiology Report Generation with Rich Clinical
Context
Paper
• 2510.00428
• Published
• 8
Extract-0: A Specialized Language Model for Document Information
Extraction
Paper
• 2509.22906
• Published
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper
• 2510.14528
• Published
• 118
RL makes MLLMs see better than SFT
Paper
• 2510.16333
• Published
• 49
NVIDIA Nemotron Parse 1.1
Paper
• 2511.20478
• Published
• 23
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Paper
• 2511.16334
• Published
• 93
Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI
Paper
• 2502.17092
• Published
• 3
SmolDocling: An ultra-compact vision-language model for end-to-end
multi-modal document conversion
Paper
• 2503.11576
• Published
• 150
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models
Paper
• 2601.21639
• Published
• 50
DeepSeek-OCR 2: Visual Causal Flow
Paper
• 2601.20552
• Published
• 64
FireRed-OCR Technical Report
Paper
• 2603.01840
• Published
• 6