AI-Ready Multimodal Data Foundation for Scientific Discovery
AI & ML interests
OpenDataLab provides high-quality open datasets and tools for large models. China Large model corpus Data Alliance open source data service designated platform
Recent Activity
View all activity
Papers
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora
MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale