Image-to-Text

IDEA-OCSR

IDEA-OCSR is a deep learning model designed for Optical Chemical Structure Recognition (OCSR) tasks. This model is built upon the MolScribe architecture and has been enhanced through specialized training focused on resolving complex molecular structures.

Model Summary

IDEA-OCSR (Optical Chemical Structure Recognition) maintains the core architecture of MolScribe while introducing a completely new pipeline for molecular graph data preprocessing and prediction postprocessing. Through intensive optimization of the training strategy, this model outperforms the original MolScribe on the majority of public datasets.

Due to commercial patent restrictions, we are releasing the model weights in a format compatible with the original open source license. The weights for IDEA-OCSR-v1.0.0 can be loaded using the exact same inference logic as the original MolScribe to allow for a seamless transition.

Key Improvements

  • Enhanced Complex Scenarios: There is a significant improvement in recognition accuracy for multi-cyclic structures and chiral molecules as well as other highly complex molecular graphs.

  • Optimized Algorithms: We have implemented advanced logic for the image input stage and refined the postprocessing validation for decoded molecular topologies. Please note that the specific source code for these optimized algorithms is currently not open source. Furthermore, because these proprietary postprocessing algorithms are not included in this release, the final inference performance using these weights alone will be slightly lower than the benchmarks achieved by IDEA-OCSR weight.

  • Superior Performance: The model achieves higher benchmarks compared to the original version across various public datasets.

Quick Start

Since IDEA-OCSR is architecturally compatible with MolScribe, you can use the official MolScribe codebase and point the weight path to this repository.

URL: https://huggingface.co/spaces/IDEA-AI4S/IDEA-OCSR

Limitations and Disclaimer

While IDEA-OCSR performs exceptionally well in most scenarios, it may still be limited by the quality of the original image or extremely rare chemical configurations. These weights are released for academic exchange and research purposes only. For commercial use, please refer to the open source license and relevant legal regulations.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for IDEA-AI4S/IDEA-OCSR

Base model

yujieq/MolScribe
Finetuned
(3)
this model

Space using IDEA-AI4S/IDEA-OCSR 1