| | --- |
| | license: apache-2.0 |
| | language: |
| | - en |
| | base_model: |
| | - openmmlab/mask-rcnn |
| | - microsoft/swin-base-patch4-window7-224-in22k |
| | pipeline_tag: image-segmentation |
| | --- |
| | |
| | # Model Card for ChartPointNet-InstanceSeg |
| |
|
| | ChartPointNet-InstanceSeg is a high-precision data point instance segmentation model for scientific charts. It uses Mask R-CNN with a Swin Transformer backbone to detect and segment individual data points, especially in dense and small-object scenarios common in scientific figures. |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | ChartPointNet-InstanceSeg is designed for pixel-precise instance segmentation of data points in scientific charts (e.g., scatter plots). It leverages Mask R-CNN with a Swin Transformer backbone, trained on enhanced COCO-style datasets with instance masks for data points. The model is ideal for extracting quantitative data from scientific figures and for downstream chart analysis. |
| |
|
| | - **Developed by:** Hansheng Zhu |
| | - **Model type:** Instance Segmentation |
| | - **License:** Apache-2.0 |
| | - **Finetuned from model:** openmmlab/mask-rcnn |
| |
|
| | ### Model Sources |
| |
|
| | - **Repository:** [https://github.com/hanszhu/ChartSense](https://github.com/hanszhu/ChartSense) |
| | - **Paper:** https://arxiv.org/abs/2106.01841 |
| |
|
| | ## Uses |
| |
|
| | ### Direct Use |
| |
|
| | - Instance segmentation of data points in scientific charts |
| | - Automated extraction of quantitative data from figures |
| | - Preprocessing for downstream chart understanding and data mining |
| |
|
| | ### Downstream Use |
| |
|
| | - As a preprocessing step for chart structure parsing or data extraction |
| | - Integration into document parsing, digital library, or accessibility systems |
| |
|
| | ### Out-of-Scope Use |
| |
|
| | - Segmentation of non-data-point elements |
| | - Use on figures outside the supported chart types |
| | - Medical or legal decision making |
| |
|
| | ## Bias, Risks, and Limitations |
| |
|
| | - The model is limited to data point segmentation in scientific charts. |
| | - May not generalize to figures with highly unusual styles or poor image quality. |
| | - Potential dataset bias: Training data is sourced from scientific literature. |
| |
|
| | ### Recommendations |
| |
|
| | Users should verify predictions on out-of-domain data and be aware of the model’s limitations regarding chart style and domain. |
| |
|
| | ## How to Get Started with the Model |
| |
|
| | ```python |
| | import torch |
| | from mmdet.apis import inference_detector, init_detector |
| | |
| | config_file = 'legend_match_swin/mask_rcnn_swin_datapoint.py' |
| | checkpoint_file = 'chart_datapoint.pth' |
| | model = init_detector(config_file, checkpoint_file, device='cuda:0') |
| | |
| | result = inference_detector(model, 'example_chart.png') |
| | # result: list of detected masks and class labels |
| | ``` |
| |
|
| | ## Training Details |
| |
|
| | ### Training Data |
| |
|
| | - **Dataset:** Enhanced COCO-style scientific chart dataset with instance masks |
| | - Data point class with pixel-precise segmentation masks |
| | - Images and annotations filtered and preprocessed for optimal Swin Transformer performance |
| |
|
| | ### Training Procedure |
| |
|
| | - Images resized to 1120x672 |
| | - Mask R-CNN with Swin Transformer backbone |
| | - **Training regime:** fp32 |
| | - **Optimizer:** AdamW |
| | - **Batch size:** 8 |
| | - **Epochs:** 36 |
| | - **Learning rate:** 1e-4 |
| |
|
| | ## Evaluation |
| |
|
| | ### Testing Data, Factors & Metrics |
| |
|
| | - **Testing Data:** Held-out split from enhanced COCO-style dataset |
| | - **Factors:** Data point density, image quality |
| | - **Metrics:** mAP (mean Average Precision), AP50, AP75, per-class AP |
| |
|
| | ### Results |
| |
|
| | | Category | mAP | mAP_50 | mAP_75 | mAP_s | mAP_m | mAP_l | |
| | |-----------------|-------|--------|--------|-------|-------|-------| |
| | | data-point | 0.485 | 0.687 | 0.581 | 0.487 | 0.05 | nan | |
| | |
| | #### Summary |
| | |
| | The model achieves strong mAP for data point segmentation, excelling in dense and small-object scenarios. It is highly effective for scientific figures requiring pixel-level accuracy. |
| | |
| | ## Environmental Impact |
| | |
| | - **Hardware Type:** NVIDIA V100 GPU |
| | - **Hours used:** 10 |
| | - **Cloud Provider:** Google Cloud |
| | - **Compute Region:** us-central1 |
| | - **Carbon Emitted:** ~15 kg CO2eq (estimated) |
| | |
| | ## Technical Specifications |
| | |
| | ### Model Architecture and Objective |
| | |
| | - Mask R-CNN with Swin Transformer backbone |
| | - Instance segmentation head for data point class |
| | |
| | ### Compute Infrastructure |
| | |
| | - **Hardware:** NVIDIA V100 GPU |
| | - **Software:** PyTorch 1.13, MMDetection 2.x, Python 3.9 |
| | |
| | ## Citation |
| | |
| | **BibTeX:** |
| | |
| | ```bibtex |
| | @article{DocFigure2021, |
| | title={DocFigure: A Dataset for Scientific Figure Classification}, |
| | author={S. Afzal, et al.}, |
| | journal={arXiv preprint arXiv:2106.01841}, |
| | year={2021} |
| | } |
| | ``` |
| | |
| | **APA:** |
| | |
| | Afzal, S., et al. (2021). DocFigure: A Dataset for Scientific Figure Classification. arXiv preprint arXiv:2106.01841. |
| | |
| | ## Glossary |
| | |
| | - **Data Point:** An individual visual marker representing a value in a scientific chart (e.g., a dot in a scatter plot) |
| | |
| | ## More Information |
| | |
| | - [DocFigure Paper](https://arxiv.org/abs/2106.01841) |
| | |
| | ## Model Card Authors |
| | |
| | Hansheng Zhu |
| | |
| | ## Model Card Contact |
| | |
| | hanszhu05@gmail.com |