Transformers documentation

PP-Chart2Table

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v5.3.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

This model was released on 2025-05-20 and added to Hugging Face Transformers on 2026-03-18.

PP-Chart2Table

PyTorch

Overview

PP-Chart2Table is a SOTA multimodal model developed by the PaddlePaddle team, specializing in chart parsing for both Chinese and English. Its high performance is driven by a novel “Shuffled Chart Data Retrieval” training task, which, combined with a refined token masking strategy, significantly improves its efficiency in converting charts to data tables. The model is further strengthened by an advanced data synthesis pipeline that uses high-quality seed data, RAG, and LLMs persona design to create a richer, more diverse training set. To address the challenge of large-scale unlabeled, out-of-distribution (OOD) data, the team implemented a two-stage distillation process, ensuring robust adaptability and generalization on real-world data.

Model Architecture

PP-Chart2Table adopts a multimodal fusion architecture that combines a vision tower for chart feature extraction and a language model for table structure generation, enabling end-to-end chart-to-table conversion.

Usage

Single input inference

The example below demonstrates how to classify image with PP-Chart2Table using Pipeline or the AutoModel.

Pipeline
AutoModel
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="PaddlePaddle/PP-Chart2Table_safetensors")

# PPChart2TableProcessor uses hardcoded "Chart to table" instruction internally via chat template
conversation = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "url": "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/chart_parsing_02.png",
            },
        ],
    },
]
result = pipe(text=conversation)
print(result[0]["generated_text"])

Batched inference

Here is how you can do it with PP-Chart2Table using Pipeline or the AutoModel:

Pipeline
AutoModel
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="PaddlePaddle/PP-Chart2Table_safetensors")

# PPChart2TableProcessor uses hardcoded "Chart to table" instruction internally via chat template
conversation = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "url": "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/chart_parsing_02.png",
            },
        ],
    },
]
result = pipe(text=[conversation, conversation])
print(result[0][0]["generated_text"])

PPChart2TableConfig

class transformers.PPChart2TableConfig

< >

( output_hidden_states: bool | None = False return_dict: bool | None = True dtype: typing.Union[str, ForwardRef('torch.dtype'), NoneType] = None chunk_size_feed_forward: int = 0 is_encoder_decoder: bool = False id2label: dict[int, str] | dict[str, str] | None = None label2id: dict[str, int] | dict[str, str] | None = None problem_type: typing.Optional[typing.Literal['regression', 'single_label_classification', 'multi_label_classification']] = None tokenizer_class: str | transformers.tokenization_utils_base.PreTrainedTokenizerBase | None = None vision_config: dict | transformers.configuration_utils.PreTrainedConfig | None = None text_config: dict | transformers.configuration_utils.PreTrainedConfig | None = None image_token_index: int = 151859 image_seq_length: int = 576 tie_word_embeddings: bool = True )

Parameters

  • output_hidden_states (bool, optional, defaults to False) — Whether or not the model should return all hidden-states.
  • return_dict (bool, optional, defaults to True) — Whether to return a ModelOutput (dataclass) instead of a plain tuple.
  • dtype (Union[str, torch.dtype], optional) — The chunk size of all feed forward layers in the residual attention blocks. A chunk size of 0 means that the feed forward layer is not chunked. A chunk size of n means that the feed forward layer processes n < sequence_length embeddings at a time. For more information on feed forward chunking, see How does Feed Forward Chunking work?.
  • chunk_size_feed_forward (int, optional, defaults to 0) — The dtype of the weights. This attribute can be used to initialize the model to a non-default dtype (which is normally float32) and thus allow for optimal storage allocation. For example, if the saved model is float16, ideally we want to load it back using the minimal amount of memory needed to load float16 weights.
  • is_encoder_decoder (bool, optional, defaults to False) — Whether the model is used as an encoder/decoder or not.
  • id2label (Union[dict[int, str], dict[str, str]], optional) — A map from index (for instance prediction index, or target index) to label.
  • label2id (Union[dict[str, int], dict[str, str]], optional) — A map from label to index for the model.
  • problem_type (Literal[regression, single_label_classification, multi_label_classification], optional) — Problem type for XxxForSequenceClassification models. Can be one of "regression", "single_label_classification" or "multi_label_classification".
  • tokenizer_class (Union[str, ~tokenization_utils_base.PreTrainedTokenizerBase], optional) — The class name of model’s tokenizer.
  • vision_config (Union[dict, ~configuration_utils.PreTrainedConfig], optional) — The config object or dictionary of the vision backbone.
  • text_config (Union[dict, ~configuration_utils.PreTrainedConfig], optional) — The config object or dictionary of the text backbone.
  • image_token_index (int, optional, defaults to 151859) — The image token index used as a placeholder for input images.
  • image_seq_length (int, optional, defaults to 576) — Sequence length of one image embedding.
  • tie_word_embeddings (bool, optional, defaults to True) — Whether to tie weight embeddings according to model’s tied_weights_keys mapping.

This is the configuration class to store the configuration of a Pp Chart2TableModel. It is used to instantiate a Pp Chart2Table model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the PaddlePaddle/PP-Chart2Table_safetensors

Configuration objects inherit from PreTrainedConfig and can be used to control the model outputs. Read the documentation from PreTrainedConfig for more information.

Example:

>>> from transformers import GotOcr2ForConditionalGeneration, PPChart2TableConfig

>>> # Initializing a PPChart2Table style configuration
>>> configuration = PPChart2TableConfig()

>>> # Initializing a model from the PaddlePaddle/PP-Chart2Table_safetensors style configuration
>>> model = GotOcr2ForConditionalGeneration(configuration)  # underlying architecture is Got Ocr 2

>>> # Accessing the model configuration
>>> configuration = model.config

PPChart2TableImageProcessor

class transformers.PPChart2TableImageProcessor

< >

( **kwargs: typing_extensions.Unpack[transformers.processing_utils.ImagesKwargs] )

Parameters

  • **kwargs (ImagesKwargs, optional) — Additional image preprocessing options. Model-specific kwargs are listed above; see the TypedDict class for the complete list of supported arguments.

Constructs a PPChart2TableImageProcessor image processor.

PPChart2TableImageProcessorPil

class transformers.PPChart2TableImageProcessorPil

< >

( **kwargs: typing_extensions.Unpack[transformers.processing_utils.ImagesKwargs] )

Parameters

  • **kwargs (ImagesKwargs, optional) — Additional image preprocessing options. Model-specific kwargs are listed above; see the TypedDict class for the complete list of supported arguments.

Constructs a PPChart2TableImageProcessor image processor.

PPChart2TableProcessor

class transformers.PPChart2TableProcessor

< >

( image_processor = None tokenizer = None chat_template = None **kwargs )

Parameters

  • image_processor (PPChart2TableImageProcessor) — The image processor is a required input.
  • tokenizer (tokenizer_class) — The tokenizer is a required input.
  • chat_template (str) — A Jinja template to convert lists of messages in a chat into a tokenizable string.

Constructs a PPChart2TableProcessor which wraps a image processor and a tokenizer into a single processor.

PPChart2TableProcessor offers all the functionalities of PPChart2TableImageProcessor and tokenizer_class. See the ~PPChart2TableImageProcessor and ~tokenizer_class for more information.

Update on GitHub