Transformers documentation

PP-LCNet

Transformers

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v5.13.0).

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

This model was contributed to Hugging Face Transformers on 2026-03-13.

PP-LCNet

Overview

PP-LCNet PP-LCNet is a family of efficient, lightweight convolutional neural networks designed for real-world document understanding and OCR tasks. It balances accuracy, speed, and model size, making it ideal for both server-side and edge deployment. To address different document processing requirements, PP-LCNet has three main variants, each optimized for a specific task.

Model Architecture

The Document Image Orientation Classification Module is primarily designed to distinguish the orientation of document images and correct them through post-processing. During processes such as document scanning or ID photo capturing, the device might be rotated to achieve clearer images, resulting in images with various orientations. Standard OCR pipelines may not handle these images effectively. By leveraging image classification techniques, the orientation of documents or IDs containing text regions can be pre-determined and adjusted, thereby improving the accuracy of OCR processing.
The Table Classification Module is a key component in computer vision systems, responsible for classifying input table images. The performance of this module directly affects the accuracy and efficiency of the entire table recognition process. The Table Classification Module typically receives table images as input and, using deep learning algorithms, classifies them into predefined categories based on the characteristics and content of the images, such as wired and wireless tables. The classification results from the Table Classification Module serve as output for use in table recognition pipelines.
The text line orientation classification module primarily distinguishes the orientation of text lines and corrects them using post-processing. In processes such as document scanning and license/certificate photography, to capture clearer images, the capture device may be rotated, resulting in text lines in various orientations. Standard OCR pipelines cannot handle such data well. By utilizing image classification technology, the orientation of text lines can be predetermined and adjusted, thereby enhancing the accuracy of OCR processing.

Usage

Single input inference

The example below demonstrates how to classify image with PP-LCNet using Pipeline or the AutoModel.

Pipeline

AutoModel

Batched inference

Here is how you can do it with PP-LCNet using Pipeline or the AutoModel:

Pipeline

AutoModel

PPLCNetForImageClassification

class transformers.PPLCNetForImageClassification

< source >

( config: PPLCNetConfig )

Parameters

config (PPLCNetConfig) — Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

The Pp Lcnet Model with an image classification head on top e.g. for ImageNet.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( pixel_values: torch.FloatTensor | None = None**kwargs: typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs] ) → BaseModelOutputWithNoAttention or tuple(torch.FloatTensor)

Parameters

pixel_values (torch.FloatTensor of shape (batch_size, num_channels, image_size, image_size), optional) — The tensors corresponding to the input images. Pixel values can be obtained using PPLCNetImageProcessor. See PPLCNetImageProcessor.__call__() for details (processor_class uses PPLCNetImageProcessor for processing images).

Returns

BaseModelOutputWithNoAttention or tuple(torch.FloatTensor)

A BaseModelOutputWithNoAttention or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (PPLCNetConfig) and inputs.

The PPLCNetForImageClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

last_hidden_state (torch.FloatTensor of shape (batch_size, num_channels, height, width)) — Sequence of hidden-states at the output of the last layer of the model.
hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) — Tuple of torch.FloatTensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, num_channels, height, width).

Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.

Examples:

>>> import httpx
>>> from io import BytesIO
>>> from PIL import Image
>>> from transformers import AutoModelForImageClassification, AutoImageProcessor

>>> model_path = "PaddlePaddle/PP-LCNet_x1_0_table_cls_safetensors"
>>> model = AutoModelForImageClassification.from_pretrained(model_path)
>>> image_processor = AutoImageProcessor.from_pretrained(model_path)

>>> url = "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/img_rot180_demo.jpg"
>>> with httpx.stream("GET", url) as response:
...     image = Image.open(BytesIO(response.read()))

>>> inputs = image_processor(images=image, return_tensors="pt")
>>> outputs = model(**inputs)
>>> predicted_label = outputs.last_hidden_state.argmax(-1).item()
>>> print(model.config.id2label[predicted_label])
wireless_table

PPLCNetConfig

class transformers.PPLCNetConfig

< source >

( transformers_version: str | None = Nonearchitectures: list[str] | None = Noneoutput_hidden_states: bool | None = Falsereturn_dict: bool | None = Truedtype: typing.Union[str, ForwardRef('torch.dtype'), NoneType] = Nonechunk_size_feed_forward: int = 0is_encoder_decoder: bool = Falseid2label: dict[int, str] | dict[str, str] | None = Nonelabel2id: dict[str, int] | dict[str, str] | None = Noneproblem_type: typing.Optional[typing.Literal['regression', 'single_label_classification', 'multi_label_classification']] = Nonescale: float | int = 1.0block_configs: list | None = Nonestem_channels: int = 16stem_stride: int = 2reduction: int = 4class_expand: int = 1280divisor: int = 8hidden_act: str = 'hardswish'_out_features: list[str] | None = None_out_indices: list[int] | None = Nonehidden_dropout_prob: float | int = 0.2 )

Parameters

scale (float, optional, defaults to 1.0) — The scaling factor for the model’s channel dimensions, used to adjust the model size and computational cost without changing the overall architecture (e.g., 0.25, 0.5, 1.0, 1.5).
block_configs (list[list[tuple]], optional, defaults to None) — Configuration for each block in each stage. Each tuple contains: (kernel_size, in_channels, out_channels, stride, use_squeeze_excitation). If None, uses the default PP-LCNet configuration.
stem_channels (int, optional, defaults to 16) — The number of output channels for the stem layer.
stem_stride (int, optional, defaults to 2) — The stride for the stem convolution layer.
reduction (int, optional, defaults to 4) — The reduction factor for feature channel dimensions in the squeeze-and-excitation (SE) blocks, used to reduce the number of model parameters and computational complexity while maintaining feature representability.
class_expand (int, optional, defaults to 1280) — The number of hidden units in the expansion layer of the classification head, used to enhance the model’s feature representation capability before the final classification layer.
divisor (int, optional, defaults to 8) — The divisor used to ensure that various model parameters (e.g., channel dimensions, kernel sizes) are multiples of this value, promoting efficient model implementation and resource utilization.
hidden_act (str, optional, defaults to hardswish) — The non-linear activation function (function or string) in the decoder. For example, "gelu", "relu", "silu", etc.
hidden_dropout_prob (Union[float, int], optional, defaults to 0.2) — The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.

This is the configuration class to store the configuration of a Pp LcnetModel. It is used to instantiate a Pp Lcnet model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the PaddlePaddle/PP-LCNet_x1_0_doc_ori_safetensors

Configuration objects inherit from PreTrainedConfig and can be used to control the model outputs. Read the documentation from PreTrainedConfig for more information.

PPLCNetBackbone

class transformers.PPLCNetBackbone

< source >

( config: PPLCNetConfig )

Parameters

config (PPLCNetConfig) — Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

PPLCNet backbone model for feature extraction.

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( pixel_values: Tensor**kwargs: typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs] ) → BackboneOutput or tuple(torch.FloatTensor)

Parameters

pixel_values (torch.Tensor of shape (batch_size, num_channels, image_size, image_size)) — The tensors corresponding to the input images. Pixel values can be obtained using PPLCNetImageProcessor. See PPLCNetImageProcessor.__call__() for details (processor_class uses PPLCNetImageProcessor for processing images).

Returns

BackboneOutput or tuple(torch.FloatTensor)

A BackboneOutput or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (PPLCNetConfig) and inputs.

The PPLCNetBackbone forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

feature_maps (tuple(torch.FloatTensor) of shape (batch_size, num_channels, height, width)) — Feature maps of the stages.
hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) — Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size) or (batch_size, num_channels, height, width), depending on the backbone.

Hidden-states of the model at the output of each stage plus the initial embedding outputs.
attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) — Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length). Only applicable if the backbone uses attention.

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Examples:

>>> from transformers import PPLCNetConfig, PPLCNetBackbone
>>> import torch

>>> config = PPLCNetConfig()
>>> model = PPLCNetBackbone(config)

>>> pixel_values = torch.randn(1, 3, 224, 224)

>>> with torch.no_grad():
...     outputs = model(pixel_values)

>>> feature_maps = outputs.feature_maps
>>> list(feature_maps[-1].shape)

PPLCNetImageProcessor

class transformers.PPLCNetImageProcessor

< source >

( **kwargs: typing_extensions.Unpack[transformers.processing_utils.ImagesKwargs] )

Parameters

**kwargs (ImagesKwargs, optional) — Additional image preprocessing options. Model-specific kwargs are listed above; see the TypedDict class for the complete list of supported arguments.

Constructs a PPLCNetImageProcessor image processor.

preprocess

< source >

( images: typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']]*args**kwargs: typing_extensions.Unpack[transformers.processing_utils.ImagesKwargs] ) → ~image_processing_base.BatchFeature

Parameters

images (Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, list[PIL.Image.Image], list[numpy.ndarray], list[torch.Tensor]]) — Image to preprocess. Expects a single or batch of images with pixel values ranging from 0 to 255. If passing in images with pixel values between 0 and 1, set do_rescale=False.
return_tensors (str or TensorType, optional) — Returns stacked tensors if set to 'pt', otherwise returns a list of tensors.
**kwargs (ImagesKwargs, optional) — Additional image preprocessing options. Model-specific kwargs are listed above; see the TypedDict class for the complete list of supported arguments.

Returns

~image_processing_base.BatchFeature

data (dict) — Dictionary of lists/arrays/tensors returned by the call method (‘pixel_values’, etc.).
tensor_type (Union[None, str, TensorType], optional) — You can give a tensor_type here to convert the lists of integers in PyTorch/Numpy Tensors at initialization.

Update on GitHub

←PPChart2Table PPLCNetV3→