PP-DocLayoutV2（ONNX 格式）

本模型為 PaddlePaddle/PP-DocLayoutV2 的 ONNX 轉換版本，使用 Paddle2ONNX v2.1.0 轉換，ONNX Opset 版本為 17。無須安裝 PaddlePaddle，可直接透過 ONNX Runtime 推論。

模型輸入／輸出

輸入

名稱	形狀	型別	說明
`im_shape`	`[batch, 2]`	float32	模型輸入尺寸，固定為 `[800, 800]`（非原始圖片尺寸）
`image`	`[batch, 3, 800, 800]`	float32	預處理後的圖片張量（CHW、RGB、值域 [0, 1]）
`scale_factor`	`[batch, 2]`	float32	縮放比例 `[scale_h, scale_w]`，其中 `scale_h = 800 / original_height`，`scale_w = 800 / original_width`

注意：im_shape 傳入的是模型的輸入解析度（800, 800），而非原始圖片的尺寸。 scale_factor 為「目標尺寸 / 原始尺寸」，方向請勿填反。

輸出

名稱	形狀	型別	說明
`fetch_name_0`	`[N, 8]`	float32	偵測結果：`[label_id, score, xmin, ymin, xmax, ymax, -, -]`
`fetch_name_1`	`[batch]`	int32	每張圖片的有效偵測數量

座標已還原至原始圖片像素空間。

前處理

依據原始模型 inference.yml（norm_type: none，不套用 ImageNet 均值/標準差）：

Resize 至 800 × 800（不保留長寬比）
BGR → RGB，除以 255 縮放至 [0, 1]
HWC → CHW，加入 batch 維度

使用範例（Python）

import cv2
import numpy as np
import onnxruntime as ort

session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])

# 讀取圖片
image = cv2.imread("document.jpg")  # BGR
original_height, original_width = image.shape[:2]

# 前處理
resized = cv2.resize(image, (800, 800))
resized_rgb = cv2.cvtColor(resized, cv2.COLOR_BGR2RGB).astype(np.float32) / 255.0
image_tensor = np.transpose(resized_rgb, (2, 0, 1))[np.newaxis, :]  # [1, 3, 800, 800]

# 準備輸入
# im_shape 為模型輸入尺寸（固定 800x800），不是原始圖片尺寸
im_shape = np.array([[800, 800]], dtype=np.float32)
# scale_factor = 目標尺寸 / 原始尺寸
scale_factor = np.array([[800 / original_height, 800 / original_width]], dtype=np.float32)

# 推論
outputs = session.run(
    ['fetch_name_0', 'fetch_name_1'],
    {
        'im_shape': im_shape,
        'image': image_tensor,
        'scale_factor': scale_factor,
    }
)

detections = outputs[0]          # shape: (N, 8)
valid_count = int(outputs[1][0]) # 有效偵測數量
results = detections[:valid_count]
# results[:, 0] = label_id, results[:, 1] = score
# results[:, 2:6] = [xmin, ymin, xmax, ymax]（已還原至原始圖片空間）

引用

@misc{cui2025paddleocrvlboostingmultilingualdocument,
      title={PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model},
      author={Cheng Cui and Ting Sun and Suyin Liang and Tingquan Gao and Zelun Zhang and Jiaxuan Liu and Xueqing Wang and Changda Zhou and Hongen Liu and Manhui Lin and Yue Zhang and Yubo Zhang and Handong Zheng and Jing Zhang and Jun Zhang and Yi Liu and Dianhai Yu and Yanjun Ma},
      year={2025},
      eprint={2510.14528},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.14528},
}

Downloads last month: -

Model tree for AIPLUX/PP-DocLayoutV2-ONNX

Base model

PaddlePaddle/PP-DocLayoutV2

Quantized

(1)

this model

Paper for AIPLUX/PP-DocLayoutV2-ONNX

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Paper • 2510.14528 • Published Oct 16, 2025 • 120