Kreuzberg Layout Models
ONNX models used by Kreuzberg for document layout detection and table structure recognition.
Models
RT-DETR (Document Layout Detection)
| Property | Value |
|---|---|
| Path | rtdetr/model.onnx |
| Size | 169 MB |
| Precision | FP32 |
| Architecture | RT-DETR v2 (Real-Time Detection Transformer) |
| Input | images: [batch, 3, 640, 640] f32 (ImageNet-normalized, letterboxed) |
| Input | orig_target_sizes: [batch, 2] i64 (original [height, width]) |
| Outputs | labels i64, boxes f32 [batch, N, 4], scores f32 |
| Classes | 17 document layout classes |
| SHA256 | 3bf2fb0ee6df87435b7ae47f0f3930ec3dc97ec56fd824acc6d57bc7a6b89ef2 |
Layout Classes: Caption, Footnote, Formula, ListItem, PageFooter, PageHeader, Picture, SectionHeader, Table, Text, Title, DocumentIndex, Code, CheckboxSelected, CheckboxUnselected, Form, KeyValueRegion
SLANet-plus (Table Structure Recognition)
| Property | Value |
|---|---|
| Path | slanet-plus/model.onnx |
| Size | 7.8 MB |
| Precision | FP32 |
| Architecture | SLANet-plus (Sequence-to-Sequence table decoder) |
| Input | x: [1, 3, 488, 488] f32 (BGR channel order, ImageNet-normalized) |
| Outputs | [1, seq_len, 8] cell bbox corners, [1, seq_len, 50] HTML token probabilities |
| Vocabulary | 50 tokens (HTML structure tags, rowspan/colspan 1-20, sos/eos) |
| SHA256 | e0bff8da087f9b83629f1e1a6e0f8252fc2de85a7d80415b3510fc521338da3d |
Attribution & Provenance
RT-DETR
This model is mirrored from docling-project/docling-layout-heron-onnx, created by the Docling team at IBM Research.
- Original repository: docling-project/docling-layout-heron-onnx
- License: Apache-2.0
- Architecture paper: Zhao et al., "DETRs Beat YOLOs on Real-time Object Detection" (arXiv:2304.08069)
- Training data: DocLayNet and internal IBM document datasets
SLANet-plus
This model was converted from PaddlePaddle format to ONNX using Paddle2ONNX. The original model is from the PaddleOCR project by PaddlePaddle.
- Original repository: PaddlePaddle/SLANet_plus
- License: Apache-2.0
- Architecture paper: "PP-StructureV2: A Stronger Document Analysis System" (arXiv:2210.05391)
- Conversion: PaddlePaddle inference format → ONNX via Paddle2ONNX (opset 17)
Usage
These models are automatically downloaded and cached by the Kreuzberg document extraction library. See the layout extraction documentation for details.
License
All models in this repository are distributed under the Apache-2.0 License, consistent with the licenses of the original models.