docs: add Windows ONNX Runtime usage (CPU / NPU / GPU) with WinML CLI

310e9c4 verified 3 days ago

2.75 kB

	---
	license: mit
	widget:
	- src: https://www.invoicesimple.com/wp-content/uploads/2018/06/Sample-Invoice-printable.png
	example_title: Invoice
	---

	# Table Transformer (fine-tuned for Table Detection)

	Table Transformer (DETR) model trained on PubTables1M. It was introduced in the paper [PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents](https://arxiv.org/abs/2110.00061) by Smock et al. and first released in [this repository](https://github.com/microsoft/table-transformer).

	Disclaimer: The team releasing Table Transformer did not write a model card for this model so this model card has been written by the Hugging Face team.

	## Model description

	The Table Transformer is equivalent to [DETR](https://huggingface.co/docs/transformers/model_doc/detr), a Transformer-based object detection model. Note that the authors decided to use the "normalize before" setting of DETR, which means that layernorm is applied before self- and cross-attention.

	## Usage

	You can use the raw model for detecting tables in documents. See the [documentation](https://huggingface.co/docs/transformers/main/en/model_doc/table-transformer) for more info.

	### Run as ONNX (CPU / NPU / GPU)

	Detect tables ~14× faster on a Windows NPU at half the model size, with mAP within 1% of the original PyTorch checkpoint — by exporting this model to ONNX. You can also export to ONNX to run on CPU or GPU.

	Benchmarked on an Intel Core Ultra 7 258V (PubTables-1M validation, 1000 samples):

	\| Model \| Device \| Precision \| mAP \| mean latency (ms) \| p50 latency (ms) \| Size (MB) \|
	\|---------\|--------------\|-------------\|--------\|-------------------\|------------------\|-----------\|
	\| PyTorch \| CPU \| fp32 \| 0.9887 \| 620.9 \| 600.3 \| 115 \|
	\| ONNX \| OpenVINO NPU \| w8a16 (QDQ) \| 0.9822 \| 44.1 \| 41.6 \| 58 \|

	- How to convert — Export and quantize with [Microsoft's WinML CLI](https://github.com/microsoft/winml-cli). The NPU build is QDQ-quantized to w8a16; fp32 builds for CPU and GPU are also supported. End-to-end build, evaluation, and a Python inference example: [examples/microsoft-table-transformer-detection](https://github.com/microsoft/winml-cli/blob/main/examples/microsoft-table-transformer-detection/README.md).
	- How to run on Windows — Use [Windows ML](https://learn.microsoft.com/en-us/windows/ai/new-windows-ml/overview), which manages execution providers for NPU / GPU / CPU and routes ONNX inference to the right backend automatically.
	- How to run on other platforms — Use [ONNX Runtime](https://onnxruntime.ai/docs/) with the execution provider of your choice (OpenVINO, QNN, DirectML, CUDA, CPU, etc.).