docs: add Windows ONNX Runtime usage (CPU / NPU / GPU) with WinML CLI

#21
by xieofxie - opened

This PR adds a small "Run as ONNX" subsection to the Usage section.
It points users at three ways to run the model outside the original
PyTorch path:

- Microsoft's WinML CLI for conversion + quantization
- Windows ML for on-device inference (NPU / GPU / CPU)
- ONNX Runtime for cross-platform inference

Why merge:

- Reaches a much wider audience. Windows is the largest desktop
  install base, and Windows ML now ships built-in NPU support on
  Copilot+ PCs. Surfacing an NPU path on the model card makes this
  model directly discoverable to Windows app developers who would
  otherwise skip it because the existing card only shows the
  PyTorch CPU path.

- Removes the biggest deployment blocker. PyTorch CPU inference
  runs at ~621 ms / image on an Intel Core Ultra 7 258V; the same
  model on the NPU runs at ~44 ms (~14x speedup) at roughly half
  the file size, with mAP within 1% of the baseline. That moves
  this model from "batch / offline" into "interactive UX"
  territory for document / PDF tools.

Benchmark numbers and a full reproduction walkthrough live at:
https://github.com/microsoft/winml-cli/blob/main/examples/microsoft-table-transformer-detection/README.md
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment