Embedl SAM3 (Quantized)
Optimized version of facebook/sam3 for edge deployment.
Mixed-precision INT8/FP16 quantization with hardware-aware optimizations, ready for NVIDIA Jetson AGX Orin and other TensorRT-capable platforms.
Highlights
- Format: ONNX with external weights (
embedl_sam3_quant.onnx+.onnx.data) - Precision: INT8 with sensitive layers kept in FP16
- Runtime: TensorRT (FP16 + INT8 mode)
- Target hardware: NVIDIA Jetson AGX Orin, desktop/server GPUs with TensorRT
Quick Start
1. Download the model
hf download embedl/sam3 embedl_sam3_quant.onnx embedl_sam3_quant.onnx.data infer_trt.py --local-dir .
2. Build the TensorRT engine
WARNING: Validated with TensorRT 10.1 and 10.3 only. Latest versions of TensorRT produce incorrect segmentation masks for this model.
/usr/src/tensorrt/bin/trtexec --onnx=embedl_sam3_quant.onnx \
--fp16 --int8 \
--builderOptimizationLevel=5 \
--memPoolSize=workspace:4294967296 \
--timingCacheFile=embedl_sam3_timing_cache.bin \
--saveEngine=embedl_sam3_quant.engine
3. Run inference
See infer_trt.py for a complete example that runs
text-prompted video segmentation, measures latency, and saves an output video
with mask overlays.
python3 -m venv venv --system-site-packages # Use system TensorRT
source venv/bin/activate
pip install opencv-python transformers av
python infer_trt.py
Files
| File | Description |
|---|---|
embedl_sam3_quant.onnx |
ONNX model graph |
embedl_sam3_quant.onnx.data |
External weights (~3.1 GB) |
infer_trt.py |
TensorRT inference example |
Performance
The input resolution is reduced from the default to 924 to enable TensorRT layer fusions that are not possible at the original size. All benchmarks use this resolution.
NVIDIA L4 GPU
Environment: NVIDIA L4, Driver 570.211.01, CUDA 12.8, TensorRT 10.3
| Configuration | Latency | Speedup |
|---|---|---|
torch.compile (FP16) |
137 ms | 1.0x |
| Embedl Deploy (this model) | 104 ms | 1.32x |
NVIDIA Jetson AGX Orin
| Configuration | Latency | Throughput | Speedup |
|---|---|---|---|
| Baseline (FP16, resized to 924) | 763 ms | 1.31 qps | 1.0x |
| Embedl Deploy (this model) | 462 ms | 2.17 qps | 1.65x |
Accuracy (SA-Co/Gold)
Evaluated on the SA-Co/Gold instance segmentation benchmark (Table 30 in the SAM3 paper). The quantized model retains nearly all of the FP32 accuracy with a tolerance.
Average across all subsets:
| Model | cgF1 | IL_MCC | pos_µF1 |
|---|---|---|---|
| SAM3 (paper, Table 30) | 54.1 | 0.82 | 66.1 |
| SAM3 ONNX FP32 (ours) | 55.56 | 0.823 | 67.45 |
| Embedl SAM3 INT8 (this model) | 53.77 | 0.809 | 66.36 |
Per-subset breakdown:
| Subset | cgF1 (FP32) | cgF1 (INT8) | pos_µF1 (FP32) | pos_µF1 (INT8) |
|---|---|---|---|---|
| Metaclip | 47.92 | 47.07 | 59.24 | 58.54 |
| SA-1B | 53.44 | 52.33 | 61.70 | 61.31 |
| Crowded | 60.28 | 59.09 | 67.54 | 67.25 |
| FG Food | 58.76 | 56.28 | 72.01 | 70.02 |
| Sports Equipment | 67.85 | 65.61 | 75.15 | 73.91 |
| Attributes | 55.11 | 54.12 | 73.08 | 72.57 |
| WikiCommon | 45.57 | 41.85 | 63.46 | 60.88 |
| Average | 55.56 | 53.77 | 67.45 | 66.36 |
Creating Your Own Optimized Models
Deployment-ready models can be created from any supported base model using embedl-deploy, available on PyPI. Detailed tutorials will follow.
License
This model is a derivative of facebook/sam3.
| Component | License |
|---|---|
| Upstream (Meta SAM3) | SAM License |
| Optimized components | Embedl Models Community Licence v1.0 (no redistribution as a hosted service) |
Contact
- Enterprise & commercial inquiries: models@embedl.com
- Technical issues & early access: github.com/embedl/embedl-deploy
We offer engineering support for on-prem/edge deployments and partner co-marketing opportunities.
- Downloads last month
- -
Model tree for embedl/sam3
Base model
facebook/sam3