Depth Anything 3 (small+base) Core AI bundles + card
Browse files- .gitattributes +4 -0
- README.md +84 -0
- base/da3-base_float16.aimodel/main.hash +1 -0
- base/da3-base_float16.aimodel/main.mlirb +3 -0
- base/da3-base_float16.aimodel/metadata.json +7 -0
- base/da3-base_float32.aimodel/main.hash +1 -0
- base/da3-base_float32.aimodel/main.mlirb +3 -0
- base/da3-base_float32.aimodel/metadata.json +7 -0
- small/da3-small_float16.aimodel/main.hash +2 -0
- small/da3-small_float16.aimodel/main.mlirb +3 -0
- small/da3-small_float16.aimodel/metadata.json +7 -0
- small/da3-small_float32.aimodel/main.hash +1 -0
- small/da3-small_float32.aimodel/main.mlirb +3 -0
- small/da3-small_float32.aimodel/metadata.json +7 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
base/da3-base_float16.aimodel/main.mlirb filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
base/da3-base_float32.aimodel/main.mlirb filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
small/da3-small_float16.aimodel/main.mlirb filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
small/da3-small_float32.aimodel/main.mlirb filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,84 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- depth-estimation
|
| 5 |
+
- monocular-depth
|
| 6 |
+
- core-ai
|
| 7 |
+
- coreai
|
| 8 |
+
- apple
|
| 9 |
+
- on-device
|
| 10 |
+
- depth-anything
|
| 11 |
+
pipeline_tag: depth-estimation
|
| 12 |
+
base_model:
|
| 13 |
+
- depth-anything/DA3-SMALL
|
| 14 |
+
- depth-anything/DA3-BASE
|
| 15 |
+
library_name: coreai
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
# Depth Anything 3 — Core AI
|
| 19 |
+
|
| 20 |
+
**The [coreai-model-zoo](https://github.com/john-rocky/coreai-model-zoo)'s first depth model.**
|
| 21 |
+
Monocular (single-image) **relative depth** estimation running fully on-device on Apple's Core AI
|
| 22 |
+
runtime, as a single static `.aimodel`. A conversion of ByteDance's
|
| 23 |
+
[Depth Anything 3](https://github.com/ByteDance-Seed/depth-anything-3)
|
| 24 |
+
([`depth-anything/DA3-SMALL`](https://huggingface.co/depth-anything/DA3-SMALL) /
|
| 25 |
+
[`DA3-BASE`](https://huggingface.co/depth-anything/DA3-BASE), Apache-2.0): a DINOv2 ViT backbone +
|
| 26 |
+
DPT-style head. Drop in an RGB image, get a depth map (and a confidence map). No NMS, no sampling —
|
| 27 |
+
host post-processing is just a colormap.
|
| 28 |
+
|
| 29 |
+
## Bundles
|
| 30 |
+
|
| 31 |
+
| dir | variant | params | dtype | size | M4 Max GPU |
|
| 32 |
+
|---|---|---|---|---|---|
|
| 33 |
+
| `small/da3-small_float16.aimodel` | ViT-S | 34.3M | fp16 | **54 MB** | **65.7 FPS** |
|
| 34 |
+
| `small/da3-small_float32.aimodel` | ViT-S | 34.3M | fp32 | 105 MB | 56.5 FPS |
|
| 35 |
+
| `base/da3-base_float16.aimodel` | ViT-B | 135.4M | fp16 | 202 MB | 26.5 FPS |
|
| 36 |
+
| `base/da3-base_float32.aimodel` | ViT-B | 135.4M | fp32 | 402 MB | 23.0 FPS |
|
| 37 |
+
|
| 38 |
+
`small · fp16` is the on-device hero — 54 MB, 65 FPS at 504² on an M4 Max, comfortably real-time on
|
| 39 |
+
iPhone-class GPUs. Each `.aimodel` is a directory bundle (`main.mlirb` + `metadata.json`).
|
| 40 |
+
|
| 41 |
+
## I/O contract
|
| 42 |
+
|
| 43 |
+
```
|
| 44 |
+
input : image [1, 3, 504, 504] RGB, raw [0, 1] (ImageNet normalization is folded into the graph)
|
| 45 |
+
output: depth [1, 504, 504] relative depth (exp-activated; larger = nearer)
|
| 46 |
+
depth_conf [1, 504, 504] confidence
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
Host: resize the RGB image to 504 × 504 (e.g. cv2 `INTER_AREA`), feed raw [0, 1], run, then resize
|
| 50 |
+
the depth map back to the original H × W. For display, the DA3 convention is inverse-depth →
|
| 51 |
+
percentile 2–98 normalize → `Spectral` colormap.
|
| 52 |
+
|
| 53 |
+
## Fidelity
|
| 54 |
+
|
| 55 |
+
- **Bit-exact conversion:** the Core AI engine matches the PyTorch reference at **cos 1.000000** (≤
|
| 56 |
+
~1e-5 / ~1e-2 per-pixel for fp32 / fp16) on both CPU and GPU, at any fixed input shape.
|
| 57 |
+
- **vs the official DA3 viewer:** **mean Pearson r ≈ 0.98** across diverse aspect ratios (square
|
| 58 |
+
inputs r = 1.000) — within DA3's own resolution sensitivity (its 504-vs-518 outputs differ by
|
| 59 |
+
r ≈ 0.975–0.984).
|
| 60 |
+
|
| 61 |
+
## Usage (CoreAIKit / coreai.runtime)
|
| 62 |
+
|
| 63 |
+
```python
|
| 64 |
+
import coreai.runtime as rt, numpy as np
|
| 65 |
+
from PIL import Image
|
| 66 |
+
|
| 67 |
+
m = await rt.AIModel.load("small/da3-small_float16.aimodel",
|
| 68 |
+
rt.SpecializationOptions.from_preferred_compute_unit_kind(rt.ComputeUnitKind.gpu()))
|
| 69 |
+
fn = m.load_function("main")
|
| 70 |
+
|
| 71 |
+
img = np.asarray(Image.open("photo.jpg").convert("RGB").resize((504, 504)))
|
| 72 |
+
x = (img.astype(np.float16) / 255.0).transpose(2, 0, 1)[None] # raw [0,1], NCHW
|
| 73 |
+
depth = (await fn({"image": rt.NDArray(x)}))["depth"].numpy().reshape(504, 504)
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
## Links
|
| 77 |
+
|
| 78 |
+
- Conversion script + model card: [coreai-model-zoo `zoo/depth-anything-3.md`](https://github.com/john-rocky/coreai-model-zoo/blob/main/zoo/depth-anything-3.md)
|
| 79 |
+
- Source: [Depth Anything 3](https://github.com/ByteDance-Seed/depth-anything-3) · Apache-2.0
|
| 80 |
+
|
| 81 |
+
---
|
| 82 |
+
|
| 83 |
+
*On-device ML / Core ML / Core AI model porting — get in touch: open an issue on the
|
| 84 |
+
[zoo](https://github.com/john-rocky/coreai-model-zoo).*
|
base/da3-base_float16.aimodel/main.hash
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
�8\�I��V!��K�פѪ��J�P��Gg��
|
base/da3-base_float16.aimodel/main.mlirb
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ed385ced49b3e2975621b8d54bead7a410d1aa19d818bc4ac150a1c04767f7bb
|
| 3 |
+
size 201980350
|
base/da3-base_float16.aimodel/metadata.json
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"license" : "Apache-2.0",
|
| 3 |
+
"creationDate" : "20260623T154004Z",
|
| 4 |
+
"assetVersion" : "2.0",
|
| 5 |
+
"description" : "Depth Anything 3 (DINOv2 ViT-S backbone + DualDPT head), monocular depth. Input: RGB [0,1]; outputs: relative depth + confidence map. https:\/\/github.com\/ByteDance-Seed\/depth-anything-3",
|
| 6 |
+
"author" : "ByteDance (Depth Anything 3); Core AI export: coreai-model-zoo"
|
| 7 |
+
}
|
base/da3-base_float32.aimodel/main.hash
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
����Z��,�]-T#�Uo���ZC�Ƀu .p
|
base/da3-base_float32.aimodel/main.mlirb
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8896cd06d05ae7e12ca85d2d5423d7556ff0f8c15a43c10b07c983751f092e70
|
| 3 |
+
size 401763991
|
base/da3-base_float32.aimodel/metadata.json
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"creationDate" : "20260623T153804Z",
|
| 3 |
+
"license" : "Apache-2.0",
|
| 4 |
+
"assetVersion" : "2.0",
|
| 5 |
+
"description" : "Depth Anything 3 (DINOv2 ViT-S backbone + DualDPT head), monocular depth. Input: RGB [0,1]; outputs: relative depth + confidence map. https:\/\/github.com\/ByteDance-Seed\/depth-anything-3",
|
| 6 |
+
"author" : "ByteDance (Depth Anything 3); Core AI export: coreai-model-zoo"
|
| 7 |
+
}
|
small/da3-small_float16.aimodel/main.hash
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
`Iց�
|
| 2 |
+
iLm�&��I�?NP`�q�[���4&�
|
small/da3-small_float16.aimodel/main.mlirb
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6049d6818a0a694c6ded26b38849ae3f4e5060ef71171f975bb29cfa3426e307
|
| 3 |
+
size 54518253
|
small/da3-small_float16.aimodel/metadata.json
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"author" : "ByteDance (Depth Anything 3); Core AI export: coreai-model-zoo",
|
| 3 |
+
"license" : "Apache-2.0",
|
| 4 |
+
"assetVersion" : "2.0",
|
| 5 |
+
"creationDate" : "20260623T153758Z",
|
| 6 |
+
"description" : "Depth Anything 3 (DINOv2 ViT-S backbone + DualDPT head), monocular depth. Input: RGB [0,1]; outputs: relative depth + confidence map. https:\/\/github.com\/ByteDance-Seed\/depth-anything-3"
|
| 7 |
+
}
|
small/da3-small_float32.aimodel/main.hash
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
��fFhdG�%�Rb�[͵n�]�W[o��T� ?�
|
small/da3-small_float32.aimodel/main.mlirb
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1aaaa46646686447e525b60752628a5bcdb56ecb5dd3575b6ffcb354f3093fab
|
| 3 |
+
size 104852846
|
small/da3-small_float32.aimodel/metadata.json
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"description" : "Depth Anything 3 (DINOv2 ViT-S backbone + DualDPT head), monocular depth. Input: RGB [0,1]; outputs: relative depth + confidence map. https:\/\/github.com\/ByteDance-Seed\/depth-anything-3",
|
| 3 |
+
"author" : "ByteDance (Depth Anything 3); Core AI export: coreai-model-zoo",
|
| 4 |
+
"creationDate" : "20260623T152907Z",
|
| 5 |
+
"license" : "Apache-2.0",
|
| 6 |
+
"assetVersion" : "2.0"
|
| 7 |
+
}
|