mlboydaisuke commited on
Commit
ee6b5cf
·
verified ·
1 Parent(s): accfbdb

Depth Anything 3 (small+base) Core AI bundles + card

Browse files
.gitattributes CHANGED
@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ base/da3-base_float16.aimodel/main.mlirb filter=lfs diff=lfs merge=lfs -text
37
+ base/da3-base_float32.aimodel/main.mlirb filter=lfs diff=lfs merge=lfs -text
38
+ small/da3-small_float16.aimodel/main.mlirb filter=lfs diff=lfs merge=lfs -text
39
+ small/da3-small_float32.aimodel/main.mlirb filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - depth-estimation
5
+ - monocular-depth
6
+ - core-ai
7
+ - coreai
8
+ - apple
9
+ - on-device
10
+ - depth-anything
11
+ pipeline_tag: depth-estimation
12
+ base_model:
13
+ - depth-anything/DA3-SMALL
14
+ - depth-anything/DA3-BASE
15
+ library_name: coreai
16
+ ---
17
+
18
+ # Depth Anything 3 — Core AI
19
+
20
+ **The [coreai-model-zoo](https://github.com/john-rocky/coreai-model-zoo)'s first depth model.**
21
+ Monocular (single-image) **relative depth** estimation running fully on-device on Apple's Core AI
22
+ runtime, as a single static `.aimodel`. A conversion of ByteDance's
23
+ [Depth Anything 3](https://github.com/ByteDance-Seed/depth-anything-3)
24
+ ([`depth-anything/DA3-SMALL`](https://huggingface.co/depth-anything/DA3-SMALL) /
25
+ [`DA3-BASE`](https://huggingface.co/depth-anything/DA3-BASE), Apache-2.0): a DINOv2 ViT backbone +
26
+ DPT-style head. Drop in an RGB image, get a depth map (and a confidence map). No NMS, no sampling —
27
+ host post-processing is just a colormap.
28
+
29
+ ## Bundles
30
+
31
+ | dir | variant | params | dtype | size | M4 Max GPU |
32
+ |---|---|---|---|---|---|
33
+ | `small/da3-small_float16.aimodel` | ViT-S | 34.3M | fp16 | **54 MB** | **65.7 FPS** |
34
+ | `small/da3-small_float32.aimodel` | ViT-S | 34.3M | fp32 | 105 MB | 56.5 FPS |
35
+ | `base/da3-base_float16.aimodel` | ViT-B | 135.4M | fp16 | 202 MB | 26.5 FPS |
36
+ | `base/da3-base_float32.aimodel` | ViT-B | 135.4M | fp32 | 402 MB | 23.0 FPS |
37
+
38
+ `small · fp16` is the on-device hero — 54 MB, 65 FPS at 504² on an M4 Max, comfortably real-time on
39
+ iPhone-class GPUs. Each `.aimodel` is a directory bundle (`main.mlirb` + `metadata.json`).
40
+
41
+ ## I/O contract
42
+
43
+ ```
44
+ input : image [1, 3, 504, 504] RGB, raw [0, 1] (ImageNet normalization is folded into the graph)
45
+ output: depth [1, 504, 504] relative depth (exp-activated; larger = nearer)
46
+ depth_conf [1, 504, 504] confidence
47
+ ```
48
+
49
+ Host: resize the RGB image to 504 × 504 (e.g. cv2 `INTER_AREA`), feed raw [0, 1], run, then resize
50
+ the depth map back to the original H × W. For display, the DA3 convention is inverse-depth →
51
+ percentile 2–98 normalize → `Spectral` colormap.
52
+
53
+ ## Fidelity
54
+
55
+ - **Bit-exact conversion:** the Core AI engine matches the PyTorch reference at **cos 1.000000** (≤
56
+ ~1e-5 / ~1e-2 per-pixel for fp32 / fp16) on both CPU and GPU, at any fixed input shape.
57
+ - **vs the official DA3 viewer:** **mean Pearson r ≈ 0.98** across diverse aspect ratios (square
58
+ inputs r = 1.000) — within DA3's own resolution sensitivity (its 504-vs-518 outputs differ by
59
+ r ≈ 0.975–0.984).
60
+
61
+ ## Usage (CoreAIKit / coreai.runtime)
62
+
63
+ ```python
64
+ import coreai.runtime as rt, numpy as np
65
+ from PIL import Image
66
+
67
+ m = await rt.AIModel.load("small/da3-small_float16.aimodel",
68
+ rt.SpecializationOptions.from_preferred_compute_unit_kind(rt.ComputeUnitKind.gpu()))
69
+ fn = m.load_function("main")
70
+
71
+ img = np.asarray(Image.open("photo.jpg").convert("RGB").resize((504, 504)))
72
+ x = (img.astype(np.float16) / 255.0).transpose(2, 0, 1)[None] # raw [0,1], NCHW
73
+ depth = (await fn({"image": rt.NDArray(x)}))["depth"].numpy().reshape(504, 504)
74
+ ```
75
+
76
+ ## Links
77
+
78
+ - Conversion script + model card: [coreai-model-zoo `zoo/depth-anything-3.md`](https://github.com/john-rocky/coreai-model-zoo/blob/main/zoo/depth-anything-3.md)
79
+ - Source: [Depth Anything 3](https://github.com/ByteDance-Seed/depth-anything-3) · Apache-2.0
80
+
81
+ ---
82
+
83
+ *On-device ML / Core ML / Core AI model porting — get in touch: open an issue on the
84
+ [zoo](https://github.com/john-rocky/coreai-model-zoo).*
base/da3-base_float16.aimodel/main.hash ADDED
@@ -0,0 +1 @@
 
 
1
+ �8\�I��V!��K�פѪ��J�P��Gg��
base/da3-base_float16.aimodel/main.mlirb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ed385ced49b3e2975621b8d54bead7a410d1aa19d818bc4ac150a1c04767f7bb
3
+ size 201980350
base/da3-base_float16.aimodel/metadata.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "license" : "Apache-2.0",
3
+ "creationDate" : "20260623T154004Z",
4
+ "assetVersion" : "2.0",
5
+ "description" : "Depth Anything 3 (DINOv2 ViT-S backbone + DualDPT head), monocular depth. Input: RGB [0,1]; outputs: relative depth + confidence map. https:\/\/github.com\/ByteDance-Seed\/depth-anything-3",
6
+ "author" : "ByteDance (Depth Anything 3); Core AI export: coreai-model-zoo"
7
+ }
base/da3-base_float32.aimodel/main.hash ADDED
@@ -0,0 +1 @@
 
 
1
+ ����Z��,�]-T#�Uo���ZC� Ƀu .p
base/da3-base_float32.aimodel/main.mlirb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8896cd06d05ae7e12ca85d2d5423d7556ff0f8c15a43c10b07c983751f092e70
3
+ size 401763991
base/da3-base_float32.aimodel/metadata.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "creationDate" : "20260623T153804Z",
3
+ "license" : "Apache-2.0",
4
+ "assetVersion" : "2.0",
5
+ "description" : "Depth Anything 3 (DINOv2 ViT-S backbone + DualDPT head), monocular depth. Input: RGB [0,1]; outputs: relative depth + confidence map. https:\/\/github.com\/ByteDance-Seed\/depth-anything-3",
6
+ "author" : "ByteDance (Depth Anything 3); Core AI export: coreai-model-zoo"
7
+ }
small/da3-small_float16.aimodel/main.hash ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ `Iց�
2
+ iLm�&��I�?NP`�q�[���4&�
small/da3-small_float16.aimodel/main.mlirb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6049d6818a0a694c6ded26b38849ae3f4e5060ef71171f975bb29cfa3426e307
3
+ size 54518253
small/da3-small_float16.aimodel/metadata.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "author" : "ByteDance (Depth Anything 3); Core AI export: coreai-model-zoo",
3
+ "license" : "Apache-2.0",
4
+ "assetVersion" : "2.0",
5
+ "creationDate" : "20260623T153758Z",
6
+ "description" : "Depth Anything 3 (DINOv2 ViT-S backbone + DualDPT head), monocular depth. Input: RGB [0,1]; outputs: relative depth + confidence map. https:\/\/github.com\/ByteDance-Seed\/depth-anything-3"
7
+ }
small/da3-small_float32.aimodel/main.hash ADDED
@@ -0,0 +1 @@
 
 
1
+ ��fFhdG�%�Rb�[͵n�]�W[o��T� ?�
small/da3-small_float32.aimodel/main.mlirb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1aaaa46646686447e525b60752628a5bcdb56ecb5dd3575b6ffcb354f3093fab
3
+ size 104852846
small/da3-small_float32.aimodel/metadata.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "description" : "Depth Anything 3 (DINOv2 ViT-S backbone + DualDPT head), monocular depth. Input: RGB [0,1]; outputs: relative depth + confidence map. https:\/\/github.com\/ByteDance-Seed\/depth-anything-3",
3
+ "author" : "ByteDance (Depth Anything 3); Core AI export: coreai-model-zoo",
4
+ "creationDate" : "20260623T152907Z",
5
+ "license" : "Apache-2.0",
6
+ "assetVersion" : "2.0"
7
+ }