OccuFly's Aerial DepthAnythingV2
Introduction
Following its acceptance as a CVPR 2026 Oral, we release our fine-tuned DepthAnythingV2 model, specialized for aerial imagery. It was trained using the OccuFly dataset, the first large-scale, real-world benchmark for aerial Metric Monocular Depth Estimation and Semantic Scene Completion.
This model represents the depth estimation component of our OccuFly project, in which fine-tuned DepthAnythingV2-ViT-S to infer accurate metric depth (in meters) from a single aerial image.
Key Features
- Aerial-specialized: Fine-tuned on diverse aerial imagery from urban, industrial, and rural environments.
- Multi-altitude performance: Trained on data from 50m, 40m, and 30m altitudes.
- Seasonal robustness: Captures data across all seasons for improved generalization.
- Lightweight: Uses the ViT-S backbone for efficient inference.
Installation
git clone https://huggingface.co/spaces/depth-anything/Depth-Anything-V2
cd Depth-Anything-V2
pip install -r requirements.txt
Quickstart
Download the model checkpoint and place it in your desired directory:
import cv2
import torch
from depth_anything_v2.dpt import DepthAnythingV2
# Load the fine-tuned aerial model
model = DepthAnythingV2(encoder='vits', features=64, out_channels=[48, 96, 192, 384])
model.load_state_dict(torch.load('OccuFly-DepthAnything2.pth', map_location='cpu'))
model.eval()
# Inference
with torch.no_grad():
raw_img = cv2.imread('example.jpg')
depth = model.infer_image(raw_img) # HxW metric depth map
OccuFly Dataset
The model is fine-tuned on OccuFly, which includes:
- 20,000+ aerial RGB images with corresponding depth maps
- Multiple altitudes: 30m, 40m, 50m flight altitudes
- Seasonal diversity: Spring, Summer, Fall, Winter
- Multiple environments: Urban, industrial, rural
- 21 semantic classes with dense voxel grid annotations
Citation
If our work was helpful to you, we would appreciate citing our paper and the original DepthAnythingV2 work, or giving the repository a like โค๏ธ
@inproceedings{gross2026occufly,
title={{OccuFly: A 3D Vision Benchmark for Semantic Scene Completion from the Aerial Perspective}},
author={Markus Gross and Sai B. Matha and Aya Fahmy and Rui Song and Daniel Cremers and Henri Meess},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2026},
}
@article{depth_anything_v2,
title={Depth Anything V2},
author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
journal={arXiv preprint arXiv:2406.09414},
year={2024}
}
Related Resources
๐ OccuFly Project Page
๐ค OccuFly Dataset on HuggingFace
๐ OccuFly Paper
๐ Original DepthAnythingV2
License
This work is licensed under the CC BY-NC-SA 4.0 license. See the LICENSE file for the full legal terms.