OccuFly's Aerial DepthAnythingV2

Introduction

Following its acceptance as a CVPR 2026 Oral, we release our fine-tuned DepthAnythingV2 model, specialized for aerial imagery. It was trained using the OccuFly dataset, the first large-scale, real-world benchmark for aerial Metric Monocular Depth Estimation and Semantic Scene Completion.

This model represents the depth estimation component of our OccuFly project, in which fine-tuned DepthAnythingV2-ViT-S to infer accurate metric depth (in meters) from a single aerial image.

Key Features

Aerial-specialized: Fine-tuned on diverse aerial imagery from urban, industrial, and rural environments.
Multi-altitude performance: Trained on data from 50m, 40m, and 30m altitudes.
Seasonal robustness: Captures data across all seasons for improved generalization.
Lightweight: Uses the ViT-S backbone for efficient inference.

Installation

git clone https://huggingface.co/spaces/depth-anything/Depth-Anything-V2
cd Depth-Anything-V2
pip install -r requirements.txt

Quickstart

Download the model checkpoint and place it in your desired directory:

import cv2
import torch
from depth_anything_v2.dpt import DepthAnythingV2

# Load the fine-tuned aerial model
model = DepthAnythingV2(encoder='vits', features=64, out_channels=[48, 96, 192, 384])
model.load_state_dict(torch.load('OccuFly-DepthAnything2.pth', map_location='cpu'))
model.eval()

# Inference
with torch.no_grad():
    raw_img = cv2.imread('example.jpg')
    depth = model.infer_image(raw_img)  # HxW metric depth map

OccuFly Dataset

The model is fine-tuned on OccuFly, which includes:

20,000+ aerial RGB images with corresponding depth maps
Multiple altitudes: 30m, 40m, 50m flight altitudes
Seasonal diversity: Spring, Summer, Fall, Winter
Multiple environments: Urban, industrial, rural
21 semantic classes with dense voxel grid annotations

Citation

If our work was helpful to you, we would appreciate citing our paper and the original DepthAnythingV2 work, or giving the repository a like ❤️

@inproceedings{gross2026occufly,
    title={{OccuFly: A 3D Vision Benchmark for Semantic Scene Completion from the Aerial Perspective}}, 
    author={Markus Gross and Sai B. Matha and Aya Fahmy and Rui Song and Daniel Cremers and Henri Meess},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2026},
}

@article{depth_anything_v2,
  title={Depth Anything V2},
  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
  journal={arXiv preprint arXiv:2406.09414},
  year={2024}
}

Related Resources

🌐 OccuFly Project Page
🤗 OccuFly Dataset on HuggingFace
📜 OccuFly Paper
🌐 Original DepthAnythingV2

License

This work is licensed under the CC BY-NC-SA 4.0 license. See the LICENSE file for the full legal terms.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Depth Estimation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for markus-42/OccuFly-DepthAnythingV2

OccuFly: A 3D Vision Benchmark for Semantic Scene Completion from the Aerial Perspective

Paper • 2512.20770 • Published Dec 23, 2025 • 1

Depth Anything V2

Paper • 2406.09414 • Published Jun 13, 2024 • 105