OccuFly's Aerial DepthAnythingV2

Introduction

Following its acceptance as a CVPR 2026 Oral, we release our fine-tuned DepthAnythingV2 model, specialized for aerial imagery. It was trained using the OccuFly dataset, the first large-scale, real-world benchmark for aerial Metric Monocular Depth Estimation and Semantic Scene Completion.

This model represents the depth estimation component of our OccuFly project, in which fine-tuned DepthAnythingV2-ViT-S to infer accurate metric depth (in meters) from a single aerial image.

Key Features

  • Aerial-specialized: Fine-tuned on diverse aerial imagery from urban, industrial, and rural environments.
  • Multi-altitude performance: Trained on data from 50m, 40m, and 30m altitudes.
  • Seasonal robustness: Captures data across all seasons for improved generalization.
  • Lightweight: Uses the ViT-S backbone for efficient inference.

Installation

git clone https://huggingface.co/spaces/depth-anything/Depth-Anything-V2
cd Depth-Anything-V2
pip install -r requirements.txt

Quickstart

Download the model checkpoint and place it in your desired directory:

import cv2
import torch
from depth_anything_v2.dpt import DepthAnythingV2

# Load the fine-tuned aerial model
model = DepthAnythingV2(encoder='vits', features=64, out_channels=[48, 96, 192, 384])
model.load_state_dict(torch.load('OccuFly-DepthAnything2.pth', map_location='cpu'))
model.eval()

# Inference
with torch.no_grad():
    raw_img = cv2.imread('example.jpg')
    depth = model.infer_image(raw_img)  # HxW metric depth map

OccuFly Dataset

The model is fine-tuned on OccuFly, which includes:

  • 20,000+ aerial RGB images with corresponding depth maps
  • Multiple altitudes: 30m, 40m, 50m flight altitudes
  • Seasonal diversity: Spring, Summer, Fall, Winter
  • Multiple environments: Urban, industrial, rural
  • 21 semantic classes with dense voxel grid annotations

Citation

If our work was helpful to you, we would appreciate citing our paper and the original DepthAnythingV2 work, or giving the repository a like โค๏ธ

@inproceedings{gross2026occufly,
    title={{OccuFly: A 3D Vision Benchmark for Semantic Scene Completion from the Aerial Perspective}}, 
    author={Markus Gross and Sai B. Matha and Aya Fahmy and Rui Song and Daniel Cremers and Henri Meess},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2026},
}

@article{depth_anything_v2,
  title={Depth Anything V2},
  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
  journal={arXiv preprint arXiv:2406.09414},
  year={2024}
}

Related Resources

๐ŸŒ OccuFly Project Page
๐Ÿค— OccuFly Dataset on HuggingFace
๐Ÿ“œ OccuFly Paper
๐ŸŒ Original DepthAnythingV2

License

This work is licensed under the CC BY-NC-SA 4.0 license. See the LICENSE file for the full legal terms.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Papers for markus-42/OccuFly-DepthAnythingV2