| | --- |
| | language: en |
| | license: apache-2.0 |
| | tags: |
| | - depth-estimation |
| | - computer-vision |
| | - pytorch |
| | - absolute depth |
| | pipeline_tag: depth-estimation |
| | library_name: transformers |
| | --- |
| | |
| | # Depth-CHM Model |
| |
|
| | A fine-tuned Depth Anything V2 model for depth estimation, trained on forest canopy height data. |
| |
|
| | ## Model Description |
| |
|
| | This model is based on [Depth-Anything-V2-Metric-Indoor-Base](https://huggingface.co/depth-anything/Depth-Anything-V2-Metric-Indoor-Base-hf) and fine-tuned for estimating depth/canopy height from aerial imagery. |
| |
|
| | ### Training Details |
| |
|
| | - **Base Model**: depth-anything/Depth-Anything-V2-Metric-Indoor-Base-hf |
| | - **Max Depth**: 40.0 meters |
| | - **Loss Function**: SiLog + 0.1 * L1 Loss |
| | - **Hyperparameter Tuning**: Optuna (50 trials) |
| |
|
| | ## Installation |
| |
|
| | ```bash |
| | pip install transformers torch pillow numpy |
| | ``` |
| |
|
| | ## Usage |
| |
|
| | ### Method 1: Using Pipeline (Recommended) |
| |
|
| | The simplest way to use the model: |
| |
|
| | ```python |
| | from transformers import pipeline |
| | from PIL import Image |
| | import numpy as np |
| | |
| | # Load pipeline |
| | pipe = pipeline(task="depth-estimation", model="Boxiang/depth_chm") |
| | |
| | # Load image |
| | image = Image.open("your_image.png").convert("RGB") |
| | |
| | # Run inference |
| | result = pipe(image) |
| | depth_image = result["depth"] # PIL Image (normalized 0-255) |
| | |
| | # Convert to numpy array and scale to actual depth (0-40m) |
| | max_depth = 40.0 |
| | depth = np.array(depth_image).astype(np.float32) / 255.0 * max_depth |
| | |
| | print(f"Depth shape: {depth.shape}") |
| | print(f"Depth range: [{depth.min():.2f}, {depth.max():.2f}] meters") |
| | ``` |
| |
|
| | ### Method 2: Using AutoImageProcessor + Model |
| |
|
| | For more control over the inference process: |
| |
|
| | ```python |
| | import torch |
| | import torch.nn.functional as F |
| | from transformers import AutoImageProcessor, DepthAnythingForDepthEstimation |
| | from PIL import Image |
| | import numpy as np |
| | |
| | # Configuration |
| | model_id = "Boxiang/depth_chm" |
| | max_depth = 40.0 |
| | |
| | # Load model and processor |
| | processor = AutoImageProcessor.from_pretrained(model_id) |
| | model = DepthAnythingForDepthEstimation.from_pretrained(model_id) |
| | |
| | # Use GPU if available |
| | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
| | model = model.to(device) |
| | model.eval() |
| | |
| | # Load and process image |
| | image = Image.open("your_image.png").convert("RGB") |
| | original_size = image.size # (width, height) |
| | |
| | # Prepare input |
| | inputs = processor(images=image, return_tensors="pt") |
| | pixel_values = inputs["pixel_values"].to(device) |
| | |
| | # Run inference |
| | with torch.no_grad(): |
| | outputs = model(pixel_values) |
| | predicted_depth = outputs.predicted_depth |
| | |
| | # Scale by max_depth |
| | pred_scaled = predicted_depth * max_depth |
| | |
| | # Resize to original image size |
| | depth = F.interpolate( |
| | pred_scaled.unsqueeze(0), |
| | size=(original_size[1], original_size[0]), # (height, width) |
| | mode="bilinear", |
| | align_corners=True |
| | ).squeeze().cpu().numpy() |
| | |
| | print(f"Depth shape: {depth.shape}") |
| | print(f"Depth range: [{depth.min():.2f}, {depth.max():.2f}] meters") |
| | ``` |
| |
|
| | ### Method 3: Local Model Path |
| |
|
| | If you have the model saved locally: |
| |
|
| | ```python |
| | from transformers import AutoImageProcessor, DepthAnythingForDepthEstimation |
| | |
| | # Load from local path |
| | model_path = "./depth_chm_trained" |
| | processor = AutoImageProcessor.from_pretrained(model_path, local_files_only=True) |
| | model = DepthAnythingForDepthEstimation.from_pretrained(model_path, local_files_only=True) |
| | ``` |
| |
|
| | ## Output Format |
| |
|
| | - **Pipeline output**: Returns a PIL Image with normalized depth values (0-255). Multiply by `max_depth / 255.0` to get actual depth in meters. |
| | - **Model output**: Returns `predicted_depth` tensor with values in range [0, 1]. Multiply by `max_depth` (40.0) to get actual depth in meters. |
| |
|
| | ## Depth vs Height Conversion |
| |
|
| | The model outputs **depth** (distance from camera). To convert to **height** (like CHM - Canopy Height Model): |
| |
|
| | ```python |
| | height = max_depth - depth |
| | ``` |
| |
|
| | ## Model Files |
| |
|
| | - `model.safetensors` - Model weights |
| | - `config.json` - Model configuration |
| | - `preprocessor_config.json` - Image processor configuration |
| | - `training_info.json` - Training hyperparameters |
| |
|
| | ## Citation |
| |
|
| | If you use this model, please cite: |
| |
|
| | ```bibtex |
| | @misc{depth_chm_2024, |
| | title={Depth-CHM: Fine-tuned Depth Anything V2 for Canopy Height Estimation}, |
| | author={Boxiang}, |
| | year={2024}, |
| | url={https://huggingface.co/Boxiang/depth_chm} |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | This model inherits the license from the base Depth Anything V2 model. |
| |
|