language:
- en
license: apache-2.0
tags:
- computer-vision
- image-matching
- overlap-detection
- feature-extraction
datasets:
- SSSSphinx/SCoDe
SCoDe: Scale-aware Co-visible Region Detection for Image Matching
Overview
SCoDe is a scale-aware co-visible region detection model designed for robust image matching. It detects overlapping regions between image pairs while being invariant to scale variations, making it particularly effective for structure-from-motion and 3D reconstruction tasks.
This model is built upon the CCOE (Co-visible region detection with Overlap Estimation) architecture and has been trained on the MegaDepth dataset.
Model Details
- Architecture: CCOE-based transformer with multi-scale attention
- Backbone: ResNet-50
- Input Size: 1024×1024 (configurable)
- Training Dataset: MegaDepth
- Framework: PyTorch
Key Features
- Scale-aware overlap region detection
- Rotation-invariant matching capabilities
- End-to-end trainable pipeline
- Compatible with various feature extractors (SIFT, SuperPoint, D2-Net, R2D2, DISK)
Usage
Installation
pip install torch torchvision
git clone https://github.com/SSSSphinx/SCoDe.git
cd SCoDe
pip install -r requirements.txt
Quick Start
import torch
from src.config.default import get_cfg_defaults
from src.model import CCOE
# Load configuration
cfg = get_cfg_defaults()
cfg.merge_from_file('configs/scode_config.py')
# Initialize model
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = CCOE(cfg.CCOE).eval().to(device)
# Load pre-trained weights
model.load_state_dict(torch.load('weights/scode.pth', map_location=device))
# Model is ready for inference
with torch.no_grad():
# Process image pair (example)
image1 = torch.randn(1, 3, 1024, 1024).to(device)
image2 = torch.randn(1, 3, 1024, 1024).to(device)
output = model({'image1': image1, 'image2': image2})
Training
# Single GPU training
python train_scode.py --num_workers 4 --epoch 15 --batch_size 4 --validation --learning_rate 1e-5
# Multi-GPU distributed training (4 GPUs)
python -m torch.distributed.launch --nproc_per_node 4 --master_port=29501 train_scode.py \
--num_workers 4 --epoch 15 --batch_size 4 --validation --learning_rate 1e-5
Evaluation
Rotation Invariance Evaluation
python rot_inv_eval.py \
--extractors superpoint d2net r2d2 disk \
--image_pairs path/to/image/pairs \
--output_dir outputs/scode_rot_eval
Pose Estimation Evaluation
python eval_pose_estimation.py \
--results_dir outputs/megadepth_results \
--dataset megadepth
Radar Evaluation
python eval_radar.py \
--results_dir outputs/radar_results
Configuration
Main configuration files:
configs/scode_config.py- SCoDe model configurationsrc/config/default.py- Default configuration template
Key Parameters
# Training
cfg.DATASET.TRAIN.IMAGE_SIZE = [1024, 1024]
cfg.DATASET.TRAIN.BATCH_SIZE = 4
cfg.DATASET.TRAIN.PAIRS_LENGTH = 128000
# Validation
cfg.DATASET.VAL.IMAGE_SIZE = [1024, 1024]
# Model
cfg.CCOE.BACKBONE.NUM_LAYERS = 50
cfg.CCOE.BACKBONE.STRIDE = 32
cfg.CCOE.CCA.DEPTH = [2, 2, 2, 2]
cfg.CCOE.CCA.NUM_HEADS = [8, 8, 8, 8]
Dataset
The model is trained on the MegaDepth dataset with scale-aware pair generation.
Dataset preparation:
python dataset_preparation.py \
--base_path dataset/megadepth/MegaDepth \
--num_per_scene 5000
Validation pairs are automatically generated and evaluated during training.
Model Performance
SCoDe demonstrates strong performance on:
- Rotation Invariance: Robust to image rotations up to 360°
- Scale Invariance: Effective across multiple image scales
- Pose Estimation: Improved camera pose estimation on MegaDepth benchmark
- Feature Matching: Enhanced matching accuracy with various feature extractors
Supported Feature Extractors
The model works seamlessly with:
- SIFT (with brute-force matcher)
- SuperPoint (with NN matcher)
- D2-Net
- R2D2
- DISK
Citation
If you find this project useful in your research, please cite our paper:
@article{pan2025scale,
title={Scale-aware co-visible region detection for image matching},
author={Pan, Xu and Xia, Zimin and Zheng, Xianwei},
journal={ISPRS Journal of Photogrammetry and Remote Sensing},
volume={229},
pages={122--137},
year={2025},
publisher={Elsevier}
}
License
This project is licensed under the Apache-2.0 License. See the LICENSE file for details.
Acknowledgments
- MegaDepth - Dataset and benchmarks
- OETR - Model initialization strategies
- PyTorch team for the excellent framework
Contact
For questions or issues, please visit the GitHub repository or contact the authors.
Paper: Scale-aware Co-visible Region Detection for Image Matching
Project Page: https://xupan.top/Projects/scode