Enhance model card for TransOSS: Add metadata, paper, code, and usage details
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,3 +1,140 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: cc-by-4.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-4.0
|
| 3 |
+
pipeline_tag: image-feature-extraction
|
| 4 |
+
library_name: transformers
|
| 5 |
+
tags:
|
| 6 |
+
- re-identification
|
| 7 |
+
- remote-sensing
|
| 8 |
+
- ship-reid
|
| 9 |
+
- cross-modal
|
| 10 |
+
- optical
|
| 11 |
+
- sar
|
| 12 |
+
- vision-transformer
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method
|
| 16 |
+
|
| 17 |
+
This repository contains the **TransOSS** model, a baseline method for cross-modal ship re-identification, presented in the paper [Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method](https://huggingface.co/papers/2506.22027).
|
| 18 |
+
|
| 19 |
+
**TransOSS** is built upon the Vision Transformer architecture, designed to refine patch embedding, incorporate additional embeddings, and utilize contrastive learning for pre-training on large-scale optical-SAR image pairs. This ensures the model's ability to extract modality-invariant features crucial for effective ship tracking using low-Earth orbit optical and SAR sensors.
|
| 20 |
+
|
| 21 |
+
**[π Paper (arXiv)](https://arxiv.org/abs/2506.22027)** | **[π Paper (Hugging Face)](https://huggingface.co/papers/2506.22027)** | **[π» Code](https://github.com/Alioth2000/TransOSS)** | **[ποΈ Dataset](https://zenodo.org/records/15860212)**
|
| 22 |
+
|
| 23 |
+
## Abstract
|
| 24 |
+
|
| 25 |
+
Detecting and tracking ground objects using earth observation imagery remains a significant challenge in the field of remote sensing. Continuous maritime ship tracking is crucial for applications such as maritime search and rescue, law enforcement, and shipping analysis. However, most current ship tracking methods rely on geostationary satellites or video satellites. The former offer low resolution and are susceptible to weather conditions, while the latter have short filming durations and limited coverage areas, making them less suitable for the real-world requirements of ship tracking. To address these limitations, we present the Hybrid Optical and Synthetic Aperture Radar (SAR) Ship Re-Identification Dataset (HOSS ReID dataset), designed to evaluate the effectiveness of ship tracking using low-Earth orbit constellations of optical and SAR sensors. This approach ensures shorter re-imaging cycles and enables all-weather tracking. HOSS ReID dataset includes images of the same ship captured over extended periods under diverse conditions, using different satellites of different modalities at varying times and angles. Furthermore, we propose a baseline method for cross-modal ship re-identification, TransOSS, which is built on the Vision Transformer architecture. It refines the patch embedding structure to better accommodate cross-modal tasks, incorporates additional embeddings to introduce more reference information, and employs contrastive learning to pre-train on large-scale optical-SAR image pairs, ensuring the model's ability to extract modality-invariant features.
|
| 26 |
+
|
| 27 |
+
## HOSS ReID Dataset
|
| 28 |
+
|
| 29 |
+
The HOSS ReID dataset and the associated pretraining dataset are publicly available on [Zenodo](https://zenodo.org/records/15860212).
|
| 30 |
+
The pretraining dataset is constructed based on the [SEN1-2](https://www.kaggle.com/datasets/requiemonk/sentinel12-image-pairs-segregated-by-terrain) and [DFC23](https://ieee-dataport.org/competitions/2023-ieee-grss-data-fusion-contest-large-scale-fine-grained-building-classification) datasets.
|
| 31 |
+
To run TransOSS, please organize the data in the following structure under the `data` directory:
|
| 32 |
+
```
|
| 33 |
+
data
|
| 34 |
+
βββ HOSS
|
| 35 |
+
β βββ bounding_box_test
|
| 36 |
+
β βββ bounding_box_train
|
| 37 |
+
β βββ ...
|
| 38 |
+
βββ OptiSar_Pair
|
| 39 |
+
βββ 0001
|
| 40 |
+
βββ 0002
|
| 41 |
+
βββ ...
|
| 42 |
+
```
|
| 43 |
+

|
| 44 |
+
|
| 45 |
+
## Pipeline
|
| 46 |
+
|
| 47 |
+

|
| 48 |
+
|
| 49 |
+
## Requirements and Installation
|
| 50 |
+
|
| 51 |
+
The Python version used is 3.9, and the PyTorch version is 2.2.2. It is recommended not to use versions lower than these.
|
| 52 |
+
|
| 53 |
+
```bash
|
| 54 |
+
pip install -r requirements.txt
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
## Sample Usage (Feature Extraction)
|
| 58 |
+
|
| 59 |
+
You can use the `TransOSS` model to extract features from optical and SAR images for re-identification. While the original repository uses a custom PyTorch setup, if the model is integrated into the Hugging Face `transformers` ecosystem, you can use the following snippet:
|
| 60 |
+
|
| 61 |
+
```python
|
| 62 |
+
import torch
|
| 63 |
+
from PIL import Image
|
| 64 |
+
from transformers import AutoImageProcessor, AutoModel
|
| 65 |
+
|
| 66 |
+
# Model ID on Hugging Face Hub
|
| 67 |
+
model_id = "Alioth2000/TransOSS"
|
| 68 |
+
|
| 69 |
+
try:
|
| 70 |
+
# Load processor and model
|
| 71 |
+
# For custom models, you might need trust_remote_code=True,
|
| 72 |
+
# or to load a specific model class (e.g., if TransOSS is a custom ViT).
|
| 73 |
+
processor = AutoImageProcessor.from_pretrained(model_id)
|
| 74 |
+
model = AutoModel.from_pretrained(model_id)
|
| 75 |
+
|
| 76 |
+
# Example: Create dummy images for demonstration
|
| 77 |
+
# In a real scenario, load your actual optical and SAR images using Image.open()
|
| 78 |
+
optical_image = Image.new('RGB', (256, 256), color='blue') # Placeholder for an optical image
|
| 79 |
+
sar_image = Image.new('RGB', (256, 256), color='red') # Placeholder for a SAR image
|
| 80 |
+
|
| 81 |
+
# Preprocess images
|
| 82 |
+
inputs = processor(images=[optical_image, sar_image], return_tensors="pt")
|
| 83 |
+
|
| 84 |
+
# Move inputs to device (GPU if available)
|
| 85 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 86 |
+
inputs = {k: v.to(device) for k, v in inputs.items()}
|
| 87 |
+
model.to(device)
|
| 88 |
+
|
| 89 |
+
# Extract features
|
| 90 |
+
model.eval() # Set model to evaluation mode
|
| 91 |
+
with torch.no_grad():
|
| 92 |
+
outputs = model(**inputs)
|
| 93 |
+
|
| 94 |
+
# Typically for feature extraction, the pooler_output or the CLS token's hidden state is used.
|
| 95 |
+
# Adjust based on the actual model output structure.
|
| 96 |
+
# Assuming the model returns pooler_output for a global feature vector.
|
| 97 |
+
optical_features = outputs.pooler_output[0] if hasattr(outputs, 'pooler_output') else outputs.last_hidden_state[:, 0, :][0]
|
| 98 |
+
sar_features = outputs.pooler_output[1] if hasattr(outputs, 'pooler_output') else outputs.last_hidden_state[:, 0, :][1]
|
| 99 |
+
|
| 100 |
+
print(f"Features from optical image (shape): {optical_features.shape}")
|
| 101 |
+
print(f"Features from SAR image (shape): {sar_features.shape}")
|
| 102 |
+
|
| 103 |
+
# For re-identification, you would compare these features, e.g., using cosine similarity:
|
| 104 |
+
# similarity = torch.nn.functional.cosine_similarity(optical_features, sar_features, dim=0)
|
| 105 |
+
# print(f"Cosine similarity: {similarity.item():.4f}")
|
| 106 |
+
|
| 107 |
+
except Exception as e:
|
| 108 |
+
print(f"Failed to run sample usage. Please ensure `transformers` and `Pillow` are installed (`pip install transformers Pillow`).")
|
| 109 |
+
print(f"Also, confirm the model `Alioth2000/TransOSS` is compatible with `AutoModel`/`AutoImageProcessor` or refer to the official GitHub repository for detailed setup: {e}")
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
## Training and Evaluation
|
| 113 |
+
|
| 114 |
+
For detailed instructions on pretraining and fine-tuning the model, please refer to the [official GitHub repository](https://github.com/Alioth2000/TransOSS).
|
| 115 |
+
|
| 116 |
+
An example command for evaluation is provided below:
|
| 117 |
+
|
| 118 |
+
```bash
|
| 119 |
+
python test.py --config_file configs/hoss_transoss.yml MODEL.DEVICE_ID "('0')" TEST.WEIGHT 'weights/HOSS_TransOSS.pth'
|
| 120 |
+
```
|
| 121 |
+
|
| 122 |
+
## Citation
|
| 123 |
+
|
| 124 |
+
If you find this work useful, please cite the paper:
|
| 125 |
+
|
| 126 |
+
```bibtex
|
| 127 |
+
@misc{wang2025crossmodal,
|
| 128 |
+
title={Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method},
|
| 129 |
+
author={Han Wang and Shengyang Li and Jian Yang and Yuxuan Liu and Yixuan Lv and Zhuang Zhou},
|
| 130 |
+
year={2025},
|
| 131 |
+
eprint={2506.22027},
|
| 132 |
+
archivePrefix={arXiv},
|
| 133 |
+
primaryClass={cs.CV},
|
| 134 |
+
url={https://arxiv.org/abs/2506.22027},
|
| 135 |
+
}
|
| 136 |
+
```
|
| 137 |
+
|
| 138 |
+
## Acknowledgement
|
| 139 |
+
|
| 140 |
+
Codebase from [TransReID](https://github.com/damo-cv/TransReID/tree/main), [reid-strong-baseline](https://github.com/michuanhaohao/reid-strong-baseline) , [pytorch-image-models](https://github.com/rwightman/pytorch-image-models).
|