Enhance model card for TransOSS: Add metadata, paper, code, and usage details

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +140 -3
README.md CHANGED
@@ -1,3 +1,140 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ pipeline_tag: image-feature-extraction
4
+ library_name: transformers
5
+ tags:
6
+ - re-identification
7
+ - remote-sensing
8
+ - ship-reid
9
+ - cross-modal
10
+ - optical
11
+ - sar
12
+ - vision-transformer
13
+ ---
14
+
15
+ # Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method
16
+
17
+ This repository contains the **TransOSS** model, a baseline method for cross-modal ship re-identification, presented in the paper [Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method](https://huggingface.co/papers/2506.22027).
18
+
19
+ **TransOSS** is built upon the Vision Transformer architecture, designed to refine patch embedding, incorporate additional embeddings, and utilize contrastive learning for pre-training on large-scale optical-SAR image pairs. This ensures the model's ability to extract modality-invariant features crucial for effective ship tracking using low-Earth orbit optical and SAR sensors.
20
+
21
+ **[πŸ“ Paper (arXiv)](https://arxiv.org/abs/2506.22027)** | **[πŸ“„ Paper (Hugging Face)](https://huggingface.co/papers/2506.22027)** | **[πŸ’» Code](https://github.com/Alioth2000/TransOSS)** | **[πŸ—ƒοΈ Dataset](https://zenodo.org/records/15860212)**
22
+
23
+ ## Abstract
24
+
25
+ Detecting and tracking ground objects using earth observation imagery remains a significant challenge in the field of remote sensing. Continuous maritime ship tracking is crucial for applications such as maritime search and rescue, law enforcement, and shipping analysis. However, most current ship tracking methods rely on geostationary satellites or video satellites. The former offer low resolution and are susceptible to weather conditions, while the latter have short filming durations and limited coverage areas, making them less suitable for the real-world requirements of ship tracking. To address these limitations, we present the Hybrid Optical and Synthetic Aperture Radar (SAR) Ship Re-Identification Dataset (HOSS ReID dataset), designed to evaluate the effectiveness of ship tracking using low-Earth orbit constellations of optical and SAR sensors. This approach ensures shorter re-imaging cycles and enables all-weather tracking. HOSS ReID dataset includes images of the same ship captured over extended periods under diverse conditions, using different satellites of different modalities at varying times and angles. Furthermore, we propose a baseline method for cross-modal ship re-identification, TransOSS, which is built on the Vision Transformer architecture. It refines the patch embedding structure to better accommodate cross-modal tasks, incorporates additional embeddings to introduce more reference information, and employs contrastive learning to pre-train on large-scale optical-SAR image pairs, ensuring the model's ability to extract modality-invariant features.
26
+
27
+ ## HOSS ReID Dataset
28
+
29
+ The HOSS ReID dataset and the associated pretraining dataset are publicly available on [Zenodo](https://zenodo.org/records/15860212).
30
+ The pretraining dataset is constructed based on the [SEN1-2](https://www.kaggle.com/datasets/requiemonk/sentinel12-image-pairs-segregated-by-terrain) and [DFC23](https://ieee-dataport.org/competitions/2023-ieee-grss-data-fusion-contest-large-scale-fine-grained-building-classification) datasets.
31
+ To run TransOSS, please organize the data in the following structure under the `data` directory:
32
+ ```
33
+ data
34
+ β”œβ”€β”€ HOSS
35
+ β”‚ β”œβ”€β”€ bounding_box_test
36
+ β”‚ β”œβ”€β”€ bounding_box_train
37
+ β”‚ └── ...
38
+ └── OptiSar_Pair
39
+ β”œβ”€β”€ 0001
40
+ β”œβ”€β”€ 0002
41
+ └── ...
42
+ ```
43
+ ![HOSS ReID Dataset structure](https://github.com/Alioth2000/TransOSS/raw/main/figs/dataset.png)
44
+
45
+ ## Pipeline
46
+
47
+ ![TransOSS Framework](https://github.com/Alioth2000/TransOSS/raw/main/figs/framework.png)
48
+
49
+ ## Requirements and Installation
50
+
51
+ The Python version used is 3.9, and the PyTorch version is 2.2.2. It is recommended not to use versions lower than these.
52
+
53
+ ```bash
54
+ pip install -r requirements.txt
55
+ ```
56
+
57
+ ## Sample Usage (Feature Extraction)
58
+
59
+ You can use the `TransOSS` model to extract features from optical and SAR images for re-identification. While the original repository uses a custom PyTorch setup, if the model is integrated into the Hugging Face `transformers` ecosystem, you can use the following snippet:
60
+
61
+ ```python
62
+ import torch
63
+ from PIL import Image
64
+ from transformers import AutoImageProcessor, AutoModel
65
+
66
+ # Model ID on Hugging Face Hub
67
+ model_id = "Alioth2000/TransOSS"
68
+
69
+ try:
70
+ # Load processor and model
71
+ # For custom models, you might need trust_remote_code=True,
72
+ # or to load a specific model class (e.g., if TransOSS is a custom ViT).
73
+ processor = AutoImageProcessor.from_pretrained(model_id)
74
+ model = AutoModel.from_pretrained(model_id)
75
+
76
+ # Example: Create dummy images for demonstration
77
+ # In a real scenario, load your actual optical and SAR images using Image.open()
78
+ optical_image = Image.new('RGB', (256, 256), color='blue') # Placeholder for an optical image
79
+ sar_image = Image.new('RGB', (256, 256), color='red') # Placeholder for a SAR image
80
+
81
+ # Preprocess images
82
+ inputs = processor(images=[optical_image, sar_image], return_tensors="pt")
83
+
84
+ # Move inputs to device (GPU if available)
85
+ device = "cuda" if torch.cuda.is_available() else "cpu"
86
+ inputs = {k: v.to(device) for k, v in inputs.items()}
87
+ model.to(device)
88
+
89
+ # Extract features
90
+ model.eval() # Set model to evaluation mode
91
+ with torch.no_grad():
92
+ outputs = model(**inputs)
93
+
94
+ # Typically for feature extraction, the pooler_output or the CLS token's hidden state is used.
95
+ # Adjust based on the actual model output structure.
96
+ # Assuming the model returns pooler_output for a global feature vector.
97
+ optical_features = outputs.pooler_output[0] if hasattr(outputs, 'pooler_output') else outputs.last_hidden_state[:, 0, :][0]
98
+ sar_features = outputs.pooler_output[1] if hasattr(outputs, 'pooler_output') else outputs.last_hidden_state[:, 0, :][1]
99
+
100
+ print(f"Features from optical image (shape): {optical_features.shape}")
101
+ print(f"Features from SAR image (shape): {sar_features.shape}")
102
+
103
+ # For re-identification, you would compare these features, e.g., using cosine similarity:
104
+ # similarity = torch.nn.functional.cosine_similarity(optical_features, sar_features, dim=0)
105
+ # print(f"Cosine similarity: {similarity.item():.4f}")
106
+
107
+ except Exception as e:
108
+ print(f"Failed to run sample usage. Please ensure `transformers` and `Pillow` are installed (`pip install transformers Pillow`).")
109
+ print(f"Also, confirm the model `Alioth2000/TransOSS` is compatible with `AutoModel`/`AutoImageProcessor` or refer to the official GitHub repository for detailed setup: {e}")
110
+ ```
111
+
112
+ ## Training and Evaluation
113
+
114
+ For detailed instructions on pretraining and fine-tuning the model, please refer to the [official GitHub repository](https://github.com/Alioth2000/TransOSS).
115
+
116
+ An example command for evaluation is provided below:
117
+
118
+ ```bash
119
+ python test.py --config_file configs/hoss_transoss.yml MODEL.DEVICE_ID "('0')" TEST.WEIGHT 'weights/HOSS_TransOSS.pth'
120
+ ```
121
+
122
+ ## Citation
123
+
124
+ If you find this work useful, please cite the paper:
125
+
126
+ ```bibtex
127
+ @misc{wang2025crossmodal,
128
+ title={Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method},
129
+ author={Han Wang and Shengyang Li and Jian Yang and Yuxuan Liu and Yixuan Lv and Zhuang Zhou},
130
+ year={2025},
131
+ eprint={2506.22027},
132
+ archivePrefix={arXiv},
133
+ primaryClass={cs.CV},
134
+ url={https://arxiv.org/abs/2506.22027},
135
+ }
136
+ ```
137
+
138
+ ## Acknowledgement
139
+
140
+ Codebase from [TransReID](https://github.com/damo-cv/TransReID/tree/main), [reid-strong-baseline](https://github.com/michuanhaohao/reid-strong-baseline) , [pytorch-image-models](https://github.com/rwightman/pytorch-image-models).