| # Methane Benchmark Dataset (PINEAPPLE + Clean) | |
| This folder contains the **Methane Benchmark Dataset** in two variants: | |
| - **balanced**: a balanced mix of methane and non-methane patches | |
| - **clean**: **no-methane only** (negative patches) | |
| The dataset combines multiple modalities (HSI and RGB), **simulated Sentinel-2 BOA reflectance (S2 BOA refl)** derived from HSI, **TerraMind TiM-generated products** (including **S2L2A** and **LULC**), text captions, and labels produced by different sources (LLM, human, and TiM/TerraMind). The clean split additionally contains **Intuition-1 simulated data**. | |
| --- | |
| ## 1. Dataset overview | |
| ### 1.1 balanced (PINEAPPLE: methane + non-methane) | |
| - **178 patches**, **27 flights** | |
| - **HSI**: AVIRIS-NG | |
| - **RGB**: RGB renderings / visualizations aligned with the patches | |
| - **Simulated Sentinel-2 (BOA reflectance)**: derived from HSI and stored under `simulated_s2_boarefl_balanced/` | |
| - **TerraMind TiM products** (derived from simulated S2 BOA reflectance; stored under `tim_generation_balanced/`): | |
| - **S2L2A** (TiM-generated) | |
| - **LULC** (TiM-generated, pixel-level) | |
| - Plots and auxiliary outputs | |
| - **Annotations** | |
| - Urban vs. non-urban (image-level): **LLM** | |
| - Urban vs. non-urban (image-level): **human** | |
| - Textual description: **LLM** | |
| ### 1.2 clean (no-methane only) | |
| - **261 patches** (neighboring patches; center patch excluded), **20 flights** | |
| - **HSI**: AVIRIS-NG | |
| - **RGB**: RGB renderings / visualizations aligned with the patches | |
| - **Simulated Sentinel-2 (BOA reflectance)**: derived from HSI and stored under `simulated_s2_boareflclean/` (folder name preserved as exported) | |
| - **TerraMind TiM products** (derived from simulated S2 BOA reflectance; stored under `tim_generation_clean/`): | |
| - **S2L2A** (TiM-generated) | |
| - **LULC** (TiM-generated, pixel-level) | |
| - Plots and auxiliary outputs | |
| - **Intuition-1 simulated data (clean only)**: additional simulated modality for extended ablations and robustness checks (see notes in Section 2) | |
| - **Annotations** | |
| - Urban vs. non-urban (image-level): **LLM** | |
| - Urban vs. non-urban (image-level): **human** | |
| - Textual description: **LLM** | |
| --- | |
| ## 2. Folder structure | |
| Top-level directories: | |
| - `aviris_hsi_balanced/` | |
| AVIRIS-NG hyperspectral patches for the balanced split. | |
| - `aviris_hsi_clean/` | |
| AVIRIS-NG hyperspectral patches for the clean (no-methane) split. | |
| - `rgb_balanced/` | |
| RGB images for the balanced split (aligned to patches). | |
| - `rgb_clean/` | |
| RGB images for the clean split (aligned to patches). | |
| - `captions_balanced/` | |
| LLM-generated text captions/descriptions for the balanced split. | |
| - `captions_clean/` | |
| LLM-generated text captions/descriptions for the clean split. | |
| - `simulated_s2_boarefl_balanced/` | |
| Simulated Sentinel-2 BOA reflectance images for the balanced split (simulated from HSI). | |
| - `simulated_s2_boareflclean/` | |
| Simulated Sentinel-2 BOA reflectance images for the clean split (simulated from HSI; folder name preserved as exported). | |
| - `tim_generation_balanced/` | |
| TerraMind TiM outputs generated from simulated S2 BOA reflectance (balanced split). | |
| Contains (at least): `s2l2a/`, `lulc/`, `classes/`, `plots/`, and auxiliary files (e.g., a legend script). | |
| - `tim_generation_clean/` | |
| TerraMind TiM outputs generated from simulated S2 BOA reflectance (clean split). | |
| Contains the same product types as the balanced split. | |
| - `I1_simulation` | |
| Additional Intuition-1 simulated data aligned with clean split patches. | |
| Other files: | |
| - `truth_false_labels.xlsx` | |
| A compact label file (yes/no style) aggregating selected annotations (LLM, human, TiM classes), depending on your export. | |
| --- | |
| ## 3. Labels and annotation sources | |
| The dataset provides yes/no labels and/or categorical classes from the following sources: | |
| ### 3.1 LLM labels (image-level) | |
| - Urban vs. non-urban classification at image/patch level | |
| - Stored in the exported label file and/or per-sample metadata (depending on your pipeline) | |
| ### 3.2 Human labels (image-level) | |
| - Urban vs. non-urban classification at image/patch level | |
| - Available for at least the clean split (and optionally balanced, depending on the export) | |
| ### 3.3 TerraMind TiM products (pixel-level and per-image products) | |
| - **S2L2A** generated by TerraMind TiM from simulated S2 BOA reflectance | |
| - **LULC** (pixel-level) generated by TerraMind TiM from simulated S2 BOA reflectance | |
| - Stored under `tim_generation_*` (subfolders `s2l2a/`, `lulc/`, and `classes/`) | |
| --- | |
| ## 4. Modality relationships | |
| - **HSI (AVIRIS-NG)** is the primary observation modality. | |
| - **RGB** is a visualization or derived view aligned to the same patch footprint. | |
| - **Simulated Sentinel-2 BOA reflectance (S2 BOA refl)** is simulated from HSI and used as input to TiM/TerraMind. | |
| - **S2L2A** is not directly stored as a standalone raw simulation in the root; it is produced by **TerraMind TiM** and stored inside `tim_generation_*`. | |
| - **LULC** is produced by **TerraMind TiM** (pixel-level) and stored inside `tim_generation_*`. | |
| - **Captions** provide text descriptions for multimodal experiments (retrieval, captioning, instruction-following, VLM/LLM alignment). | |
| - **Intuition-1 simulated data** (clean only) provides an extra modality for robustness and domain-shift experiments. | |
| --- | |
| ## 5. Warning | |
| Before using check dataset class if there was any changes with naming convention of the files. | |