Flux, Explained: Benchmarking Wetland Methane Emissions

Community Article Published January 26, 2026

Introducing X-MethaneWet, the first global, temporal, multi-scale wetland methane dataset, designed to make methane emissions models more trustworthy and interpretable by grounding them in real-world site measurements.

Pictured: The site of a 2024 study on wetland methane emissions at the
St. Jones Reserve in Delaware. Photo Credit: Rodrigo Vargas,
Journal of Geophysical Research: Biogeosciences

Methane (CH₄) is the second most abundant greenhouse gas after carbon dioxide (CO₂) and plays an outsized role in climate change. Largely driven by human activities, CH₄ has a warming impact approximately 86 times greater than CO₂ over a 20-year period, making it a dominant contributor to near-term warming 1. Beyond its well-documented effects to crop yields, vegetation health, and tropospheric ozone, it also poses significant risks to human health. Because CH₄ is relatively short-lived in the atmosphere, with an average lifespan of just 12 years, reduced emissions could rapidly lower ground-level ozone and deliver broad public health benefits 2.

Wetlands are the largest natural source of methane, yet their emissions vary widely across space and time 3. Understanding when, where, and how much methane wetlands release (i.e., methane flux) is essential for managing their impact, but is nontrivial. Emissions are influenced by factors ranging from salinity to vegetation, and scarce methane flux measurements make these interactions difficult to model at scale 4.

X-MethaneWet, a new Hugging Face dataset, bridges this critical gap by providing the first global, temporal, multi-scale wetland methane dataset that pairs physics-based methane simulations with site-level observations. This creative approach enables rigorous, reproducible evaluation of machine learning (ML) models for methane flux, supporting more reliable forecasting and climate research.

What's in the data?

X-MethaneWet synthesizes two complementary data sources: FluxNet-CH4 (observed data, 2024) and TEM-MDM (simulated data, 2018). Let's begin by taking a look at the contents of the FluxNet-CH4 directory.

1. Observed Data: FluxNet-CH4

$ tree ymsum99/X-MethaneWet/
ymsum99/X-MethaneWet/
├── README.md
├── FLUXNET-CH4
│   ├── FLUX_CH4_2024.csv          # Site metadata
│   └── FLUXNET_T1_DD.csv          # Daily flux time series
└── TEM-MDM
    ├── ...

FluxNet-CH4 is a harmonized collection of wetland ecosystem measurements collected using eddy covariance towers across 79 sites, varied by region and climate. Each row represents features associated with an individual site and hourly timestamp collected between 2006-2018. The core target variable is CH₄ flux, measured in molar flux units, or nanomoles of methane per square meter per second (nmol CH₄ m⁻² s⁻¹). Key influencers of methane production, oxidation, and transport, along with site metadata, are included, to enable predictive modeling.

image

In addition to aggregated site-level metadata and summaries (FLUXNET_CH4_2024), the authors provide standardized, quality-controlled, gap-filled daily data (DD) prepared for modeling (FLUXNET_T1_DD), which prioritizes consistency and completeness (check out their paper). In practice, the data represents daily flux time series from individual sites along with potential meteorological drivers of these shifts. The recommended target variable for downstream modeling is FCH4_F_ANNOPTLM, as it represents post-processed, quality controlled estimates.

Here's a simple look into one variable: mean annual temperature, by site.

image

image

2. Simulated Data: TEM-MDM

$ tree ymsum99/X-MethaneWet/
ymsum99/X-MethaneWet/
├── README.md
├── FLUXNET-CH4
│   └── ...
└── TEM-MDM
    ├── phh2o.nc                               # Soil pH
    ├── topsoil_bulk_density.nc                # Topsoil bulk density
    ├── clelev.nc                              # Surface elevation
    ├── clfaotxt.nc                            # Soil texture class
    ├── cltveg.nc                              # Fractional vegetation cover
    ├── vegetation_type_11.nc                  # Vegetation type (11 classes)
    ├── wetlandtype.nc                         # Wetland type classification
    ├── climatetype.nc                         # Climate zone classification
    ├── ch4-1979-2018.txt                      # Atmospheric methane concentration
    ├── kco21979-2018.txt                      # Atmospheric CO₂ concentration
    ├── monthly_NPP_{1979–2018}.nc             # Net primary productivity
    ├── daily_ecmwf_PREC_{1979–2018}.nc        # Daily precipitation
    ├── daily_ecmwf_SOLR_{1979–2018}.nc        # Daily solar radiation
    ├── daily_ecmwf_TAIR_{1979–2018}.nc        # Daily air temperature
    ├── daily_ecmwf_VAPR_{1979–2018}.nc        # Daily atmospheric humidity
    └── CH4_emission_intensity_{1979–2018}.nc  # Methane emission intensity

TEM-MDM is a global, physics-based ecosystem model that generates spatially complete estimates of wetland CH₄ emissions by simulating interactions between soils, water, vegetation, and climate. In X-MethaneWet, TEM-MDM provides gridded methane flux outputs from 1979-2018, along with the drivers used to produce them, enabling ML models to be trained and evaluated at a planetary scale. Each grid cell represents aggregated wetland behavior under local climate conditions. When paired with high-fidelity field measures, these simulations support reproducible benchmarking, cross-scale generalization, and forecasting in data-scarce regions.

image

This file contains modeled daily wetland methane emissions for 2010, mapped on a 0.5° latitude–longitude global grid (~50 x 50 km). Emissions are provided for every grid cell, every day. CH4_emission captures the spatial and seasonal variability of methane flux, enabling a direct comparison with site-level observations.

Here's a look at CH₄ emissions per location, averaged over 2010:

image

image

What's possible?

The bridging of observational and simulated data enables supervised and weakly supervised learning and forecasting capabilities. The authors provide baseline performance evaluations using various sequential deep learning models (e.g., LSTM, Transformer), and explore transfer learning strategies to enhance generalization from simulated to observed data.

Below, you can begin to explore cross-scale methane emissions dynamics by comparing these two powerful datasets. This demo only scratches the surface of what's possible once you bring these datasets together.

Interactive web app can be accessed here.

The BW-Gum site is a flux tower located in Botswana at the edge of a permanently inundated papyrus swamp in the Okavango Delta. A simple comparison of observed versus simulated CH₄ flux data for this site shows a deviation from March-June 2018, with observed flux (blue) showing a clear rise above simulations (red). This discrepancy likely reflects delayed seasonal flooding: when rising water levels inundate drier soils, methane emissions can spike. That's exactly the kind of mismatch site-level flux data can reveal.

What's next?

By pairing site-level methane flux measurements with global, process-based simulations, X-MethaneWet enables rigorous model benchmarking. This new dataset supports ML workflows that:

  • Learn emissions drivers from climate, vegetation, hydrology, and soil properties
  • Diagnose model biases across time and space
  • Measure models against real-world observations
  • Develop hybrid physics-ML approaches
  • Build ML-based corrections

And enables users to answer questions like:

  • Which wetland types contribute disproportionately to methane emissions?
  • What conditions most strongly drive methane release from wetlands?
  • At what spatial or temporal scales do model predictions tend to break down?
  • Where would new measurements most reduce uncerainty in methane estimates?

Credible environmental policy requires scientific models that can be tested against reality. X-MethaneWet provides the data needed to do exactly that for methane, the highest-impact lever for near-term climate action. I can't wait to see what you build with this new dataset on Hugging Face!

Community

Sign up or log in to comment