we do not have a full checkpoint conversion validation, if you encounter pipeline loading failure and unsidered output, please contact me via bili_sakura@zju.edu.cn

GeoSynth-ControlNets

We maintain two repositories—one per base checkpoint—each with its compatible ControlNets:

Repo	Base Model	ControlNets
This repo	GeoSynth (text encoder & UNet same as SD 2.1)	GeoSynth-OSM, GeoSynth-Canny, GeoSynth-SAM
GeoSynth-ControlNets-Location	GeoSynth-Location (adds CoordNet branch)	GeoSynth-Location-OSM, GeoSynth-Location-SAM*, GeoSynth-Location-Canny

GeoSynth-Location-SAM controlnet ckpt is missing from source.

This repository

GeoSynth checkpoint — A remote sensing visual generative model. The text encoder and UNet are the same as Stable Diffusion 2.1 (not fine-tuned).
ControlNet models — OSM, Canny, and SAM conditioning, located under controlnet/.

Architecture note: location-conditioned models

Location-conditioned variants (GeoSynth-Location-*) use a different base checkpoint that adds a CoordNet branch. The branch takes [lon, lat] as input, passes it through a SatCLIP location encoder, then through a CoordNet (13 stacked cross-attention blocks, inner dim 256, 4 heads). ControlNet and CoordNet both condition the UNet. See the GeoSynth paper Figure 3.

ControlNet variants (this repo)

Control	Subfolder	Status
OSM	`controlnet/GeoSynth-OSM`	✅ Integrated
Canny	`controlnet/GeoSynth-Canny`	✅ Integrated
SAM	`controlnet/GeoSynth-SAM`	✅ Integrated

Use it with 🧨 diffusers or the Stable Diffusion repository.

Model Sources

Source: GeoSynth
Paper: GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis
Base model: Stable Diffusion 2.1

Examples

Text-to-Image (base GeoSynth)

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("BiliSakura/GeoSynth-ControlNets")
pipe = pipe.to("cuda")

image = pipe("Satellite image features a city neighborhood").images[0]
image.save("generated_city.jpg")

ControlNet (diffusers integration)

Use the 🧨 diffusers ControlNetModel wrapper with StableDiffusionControlNetPipeline:

GeoSynth-OSM — synthesizes satellite images from OpenStreetMap tiles (RGB):

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from PIL import Image
import torch

controlnet = ControlNetModel.from_pretrained(
    "BiliSakura/GeoSynth-ControlNets",
    subfolder="controlnet/GeoSynth-OSM",
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "BiliSakura/GeoSynth-ControlNets",
    controlnet=controlnet,
)
pipe = pipe.to("cuda")
img = Image.open("osm_tile.jpeg")  # OSM tile (RGB, 512x512)
generator = torch.manual_seed(42)
image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0]
image.save("generated_city.jpg")

GeoSynth-Canny — synthesizes satellite images from Canny edge maps:

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from PIL import Image
import torch

controlnet = ControlNetModel.from_pretrained(
    "BiliSakura/GeoSynth-ControlNets",
    subfolder="controlnet/GeoSynth-Canny",
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "BiliSakura/GeoSynth-ControlNets",
    controlnet=controlnet,
)
pipe = pipe.to("cuda")
img = Image.open("canny_edges.jpeg")  # Canny edge image (RGB, 512x512)
generator = torch.manual_seed(42)
image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0]
image.save("generated_city.jpg")

GeoSynth-SAM — synthesizes satellite images from SAM (Segment Anything Model) segmentation masks:

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from PIL import Image
import torch

controlnet = ControlNetModel.from_pretrained(
    "BiliSakura/GeoSynth-ControlNets",
    subfolder="controlnet/GeoSynth-SAM",
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "BiliSakura/GeoSynth-ControlNets",
    controlnet=controlnet,
)
pipe = pipe.to("cuda")
img = Image.open("sam_segmentation.jpeg")  # SAM mask (RGB, 512x512)
generator = torch.manual_seed(42)
image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0]
image.save("generated_city.jpg")

For location-conditioned variants (GeoSynth-Location-OSM, GeoSynth-Location-SAM, GeoSynth-Location-Canny), see the separate GeoSynth-ControlNets-Location repo.

Citation

If you use this model, please cite the GeoSynth paper. For location-conditioned variants, also cite SatCLIP.

@inproceedings{sastry2024geosynth,
  title={GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis},
  author={Sastry, Srikumar and Khanal, Subash and Dhakal, Aayush and Jacobs, Nathan},
  booktitle={IEEE/ISPRS Workshop: Large Scale Computer Vision for Remote Sensing (EARTHVISION)},
  year={2024}
}

@article{klemmer2025satclip,
  title={{SatCLIP}: {Global}, General-Purpose Location Embeddings with Satellite Imagery},
  author={Klemmer, Konstantin and Rolf, Esther and Robinson, Caleb and Mackey, Lester and Ru{\ss}wurm, Marc},
  journal={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={4},
  pages={4347--4355},
  year={2025},
  doi={10.1609/aaai.v39i4.32457}
}

Downloads last month: 2

Collection including BiliSakura/GeoSynth-ControlNets

Remote Sensing Visual Generative Models

Collection

diffusers implementation • 25 items • Updated 6 days ago • 1

Paper for BiliSakura/GeoSynth-ControlNets

GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis

Paper • 2404.06637 • Published Apr 9, 2024