we do not have a full checkpoint conversion validation, if you encounter pipeline loading failure and unsidered output, please contact me via bili_sakura@zju.edu.cn
GeoSynth-ControlNets
We maintain two repositories—one per base checkpoint—each with its compatible ControlNets:
| Repo | Base Model | ControlNets |
|---|---|---|
| This repo | GeoSynth (text encoder & UNet same as SD 2.1) | GeoSynth-OSM, GeoSynth-Canny, GeoSynth-SAM |
| GeoSynth-ControlNets-Location | GeoSynth-Location (adds CoordNet branch) | GeoSynth-Location-OSM, GeoSynth-Location-SAM*, GeoSynth-Location-Canny |
GeoSynth-Location-SAM controlnet ckpt is missing from source.
This repository
- GeoSynth checkpoint — A remote sensing visual generative model. The text encoder and UNet are the same as Stable Diffusion 2.1 (not fine-tuned).
- ControlNet models — OSM, Canny, and SAM conditioning, located under
controlnet/.
Architecture note: location-conditioned models
Location-conditioned variants (GeoSynth-Location-*) use a different base checkpoint that adds a CoordNet branch. The branch takes [lon, lat] as input, passes it through a SatCLIP location encoder, then through a CoordNet (13 stacked cross-attention blocks, inner dim 256, 4 heads). ControlNet and CoordNet both condition the UNet. See the GeoSynth paper Figure 3.
ControlNet variants (this repo)
| Control | Subfolder | Status |
|---|---|---|
| OSM | controlnet/GeoSynth-OSM |
✅ Integrated |
| Canny | controlnet/GeoSynth-Canny |
✅ Integrated |
| SAM | controlnet/GeoSynth-SAM |
✅ Integrated |
Use it with 🧨 diffusers or the Stable Diffusion repository.
Model Sources
- Source: GeoSynth
- Paper: GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis
- Base model: Stable Diffusion 2.1
Examples
Text-to-Image (base GeoSynth)
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("BiliSakura/GeoSynth-ControlNets")
pipe = pipe.to("cuda")
image = pipe("Satellite image features a city neighborhood").images[0]
image.save("generated_city.jpg")
ControlNet (diffusers integration)
Use the 🧨 diffusers ControlNetModel wrapper with StableDiffusionControlNetPipeline:
GeoSynth-OSM — synthesizes satellite images from OpenStreetMap tiles (RGB):
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from PIL import Image
import torch
controlnet = ControlNetModel.from_pretrained(
"BiliSakura/GeoSynth-ControlNets",
subfolder="controlnet/GeoSynth-OSM",
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"BiliSakura/GeoSynth-ControlNets",
controlnet=controlnet,
)
pipe = pipe.to("cuda")
img = Image.open("osm_tile.jpeg") # OSM tile (RGB, 512x512)
generator = torch.manual_seed(42)
image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0]
image.save("generated_city.jpg")
GeoSynth-Canny — synthesizes satellite images from Canny edge maps:
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from PIL import Image
import torch
controlnet = ControlNetModel.from_pretrained(
"BiliSakura/GeoSynth-ControlNets",
subfolder="controlnet/GeoSynth-Canny",
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"BiliSakura/GeoSynth-ControlNets",
controlnet=controlnet,
)
pipe = pipe.to("cuda")
img = Image.open("canny_edges.jpeg") # Canny edge image (RGB, 512x512)
generator = torch.manual_seed(42)
image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0]
image.save("generated_city.jpg")
GeoSynth-SAM — synthesizes satellite images from SAM (Segment Anything Model) segmentation masks:
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from PIL import Image
import torch
controlnet = ControlNetModel.from_pretrained(
"BiliSakura/GeoSynth-ControlNets",
subfolder="controlnet/GeoSynth-SAM",
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"BiliSakura/GeoSynth-ControlNets",
controlnet=controlnet,
)
pipe = pipe.to("cuda")
img = Image.open("sam_segmentation.jpeg") # SAM mask (RGB, 512x512)
generator = torch.manual_seed(42)
image = pipe("Satellite image features a city neighborhood", image=img, generator=generator, num_inference_steps=20).images[0]
image.save("generated_city.jpg")
For location-conditioned variants (GeoSynth-Location-OSM, GeoSynth-Location-SAM, GeoSynth-Location-Canny), see the separate GeoSynth-ControlNets-Location repo.
Citation
If you use this model, please cite the GeoSynth paper. For location-conditioned variants, also cite SatCLIP.
@inproceedings{sastry2024geosynth,
title={GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis},
author={Sastry, Srikumar and Khanal, Subash and Dhakal, Aayush and Jacobs, Nathan},
booktitle={IEEE/ISPRS Workshop: Large Scale Computer Vision for Remote Sensing (EARTHVISION)},
year={2024}
}
@article{klemmer2025satclip,
title={{SatCLIP}: {Global}, General-Purpose Location Embeddings with Satellite Imagery},
author={Klemmer, Konstantin and Rolf, Esther and Robinson, Caleb and Mackey, Lester and Ru{\ss}wurm, Marc},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={39},
number={4},
pages={4347--4355},
year={2025},
doi={10.1609/aaai.v39i4.32457}
}
- Downloads last month
- 42