| | --- |
| | license: apache-2.0 |
| | tags: |
| | - controlnet |
| | - stable-diffusion |
| | - satellite-imagery |
| | - osm |
| | - image-to-image |
| | - diffusers |
| | base_model: stabilityai/stable-diffusion-2-1-base |
| | pipeline_tag: image-to-image |
| | library_name: diffusers |
| | --- |
| | |
| | # VectorSynth |
| |
|
| | **VectorSynth** is a ControlNet model that generates satellite imagery from OpenStreetMap (OSM) vector data embeddings. It conditions [Stable Diffusion 2.1 Base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base) on rendered OSM text to synthesize realistic aerial imagery. |
| |
|
| | ## Model Description |
| |
|
| | VectorSynth uses a two-stage pipeline: |
| | 1. **RenderEncoder**: Projects 768-dim CLIP text embeddings of OSM text to 3-channel control images |
| | 2. **ControlNet**: Conditions Stable Diffusion 2.1 on the rendered control images |
| |
|
| | This model uses standard CLIP embeddings. For the COSA embedding variant, see [VectorSynth-COSA](https://huggingface.co/MVRL/VectorSynth-COSA). |
| |
|
| | ## Files |
| |
|
| | - `config.json` - ControlNet configuration |
| | - `diffusion_pytorch_model.safetensors` - ControlNet weights |
| | - `render_encoder/clip-render_encoder.pth` - RenderEncoder weights |
| | - `render.py` - RenderEncoder class definition |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @inproceedings{cher2025vectorsynth, |
| | title={VectorSynth: Fine-Grained Satellite Image Synthesis with Structured Semantics}, |
| | author={Cher, Daniel and Wei, Brian and Sastry, Srikumar and Jacobs, Nathan}, |
| | year={2025}, |
| | eprint={arXiv:2511.07744}, |
| | note={arXiv preprint} |
| | } |
| | ``` |
| |
|
| | ## Related Models |
| |
|
| | - [VectorSynth-COSA](https://huggingface.co/MVRL/VectorSynth-COSA) - COSA embedding variant |
| | - [GeoSynth](https://huggingface.co/MVRL/GeoSynth) - Text-to-satellite image generation |