SynWTS: Synthetic Woven Traffic Safety Dataset

SynWTS is a high-fidelity synthetic dataset built as a Digital Twin of the Woven Traffic Safety (WTS) dataset. It is developed for the 2026 AI City Challenge (Track 2) to advance Sim2Real research in transportation safety understanding.

Dataset Summary

Participants in the Sim2Real challenge must train models exclusively on this synthetic data and evaluate performance on real-world video. SynWTS provides a geometric match to real-world test locations, focusing on pedestrian-involved incidents with multi-view 1080p video, structured temporal captions, and complex Visual Question Answering (VQA) pairs.

Key Features

Sim2Real Benchmark: Specifically designed to bridge the gap between NVIDIA Isaac Sim environments and real-world traffic scenarios.
Multi-View Perception: Synchronized views from overhead infrastructure cameras and vehicle-ego perspectives.
Temporal Segmentation: Scenarios are partitioned into five safety-critical phases: Pre-recognition, Recognition, Judgment, Action, and Avoidance.
Structured Annotations: Descriptions cover four pillars: Location, Attention, Behavior, and Context.

Dataset Structure

Directory Layout

data/
├── videos/
│   └── {split}/{scenario}/{view}/*.mp4
├── annotations/
│   ├── caption/
│   │   └── {split}/{scenario}/{view}/{scenario}_caption.json
│   ├── bbox_annotated/
│   │   ├── pedestrian/{split}/{scenario}/{view}/{scenario}_{camera_id}_bbox.json
│   │   └── vehicle/{split}/{scenario}/overhead_view/{scenario}_{camera_id}_bbox.json
│   └── vqa/
│       └── {split}/{scenario}/{view}/{scenario}.json

{split} = train | val | test

{view} = overhead_view | vehicle_view | environment

{camera_id} = {camera_ip_address}_{direction_id} | vehicle_view

Data Fields & Samples

1. Fine-Grained Captions

Captions are generated from a checklist of 170+ traffic items. Each event phase contains a distinct caption for the pedestrian and the vehicle. We used the same annotations as in the WTS dataset and only updated necessary details that could not be simulated in the current version.

Sample (from overhead_view_caption.json):

{
    "id": 765,
    "event_phase": [
        {
            "labels": ["4"],
            "caption_pedestrian": "The pedestrian was a male in his 30s walking slowly... He was standing close behind a vehicle... Although he almost noticed the vehicle, he seemed unaware of it.",
            "caption_vehicle": "The vehicle was on the left side of the pedestrian and was close to them... The vehicle slightly collided with the pedestrian while moving at a speed of 0 km/h.",
            "start_time": "8.993",
            "end_time": "14.903"
        }
    ]
}

2. Visual Question Answering (VQA)

Includes multiple-choice questions covering position, distance, visibility, and actions.

Sample (from vqa-vehicle_view.json):

{
    "question": "What is the action taken by vehicle?",
    "a": "Swerved to the left to avoid",
    "b": "Swerved to the right, but could not avoid",
    "c": "Tried sudden braking but could not avoid",
    "d": "Collided with the pedestrian",
    "correct": "d"
}

Technical Specifications & Limitations

Digital Twin Characteristics

Environmental Fidelity: Roads and buildings are a close geometric match to real-world WTS locations.
No 3D Gaze: Unlike the original WTS, 3D gaze and head bounding boxes are not included due to simulation constraints.
Character Dynamics: Poses are simulated and may not perfectly replicate real-world physics.
Object Limitations: Characters do not hold hand-held objects (umbrellas, phones) that may appear in the real-world test set. Labels/VQA have been adjusted accordingly.

Test Set

The dataset only includes the train and val sets of the data. The test set will be the "internal" or "main" subset of the WTS Dataset. Note that the WTS dataset also contais a BDD_PC_5K subset in its train/val/test splits that will not be used for this challenge since synthetic versions of those scenarios are not included in our training and validation sets.

Release Schedule

Initial Release: 80 scenarios (May 1, 2026)
Mid-May Update: 144 scenarios (May 11, 2026)
Final Dataset: ~249 scenarios total (Expected May 25, 2026).

Team & Credits

Santa Clara University

Dhanishtha Patil, Ridham Kachhadiya, Andrew Vattuone, and David C. Anastasiu

NVIDIA

Haoquan Liang, Jiajun Li, Yuxing Wang, and Thomas Tang

Woven by Toyota

Ashutosh Kumar and Quan Kong

Point of Contact:

For questions regarding the SynWTS dataset or the AI City Challenge Track 2, please contact:

David C. Anastasiu

Email: danastasiu@scu.edu

Citation

Please cite the original WTS paper and the 2026 AI City Challenge:

@article{kong2024wts,
  title={WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding},
  author={Kong, Quan and Kumar, Ashutosh and others},
  journal={arXiv preprint arXiv:2407.15350},
  year={2024}
}

Stay tuned for an updated citation to our dataset paper.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for mlcglab/synwts

WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding

Paper • 2407.15350 • Published Jul 22, 2024