Model Card for PointWorld
Description
PointWorld is an action-conditioned 3D world model for robotic manipulation. Pre-trained on 500 hours of in-the-wild 3D interactions, PointWorld predicts environment dynamics from RGB-D capture(s) and robot actions with unified state-action representation as 3D point flows.
This model card covers the pretrained checkpoints released under the PointWorld checkpoint package.
License/Terms of Use
Deployment Geography
Global
Use Case
Given one or a few RGB-D observations and robot actions, it predicts environment dynamics with unified state-action representation as 3D point flows. PointWorld is intended for research and development in robotics, computer vision, and world modeling.
Release Date
- Paper: 01/07/2026 (arXiv:2601.03782)
- Checkpoint release: TBD
Reference(s)
Model Architecture
Architecture Type: Transformer Network Architecture: Point Transformer V3
Input
Input Type(s): RGB-D Images, Robot Actions
Input Format(s): RGB image, depth image, action/state tensors
Other Properties Related to Input: Resolution is 320x180 for RGB/depth images.
Output
Output Type(s): 3D point flows Output Format: 3D point trajectories
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Software Integration
Runtime Engine(s)
- PyTorch
Supported Hardware Microarchitecture Compatibility
- NVIDIA Ampere
- NVIDIA Hopper
Preferred Operating System(s)
- Linux
Model Version(s)
v1.0
Training, Testing, and Evaluation Datasets
We perform training, testing, and evaluation on the DROID and BEHAVIOR datasets with custom 3D annotations.
DROID
Link: https://droid-dataset.github.io/
Data Collection method: Manual
Labeling Method by dataset: N/A (no labels)
Properties: We use a subset of the DROID dataset filtered by the quality of our custom 3D annotations.
BEHAVIOR
Link: https://behavior.stanford.edu/
Data Collection method: Manual
Labeling Method by dataset: N/A (no labels)
Properties: We use a subset of the BEHAVIOR dataset filtered by the interaction quality.
Inference
Acceleration Engine: PyTorch Test Hardware: NVIDIA RTX 4090, NVIDIA H100, NVIDIA A100
Ethical Considerations
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns here.