English

Model Card for PointWorld

Description

PointWorld is an action-conditioned 3D world model for robotic manipulation. Pre-trained on 500 hours of in-the-wild 3D interactions, PointWorld predicts environment dynamics from RGB-D capture(s) and robot actions with unified state-action representation as 3D point flows.

This model card covers the pretrained checkpoints released under the PointWorld checkpoint package.

License/Terms of Use

NVIDIA Open Model License

Deployment Geography

Global

Use Case

Given one or a few RGB-D observations and robot actions, it predicts environment dynamics with unified state-action representation as 3D point flows. PointWorld is intended for research and development in robotics, computer vision, and world modeling.

Release Date

Reference(s)

Model Architecture

Architecture Type: Transformer Network Architecture: Point Transformer V3

Input

Input Type(s): RGB-D Images, Robot Actions
Input Format(s): RGB image, depth image, action/state tensors
Other Properties Related to Input: Resolution is 320x180 for RGB/depth images.

Output

Output Type(s): 3D point flows Output Format: 3D point trajectories

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration

Runtime Engine(s)

  • PyTorch

Supported Hardware Microarchitecture Compatibility

  • NVIDIA Ampere
  • NVIDIA Hopper

Preferred Operating System(s)

  • Linux

Model Version(s)

v1.0

Training, Testing, and Evaluation Datasets

We perform training, testing, and evaluation on the DROID and BEHAVIOR datasets with custom 3D annotations.

DROID

Link: https://droid-dataset.github.io/

Data Collection method: Manual

Labeling Method by dataset: N/A (no labels)

Properties: We use a subset of the DROID dataset filtered by the quality of our custom 3D annotations.

BEHAVIOR

Link: https://behavior.stanford.edu/

Data Collection method: Manual

Labeling Method by dataset: N/A (no labels)

Properties: We use a subset of the BEHAVIOR dataset filtered by the interaction quality.

Inference

Acceleration Engine: PyTorch Test Hardware: NVIDIA RTX 4090, NVIDIA H100, NVIDIA A100

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns here.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for nvidia/PointWorld_models