nvidia
/

PointWorld_models

English

Model card Files Files and versions

xet

Community

kmo-nvidia commited on Feb 27

Commit

9e9aff7

verified ·

1 Parent(s): 587721f

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +98 -6

README.md CHANGED Viewed

@@ -1,6 +1,98 @@
----
-license: other
-license_name: nvidia-open-model-license
-license_link: >-
-  https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
----

+---
+license: other
+license_name: nvidia-open-model-license
+license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
+language:
+- en
+---
+# Model Card for PointWorld
+## Description
+PointWorld is an action-conditioned 3D world model for robotic manipulation.
+Pre-trained on 500 hours of in-the-wild 3D interactions, PointWorld predicts environment dynamics from RGB-D capture(s) and robot actions with unified state-action representation as 3D point flows.
+This model card covers the pretrained checkpoints released under the PointWorld checkpoint package.
+## License/Terms of Use
+[NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/)
+## Deployment Geography
+Global
+## Use Case
+Given one or a few RGB-D observations and robot actions, it predicts environment dynamics with unified state-action representation as 3D point flows.
+PointWorld is intended for research and development in robotics, computer vision, and world modeling.
+## Release Date
+- Paper: 01/07/2026 ([arXiv:2601.03782](https://arxiv.org/abs/2601.03782))
+- Checkpoint release: TBD
+## Reference(s)
+- [Project Website](https://point-world.github.io/)
+- [Paper](https://arxiv.org/abs/2601.03782)
+- [Code](https://github.com/NVlabs/PointWorld)
+## Model Architecture
+**Architecture Type:** Transformer
+**Network Architecture:** Point Transformer V3
+## Input
+**Input Type(s):** RGB-D Images, Robot Actions
+**Input Format(s):** RGB image, depth image, action/state tensors
+**Other Properties Related to Input:** Resolution is `320x180` for RGB/depth images.
+## Output
+**Output Type(s):** 3D point flows
+**Output Format:** 3D point trajectories
+Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
+## Software Integration
+### Runtime Engine(s)
+* PyTorch
+### Supported Hardware Microarchitecture Compatibility
+- NVIDIA Ampere
+- NVIDIA Hopper
+### Preferred Operating System(s)
+* Linux
+## Model Version(s)
+v1.0
+## Training, Testing, and Evaluation Datasets
+We perform training, testing, and evaluation on the DROID and BEHAVIOR datasets with custom 3D annotations.
+### DROID
+**Link**: https://droid-dataset.github.io/
+**Data Collection method**: Manual
+**Labeling Method by dataset**: N/A (no labels)
+**Properties**: We use a subset of the DROID dataset filtered by the quality of our custom 3D annotations.
+### BEHAVIOR
+**Link**: https://behavior.stanford.edu/
+**Data Collection method**: Manual
+**Labeling Method by dataset**: N/A (no labels)
+**Properties**: We use a subset of the BEHAVIOR dataset filtered by the interaction quality.
+## Inference
+**Acceleration Engine:** PyTorch
+**Test Hardware:** NVIDIA RTX 4090, NVIDIA H100, NVIDIA A100
+## Ethical Considerations
+NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
+Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).