ML-GOD commited on
Commit
72aafb0
·
verified ·
1 Parent(s): bea8489

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: opentau
3
+ tags:
4
+ - robotics
5
+ - vla
6
+ - pi05
7
+ - robocasa
8
+ - manipulation
9
+ - flow-matching
10
+ - pytorch
11
+ base_model: williamyue/pi05_base
12
+ license: apache-2.0
13
+ datasets:
14
+ - robocasa/CloseToasterOvenDoor
15
+ - robocasa/CloseDishwasher
16
+ - robocasa/CloseOven
17
+ repo_url: https://github.com/TensorAuto/OpenTau
18
+ ---
19
+
20
+ # Robocasa_navigatekitchen
21
+
22
+ A **pi0.5 (π₀.₅)** Vision-Language-Action (VLA) model, finetuned on the **ROBOCASA** robotic manipulation/navigation benchmark using the **OpenTau** training framework. This model is designed to follow natural language instructions to perform navigation tasks in a simulated kitchen environment.
23
+
24
+ **For full documentation, evaluation results, and inference code, please visit the repository:**
25
+ <br>
26
+ 👉 **[https://github.com/TensorAuto/OpenTau](https://github.com/TensorAuto/OpenTau)**
27
+
28
+ ---
29
+
30
+ ## Model Details
31
+
32
+ ### Description
33
+ - **Model Type:** Vision-Language-Action (VLA) Model
34
+ - **Base Architecture:** π₀.₅ (pi0.5) by Physical Intelligence
35
+ - **Backbone:** PaliGemma-3B (VLM) + Gemma-300M (Action Expert)
36
+ - **Training Data:** Robocasa Benchmark
37
+ - **Framework:** OpenTau
38
+
39
+ ### Architecture
40
+ The pi0.5 architecture uses a flow-matching-based policy designed for open-world generalization. It combines a Visual Language Model (VLM) for high-level semantic understanding with a smaller "action expert" model that generates continuous joint trajectories (10-step action chunks) via flow matching.
41
+
42
+ ---
43
+
44
+ ## Training and Evaluation
45
+
46
+ ### Dataset
47
+ This model was finetuned on the **Robocasa** benchmark dataset. The Robocasa suite consists of human-teleoperated and mimicgen demonstrations for manipulation and navigation, covering:
48
+ - **CloseToasterOvenDoor** (Atomic)
49
+ - **CloseDishwasher** (Atomic)
50
+ - **CloseOven** (Atomic)
51
+
52
+ ### Results
53
+ Training on 100 Human demonstrations, our model achieves **70% , 90% and 90%** success rate on CloseToasterOvenDoor, Close Dishwasher and Close Oven tasks respectively.
54
+ For detailed usage instructions, success rates, baseline comparisons, and evaluation protocols, please refer to the [OpenTau GitHub Repository](https://github.com/TensorAuto/OpenTau).