SmolVLA-CaP-StackBlock-50epochs

This repository contains a SmolVLA policy fine-tuned with LeRobot for the SO101 CAP task Stack RGB Blocks on a Blue Dish. The policy was initialized from CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep and trained for 50 epochs on CoRL2026-CSI/SO101-cap_stack_RGBblock_on_bluedish_10fps.

Model Details

Field	Value
Policy type	`smolvla`
Task	stack red, green, and blue blocks on the blue dish from bottom to top
Robot	SO101 follower
Dataset	`CoRL2026-CSI/SO101-cap_stack_RGBblock_on_bluedish_10fps`
Base model	`CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep`
Training steps	`17100`
Completed step	`17100`
Batch size	`128` per GPU
Effective batch size	`256`
Action chunk size	`50`
Action horizon	`50`
Observation steps	`1`
Inference denoising steps	`50`
Model weights	`model.safetensors` (864.7 MiB)

Training Setup

The run used two CUDA processes with batch_size=128 per process, image augmentation enabled, and camera key remapping from the dataset's raw cameras to the SmolVLA camera names:

observation.images.left_wrist -> observation.images.camera1
observation.images.top        -> observation.images.camera2

The checkpoint was saved locally at step 17100 with LeRobot's preprocessor and postprocessor artifacts included in this repository.

Files

model.safetensors
config.json
train_config.json
policy_preprocessor.json
policy_preprocessor_step_5_normalizer_processor.safetensors
policy_postprocessor.json
policy_postprocessor_step_0_unnormalizer_processor.safetensors

Usage

from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy

policy = SmolVLAPolicy.from_pretrained("CoRL2026-CSI/SmolVLA-CaP-StackBlock-50epochs")

For robot deployment, use the same camera mapping, normalization pipeline, and SO101 action/state conventions used by the training dataset.

Intended Use

This model is intended for imitation-learning experiments and SO101 tabletop manipulation research on the specified CAP task. It is not a general-purpose robot policy and should be validated in a controlled workspace before any hardware deployment.

Limitations

The model was trained on a single task dataset with fixed camera views, object set, action space, and workspace assumptions. No official evaluation success rate is included in this repository.

Downloads last month: 13

Safetensors

Model size

0.5B params

Tensor type

F32

BF16

Video Preview

Robotics

Model tree for CoRL2026-CSI/SmolVLA-CaP-StackBlock-50epochs

Base model

lerobot/smolvla_base

Finetuned

CoRL2026-CSI/smolvla_isaaclab_so101_11task_basecap_3300epi_8ep