| --- |
| license: mit |
| library_name: stable-baselines3 |
| tags: |
| - reinforcement-learning |
| - robotics |
| - autonomous-navigation |
| - ros2 |
| - gazebo |
| - sac |
| - lidar |
| - camera |
| - multi-input |
| pipeline_tag: reinforcement-learning |
| --- |
| |
| # RC Car Autonomous Navigation — SAC (Camera + LiDAR) |
|
|
| A **Soft Actor-Critic (SAC)** agent trained to autonomously navigate an RC car in a simulated Gazebo environment using both **camera images** and **LiDAR sensor data** as observations. The agent learns to reach target positions while avoiding obstacles. |
|
|
| --- |
|
|
| ## Model Description |
|
|
| This model uses a **MultiInputPolicy** with a hybrid perception backbone: |
|
|
| - **Visual stream** — RGB camera frames processed by a CNN (NatureCNN) |
| - **Sensor stream** — LiDAR point cloud + navigation state processed by an MLP |
|
|
| Both streams are fused and fed into the SAC actor/critic networks for end-to-end policy learning. |
|
|
| | Property | Value | |
| |---|---| |
| | Algorithm | Soft Actor-Critic (SAC) | |
| | Policy | `MultiInputPolicy` | |
| | Observation | `Dict` — image `(64×64×3)` + sensor vector `(184,)` | |
| | Action Space | `Box([-1, -1], [1, 1])` — speed & steering | |
| | Simulator | Gazebo (Ignition/Harmonic) via ROS 2 | |
| | Framework | Stable-Baselines3 | |
|
|
| --- |
|
|
| ## Environments |
|
|
| Two training environments are available: |
|
|
| ### `RcCarTargetEnv` |
| The robot spawns at a random position and must navigate to a randomly placed target (red sphere marker). No dynamic obstacles. |
|
|
| ### `RcCarComplexEnv` |
| Same goal-reaching task but with **6 randomly placed box obstacles** that are reshuffled every episode, requiring active collision avoidance. |
|
|
| --- |
|
|
| ## Observation Space |
|
|
| ```python |
| spaces.Dict({ |
| "image": spaces.Box(low=0, high=255, shape=(64, 64, 3), dtype=np.uint8), |
| "sensor": spaces.Box(low=0.0, high=1.0, shape=(184,), dtype=np.float32) |
| }) |
| ``` |
|
|
| The `sensor` vector contains: |
| - **[0:180]** — Normalised LiDAR ranges (180 beams, max range 10 m) |
| - **[180]** — Normalised linear speed |
| - **[181]** — Normalised steering angle |
| - **[182]** — Normalised distance to target (clipped at 10 m) |
| - **[183]** — Normalised relative angle to target |
|
|
| --- |
|
|
| ## Action Space |
|
|
| ```python |
| spaces.Box(low=[-1.0, -1.0], high=[1.0, 1.0], dtype=np.float32) |
| ``` |
|
|
| | Index | Meaning | Scale | |
| |---|---|---| |
| | `action[0]` | Linear speed | × 1.0 m/s | |
| | `action[1]` | Steering angle | × 0.6 rad/s | |
|
|
| Steering is smoothed with a low-pass filter: `steer = 0.6 × prev + 0.4 × target`. |
|
|
| --- |
|
|
| ## Reward Function |
|
|
| ### `RcCarTargetEnv` |
| | Event | Reward | |
| |---|---| |
| | Progress toward target | `Δdistance × 40.0` | |
| | Reached target (< 0.6 m) | `+100.0` | |
| | Collision (LiDAR < 0.22 m) | `−50.0` | |
| | Per-step penalty | `−0.05` | |
|
|
| ### `RcCarComplexEnv` |
| | Event | Reward | |
| |---|---| |
| | Progress toward target | `Δdistance × 40.0` | |
| | Forward speed bonus (on progress) | `+speed × 0.5` | |
| | Proximity warning (LiDAR < 0.5 m) | `−0.5` | |
| | Collision | `−50.0` | |
| | Reached target | `+100.0` | |
| | Per-step penalty | `−0.1` | |
|
|
| --- |
|
|
| ## Training Setup |
|
|
| ```python |
| model = SAC( |
| "MultiInputPolicy", |
| env, |
| learning_rate=3e-4, |
| buffer_size=50000, |
| policy_kwargs=dict( |
| net_arch=dict(pi=[256, 256], qf=[256, 256]) |
| ), |
| device="auto" |
| ) |
| ``` |
|
|
| - **Action repeat:** 4 steps per agent decision |
| - **Frame stacking:** configurable via Hydra config (`n_stack`) |
| - **Vectorised env:** `DummyVecEnv` + `VecFrameStack` (channels_order=`"last"`) |
| - **Experiment tracking:** Weights & Biases (W&B) with SB3 callback |
| |
| --- |
| |
| ## Hardware & Software Requirements |
| |
| | Component | Requirement | |
| |---|---| |
| | ROS 2 | Humble or newer | |
| | Gazebo | Ignition Fortress / Harmonic | |
| | Python | 3.10+ | |
| | PyTorch | 2.0+ | |
| | stable-baselines3 | ≥ 2.0 | |
| | gymnasium | ≥ 0.29 | |
| | opencv-python | any recent | |
| | cv_bridge | ROS 2 package | |
|
|
| --- |
|
|
| ## How to Use |
|
|
| ### 1. Install dependencies |
| ```bash |
| pip install stable-baselines3 wandb hydra-core gymnasium opencv-python |
| ``` |
|
|
| ### 2. Launch the simulator |
| ```bash |
| ros2 launch my_bot_pkg sim.launch.py |
| ``` |
|
|
| ### 3. Run training |
| ```bash |
| python train.py experiment.mode=target experiment.total_timesteps=500000 |
| ``` |
|
|
| ### 4. Load and run inference |
| ```python |
| from stable_baselines3 import SAC |
| from rc_car_envs_camera import RcCarTargetEnv |
| |
| env = RcCarTargetEnv() |
| model = SAC.load("sac_target_camera_final", env=env) |
| |
| obs, _ = env.reset() |
| while True: |
| action, _ = model.predict(obs, deterministic=True) |
| obs, reward, terminated, truncated, info = env.step(action) |
| if terminated or truncated: |
| obs, _ = env.reset() |
| ``` |
|
|
| --- |
|
|
| ## Project Structure |
|
|
| ``` |
| ├── rc_car_envs_camera.py # Gym environments (Base, Target, Complex) |
| ├── train.py # Hydra-based training entry point |
| ├── configs/ |
| │ └── config.yaml # Hydra config (mode, timesteps, wandb, etc.) |
| └── models/ # Saved checkpoints (W&B) |
| ``` |
|
|
| --- |
|
|
| ## Limitations & Known Issues |
|
|
| - Training requires a live ROS 2 + Gazebo session; no offline/headless mode currently. |
| - `DummyVecEnv` runs a single environment — parallelisation would require `SubprocVecEnv` with careful ROS node naming. |
| - Camera latency under heavy load may cause the `scan_received` / `cam_received` wait loop to time out, potentially delivering stale observations. |
| - The collision threshold (0.22 m) is tuned for the specific robot mesh; adjust for different URDF geometries. |
|
|
| --- |
|
|
| ## Citation |
|
|
| If you use this environment or training code in your research, please cite: |
|
|
| ```bibtex |
| @misc{rccar_sac_nav, |
| title = {RC Car Autonomous Navigation with SAC (Camera + LiDAR)}, |
| year = {2025}, |
| url = {https://huggingface.co/Hajorda/SAC_Complex_Camera} |
| } |
| ``` |
|
|
| --- |
|
|
| ## License |
|
|
| MIT License |