DDPG Panda Reach Model

This is a DDPG (Deep Deterministic Policy Gradient) model trained to control a Franka Emika Panda robot arm in a reaching task using dense rewards. The model was trained using Stable-Baselines3 with Hindsight Experience Replay (HER).

Task Description

In this task, a 7-DOF Panda robotic arm must reach a randomly positioned target in 3D space. The environment provides dense rewards based on the distance between the end-effector and the target position. The task is considered successful when the end-effector reaches within a small threshold distance of the target.

Training Details

  • Environment: PandaReachJointsDense-v3 from panda-gym
  • Algorithm: DDPG with HER
  • Policy: MultiInputPolicy
  • Training Steps: 100,000
  • Framework: Stable-Baselines3
  • Training Monitoring: Weights & Biases

Hyperparameters

{
    "policy": "MultiInputPolicy",
    "replay_buffer_class": "HerReplayBuffer",
    "tensorboard_log": True,
    "verbose": 1,
    "total_timesteps": 100000
}

Usage

import gymnasium as gym
import panda_gym
from stable_baselines3 import DDPG

# Create environment
env = gym.make("PandaReachJointsDense-v3", render_mode="human")

# Load the trained model
model = DDPG.load("StevanLS/ddpg-panda-reach-100")

# Run the model
obs, _ = env.reset()
while True:
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)
    if done or truncated:
        obs, _ = env.reset()

Limitations

  • The model is trained specifically for the reaching task and may not generalize to other manipulation tasks
  • Performance may vary depending on the random target positions
  • The model uses dense rewards, which might not be available in real-world scenarios

Author

  • StevanLS

Citations

@article{raffin2021stable,
    title={Stable-baselines3: Reliable reinforcement learning implementations},
    author={Raffin, Antonin and Hill, Ashley and Gleave, Adam and Kanervisto, Anssi and Ernestus, Maximilian and Dormann, Noah},
    journal={Journal of Machine Learning Research},
    year={2021}
}

@article{gallouedec2021pandagym,
    title={panda-gym: Open-Source Goal-Conditioned Environments for Robotic Learning},
    author={Gallou{\'e}dec, Quentin and Cazin, Nicolas and Dellandr{\'e}a, Emmanuel and Chen, Liming},
    journal={arXiv preprint arXiv:2106.13687},
    year={2021}
}

@article{gymatorium2023,
    author={Farama Foundation},
    title={Gymnasium},
    year={2023},
    journal={GitHub repository},
    publisher={GitHub},
    url={https://github.com/Farama-Foundation/Gymnasium}
}
Downloads last month
3
Video Preview
loading

Evaluation results

  • mean_reward on PandaReachJointsDense-v3
    self-reported
    REPLACE_WITH_ACTUAL_MEAN
  • std_reward on PandaReachJointsDense-v3
    self-reported
    REPLACE_WITH_ACTUAL_STD