DDPG Panda Reach Model

This is a DDPG (Deep Deterministic Policy Gradient) model trained to control a Franka Emika Panda robot arm in a reaching task using dense rewards. The model was trained using Stable-Baselines3 with Hindsight Experience Replay (HER).

Task Description

In this task, a 7-DOF Panda robotic arm must reach a randomly positioned target in 3D space. The environment provides dense rewards based on the distance between the end-effector and the target position. The task is considered successful when the end-effector reaches within a small threshold distance of the target.

Training Details

Environment: PandaReachJointsDense-v3 from panda-gym
Algorithm: DDPG with HER
Policy: MultiInputPolicy
Training Steps: 100,000
Framework: Stable-Baselines3
Training Monitoring: Weights & Biases

Hyperparameters

{
    "policy": "MultiInputPolicy",
    "replay_buffer_class": "HerReplayBuffer",
    "tensorboard_log": True,
    "verbose": 1,
    "total_timesteps": 100000
}

Usage

import gymnasium as gym
import panda_gym
from stable_baselines3 import DDPG

# Create environment
env = gym.make("PandaReachJointsDense-v3", render_mode="human")

# Load the trained model
model = DDPG.load("StevanLS/ddpg-panda-reach-100")

# Run the model
obs, _ = env.reset()
while True:
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)
    if done or truncated:
        obs, _ = env.reset()

Limitations

The model is trained specifically for the reaching task and may not generalize to other manipulation tasks
Performance may vary depending on the random target positions
The model uses dense rewards, which might not be available in real-world scenarios

Author

StevanLS

Citations

@article{raffin2021stable,
    title={Stable-baselines3: Reliable reinforcement learning implementations},
    author={Raffin, Antonin and Hill, Ashley and Gleave, Adam and Kanervisto, Anssi and Ernestus, Maximilian and Dormann, Noah},
    journal={Journal of Machine Learning Research},
    year={2021}
}

@article{gallouedec2021pandagym,
    title={panda-gym: Open-Source Goal-Conditioned Environments for Robotic Learning},
    author={Gallou{\'e}dec, Quentin and Cazin, Nicolas and Dellandr{\'e}a, Emmanuel and Chen, Liming},
    journal={arXiv preprint arXiv:2106.13687},
    year={2021}
}

@article{gymatorium2023,
    author={Farama Foundation},
    title={Gymnasium},
    year={2023},
    journal={GitHub repository},
    publisher={GitHub},
    url={https://github.com/Farama-Foundation/Gymnasium}
}

Downloads last month: 3

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on PandaReachJointsDense-v3
self-reported

REPLACE_WITH_ACTUAL_MEAN
std_reward on PandaReachJointsDense-v3
self-reported

REPLACE_WITH_ACTUAL_STD