DDPG Panda Reach Model
This is a DDPG (Deep Deterministic Policy Gradient) model trained to control a Franka Emika Panda robot arm in a reaching task using dense rewards. The model was trained using Stable-Baselines3 with Hindsight Experience Replay (HER).
Task Description
In this task, a 7-DOF Panda robotic arm must reach a randomly positioned target in 3D space. The environment provides dense rewards based on the distance between the end-effector and the target position. The task is considered successful when the end-effector reaches within a small threshold distance of the target.
Training Details
- Environment: PandaReachJointsDense-v3 from panda-gym
- Algorithm: DDPG with HER
- Policy: MultiInputPolicy
- Training Steps: 100,000
- Framework: Stable-Baselines3
- Training Monitoring: Weights & Biases
Hyperparameters
{
"policy": "MultiInputPolicy",
"replay_buffer_class": "HerReplayBuffer",
"tensorboard_log": True,
"verbose": 1,
"total_timesteps": 100000
}
Usage
import gymnasium as gym
import panda_gym
from stable_baselines3 import DDPG
# Create environment
env = gym.make("PandaReachJointsDense-v3", render_mode="human")
# Load the trained model
model = DDPG.load("StevanLS/ddpg-panda-reach-100")
# Run the model
obs, _ = env.reset()
while True:
action, _ = model.predict(obs, deterministic=True)
obs, reward, done, truncated, info = env.step(action)
if done or truncated:
obs, _ = env.reset()
Limitations
- The model is trained specifically for the reaching task and may not generalize to other manipulation tasks
- Performance may vary depending on the random target positions
- The model uses dense rewards, which might not be available in real-world scenarios
Author
- StevanLS
Citations
@article{raffin2021stable,
title={Stable-baselines3: Reliable reinforcement learning implementations},
author={Raffin, Antonin and Hill, Ashley and Gleave, Adam and Kanervisto, Anssi and Ernestus, Maximilian and Dormann, Noah},
journal={Journal of Machine Learning Research},
year={2021}
}
@article{gallouedec2021pandagym,
title={panda-gym: Open-Source Goal-Conditioned Environments for Robotic Learning},
author={Gallou{\'e}dec, Quentin and Cazin, Nicolas and Dellandr{\'e}a, Emmanuel and Chen, Liming},
journal={arXiv preprint arXiv:2106.13687},
year={2021}
}
@article{gymatorium2023,
author={Farama Foundation},
title={Gymnasium},
year={2023},
journal={GitHub repository},
publisher={GitHub},
url={https://github.com/Farama-Foundation/Gymnasium}
}
- Downloads last month
- 3
Evaluation results
- mean_reward on PandaReachJointsDense-v3self-reportedREPLACE_WITH_ACTUAL_MEAN
- std_reward on PandaReachJointsDense-v3self-reportedREPLACE_WITH_ACTUAL_STD