🦾 Diffusion Policy for Push-T (200k Steps)

LeRobot Task UESTC License

Summary: This model demonstrates the capabilities of Diffusion Policy on the precision-demanding Push-T task. It was trained using the LeRobot framework as part of a thesis research project benchmarking Imitation Learning algorithms.

  • 🧩 Task: Push-T (Simulated)
  • 🧠 Algorithm: Diffusion Policy (DDPM)
  • πŸ”„ Training Steps: 200,000 (Fine-tuned via Resume)
  • πŸŽ“ Author: Graduate Student, UESTC (University of Electronic Science and Technology of China)

πŸ”¬ Benchmark Results (vs ACT)

Compared to the ACT baseline (which achieved 0% success rate in our controlled experiments), this Diffusion Policy model demonstrates significantly better control precision and trajectory stability.

πŸ“Š Evaluation Metrics (50 Episodes)

Metric Value Comparison to ACT Baseline Status
Success Rate 14.0% Significant Improvement (ACT: 0%) πŸ†
Avg Max Reward 0.81 +58% Higher Precision (ACT: ~0.51) πŸ“ˆ
Avg Sum Reward 130.46 +147% More Stable (ACT: ~52.7) βœ…

Note: The Push-T environment requires >95% target coverage for success. An average max reward of 0.81 indicates the policy consistently moves the block very close to the target position, proving strong manipulation capabilities despite the strict success threshold.


βš™οΈ Model Details

Parameter Description
Architecture ResNet18 (Vision Backbone) + U-Net (Diffusion Head)
Prediction Horizon 16 steps
Observation History 2 steps
Action Steps 8 steps
  • Training Strategy:
    • Phase 1: Initial training (100,000 steps) -> Model: Lemon-03/DP_PushT_test
    • Phase 2: Resume/Fine-tuning (+100,000 steps) -> Model: Lemon-03/DP_PushT_test_Resume
    • Total: 200,000 steps

πŸ”§ Training Configuration (Reference)

For reproducibility, here are the key parameters used during the training session:

  • Batch Size: 64
  • Optimizer: AdamW (lr=1e-4)
  • Scheduler: Cosine with warmup
  • Vision: ResNet18 with random crop (84x84)
  • Precision: Mixed Precision (AMP) enabled

Original Training Command (My Resume Mode)

python -m lerobot.scripts.lerobot_train \
  --policy.type diffusion \
  --env.type pusht \
  --dataset.repo_id lerobot/pusht \
  --wandb.enable true \
  --eval.batch_size 8 \
  --job_name DP_PushT_Resume \
  --policy.repo_id Lemon-03/DP_PushT_test_Resume \
  --policy.pretrained_path outputs/train/2025-12-02/14-33-35_DP_PushT/checkpoints/last/pretrained_model \
  --steps 100000

πŸš€ Evaluate (My Evaluation Mode)

Run the following command in your terminal to evaluate the model for 50 episodes and save the visualization videos:

python -m lerobot.scripts.lerobot_eval \
  --policy.type diffusion \
  --policy.pretrained_path outputs/train/2025-12-04/14-47-37_DP_PushT_Resume/checkpoints/last/pretrained_model \
  --eval.n_episodes 50 \
  --eval.batch_size 10 \
  --env.type pusht \
  --env.task PushT-v0

To evaluate this model locally, run the following command:

python -m lerobot.scripts.lerobot_eval \
  --policy.type diffusion \
  --policy.pretrained_path Lemon-03/DP_PushT_test_Resume \
  --eval.n_episodes 50 \
  --eval.batch_size 10 \
  --env.type pusht \
  --env.task PushT-v0

Downloads last month
50
Video Preview
loading

Dataset used to train Lemon-03/DP_PushT_test_Resume