π¦Ύ Diffusion Policy for Push-T (200k Steps)
Summary: This model demonstrates the capabilities of Diffusion Policy on the precision-demanding Push-T task. It was trained using the LeRobot framework as part of a thesis research project benchmarking Imitation Learning algorithms.
- π§© Task: Push-T (Simulated)
- π§ Algorithm: Diffusion Policy (DDPM)
- π Training Steps: 200,000 (Fine-tuned via Resume)
- π Author: Graduate Student, UESTC (University of Electronic Science and Technology of China)
π¬ Benchmark Results (vs ACT)
Compared to the ACT baseline (which achieved 0% success rate in our controlled experiments), this Diffusion Policy model demonstrates significantly better control precision and trajectory stability.
π Evaluation Metrics (50 Episodes)
| Metric | Value | Comparison to ACT Baseline | Status |
|---|---|---|---|
| Success Rate | 14.0% | Significant Improvement (ACT: 0%) | π |
| Avg Max Reward | 0.81 | +58% Higher Precision (ACT: ~0.51) | π |
| Avg Sum Reward | 130.46 | +147% More Stable (ACT: ~52.7) | β |
Note: The Push-T environment requires >95% target coverage for success. An average max reward of
0.81indicates the policy consistently moves the block very close to the target position, proving strong manipulation capabilities despite the strict success threshold.
βοΈ Model Details
| Parameter | Description |
|---|---|
| Architecture | ResNet18 (Vision Backbone) + U-Net (Diffusion Head) |
| Prediction Horizon | 16 steps |
| Observation History | 2 steps |
| Action Steps | 8 steps |
- Training Strategy:
- Phase 1: Initial training (100,000 steps) -> Model:
Lemon-03/DP_PushT_test - Phase 2: Resume/Fine-tuning (+100,000 steps) -> Model:
Lemon-03/DP_PushT_test_Resume - Total: 200,000 steps
- Phase 1: Initial training (100,000 steps) -> Model:
π§ Training Configuration (Reference)
For reproducibility, here are the key parameters used during the training session:
- Batch Size: 64
- Optimizer: AdamW (
lr=1e-4) - Scheduler: Cosine with warmup
- Vision: ResNet18 with random crop (84x84)
- Precision: Mixed Precision (AMP) enabled
Original Training Command (My Resume Mode)
python -m lerobot.scripts.lerobot_train \
--policy.type diffusion \
--env.type pusht \
--dataset.repo_id lerobot/pusht \
--wandb.enable true \
--eval.batch_size 8 \
--job_name DP_PushT_Resume \
--policy.repo_id Lemon-03/DP_PushT_test_Resume \
--policy.pretrained_path outputs/train/2025-12-02/14-33-35_DP_PushT/checkpoints/last/pretrained_model \
--steps 100000
π Evaluate (My Evaluation Mode)
Run the following command in your terminal to evaluate the model for 50 episodes and save the visualization videos:
python -m lerobot.scripts.lerobot_eval \
--policy.type diffusion \
--policy.pretrained_path outputs/train/2025-12-04/14-47-37_DP_PushT_Resume/checkpoints/last/pretrained_model \
--eval.n_episodes 50 \
--eval.batch_size 10 \
--env.type pusht \
--env.task PushT-v0
To evaluate this model locally, run the following command:
python -m lerobot.scripts.lerobot_eval \
--policy.type diffusion \
--policy.pretrained_path Lemon-03/DP_PushT_test_Resume \
--eval.n_episodes 50 \
--eval.batch_size 10 \
--env.type pusht \
--env.task PushT-v0
- Downloads last month
- 50