Qwen3-VL-2B-OSWorld-Distill-Phase1
This model is the result of Phase-1 off-policy distillation for OSWorld GUI automation.
Training Details
- Base Model: Qwen/Qwen3-VL-2B-Instruct
- Training Stage: Phase-1 Off-Policy Distillation (SFT on teacher trajectories)
- Checkpoint: iter_0000049 (best validation loss: 1.50)
- Teacher Model: Qwen3-VL-32B-Instruct
- Training Framework: THUDM/slime (FSDP backend)
- Hardware: 4x NVIDIA H100 PCIe
Training Data
- OSWorld GUI automation trajectories
- Step-grounded format with screenshots and action labels
- 305 distillation samples (teacher rollouts)
Intended Use
This is an intermediate checkpoint for OSWorld VLM pipeline development. Phase-2 on-policy distillation and GSPO training recommended for final model.
Citation
- Downloads last month
- 15
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for Jarrodbarnes/Qwen3-VL-2B-OSWorld-Distill-Phase1
Base model
Qwen/Qwen3-VL-2B-Instruct