Qwen3-VL-2B-OSWorld-Distill-Phase1

This model is the result of Phase-1 off-policy distillation for OSWorld GUI automation.

Training Details

Base Model: Qwen/Qwen3-VL-2B-Instruct
Training Stage: Phase-1 Off-Policy Distillation (SFT on teacher trajectories)
Checkpoint: iter_0000049 (best validation loss: 1.50)
Teacher Model: Qwen3-VL-32B-Instruct
Training Framework: THUDM/slime (FSDP backend)
Hardware: 4x NVIDIA H100 PCIe

Training Data

OSWorld GUI automation trajectories
Step-grounded format with screenshots and action labels
305 distillation samples (teacher rollouts)

Intended Use

This is an intermediate checkpoint for OSWorld VLM pipeline development. Phase-2 on-policy distillation and GSPO training recommended for final model.

Citation

Downloads last month: 15

Safetensors

Model size

2B params

Tensor type

F32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jarrodbarnes/Qwen3-VL-2B-OSWorld-Distill-Phase1

Base model

Qwen/Qwen3-VL-2B-Instruct

Finetuned

(94)

this model