Train

You should prepare GazeFollow and VideoAttentionTarget for training.

Get GazeFollow.
If train with auxiliary regression, use scripts\gen_gazefollow_head_masks.py to generate head masks.
Get VideoAttentionTarget.

Check ViTGaze/configs/common/dataloader to modify DATA_ROOT.

Or you could download and preprocess pretrained weights by

cd ViTGaze
mkdir pretrained && cd pretrained
wget https://dl.fbaipublicfiles.com/dinov2/dinov2_vits14/dinov2_vits14_pretrain.pth

Preprocess the model weights with scripts\convert_pth.py to fit Detectron2 format.

You can modify configs in configs/gazefollow.py, configs/gazefollow_518.py and configs/videoattentiontarget.py.

Run:

    bash train.sh

to train ViTGaze on the two datasets.

Training output will be saved in ViTGaze/output/.