| # LLM auto annotation for HICO-DET dataset (Pose from [Halpe](https://github.com/Fang-Haoshu/Halpe-FullBody), Part State from [HAKE](https://github.com/DirtyHarryLYL/HAKE)). |
|
|
| ## Environment |
| The code is developed using python 3.11.11 on Ubuntu 21.xx with torch==2.6.0+cu124, |
| transformers==4.57.3 (with Qwen3 series) |
|
|
| ## Annotating HICO-Det |
| ### A. Installation |
| 1. Install required packges and dependencies. |
| 2. Clone this repo, and we'll call the directory that you cloned as ${ROOT}. |
| 3. Creat necessary directories: |
| ``` |
| mkdir outputs |
| mkdir model_weights |
| ``` |
| 4. Download LLM's weights into model_weights from hugging face. |
| |
| |
| ### B. Prepare Dataset |
| 5. Install COCO API: |
| ``` |
| pip install pycocotools |
| ``` |
| 6. Download [dataset](https://huggingface.co/datasets/ayh015/HICO-Det_Halpe_HAKE). |
| 7. Organize dataset, your directory tree of dataset should look like this (there maybe extra files.): |
| ``` |
| {DATA_ROOT} |
| |-- Annotation |
| | |--hico-det-instance-level |
| | | |--hico-det-training-set-instance-level.json |
| | `--hico-fullbody-pose |
| | |--halpe_train_v1.json |
| | `--halpe_val_v1.json |
| |ββ Configs |
| | |--hico_hoi_list.txt |
| | `--Part_State_76.txt |
| |ββ Images |
| | |--images |
| | |--test2015 |
| | | |--HICO_test2015_00000001.jpg |
| | | |--HICO_test2015_00000002.jpg |
| | | ... |
| | `--train2015 |
| | |--HICO_train2015_00000001.jpg |
| | |--HICO_train2015_00000002.jpg |
| | ... |
| `ββ Logic_Rules |
| |--gather_rule.pkl |
| `--read_rules.py |
| ``` |
| |
| ### C. Start annotation |
| #### Modify the data_path, model_path, output_dir='outputs' by your configuration in "{ROOT}/scripts/annotate.sh". |
| ``` |
| IDX={YOUR_GPU_IDS} |
| export PYTHONPATH=$PYTHONPATH:./ |
| |
| data_path={DATA_ROOT} |
| model_path={ROOT}/model_weights/{YOUR_MODEL_NAME} |
| output_dir={ROOT}/outputs |
|
|
| if [ -d ${output_dir} ];then |
| echo "dir already exists" |
| else |
| mkdir ${output_dir} |
| fi |
|
|
| CUDA_VISIBLE_DEVICES=$IDX OMP_NUM_THREADS=1 torchrun --nnodes=1 --nproc_per_node={NUM_YOUR_GPUs} --master_port=25005 \ |
| tools/annotate_hico.py \ |
| --model-path ${model_path} \ |
| --data-path ${data_path} \ |
| --output-dir ${output_dir} \ |
| ``` |
| #### Start auto-annotation |
| ``` |
| bash scripts/annotate_hico.sh |
| ``` |
| |
| ### D. Annotation format |
| A list of dict that contains the following keys: |
| ``` |
| { |
| 'file_name': 'HICO_train2015_00009511.jpg', |
| 'image_id': 0, |
| 'keypoints': a 51-elements list (17x3 keypoints with x, y, v), |
| 'vis': a 51-elements list (17 keypionts, each has 3 visiblity flags), |
| 'instance_id':0, |
| 'action_labels': [{'human_part': part_id, 'partstate': state_id}, ...], |
| 'height': 640, |
| 'width': 480, |
| 'human_bbox': [126, 258, 150, 305], |
| 'object_bbox': [128, 276, 144, 313], |
| 'description': "The person is riding a bicycle, supported by visible evidence of their body interacting with the bike.\n\n- The right hand is holding the right handlebar.\n- The left hand is holding the left handlebar.\n- The right hip is positioned over the seat, indicating the person is sitting on the bicycle.\n- The right foot is on the right pedal.\n- The left foot is on the left pedal." |
| } |
| ``` |
|
|
|
|
| ## Annotate COCO |
| 1. Download COCO dataset. |
| 2. Organize dataset, your directory tree of dataset should look like this (the files inside the Config is copied from the HICO-Det): |
| ``` |
| {DATA_ROOT} |
| |-- annotations |
| | |--person_keypoints_train2017.json |
| | `--person_keypoints_val2017.json |
| |ββ Configs |
| | |--hico_hoi_list.txt |
| | `--Part_State_76.txt |
| |ββ train2017 |
| | |--000000000009.jpg |
| | |--000000000025.jpg |
| | ... |
| `-- val2017 |
| |--000000000139.jpg |
| |--000000000285.jpg |
| ... |
| |
| ``` |
|
|
| ### Start annotation |
| #### Modify the data_path, model_path, output_dir='outputs' by your configuration in "{ROOT}/scripts/annotate_coco.sh". |
| ``` |
| IDX={YOUR_GPU_IDS} |
| export PYTHONPATH=$PYTHONPATH:./ |
| |
| data_path={DATA_ROOT} |
| model_path={ROOT}/model_weights/{YOUR_MODEL_NAME} |
| output_dir={ROOT}/outputs |
| |
| if [ -d ${output_dir} ];then |
| echo "dir already exists" |
| else |
| mkdir ${output_dir} |
| fi |
| |
| CUDA_VISIBLE_DEVICES=$IDX OMP_NUM_THREADS=1 torchrun --nnodes=1 --nproc_per_node={NUM_YOUR_GPUs} --master_port=25005 \ |
| tools/annotate_coco.py \ |
| --model-path ${model_path} \ |
| --data-path ${data_path} \ |
| --output-dir ${output_dir} \ |
| ``` |
| #### Start auto-annotation |
| ``` |
| bash scripts/annotate_coco.sh |
| ``` |
| By defualt, the annotation script only annotates the COCO train2017 set. To annotate val2017, find the following two code in Line167-Line168 in the tools/annotate_coco.py and replace the 'train2017' to 'val2017'. |
| |
| ``` |
| dataset = PoseCOCODataset( |
| data_path=os.path.join(args.data_path, 'annotations', 'person_keypoints_train2017.json'), # <- Line 167 |
| multimodal_cfg=dict(image_folder=os.path.join(args.data_path, 'train2017'), # <- Line 168 |
| data_augmentation=False, |
| image_size=336,),) |
| ``` |
| |
|
|
| ## Annotation format |
| A list of dict that contains the following keys: |
| ``` |
| { |
| 'file_name': '000000000009.jpg', |
| 'image_id': 9, |
| 'keypoints': a 51-elements list (17x3 keypoints with x, y, v), |
| 'vis': a 51-elements list (17 keypionts, each has 3 visiblity flags), |
| 'height': 640, |
| 'width': 480, |
| 'human_bbox': [126, 258, 150, 305], |
| 'description': "The person is riding a bicycle, supported by visible evidence of their body interacting with the bike.\n\n- The right hand is holding the right handlebar.\n- The left hand is holding the left handlebar.\n- The right hip is positioned over the seat, indicating the person is sitting on the bicycle.\n- The right foot is on the right pedal.\n- The left foot is on the left pedal." |
| } |
| ``` |