nielsr HF Staff commited on
Commit
568bd12
·
verified ·
1 Parent(s): d22450d

Improve model card: Add pipeline tag, library name, and official links

Browse files

This PR enhances the model card by:

- Adding the `pipeline_tag: image-text-to-text` to better categorize the model's functionality (processing images and text to generate text) and improve discoverability on the Hugging Face Hub.
- Specifying `library_name: transformers` to enable direct usage via the Hugging Face `transformers` library and showcase an automated "how to use" widget on the model page.
- Updating the paper link to point to the official Hugging Face Papers page: [Mobile-Agent-v3: Foundamental Agents for GUI Automation](https://huggingface.co/papers/2508.15144).
- Adding a link to the project page: `https://osatlas.github.io/` for more context.

These changes will improve the model's discoverability and usability on the Hugging Face Hub.

Files changed (1) hide show
  1. README.md +14 -12
README.md CHANGED
@@ -1,9 +1,11 @@
1
  ---
2
- license: mit
3
- language:
4
- - en
5
  base_model:
6
  - Qwen/Qwen2.5-VL-32B-Instruct
 
 
 
 
 
7
  ---
8
 
9
  # GUI-Owl
@@ -14,9 +16,10 @@ base_model:
14
 
15
  GUI-Owl is a model series developed as part of the Mobile-Agent-V3 project. It achieves state-of-the-art performance across a range of GUI automation benchmarks, including ScreenSpot-V2, ScreenSpot-Pro, OSWorld-G, MMBench-GUI, Android Control, Android World, and OSWorld. Furthermore, it can be instantiated as various specialized agents within the Mobile-Agent-V3 multi-agent framework to accomplish more complex tasks.
16
 
17
- * **Paper**: [Paper Link](https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/MobileAgentV3_Tech.pdf)
18
- * **GitHub Repository**: https://github.com/X-PLUG/MobileAgent
19
- * **Online Demo**: Comming soon
 
20
 
21
  ## Performance
22
 
@@ -70,10 +73,10 @@ MM_KWARGS=(
70
  --limit-mm-per-prompt $IMAGE_LIMIT_ARGS
71
  )
72
 
73
- vllm serve $CKPT \
74
- --max-model-len 32768 ${MM_KWARGS[@]} \
75
- --tensor-parallel-size $MP_SIZE \
76
- --allowed-local-media-path '/' \
77
  --port 4243
78
  ```
79
 
@@ -89,5 +92,4 @@ If you find our paper and model useful in your research, feel free to give us a
89
  primaryClass={cs.AI},
90
  url={https://arxiv.org/abs/2508.15144},
91
  }
92
- ```
93
-
 
1
  ---
 
 
 
2
  base_model:
3
  - Qwen/Qwen2.5-VL-32B-Instruct
4
+ language:
5
+ - en
6
+ license: mit
7
+ pipeline_tag: image-text-to-text
8
+ library_name: transformers
9
  ---
10
 
11
  # GUI-Owl
 
16
 
17
  GUI-Owl is a model series developed as part of the Mobile-Agent-V3 project. It achieves state-of-the-art performance across a range of GUI automation benchmarks, including ScreenSpot-V2, ScreenSpot-Pro, OSWorld-G, MMBench-GUI, Android Control, Android World, and OSWorld. Furthermore, it can be instantiated as various specialized agents within the Mobile-Agent-V3 multi-agent framework to accomplish more complex tasks.
18
 
19
+ * **Paper**: [Mobile-Agent-v3: Foundamental Agents for GUI Automation](https://huggingface.co/papers/2508.15144)
20
+ * **Project Page**: https://osatlas.github.io/
21
+ * **GitHub Repository**: https://github.com/X-PLUG/MobileAgent
22
+ * **Online Demo**: Comming soon
23
 
24
  ## Performance
25
 
 
73
  --limit-mm-per-prompt $IMAGE_LIMIT_ARGS
74
  )
75
 
76
+ vllm serve $CKPT \\\
77
+ --max-model-len 32768 ${MM_KWARGS[@]} \\\
78
+ --tensor-parallel-size $MP_SIZE \\\
79
+ --allowed-local-media-path '/' \\\
80
  --port 4243
81
  ```
82
 
 
92
  primaryClass={cs.AI},
93
  url={https://arxiv.org/abs/2508.15144},
94
  }
95
+ ```