eric1993 commited on
Commit
f278997
Β·
verified Β·
1 Parent(s): c2a3f4f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -27,12 +27,12 @@ Our research reveals that issue lies not only with the models but with the bench
27
  On this enhanced benchmark, state-of-the-art models achieve success rates nearing 80% on complex tasks, reflecting that on-device GUI agents are actually closer to practical deployment than previously thought. We also trained our new SOTA model, **Magma-R1**, on just 2,400 curated samples, which matches the performance of previous models trained on over 31,000 samples.
28
 
29
  <div align="center">
30
- <img src="static/images/method_1021_1355-compress.png" width="90%" alt="Method Overview">
31
  <p><i>Overview of our integrated pipeline for Magma-R1 training and AndroidControl-Curated creation.</i></p>
32
  </div>
33
 
34
  ## πŸ”₯ News
35
- - πŸ”₯ ***`2025/10/21`*** Our paper "[AndroidControl-Curated: Revealing the True Potential of GUI Agents through Benchmark Purification](YOUR_ARXIV_PAPER_LINK)" released.
36
 
37
  ## πŸš€ Updates
38
  - ***`2025/10/21`*** The source code for `AndroidControl-Curated` and `Magma-R1` has been released.
@@ -74,8 +74,8 @@ On this enhanced benchmark, state-of-the-art models achieve success rates nearin
74
 
75
  1. **Clone the repository:**
76
  ```bash
77
- git clone https://github.com/YourUsername/YourRepoName.git
78
- cd YourRepoName
79
  ```
80
 
81
  2. **Install dependencies:**
@@ -89,7 +89,7 @@ On this enhanced benchmark, state-of-the-art models achieve success rates nearin
89
  To reproduce the results on `AndroidControl-Curated`:
90
 
91
  1. **Download the benchmark data:**
92
- Download the processed test set from [Hugging Face](YOUR_HUGGINGFACE_DATASET_LINK) and place it in the `benchmark_resource/` directory. The directory should contain the following files:
93
  - `android_control_high_bbox.json`
94
  - `android_control_high_point.json`
95
  - `android_control_low_bbox.json`
@@ -97,7 +97,7 @@ To reproduce the results on `AndroidControl-Curated`:
97
  - `android_control_high_task-improved.json`
98
 
99
  2. **Download the model:**
100
- Download the `Magma-R1` model weights from [Hugging Face](YOUR_HUGGINGFACE_MODEL_LINK) and place them in your desired location.
101
 
102
  3. **Run the evaluation script:**
103
  Execute the following command, making sure to update the paths to your model and the benchmark image directory.
 
27
  On this enhanced benchmark, state-of-the-art models achieve success rates nearing 80% on complex tasks, reflecting that on-device GUI agents are actually closer to practical deployment than previously thought. We also trained our new SOTA model, **Magma-R1**, on just 2,400 curated samples, which matches the performance of previous models trained on over 31,000 samples.
28
 
29
  <div align="center">
30
+ <img src="static/images/method_1013_1355-compress.png" width="90%" alt="Method Overview">
31
  <p><i>Overview of our integrated pipeline for Magma-R1 training and AndroidControl-Curated creation.</i></p>
32
  </div>
33
 
34
  ## πŸ”₯ News
35
+ - πŸ”₯ ***`2025/10/21`*** Our paper "[AndroidControl-Curated: Revealing the True Potential of GUI Agents through Benchmark Purification](https://arxiv.org/abs/2510.18488)" released.
36
 
37
  ## πŸš€ Updates
38
  - ***`2025/10/21`*** The source code for `AndroidControl-Curated` and `Magma-R1` has been released.
 
74
 
75
  1. **Clone the repository:**
76
  ```bash
77
+ git clone https://github.com/batechworks/AndroidControl_Curated.git
78
+ cd AndroidControl_Curated
79
  ```
80
 
81
  2. **Install dependencies:**
 
89
  To reproduce the results on `AndroidControl-Curated`:
90
 
91
  1. **Download the benchmark data:**
92
+ Download the processed test set from [Hugging Face](https://huggingface.co/datasets/batwBMW/AndroidControl_Curated) and place it in the `benchmark_resource/` directory. The directory should contain the following files:
93
  - `android_control_high_bbox.json`
94
  - `android_control_high_point.json`
95
  - `android_control_low_bbox.json`
 
97
  - `android_control_high_task-improved.json`
98
 
99
  2. **Download the model:**
100
+ Download the `Magma-R1` model weights from [Hugging Face](https://huggingface.co/batwBMW/Magma-R1) and place them in your desired location.
101
 
102
  3. **Run the evaluation script:**
103
  Execute the following command, making sure to update the paths to your model and the benchmark image directory.