Update README.md
Browse files
README.md
CHANGED
|
@@ -27,12 +27,12 @@ Our research reveals that issue lies not only with the models but with the bench
|
|
| 27 |
On this enhanced benchmark, state-of-the-art models achieve success rates nearing 80% on complex tasks, reflecting that on-device GUI agents are actually closer to practical deployment than previously thought. We also trained our new SOTA model, **Magma-R1**, on just 2,400 curated samples, which matches the performance of previous models trained on over 31,000 samples.
|
| 28 |
|
| 29 |
<div align="center">
|
| 30 |
-
<img src="static/images/
|
| 31 |
<p><i>Overview of our integrated pipeline for Magma-R1 training and AndroidControl-Curated creation.</i></p>
|
| 32 |
</div>
|
| 33 |
|
| 34 |
## π₯ News
|
| 35 |
-
- π₯ ***`2025/10/21`*** Our paper "[AndroidControl-Curated: Revealing the True Potential of GUI Agents through Benchmark Purification](
|
| 36 |
|
| 37 |
## π Updates
|
| 38 |
- ***`2025/10/21`*** The source code for `AndroidControl-Curated` and `Magma-R1` has been released.
|
|
@@ -74,8 +74,8 @@ On this enhanced benchmark, state-of-the-art models achieve success rates nearin
|
|
| 74 |
|
| 75 |
1. **Clone the repository:**
|
| 76 |
```bash
|
| 77 |
-
git clone https://github.com/
|
| 78 |
-
cd
|
| 79 |
```
|
| 80 |
|
| 81 |
2. **Install dependencies:**
|
|
@@ -89,7 +89,7 @@ On this enhanced benchmark, state-of-the-art models achieve success rates nearin
|
|
| 89 |
To reproduce the results on `AndroidControl-Curated`:
|
| 90 |
|
| 91 |
1. **Download the benchmark data:**
|
| 92 |
-
Download the processed test set from [Hugging Face](
|
| 93 |
- `android_control_high_bbox.json`
|
| 94 |
- `android_control_high_point.json`
|
| 95 |
- `android_control_low_bbox.json`
|
|
@@ -97,7 +97,7 @@ To reproduce the results on `AndroidControl-Curated`:
|
|
| 97 |
- `android_control_high_task-improved.json`
|
| 98 |
|
| 99 |
2. **Download the model:**
|
| 100 |
-
Download the `Magma-R1` model weights from [Hugging Face](
|
| 101 |
|
| 102 |
3. **Run the evaluation script:**
|
| 103 |
Execute the following command, making sure to update the paths to your model and the benchmark image directory.
|
|
|
|
| 27 |
On this enhanced benchmark, state-of-the-art models achieve success rates nearing 80% on complex tasks, reflecting that on-device GUI agents are actually closer to practical deployment than previously thought. We also trained our new SOTA model, **Magma-R1**, on just 2,400 curated samples, which matches the performance of previous models trained on over 31,000 samples.
|
| 28 |
|
| 29 |
<div align="center">
|
| 30 |
+
<img src="static/images/method_1013_1355-compress.png" width="90%" alt="Method Overview">
|
| 31 |
<p><i>Overview of our integrated pipeline for Magma-R1 training and AndroidControl-Curated creation.</i></p>
|
| 32 |
</div>
|
| 33 |
|
| 34 |
## π₯ News
|
| 35 |
+
- π₯ ***`2025/10/21`*** Our paper "[AndroidControl-Curated: Revealing the True Potential of GUI Agents through Benchmark Purification](https://arxiv.org/abs/2510.18488)" released.
|
| 36 |
|
| 37 |
## π Updates
|
| 38 |
- ***`2025/10/21`*** The source code for `AndroidControl-Curated` and `Magma-R1` has been released.
|
|
|
|
| 74 |
|
| 75 |
1. **Clone the repository:**
|
| 76 |
```bash
|
| 77 |
+
git clone https://github.com/batechworks/AndroidControl_Curated.git
|
| 78 |
+
cd AndroidControl_Curated
|
| 79 |
```
|
| 80 |
|
| 81 |
2. **Install dependencies:**
|
|
|
|
| 89 |
To reproduce the results on `AndroidControl-Curated`:
|
| 90 |
|
| 91 |
1. **Download the benchmark data:**
|
| 92 |
+
Download the processed test set from [Hugging Face](https://huggingface.co/datasets/batwBMW/AndroidControl_Curated) and place it in the `benchmark_resource/` directory. The directory should contain the following files:
|
| 93 |
- `android_control_high_bbox.json`
|
| 94 |
- `android_control_high_point.json`
|
| 95 |
- `android_control_low_bbox.json`
|
|
|
|
| 97 |
- `android_control_high_task-improved.json`
|
| 98 |
|
| 99 |
2. **Download the model:**
|
| 100 |
+
Download the `Magma-R1` model weights from [Hugging Face](https://huggingface.co/batwBMW/Magma-R1) and place them in your desired location.
|
| 101 |
|
| 102 |
3. **Run the evaluation script:**
|
| 103 |
Execute the following command, making sure to update the paths to your model and the benchmark image directory.
|