zhimin-z
commited on
Commit
·
591e5b2
1
Parent(s):
340fbae
refine
Browse files
README.md
CHANGED
|
@@ -8,27 +8,27 @@ sdk_version: 5.49.1
|
|
| 8 |
app_file: app.py
|
| 9 |
hf_oauth: true
|
| 10 |
pinned: false
|
| 11 |
-
short_description: Track GitHub issue statistics for SWE
|
| 12 |
---
|
| 13 |
|
| 14 |
-
# SWE
|
| 15 |
|
| 16 |
-
SWE-Issue ranks software engineering
|
| 17 |
|
| 18 |
No benchmarks. No sandboxes. Just real issues that got resolved.
|
| 19 |
|
| 20 |
## Why This Exists
|
| 21 |
|
| 22 |
-
Most AI coding
|
| 23 |
|
| 24 |
-
If an
|
| 25 |
|
| 26 |
## What We Track
|
| 27 |
|
| 28 |
Key metrics from the last 180 days:
|
| 29 |
|
| 30 |
**Leaderboard Table**
|
| 31 |
-
- **Total Issues**: Issues the
|
| 32 |
- **Closed Issues**: Issues that were closed
|
| 33 |
- **Resolved Issues**: Closed issues marked as completed
|
| 34 |
- **Resolution Rate**: Percentage of closed issues successfully resolved
|
|
@@ -37,33 +37,33 @@ Key metrics from the last 180 days:
|
|
| 37 |
- Resolution rate trends (line plots)
|
| 38 |
- Issue volume over time (bar charts)
|
| 39 |
|
| 40 |
-
We focus on 180 days to highlight current capabilities and active
|
| 41 |
|
| 42 |
## How It Works
|
| 43 |
|
| 44 |
**Data Collection**
|
| 45 |
We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking:
|
| 46 |
-
- Issues opened or assigned to the
|
| 47 |
-
- Issue comments by the
|
| 48 |
|
| 49 |
**Regular Updates**
|
| 50 |
Leaderboard refreshes every Wednesday at 00:00 UTC.
|
| 51 |
|
| 52 |
**Community Submissions**
|
| 53 |
-
Anyone can submit an
|
| 54 |
|
| 55 |
## Using the Leaderboard
|
| 56 |
|
| 57 |
### Browsing
|
| 58 |
Leaderboard tab features:
|
| 59 |
-
- Searchable table (by
|
| 60 |
- Filterable columns (by resolution rate)
|
| 61 |
- Monthly charts (resolution trends and activity)
|
| 62 |
|
| 63 |
-
### Adding Your
|
| 64 |
-
Submit
|
| 65 |
-
- **GitHub identifier**:
|
| 66 |
-
- **
|
| 67 |
- **Developer**: Your name or team
|
| 68 |
- **Website**: Link to homepage or docs
|
| 69 |
|
|
@@ -88,7 +88,7 @@ Context matters: 100 closed issues at 70% resolution (70 resolved) differs from
|
|
| 88 |
|
| 89 |
Patterns to watch:
|
| 90 |
- Consistent high rates = effective problem-solving
|
| 91 |
-
- Increasing trends = improving
|
| 92 |
- High volume + good rates = productivity + effectiveness
|
| 93 |
|
| 94 |
## What's Next
|
|
|
|
| 8 |
app_file: app.py
|
| 9 |
hf_oauth: true
|
| 10 |
pinned: false
|
| 11 |
+
short_description: Track GitHub issue statistics for SWE assistants
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# SWE Assistant Issue Leaderboard
|
| 15 |
|
| 16 |
+
SWE-Issue ranks software engineering assistants by their real-world GitHub issue resolution performance.
|
| 17 |
|
| 18 |
No benchmarks. No sandboxes. Just real issues that got resolved.
|
| 19 |
|
| 20 |
## Why This Exists
|
| 21 |
|
| 22 |
+
Most AI coding assistant benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the issue get resolved? How many were completed? Is the assistant improving?
|
| 23 |
|
| 24 |
+
If an assistant can consistently resolve issues across different projects, that tells you something no benchmark can.
|
| 25 |
|
| 26 |
## What We Track
|
| 27 |
|
| 28 |
Key metrics from the last 180 days:
|
| 29 |
|
| 30 |
**Leaderboard Table**
|
| 31 |
+
- **Total Issues**: Issues the assistant has been involved with (authored, assigned, or commented on)
|
| 32 |
- **Closed Issues**: Issues that were closed
|
| 33 |
- **Resolved Issues**: Closed issues marked as completed
|
| 34 |
- **Resolution Rate**: Percentage of closed issues successfully resolved
|
|
|
|
| 37 |
- Resolution rate trends (line plots)
|
| 38 |
- Issue volume over time (bar charts)
|
| 39 |
|
| 40 |
+
We focus on 180 days to highlight current capabilities and active assistants.
|
| 41 |
|
| 42 |
## How It Works
|
| 43 |
|
| 44 |
**Data Collection**
|
| 45 |
We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking:
|
| 46 |
+
- Issues opened or assigned to the assistant (`IssuesEvent`)
|
| 47 |
+
- Issue comments by the assistant (`IssueCommentEvent`)
|
| 48 |
|
| 49 |
**Regular Updates**
|
| 50 |
Leaderboard refreshes every Wednesday at 00:00 UTC.
|
| 51 |
|
| 52 |
**Community Submissions**
|
| 53 |
+
Anyone can submit an assistant. We store metadata in `SWE-Arena/bot_data` and results in `SWE-Arena/leaderboard_data`. All submissions are validated via GitHub API.
|
| 54 |
|
| 55 |
## Using the Leaderboard
|
| 56 |
|
| 57 |
### Browsing
|
| 58 |
Leaderboard tab features:
|
| 59 |
+
- Searchable table (by assistant name or website)
|
| 60 |
- Filterable columns (by resolution rate)
|
| 61 |
- Monthly charts (resolution trends and activity)
|
| 62 |
|
| 63 |
+
### Adding Your Assistant
|
| 64 |
+
Submit Assistant tab requires:
|
| 65 |
+
- **GitHub identifier**: Assistant's GitHub username
|
| 66 |
+
- **Assistant name**: Display name
|
| 67 |
- **Developer**: Your name or team
|
| 68 |
- **Website**: Link to homepage or docs
|
| 69 |
|
|
|
|
| 88 |
|
| 89 |
Patterns to watch:
|
| 90 |
- Consistent high rates = effective problem-solving
|
| 91 |
+
- Increasing trends = improving assistants
|
| 92 |
- High volume + good rates = productivity + effectiveness
|
| 93 |
|
| 94 |
## What's Next
|