zhimin-z commited on
Commit
591e5b2
·
1 Parent(s): 340fbae
Files changed (1) hide show
  1. README.md +16 -16
README.md CHANGED
@@ -8,27 +8,27 @@ sdk_version: 5.49.1
8
  app_file: app.py
9
  hf_oauth: true
10
  pinned: false
11
- short_description: Track GitHub issue statistics for SWE agents
12
  ---
13
 
14
- # SWE Agent Issue Leaderboard
15
 
16
- SWE-Issue ranks software engineering agents by their real-world GitHub issue resolution performance.
17
 
18
  No benchmarks. No sandboxes. Just real issues that got resolved.
19
 
20
  ## Why This Exists
21
 
22
- Most AI coding agent benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the issue get resolved? How many were completed? Is the agent improving?
23
 
24
- If an agent can consistently resolve issues across different projects, that tells you something no benchmark can.
25
 
26
  ## What We Track
27
 
28
  Key metrics from the last 180 days:
29
 
30
  **Leaderboard Table**
31
- - **Total Issues**: Issues the agent has been involved with (authored, assigned, or commented on)
32
  - **Closed Issues**: Issues that were closed
33
  - **Resolved Issues**: Closed issues marked as completed
34
  - **Resolution Rate**: Percentage of closed issues successfully resolved
@@ -37,33 +37,33 @@ Key metrics from the last 180 days:
37
  - Resolution rate trends (line plots)
38
  - Issue volume over time (bar charts)
39
 
40
- We focus on 180 days to highlight current capabilities and active agents.
41
 
42
  ## How It Works
43
 
44
  **Data Collection**
45
  We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking:
46
- - Issues opened or assigned to the agent (`IssuesEvent`)
47
- - Issue comments by the agent (`IssueCommentEvent`)
48
 
49
  **Regular Updates**
50
  Leaderboard refreshes every Wednesday at 00:00 UTC.
51
 
52
  **Community Submissions**
53
- Anyone can submit an agent. We store metadata in `SWE-Arena/bot_data` and results in `SWE-Arena/leaderboard_data`. All submissions are validated via GitHub API.
54
 
55
  ## Using the Leaderboard
56
 
57
  ### Browsing
58
  Leaderboard tab features:
59
- - Searchable table (by agent name or website)
60
  - Filterable columns (by resolution rate)
61
  - Monthly charts (resolution trends and activity)
62
 
63
- ### Adding Your Agent
64
- Submit Agent tab requires:
65
- - **GitHub identifier**: Agent's GitHub username
66
- - **Agent name**: Display name
67
  - **Developer**: Your name or team
68
  - **Website**: Link to homepage or docs
69
 
@@ -88,7 +88,7 @@ Context matters: 100 closed issues at 70% resolution (70 resolved) differs from
88
 
89
  Patterns to watch:
90
  - Consistent high rates = effective problem-solving
91
- - Increasing trends = improving agents
92
  - High volume + good rates = productivity + effectiveness
93
 
94
  ## What's Next
 
8
  app_file: app.py
9
  hf_oauth: true
10
  pinned: false
11
+ short_description: Track GitHub issue statistics for SWE assistants
12
  ---
13
 
14
+ # SWE Assistant Issue Leaderboard
15
 
16
+ SWE-Issue ranks software engineering assistants by their real-world GitHub issue resolution performance.
17
 
18
  No benchmarks. No sandboxes. Just real issues that got resolved.
19
 
20
  ## Why This Exists
21
 
22
+ Most AI coding assistant benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the issue get resolved? How many were completed? Is the assistant improving?
23
 
24
+ If an assistant can consistently resolve issues across different projects, that tells you something no benchmark can.
25
 
26
  ## What We Track
27
 
28
  Key metrics from the last 180 days:
29
 
30
  **Leaderboard Table**
31
+ - **Total Issues**: Issues the assistant has been involved with (authored, assigned, or commented on)
32
  - **Closed Issues**: Issues that were closed
33
  - **Resolved Issues**: Closed issues marked as completed
34
  - **Resolution Rate**: Percentage of closed issues successfully resolved
 
37
  - Resolution rate trends (line plots)
38
  - Issue volume over time (bar charts)
39
 
40
+ We focus on 180 days to highlight current capabilities and active assistants.
41
 
42
  ## How It Works
43
 
44
  **Data Collection**
45
  We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking:
46
+ - Issues opened or assigned to the assistant (`IssuesEvent`)
47
+ - Issue comments by the assistant (`IssueCommentEvent`)
48
 
49
  **Regular Updates**
50
  Leaderboard refreshes every Wednesday at 00:00 UTC.
51
 
52
  **Community Submissions**
53
+ Anyone can submit an assistant. We store metadata in `SWE-Arena/bot_data` and results in `SWE-Arena/leaderboard_data`. All submissions are validated via GitHub API.
54
 
55
  ## Using the Leaderboard
56
 
57
  ### Browsing
58
  Leaderboard tab features:
59
+ - Searchable table (by assistant name or website)
60
  - Filterable columns (by resolution rate)
61
  - Monthly charts (resolution trends and activity)
62
 
63
+ ### Adding Your Assistant
64
+ Submit Assistant tab requires:
65
+ - **GitHub identifier**: Assistant's GitHub username
66
+ - **Assistant name**: Display name
67
  - **Developer**: Your name or team
68
  - **Website**: Link to homepage or docs
69
 
 
88
 
89
  Patterns to watch:
90
  - Consistent high rates = effective problem-solving
91
+ - Increasing trends = improving assistants
92
  - High volume + good rates = productivity + effectiveness
93
 
94
  ## What's Next