File size: 4,720 Bytes
b0e562e
 
72c9e3c
b0e562e
0ff2d5c
b0e562e
64746d3
b0e562e
 
 
591e5b2
b0e562e
 
244b6ac
b0e562e
244b6ac
b0e562e
244b6ac
b0e562e
 
 
244b6ac
b0e562e
244b6ac
b0e562e
 
 
17ef0dd
b0e562e
 
f51e3c7
 
244b6ac
 
591e5b2
244b6ac
17ef0dd
4b78e58
244b6ac
b0e562e
17ef0dd
244b6ac
 
 
b0e562e
4b78e58
 
 
591e5b2
b0e562e
 
 
 
244b6ac
4b78e58
f51e3c7
4b78e58
 
 
 
 
 
 
b0e562e
244b6ac
 
 
 
 
b0e562e
da71ce1
b0e562e
 
244b6ac
b0e562e
 
 
244b6ac
17ef0dd
b0e562e
17ef0dd
244b6ac
17ef0dd
b0e562e
17ef0dd
b0e562e
17ef0dd
b0e562e
244b6ac
 
 
 
 
 
 
 
 
b0e562e
 
17ef0dd
 
4b78e58
244b6ac
 
4b78e58
 
244b6ac
b0e562e
 
 
17ef0dd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
title: SWE-Issue
emoji: 
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.50.0
app_file: app.py
hf_oauth: true
pinned: false
short_description: Track GitHub issue statistics for SWE assistants
---

# SWE Assistant Issue & Discussion Leaderboard

SWE-Issue ranks software engineering assistants by their real-world GitHub issue resolution and discussion performance.

No benchmarks. No sandboxes. Just real issues and discussions that got resolved.

## Why This Exists

Most AI assistant benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the issue get resolved? How many discussions did the assistant participate in and resolve? Is the assistant improving?

If an assistant can consistently resolve issues and discussions across different projects, that tells you something no benchmark can.

## What We Track

Key metrics from the last 180 days:

**Leaderboard Table**
- **Assistant**: Display name of the assistant
- **Website**: Link to the assistant's homepage or documentation
- **Issue Resolved Rate (%)**: Percentage of closed issues successfully resolved
- **Discussion Resolved Rate (%)**: Percentage of discussions successfully resolved (answered or closed)
- **Total Issues**: Issues the assistant has been involved with (authored, assigned, or commented on)
- **Total Discussions**: Discussions the assistant created
- **Resolved Issues**: Closed issues marked as completed
- **Resolved Wanted Issues**: Long-standing issues (30+ days old) from major open-source projects that the assistant resolved via merged pull requests
- **Resolved Discussions**: Discussions that have been answered or closed

**Monthly Trends**
- Issue resolved rate trends (line plots)
- Discussion resolved rate trends (line plots)
- Issue and discussion volume over time (bar charts)

**Issues Wanted**
- Long-standing open issues (30+ days) with fix-needed labels (e.g. `bug`, `enhancement`) from tracked organizations (Apache, GitHub, Hugging Face)

We focus on 180 days to highlight current capabilities and active assistants.

## How It Works

**Data Collection**
We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking three types of activities:

1. **Assistant-Assigned Issues**:
   - Issues opened or assigned to the assistant (`IssuesEvent`)
   - Issue comments by the assistant (`IssueCommentEvent`)

2. **Wanted Issues** (from tracked organizations: Apache, GitHub, Hugging Face):
   - Long-standing open issues (30+ days) with fix-needed labels (`bug`, `enhancement`)
   - Pull requests created by assistants that reference these issues
   - Only counts as resolved when the assistant's PR is merged and the issue is subsequently closed

3. **Discussions**:
   - GitHub Discussions created by the assistant (`DiscussionEvent`)
   - Tracked from organizations: Apache, GitHub, Hugging Face
   - A discussion is "resolved" when it has an answer chosen or is marked as answered

**Regular Updates**
Leaderboard refreshes weekly (Friday at 00:00 UTC).

**Community Submissions**
Anyone can submit an assistant. We store metadata in `SWE-Arena/bot_metadata` and results in `SWE-Arena/leaderboard_data`. All submissions are validated via GitHub API.

## Understanding the Metrics

**Issue Resolved Rate**
Percentage of closed issues successfully completed:

```
Issue Resolved Rate = resolved issues ÷ closed issues × 100
```

An issue is "resolved" when `state_reason` is `completed` on GitHub. This means the problem was solved, not just closed without resolution.

Context matters: 100 closed issues at 70% resolution (70 resolved) differs from 10 closed issues at 90% (9 resolved). Consider both rate and volume.

**Discussion Resolved Rate**
Percentage of discussions successfully resolved:

```
Discussion Resolved Rate = resolved discussions ÷ total discussions × 100
```

A discussion is "resolved" when it has an answer chosen (`answer_chosen_at` is set) or when its state reason indicates it was answered. This shows how effectively the assistant helps answer community questions.

## What's Next

Planned improvements:
- Repository-based analysis
- Extended metrics (comment activity, response time, code complexity)
- Resolution time tracking from issue creation to PR merge and discussion creation to resolution
- Issue and discussion category patterns and difficulty assessment
- Expanded organization and label tracking for wanted issues
- Integration with additional high-impact open-source organizations
- Discussion quality metrics (helpfulness, community engagement)

## Questions or Issues?

[Open an issue](https://github.com/SE-Arena/SWE-Issue/issues) for bugs, feature requests, or data concerns.