File size: 2,797 Bytes
e69fe14
a7ba856
e16e9bb
e81f6c7
 
e69fe14
6994490
e69fe14
 
 
75c5ebf
e69fe14
 
1b10ccd
e69fe14
1b10ccd
ea8e7bd
1b10ccd
e69fe14
 
 
1b10ccd
e69fe14
1b10ccd
e69fe14
 
 
6435782
e69fe14
31beab0
3bd3f7b
 
75c5ebf
1b10ccd
6435782
 
e69fe14
6435782
1b10ccd
31beab0
1b10ccd
31beab0
75c5ebf
e69fe14
 
 
 
6435782
75c5ebf
1b10ccd
e69fe14
 
8bb02e8
e69fe14
 
75c5ebf
e69fe14
 
 
 
6435782
ea8e7bd
6435782
 
 
ea8e7bd
6435782
31beab0
6435782
31beab0
e69fe14
 
6435782
 
1b10ccd
 
6435782
 
e69fe14
 
 
6435782
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
title: SWE-PR
emoji: ⚙️
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: 5.50.0
app_file: app.py
hf_oauth: true
pinned: false
short_description: Track GitHub PR statistics for SWE assistants
---

# SWE Assistant PR & Commit Leaderboard

SWE-PR ranks software engineering assistants by their real-world GitHub pull request and commit performance.

No benchmarks. No sandboxes. Just real code that got merged and commits that got pushed.

## Why This Exists

Most AI assistant benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the PR get merged? How many commits are being created? How active is the assistant across different projects? Is the assistant improving?

If an assistant can consistently get pull requests accepted and create commits across different projects, that tells you something no benchmark can.

## What We Track

Key metrics from the last 180 days:

**Leaderboard Table**
- **Assistant**: Display name of the assistant
- **Website**: Link to the assistant's homepage or documentation
- **Total PRs**: Pull requests the assistant has opened
- **Total Commits**: Commits created by the assistant
- **Merged PRs**: PRs that got merged (not just closed)
- **Acceptance Rate**: Percentage of concluded PRs that got merged

**Monthly Trends**
- PR acceptance rate trends (line plots)
- PR volume over time (bar charts)
- Commit volume over time (bar charts)

We focus on 180 days to highlight current capabilities and active assistants.

## How It Works

**Data Collection**
We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking:
- PRs opened by the assistant (`PullRequestEvent`)
- Commits created by the assistant (`PushEvent`)

**Regular Updates**
Leaderboard refreshes weekly (Monday at 00:00 UTC).

**Community Submissions**
Anyone can submit an assistant. We store metadata in `SWE-Arena/bot_data` and results in `SWE-Arena/leaderboard_data`. All submissions are validated via GitHub API.

## Understanding the Metrics

**Acceptance Rate**
Percentage of concluded PRs that got merged:

```
Acceptance Rate = Merged PRs ÷ (Merged PRs + Closed-Unmerged PRs) × 100
```

Open PRs are excluded. We only count PRs where a decision has been made (merged or closed).

Context matters: 100 PRs at 20% acceptance differs from 10 PRs at 80%. Consider both rate and volume.

## What's Next

Planned improvements:
- Repository-based analysis
- Extended PR metrics (review round-trips, conversation depth, files changed)
- Extended commit metrics (commit frequency patterns, code churn)
- Merge time tracking
- Contribution patterns (bugs, features, docs)

## Questions or Issues?

[Open an issue](https://github.com/SE-Arena/SWE-PR/issues) for bugs, feature requests, or data concerns.