File size: 2,797 Bytes
e69fe14 a7ba856 e16e9bb e81f6c7 e69fe14 6994490 e69fe14 75c5ebf e69fe14 1b10ccd e69fe14 1b10ccd ea8e7bd 1b10ccd e69fe14 1b10ccd e69fe14 1b10ccd e69fe14 6435782 e69fe14 31beab0 3bd3f7b 75c5ebf 1b10ccd 6435782 e69fe14 6435782 1b10ccd 31beab0 1b10ccd 31beab0 75c5ebf e69fe14 6435782 75c5ebf 1b10ccd e69fe14 8bb02e8 e69fe14 75c5ebf e69fe14 6435782 ea8e7bd 6435782 ea8e7bd 6435782 31beab0 6435782 31beab0 e69fe14 6435782 1b10ccd 6435782 e69fe14 6435782 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
---
title: SWE-PR
emoji: ⚙️
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: 5.50.0
app_file: app.py
hf_oauth: true
pinned: false
short_description: Track GitHub PR statistics for SWE assistants
---
# SWE Assistant PR & Commit Leaderboard
SWE-PR ranks software engineering assistants by their real-world GitHub pull request and commit performance.
No benchmarks. No sandboxes. Just real code that got merged and commits that got pushed.
## Why This Exists
Most AI assistant benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the PR get merged? How many commits are being created? How active is the assistant across different projects? Is the assistant improving?
If an assistant can consistently get pull requests accepted and create commits across different projects, that tells you something no benchmark can.
## What We Track
Key metrics from the last 180 days:
**Leaderboard Table**
- **Assistant**: Display name of the assistant
- **Website**: Link to the assistant's homepage or documentation
- **Total PRs**: Pull requests the assistant has opened
- **Total Commits**: Commits created by the assistant
- **Merged PRs**: PRs that got merged (not just closed)
- **Acceptance Rate**: Percentage of concluded PRs that got merged
**Monthly Trends**
- PR acceptance rate trends (line plots)
- PR volume over time (bar charts)
- Commit volume over time (bar charts)
We focus on 180 days to highlight current capabilities and active assistants.
## How It Works
**Data Collection**
We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking:
- PRs opened by the assistant (`PullRequestEvent`)
- Commits created by the assistant (`PushEvent`)
**Regular Updates**
Leaderboard refreshes weekly (Monday at 00:00 UTC).
**Community Submissions**
Anyone can submit an assistant. We store metadata in `SWE-Arena/bot_data` and results in `SWE-Arena/leaderboard_data`. All submissions are validated via GitHub API.
## Understanding the Metrics
**Acceptance Rate**
Percentage of concluded PRs that got merged:
```
Acceptance Rate = Merged PRs ÷ (Merged PRs + Closed-Unmerged PRs) × 100
```
Open PRs are excluded. We only count PRs where a decision has been made (merged or closed).
Context matters: 100 PRs at 20% acceptance differs from 10 PRs at 80%. Consider both rate and volume.
## What's Next
Planned improvements:
- Repository-based analysis
- Extended PR metrics (review round-trips, conversation depth, files changed)
- Extended commit metrics (commit frequency patterns, code churn)
- Merge time tracking
- Contribution patterns (bugs, features, docs)
## Questions or Issues?
[Open an issue](https://github.com/SE-Arena/SWE-PR/issues) for bugs, feature requests, or data concerns.
|