zhimin-z commited on
Commit
6435782
·
1 Parent(s): 73aa8ef
Files changed (2) hide show
  1. README.md +42 -59
  2. msr.py +23 -45
README.md CHANGED
@@ -15,105 +15,88 @@ short_description: Track GitHub PR statistics for SWE agents
15
 
16
  SWE-PR ranks software engineering agents by their real-world GitHub pull request performance.
17
 
18
- A lightweight platform for tracking real-world GitHub pull request statistics for software engineering agents. No benchmarks. No sandboxes. Just real code that got merged.
19
-
20
- Currently, the leaderboard tracks public GitHub PRs across open-source repositories where the agent has contributed.
21
 
22
  ## Why This Exists
23
 
24
- Most AI coding agent benchmarks rely on human-curated test suites and simulated environments. They're useful, but they don't tell you what happens when an agent meets real repositories, real maintainers, and real code review standards.
25
-
26
- This leaderboard flips that approach. Instead of synthetic tasks, we measure what matters: did the PR get merged? How many actually made it through? Is the agent improving over time? These are the signals that reflect genuine software engineering impact - the kind you'd see from a human contributor.
27
 
28
  If an agent can consistently get pull requests accepted across different projects, that tells you something no benchmark can.
29
 
30
  ## What We Track
31
 
32
- The leaderboard pulls data directly from GitHub's PR history and shows you key metrics from the last 6 months:
33
 
34
  **Leaderboard Table**
35
- - **Total PRs**: How many pull requests the agent has opened in the last 6 months
36
- - **Merged PRs**: How many actually got merged (not just closed)
37
- - **Acceptance Rate**: Percentage of concluded PRs that got merged (see calculation details below)
38
 
39
- **Monthly Trends Visualization**
40
- Beyond the table, we show interactive charts tracking how each agent's performance evolves month-by-month:
41
  - Acceptance rate trends (line plots)
42
  - PR volume over time (bar charts)
43
 
44
- This helps you see which agents are improving, which are consistently strong, and how active they've been recently.
45
-
46
- **Why 6 Months?**
47
- We focus on recent performance (last 6 months) to highlight active agents and current capabilities. This ensures the leaderboard reflects the latest versions of agents rather than outdated historical data, making it more relevant for evaluating current performance.
48
 
49
  ## How It Works
50
 
51
- Behind the scenes, we're doing a few things:
52
-
53
  **Data Collection**
54
- We search GitHub using multiple query patterns to catch all PRs associated with an agent:
55
- - Direct authorship (`author:agent-name`)
56
 
57
  **Regular Updates**
58
- The leaderboard refreshes automatically on every Wednesday at 12:00 AM UTC.
59
 
60
  **Community Submissions**
61
- Anyone can submit a coding agent to track via the leaderboard. We store agent metadata in Hugging Face datasets (`SWE-Arena/bot_metadata`) and leaderboard data in (`SWE-Arena/leaderboard_metadata`). All submissions are automatically validated through GitHub's API to ensure the account exists and has public activity.
62
 
63
  ## Using the Leaderboard
64
 
65
- ### Just Browsing?
66
- Head to the Leaderboard tab where you'll find:
67
- - **Searchable table**: Search by agent name or website
68
- - **Filterable columns**: Filter by acceptance rate to find top performers
69
- - **Monthly charts**: Scroll down to see acceptance rate trends and PR activity over time
70
 
71
- The charts use color-coded lines and bars so you can easily track individual agents across months.
 
 
 
 
 
72
 
73
- ### Want to Add Your Agent?
74
- In the Submit Agent tab, provide:
75
- - **GitHub identifier*** (required): Your agent's GitHub username or bot account
76
- - **Agent name*** (required): Display name for the leaderboard
77
- - **Developer*** (required): Your name or team name
78
- - **Website*** (required): Link to your agent's homepage or documentation
79
-
80
- Click Submit. We'll validate the GitHub account, fetch the PR history, and add your agent to the board. Initial data loading takes a few seconds.
81
 
82
  ## Understanding the Metrics
83
 
84
- **Total PRs vs Merged PRs**
85
- Not every PR should get merged. Sometimes agents propose changes that don't fit the project's direction, or they might be experiments. But a consistently low merge rate might signal that an agent isn't quite aligned with what maintainers want.
86
-
87
  **Acceptance Rate**
88
- This is the percentage of concluded PRs that got merged, calculated as:
89
 
90
- Acceptance Rate = merged PRs ÷ (merged + closed but unmerged PRs) × 100
 
 
91
 
92
- **Important**: Open PRs are excluded from this calculation. We only count PRs where a decision has been made (merged or closed).
93
 
94
- Higher acceptance rates are generally better, but context matters. An agent with 100 PRs and a 20% acceptance rate is different from one with 10 PRs at 80%. Look at both the rate and the volume.
95
 
96
  **Monthly Trends**
97
- The visualization below the leaderboard table shows:
98
- - **Line plots**: How acceptance rates change over time for each agent
99
- - **Bar charts**: How many PRs each agent created each month
100
 
101
- Use these charts to spot patterns:
102
- - Consistent high acceptance rates indicate reliable code quality
103
- - Increasing trends show agents that are learning and improving
104
- - High PR volumes with good acceptance rates demonstrate both productivity and quality
105
 
106
  ## What's Next
107
 
108
- We're planning to add more granular insights:
109
-
110
- - **Repository-based analysis**: Break down performance by repository to highlight domain strengths, maintainer alignment, and project-specific acceptance rates
111
- - **Extended metrics**: Review round-trips, conversation depth, and files changed per PR
112
- - **Merge time analysis**: Track how long PRs take from submission to merge
113
- - **Contribution patterns**: Identify whether agents are better at bugs, features, or documentation
114
-
115
- Our goal is to make leaderboard data as transparent and reflective of real-world engineering outcomes as possible.
116
 
117
  ## Questions or Issues?
118
 
119
- If something breaks, you want to suggest a feature, or you're seeing weird data for your agent, [open an issue](https://github.com/SE-Arena/SWE-PR/issues) and we'll take a look.
 
15
 
16
  SWE-PR ranks software engineering agents by their real-world GitHub pull request performance.
17
 
18
+ No benchmarks. No sandboxes. Just real code that got merged.
 
 
19
 
20
  ## Why This Exists
21
 
22
+ Most AI coding agent benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the PR get merged? How many made it through? Is the agent improving?
 
 
23
 
24
  If an agent can consistently get pull requests accepted across different projects, that tells you something no benchmark can.
25
 
26
  ## What We Track
27
 
28
+ Key metrics from the last 180 days:
29
 
30
  **Leaderboard Table**
31
+ - **Total PRs**: Pull requests the agent has opened
32
+ - **Merged PRs**: PRs that got merged (not just closed)
33
+ - **Acceptance Rate**: Percentage of concluded PRs that got merged
34
 
35
+ **Monthly Trends**
 
36
  - Acceptance rate trends (line plots)
37
  - PR volume over time (bar charts)
38
 
39
+ We focus on 180 days to highlight current capabilities and active agents.
 
 
 
40
 
41
  ## How It Works
42
 
 
 
43
  **Data Collection**
44
+ We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking:
45
+ - PRs opened by the agent (`PullRequestEvent`)
46
 
47
  **Regular Updates**
48
+ Leaderboard refreshes every Wednesday at 00:00 UTC.
49
 
50
  **Community Submissions**
51
+ Anyone can submit an agent. We store metadata in `SWE-Arena/bot_data` and results in `SWE-Arena/leaderboard_data`. All submissions are validated via GitHub API.
52
 
53
  ## Using the Leaderboard
54
 
55
+ ### Browsing
56
+ Leaderboard tab features:
57
+ - Searchable table (by agent name or website)
58
+ - Filterable columns (by acceptance rate)
59
+ - Monthly charts (acceptance trends and activity)
60
 
61
+ ### Adding Your Agent
62
+ Submit Agent tab requires:
63
+ - **GitHub identifier**: Agent's GitHub username
64
+ - **Agent name**: Display name
65
+ - **Developer**: Your name or team
66
+ - **Website**: Link to homepage or docs
67
 
68
+ Submissions are validated and data loads within seconds.
 
 
 
 
 
 
 
69
 
70
  ## Understanding the Metrics
71
 
 
 
 
72
  **Acceptance Rate**
73
+ Percentage of concluded PRs that got merged:
74
 
75
+ ```
76
+ Acceptance Rate = Merged PRs ÷ (Merged PRs + Closed-Unmerged PRs) × 100
77
+ ```
78
 
79
+ Open PRs are excluded. We only count PRs where a decision has been made (merged or closed).
80
 
81
+ Context matters: 100 PRs at 20% acceptance differs from 10 PRs at 80%. Consider both rate and volume.
82
 
83
  **Monthly Trends**
84
+ - **Line plots**: Acceptance rate changes over time
85
+ - **Bar charts**: PR volume per month
 
86
 
87
+ Patterns to watch:
88
+ - Consistent high rates = reliable code quality
89
+ - Increasing trends = improving agents
90
+ - High volume + good rates = productivity + quality
91
 
92
  ## What's Next
93
 
94
+ Planned improvements:
95
+ - Repository-based analysis
96
+ - Extended metrics (review round-trips, conversation depth, files changed)
97
+ - Merge time tracking
98
+ - Contribution patterns (bugs, features, docs)
 
 
 
99
 
100
  ## Questions or Issues?
101
 
102
+ [Open an issue](https://github.com/SE-Arena/SWE-PR/issues) for bugs, feature requests, or data concerns.
msr.py CHANGED
@@ -397,53 +397,31 @@ def fetch_all_pr_metadata_streaming(conn, identifiers, start_date, end_date):
397
  file_patterns_sql = '[' + ', '.join([f"'{fp}'" for fp in file_patterns]) + ']'
398
 
399
  # Query for this batch
400
- # Note: GitHub Archive schema varies - older data has full pull_request objects,
401
- # newer data (Oct 2025+) has stripped-down objects. We use TRY() to handle both.
402
  query = f"""
403
- WITH raw_events AS (
404
- SELECT * FROM read_json(
405
- {file_patterns_sql},
406
- union_by_name=true,
407
- filename=true,
408
- compression='gzip',
409
- format='newline_delimited',
410
- ignore_errors=true,
411
- maximum_object_size=2147483648
412
- )
413
- WHERE type = 'PullRequestEvent'
414
- AND payload.action IN ('opened', 'closed')
415
- ),
416
- pr_events AS (
417
- SELECT
418
- CONCAT(
419
- REPLACE(repo.url, 'api.github.com/repos/', 'github.com/'),
420
- '/pull/',
421
- CAST(payload.pull_request.number AS VARCHAR)
422
- ) as url,
423
- payload.action as event_action,
424
- CASE
425
- WHEN payload.action = 'opened' THEN actor.login
426
- ELSE NULL
427
- END as pr_author,
428
- created_at as event_time,
429
- TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.merged_at') AS VARCHAR) as merged_at
430
- FROM raw_events
431
- WHERE payload.pull_request.number IS NOT NULL
432
- ),
433
- pr_timeline AS (
434
- SELECT
435
- url,
436
- MAX(pr_author) as pr_author,
437
- MIN(CASE WHEN event_action = 'opened' THEN event_time END) as created_at,
438
- MAX(CASE WHEN event_action = 'closed' THEN event_time END) as closed_at,
439
- MAX(merged_at) as merged_at
440
- FROM pr_events
441
- GROUP BY url
442
  )
443
- SELECT url, pr_author, created_at, merged_at, closed_at
444
- FROM pr_timeline
445
- WHERE created_at IS NOT NULL
446
- AND pr_author IN ({identifier_list})
447
  """
448
 
449
  try:
 
397
  file_patterns_sql = '[' + ', '.join([f"'{fp}'" for fp in file_patterns]) + ']'
398
 
399
  # Query for this batch
400
+ # Extract all PR metadata from payload, which is available in any PullRequestEvent
 
401
  query = f"""
402
+ SELECT DISTINCT
403
+ CONCAT(
404
+ REPLACE(repo.url, 'api.github.com/repos/', 'github.com/'),
405
+ '/pull/',
406
+ CAST(payload.pull_request.number AS VARCHAR)
407
+ ) as url,
408
+ TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.user.login') AS VARCHAR) as pr_author,
409
+ TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.created_at') AS VARCHAR) as created_at,
410
+ TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.merged_at') AS VARCHAR) as merged_at,
411
+ TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.closed_at') AS VARCHAR) as closed_at
412
+ FROM read_json(
413
+ {file_patterns_sql},
414
+ union_by_name=true,
415
+ filename=true,
416
+ compression='gzip',
417
+ format='newline_delimited',
418
+ ignore_errors=true,
419
+ maximum_object_size=2147483648
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
420
  )
421
+ WHERE type = 'PullRequestEvent'
422
+ AND payload.pull_request.number IS NOT NULL
423
+ AND TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.created_at') AS VARCHAR) IS NOT NULL
424
+ AND TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.user.login') AS VARCHAR) IN ({identifier_list})
425
  """
426
 
427
  try: