Spaces:

SWE-Arena
/

SWE-PR

Sleeping

App Files Files Community

zhimin-z commited on Nov 16

Commit

6435782

1 Parent(s): 73aa8ef

refine

Browse files

Files changed (2) hide show

README.md +42 -59
msr.py +23 -45

README.md CHANGED Viewed

@@ -15,105 +15,88 @@ short_description: Track GitHub PR statistics for SWE agents
 SWE-PR ranks software engineering agents by their real-world GitHub pull request performance.
-A lightweight platform for tracking real-world GitHub pull request statistics for software engineering agents. No benchmarks. No sandboxes. Just real code that got merged.
-Currently, the leaderboard tracks public GitHub PRs across open-source repositories where the agent has contributed.
 ## Why This Exists
-Most AI coding agent benchmarks rely on human-curated test suites and simulated environments. They're useful, but they don't tell you what happens when an agent meets real repositories, real maintainers, and real code review standards.
-This leaderboard flips that approach. Instead of synthetic tasks, we measure what matters: did the PR get merged? How many actually made it through? Is the agent improving over time? These are the signals that reflect genuine software engineering impact - the kind you'd see from a human contributor.
 If an agent can consistently get pull requests accepted across different projects, that tells you something no benchmark can.
 ## What We Track
-The leaderboard pulls data directly from GitHub's PR history and shows you key metrics from the last 6 months:
 **Leaderboard Table**
-- **Total PRs**: How many pull requests the agent has opened in the last 6 months
-- **Merged PRs**: How many actually got merged (not just closed)
-- **Acceptance Rate**: Percentage of concluded PRs that got merged (see calculation details below)
-**Monthly Trends Visualization**
-Beyond the table, we show interactive charts tracking how each agent's performance evolves month-by-month:
 - Acceptance rate trends (line plots)
 - PR volume over time (bar charts)
-This helps you see which agents are improving, which are consistently strong, and how active they've been recently.
-**Why 6 Months?**
-We focus on recent performance (last 6 months) to highlight active agents and current capabilities. This ensures the leaderboard reflects the latest versions of agents rather than outdated historical data, making it more relevant for evaluating current performance.
 ## How It Works
-Behind the scenes, we're doing a few things:
 **Data Collection**
-We search GitHub using multiple query patterns to catch all PRs associated with an agent:
-- Direct authorship (`author:agent-name`)
 **Regular Updates**
-The leaderboard refreshes automatically on every Wednesday at 12:00 AM UTC.
 **Community Submissions**
-Anyone can submit a coding agent to track via the leaderboard. We store agent metadata in Hugging Face datasets (`SWE-Arena/bot_metadata`) and leaderboard data in (`SWE-Arena/leaderboard_metadata`). All submissions are automatically validated through GitHub's API to ensure the account exists and has public activity.
 ## Using the Leaderboard
-### Just Browsing?
-Head to the Leaderboard tab where you'll find:
-- **Searchable table**: Search by agent name or website
-- **Filterable columns**: Filter by acceptance rate to find top performers
-- **Monthly charts**: Scroll down to see acceptance rate trends and PR activity over time
-The charts use color-coded lines and bars so you can easily track individual agents across months.
-### Want to Add Your Agent?
-In the Submit Agent tab, provide:
-- **GitHub identifier*** (required): Your agent's GitHub username or bot account
-- **Agent name*** (required): Display name for the leaderboard
-- **Developer*** (required): Your name or team name
-- **Website*** (required): Link to your agent's homepage or documentation
-Click Submit. We'll validate the GitHub account, fetch the PR history, and add your agent to the board. Initial data loading takes a few seconds.
 ## Understanding the Metrics
-**Total PRs vs Merged PRs**
-Not every PR should get merged. Sometimes agents propose changes that don't fit the project's direction, or they might be experiments. But a consistently low merge rate might signal that an agent isn't quite aligned with what maintainers want.
 **Acceptance Rate**
-This is the percentage of concluded PRs that got merged, calculated as:
-Acceptance Rate = merged PRs ÷ (merged + closed but unmerged PRs) × 100
-**Important**: Open PRs are excluded from this calculation. We only count PRs where a decision has been made (merged or closed).
-Higher acceptance rates are generally better, but context matters. An agent with 100 PRs and a 20% acceptance rate is different from one with 10 PRs at 80%. Look at both the rate and the volume.
 **Monthly Trends**
-The visualization below the leaderboard table shows:
-- **Line plots**: How acceptance rates change over time for each agent
-- **Bar charts**: How many PRs each agent created each month
-Use these charts to spot patterns:
-- Consistent high acceptance rates indicate reliable code quality
-- Increasing trends show agents that are learning and improving
-- High PR volumes with good acceptance rates demonstrate both productivity and quality
 ## What's Next
-We're planning to add more granular insights:
-- **Repository-based analysis**: Break down performance by repository to highlight domain strengths, maintainer alignment, and project-specific acceptance rates
-- **Extended metrics**: Review round-trips, conversation depth, and files changed per PR
-- **Merge time analysis**: Track how long PRs take from submission to merge
-- **Contribution patterns**: Identify whether agents are better at bugs, features, or documentation
-Our goal is to make leaderboard data as transparent and reflective of real-world engineering outcomes as possible.
 ## Questions or Issues?
-If something breaks, you want to suggest a feature, or you're seeing weird data for your agent, [open an issue](https://github.com/SE-Arena/SWE-PR/issues) and we'll take a look.

 SWE-PR ranks software engineering agents by their real-world GitHub pull request performance.
+No benchmarks. No sandboxes. Just real code that got merged.
 ## Why This Exists
+Most AI coding agent benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the PR get merged? How many made it through? Is the agent improving?
 If an agent can consistently get pull requests accepted across different projects, that tells you something no benchmark can.
 ## What We Track
+Key metrics from the last 180 days:
 **Leaderboard Table**
+- **Total PRs**: Pull requests the agent has opened
+- **Merged PRs**: PRs that got merged (not just closed)
+- **Acceptance Rate**: Percentage of concluded PRs that got merged
+**Monthly Trends**
 - Acceptance rate trends (line plots)
 - PR volume over time (bar charts)
+We focus on 180 days to highlight current capabilities and active agents.
 ## How It Works
 **Data Collection**
+We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking:
+- PRs opened by the agent (`PullRequestEvent`)
 **Regular Updates**
+Leaderboard refreshes every Wednesday at 00:00 UTC.
 **Community Submissions**
+Anyone can submit an agent. We store metadata in `SWE-Arena/bot_data` and results in `SWE-Arena/leaderboard_data`. All submissions are validated via GitHub API.
 ## Using the Leaderboard
+### Browsing
+Leaderboard tab features:
+- Searchable table (by agent name or website)
+- Filterable columns (by acceptance rate)
+- Monthly charts (acceptance trends and activity)
+### Adding Your Agent
+Submit Agent tab requires:
+- **GitHub identifier**: Agent's GitHub username
+- **Agent name**: Display name
+- **Developer**: Your name or team
+- **Website**: Link to homepage or docs
+Submissions are validated and data loads within seconds.
 ## Understanding the Metrics
 **Acceptance Rate**
+Percentage of concluded PRs that got merged:
+```
+Acceptance Rate = Merged PRs ÷ (Merged PRs + Closed-Unmerged PRs) × 100
+```
+Open PRs are excluded. We only count PRs where a decision has been made (merged or closed).
+Context matters: 100 PRs at 20% acceptance differs from 10 PRs at 80%. Consider both rate and volume.
 **Monthly Trends**
+- **Line plots**: Acceptance rate changes over time
+- **Bar charts**: PR volume per month
+Patterns to watch:
+- Consistent high rates = reliable code quality
+- Increasing trends = improving agents
+- High volume + good rates = productivity + quality
 ## What's Next
+Planned improvements:
+- Repository-based analysis
+- Extended metrics (review round-trips, conversation depth, files changed)
+- Merge time tracking
+- Contribution patterns (bugs, features, docs)
 ## Questions or Issues?
+[Open an issue](https://github.com/SE-Arena/SWE-PR/issues) for bugs, feature requests, or data concerns.

msr.py CHANGED Viewed

@@ -397,53 +397,31 @@ def fetch_all_pr_metadata_streaming(conn, identifiers, start_date, end_date):
         file_patterns_sql = '[' + ', '.join([f"'{fp}'" for fp in file_patterns]) + ']'
         # Query for this batch
-        # Note: GitHub Archive schema varies - older data has full pull_request objects,
-        # newer data (Oct 2025+) has stripped-down objects. We use TRY() to handle both.
         query = f"""
-        WITH raw_events AS (
-            SELECT * FROM read_json(
-                {file_patterns_sql},
-                union_by_name=true,
-                filename=true,
-                compression='gzip',
-                format='newline_delimited',
-                ignore_errors=true,
-                maximum_object_size=2147483648
-            )
-            WHERE type = 'PullRequestEvent'
-                AND payload.action IN ('opened', 'closed')
-        ),
-        pr_events AS (
-            SELECT
-                CONCAT(
-                    REPLACE(repo.url, 'api.github.com/repos/', 'github.com/'),
-                    '/pull/',
-                    CAST(payload.pull_request.number AS VARCHAR)
-                ) as url,
-                payload.action as event_action,
-                CASE
-                    WHEN payload.action = 'opened' THEN actor.login
-                    ELSE NULL
-                END as pr_author,
-                created_at as event_time,
-                TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.merged_at') AS VARCHAR) as merged_at
-            FROM raw_events
-            WHERE payload.pull_request.number IS NOT NULL
-        ),
-        pr_timeline AS (
-            SELECT
-                url,
-                MAX(pr_author) as pr_author,
-                MIN(CASE WHEN event_action = 'opened' THEN event_time END) as created_at,
-                MAX(CASE WHEN event_action = 'closed' THEN event_time END) as closed_at,
-                MAX(merged_at) as merged_at
-            FROM pr_events
-            GROUP BY url
         )
-        SELECT url, pr_author, created_at, merged_at, closed_at
-        FROM pr_timeline
-        WHERE created_at IS NOT NULL
-            AND pr_author IN ({identifier_list})
         """
         try:

         file_patterns_sql = '[' + ', '.join([f"'{fp}'" for fp in file_patterns]) + ']'
         # Query for this batch
+        # Extract all PR metadata from payload, which is available in any PullRequestEvent
         query = f"""
+        SELECT DISTINCT
+            CONCAT(
+                REPLACE(repo.url, 'api.github.com/repos/', 'github.com/'),
+                '/pull/',
+                CAST(payload.pull_request.number AS VARCHAR)
+            ) as url,
+            TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.user.login') AS VARCHAR) as pr_author,
+            TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.created_at') AS VARCHAR) as created_at,
+            TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.merged_at') AS VARCHAR) as merged_at,
+            TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.closed_at') AS VARCHAR) as closed_at
+        FROM read_json(
+            {file_patterns_sql},
+            union_by_name=true,
+            filename=true,
+            compression='gzip',
+            format='newline_delimited',
+            ignore_errors=true,
+            maximum_object_size=2147483648
         )
+        WHERE type = 'PullRequestEvent'
+            AND payload.pull_request.number IS NOT NULL
+            AND TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.created_at') AS VARCHAR) IS NOT NULL
+            AND TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.user.login') AS VARCHAR) IN ({identifier_list})
         """
         try: