zhiminy commited on
Commit
e69fe14
·
1 Parent(s): 12fb7c9

add readme

Browse files
Files changed (2) hide show
  1. README.md +92 -0
  2. app.py +15 -8
README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: SWE Agent PR Leaderboard
3
+ emoji: <�
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: 5.49.1
8
+ app_file: app.py
9
+ hf_oauth: true
10
+ pinned: false
11
+ short_description: Track and compare GitHub pull request statistics for SWE agents
12
+ ---
13
+
14
+ # SWE Agent PR Leaderboard
15
+
16
+ A lightweight platform for tracking real-world GitHub pull request statistics for software engineering agents. No benchmarks. No simulations. Just actual code that got merged.
17
+
18
+ ## Why This Exists
19
+
20
+ Most AI coding agent benchmarks rely on human-curated test suites and simulated environments. They're useful, but they don't tell you what happens when an agent meets real repositories, real maintainers, and real code review standards.
21
+
22
+ This leaderboard flips that approach. Instead of synthetic tasks, we measure what matters: did the PR get merged? How long did it take? How many actually made it through? These are the signals that reflect genuine software engineering impact - the kind you'd see from a human contributor.
23
+
24
+ If an agent can consistently get pull requests accepted across different projects, that tells you something no benchmark can.
25
+
26
+ ## What We Track
27
+
28
+ The leaderboard pulls data directly from GitHub's PR history and shows you four key metrics:
29
+
30
+ - **Total PRs**: How many pull requests the agent has opened
31
+ - **Merged PRs**: How many actually got merged (not just closed)
32
+ - **Acceptance Rate**: Percentage of PRs that made it through review and got merged
33
+ - **Median Merge Duration**: Typical time from PR creation to merge, in minutes
34
+
35
+ These aren't fancy metrics, but they're honest ones. They show which agents are actually contributing to real codebases.
36
+
37
+ ## How It Works
38
+
39
+ Behind the scenes, we're doing a few things:
40
+
41
+ **Data Collection**
42
+ We search GitHub using multiple query patterns to catch all PRs associated with an agent:
43
+ - Direct authorship (`author:agent-name`)
44
+ - Branch-based PRs (`head:agent-name/`)
45
+ - Co-authored commits (because some agents work collaboratively)
46
+
47
+ **Regular Updates**
48
+ The leaderboard refreshes every 24 hours automatically. You can also hit the refresh button if you want fresh data right now.
49
+
50
+ **Community Submissions**
51
+ Anyone can submit an agent to track. We store agent metadata on HuggingFace datasets (`SWE-Arena/pr_agents`) and the computed leaderboard data in another dataset (`SWE-Arena/pr_leaderboard`).
52
+
53
+ ## Using the Leaderboard
54
+
55
+ **Just Browsing?**
56
+ Head to the Leaderboard tab. You can search by agent name or organization, filter by acceptance rate or merge duration. Click refresh if you want the latest numbers.
57
+
58
+ **Want to Add Your Agent?**
59
+ Go to the Submit Agent tab and fill in:
60
+ - GitHub identifier (agent account)
61
+ - Agent name
62
+ - Organization name
63
+ - Description (optional but helpful)
64
+ - Website URL (optional)
65
+
66
+ Hit submit. We'll validate the GitHub identifier, fetch the PR history, and add it to the board. The whole process takes a few seconds.
67
+
68
+ ## Understanding the Metrics
69
+
70
+ **Total PRs vs Merged PRs**
71
+ Not every PR should get merged. Sometimes agents propose changes that don't fit the project's direction, or they might be experiments. But a consistently low merge rate might signal that an agent isn't quite aligned with what maintainers want.
72
+
73
+ **Acceptance Rate**
74
+ This is the percentage of PRs that got merged. Higher is generally better, but context matters. An agent opening 100 PRs with a 20% acceptance rate is different from one opening 10 PRs at 80%.
75
+
76
+ **Median Merge Duration**
77
+ How long it typically takes from opening a PR to seeing it merged. Faster isn't always better - some PRs need time for discussion and iteration. But extremely long merge times might indicate PRs that sat idle or needed extensive back-and-forth.
78
+
79
+ ## What's Next
80
+
81
+ We're planning a few additions:
82
+
83
+ - **Historical trends**: Track how agents improve over time
84
+ - **Repository breakdowns**: See which projects an agent contributes to
85
+ - **Time-series visualizations**: Watch acceptance rates and merge times evolve
86
+ - **Extended metrics**: Review round-trips, conversation depth, files changed per PR
87
+
88
+ The goal isn't to build the most sophisticated leaderboard. It's to build the most honest one.
89
+
90
+ ## Questions or Issues?
91
+
92
+ If something breaks, you want to suggest a feature, or you're seeing weird data for your agent, [open an issue](https://github.com/SE-Arena/SWE-Merge/issues) and we'll take a look.
app.py CHANGED
@@ -6,7 +6,7 @@ import time
6
  import requests
7
  from datetime import datetime, timezone
8
  from collections import defaultdict
9
- from huggingface_hub import HfApi, HfFolder, hf_hub_download
10
  from datasets import load_dataset, Dataset
11
  import threading
12
  from dotenv import load_dotenv
@@ -104,8 +104,7 @@ def normalize_date_format(date_string):
104
  # GITHUB API OPERATIONS
105
  # =============================================================================
106
 
107
- def request_with_backoff(method, url, *, headers=None, params=None, json_body=None, data=None,
108
- max_retries=10, timeout=60):
109
  """
110
  Perform an HTTP request with exponential backoff and jitter for GitHub API.
111
  Retries on 403/429 (rate limits), 5xx server errors, and transient network exceptions.
@@ -241,7 +240,7 @@ def fetch_all_prs(identifier, token=None):
241
  }
242
 
243
  try:
244
- response = request_with_backoff('GET', url, headers=headers, params=params, max_retries=6)
245
  if response is None:
246
  print(f"Error fetching PRs for query '{query}': retries exhausted")
247
  break
@@ -419,14 +418,22 @@ def load_leaderboard_dataset():
419
  return None
420
 
421
 
 
 
 
 
 
 
 
 
422
  def save_agent_to_hf(data):
423
  """Save a new agent to HuggingFace dataset as {identifier}.json in root."""
424
  try:
425
  api = HfApi()
426
- token = HfFolder.get_token()
427
 
428
  if not token:
429
- raise Exception("No HuggingFace token found")
430
 
431
  identifier = data['github_identifier']
432
  filename = f"{identifier}.json"
@@ -458,9 +465,9 @@ def save_agent_to_hf(data):
458
  def save_leaderboard_to_hf(cache_dict):
459
  """Save complete leaderboard to HuggingFace dataset as CSV."""
460
  try:
461
- token = HfFolder.get_token()
462
  if not token:
463
- raise Exception("No HuggingFace token found")
464
 
465
  # Convert to DataFrame
466
  data_list = dict_to_cache(cache_dict)
 
6
  import requests
7
  from datetime import datetime, timezone
8
  from collections import defaultdict
9
+ from huggingface_hub import HfApi, hf_hub_download
10
  from datasets import load_dataset, Dataset
11
  import threading
12
  from dotenv import load_dotenv
 
104
  # GITHUB API OPERATIONS
105
  # =============================================================================
106
 
107
+ def request_with_backoff(method, url, *, headers=None, params=None, json_body=None, data=None, max_retries=10, timeout=30):
 
108
  """
109
  Perform an HTTP request with exponential backoff and jitter for GitHub API.
110
  Retries on 403/429 (rate limits), 5xx server errors, and transient network exceptions.
 
240
  }
241
 
242
  try:
243
+ response = request_with_backoff('GET', url, headers=headers, params=params)
244
  if response is None:
245
  print(f"Error fetching PRs for query '{query}': retries exhausted")
246
  break
 
418
  return None
419
 
420
 
421
+ def get_hf_token():
422
+ """Get HuggingFace token from environment variables."""
423
+ token = os.getenv('HF_TOKEN')
424
+ if not token:
425
+ print("Warning: HF_TOKEN not found in environment variables")
426
+ return token
427
+
428
+
429
  def save_agent_to_hf(data):
430
  """Save a new agent to HuggingFace dataset as {identifier}.json in root."""
431
  try:
432
  api = HfApi()
433
+ token = get_hf_token()
434
 
435
  if not token:
436
+ raise Exception("No HuggingFace token found. Please set HF_TOKEN in your Space settings.")
437
 
438
  identifier = data['github_identifier']
439
  filename = f"{identifier}.json"
 
465
  def save_leaderboard_to_hf(cache_dict):
466
  """Save complete leaderboard to HuggingFace dataset as CSV."""
467
  try:
468
+ token = get_hf_token()
469
  if not token:
470
+ raise Exception("No HuggingFace token found. Please set HF_TOKEN in your Space settings.")
471
 
472
  # Convert to DataFrame
473
  data_list = dict_to_cache(cache_dict)