zhimin-z
commited on
Commit
·
1b10ccd
1
Parent(s):
be99280
merge commit with pr
Browse files
README.md
CHANGED
|
@@ -11,17 +11,17 @@ pinned: false
|
|
| 11 |
short_description: Track GitHub PR statistics for SWE assistants
|
| 12 |
---
|
| 13 |
|
| 14 |
-
# SWE Assistant PR Leaderboard
|
| 15 |
|
| 16 |
-
SWE-PR ranks software engineering assistants by their real-world GitHub pull request performance.
|
| 17 |
|
| 18 |
-
No benchmarks. No sandboxes. Just real code that got merged.
|
| 19 |
|
| 20 |
## Why This Exists
|
| 21 |
|
| 22 |
-
Most AI assistant benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the PR get merged? How many
|
| 23 |
|
| 24 |
-
If an assistant can consistently get pull requests accepted across different projects, that tells you something no benchmark can.
|
| 25 |
|
| 26 |
## What We Track
|
| 27 |
|
|
@@ -31,12 +31,14 @@ Key metrics from the last 180 days:
|
|
| 31 |
- **Assistant**: Display name of the assistant
|
| 32 |
- **Website**: Link to the assistant's homepage or documentation
|
| 33 |
- **Total PRs**: Pull requests the assistant has opened
|
|
|
|
| 34 |
- **Merged PRs**: PRs that got merged (not just closed)
|
| 35 |
- **Acceptance Rate**: Percentage of concluded PRs that got merged
|
| 36 |
|
| 37 |
**Monthly Trends**
|
| 38 |
-
-
|
| 39 |
- PR volume over time (bar charts)
|
|
|
|
| 40 |
|
| 41 |
We focus on 180 days to highlight current capabilities and active assistants.
|
| 42 |
|
|
@@ -45,6 +47,7 @@ We focus on 180 days to highlight current capabilities and active assistants.
|
|
| 45 |
**Data Collection**
|
| 46 |
We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking:
|
| 47 |
- PRs opened by the assistant (`PullRequestEvent`)
|
|
|
|
| 48 |
|
| 49 |
**Regular Updates**
|
| 50 |
Leaderboard refreshes weekly (Monday at 00:00 UTC).
|
|
@@ -69,7 +72,8 @@ Context matters: 100 PRs at 20% acceptance differs from 10 PRs at 80%. Consider
|
|
| 69 |
|
| 70 |
Planned improvements:
|
| 71 |
- Repository-based analysis
|
| 72 |
-
- Extended metrics (review round-trips, conversation depth, files changed)
|
|
|
|
| 73 |
- Merge time tracking
|
| 74 |
- Contribution patterns (bugs, features, docs)
|
| 75 |
|
|
|
|
| 11 |
short_description: Track GitHub PR statistics for SWE assistants
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# SWE Assistant PR & Commit Leaderboard
|
| 15 |
|
| 16 |
+
SWE-PR ranks software engineering assistants by their real-world GitHub pull request and commit performance.
|
| 17 |
|
| 18 |
+
No benchmarks. No sandboxes. Just real code that got merged and commits that got pushed.
|
| 19 |
|
| 20 |
## Why This Exists
|
| 21 |
|
| 22 |
+
Most AI assistant benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the PR get merged? How many commits are being created? How active is the assistant across different projects? Is the assistant improving?
|
| 23 |
|
| 24 |
+
If an assistant can consistently get pull requests accepted and create commits across different projects, that tells you something no benchmark can.
|
| 25 |
|
| 26 |
## What We Track
|
| 27 |
|
|
|
|
| 31 |
- **Assistant**: Display name of the assistant
|
| 32 |
- **Website**: Link to the assistant's homepage or documentation
|
| 33 |
- **Total PRs**: Pull requests the assistant has opened
|
| 34 |
+
- **Total Commits**: Commits created by the assistant
|
| 35 |
- **Merged PRs**: PRs that got merged (not just closed)
|
| 36 |
- **Acceptance Rate**: Percentage of concluded PRs that got merged
|
| 37 |
|
| 38 |
**Monthly Trends**
|
| 39 |
+
- PR acceptance rate trends (line plots)
|
| 40 |
- PR volume over time (bar charts)
|
| 41 |
+
- Commit volume over time (bar charts)
|
| 42 |
|
| 43 |
We focus on 180 days to highlight current capabilities and active assistants.
|
| 44 |
|
|
|
|
| 47 |
**Data Collection**
|
| 48 |
We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking:
|
| 49 |
- PRs opened by the assistant (`PullRequestEvent`)
|
| 50 |
+
- Commits created by the assistant (`PushEvent`)
|
| 51 |
|
| 52 |
**Regular Updates**
|
| 53 |
Leaderboard refreshes weekly (Monday at 00:00 UTC).
|
|
|
|
| 72 |
|
| 73 |
Planned improvements:
|
| 74 |
- Repository-based analysis
|
| 75 |
+
- Extended PR metrics (review round-trips, conversation depth, files changed)
|
| 76 |
+
- Extended commit metrics (commit frequency patterns, code churn)
|
| 77 |
- Merge time tracking
|
| 78 |
- Contribution patterns (bugs, features, docs)
|
| 79 |
|
app.py
CHANGED
|
@@ -31,6 +31,7 @@ LEADERBOARD_COLUMNS = [
|
|
| 31 |
("Assistant", "string"),
|
| 32 |
("Website", "string"),
|
| 33 |
("Total PRs", "number"),
|
|
|
|
| 34 |
("Merged PRs", "number"),
|
| 35 |
("Acceptance Rate (%)", "number"),
|
| 36 |
]
|
|
@@ -269,25 +270,42 @@ def load_leaderboard_data_from_hf():
|
|
| 269 |
# UI FUNCTIONS
|
| 270 |
# =============================================================================
|
| 271 |
|
| 272 |
-
def create_monthly_metrics_plot(top_n=5):
|
| 273 |
"""
|
| 274 |
-
Create a Plotly figure
|
| 275 |
-
-
|
| 276 |
-
-
|
| 277 |
|
| 278 |
Each assistant gets a unique color for both their line and bars.
|
| 279 |
|
| 280 |
Args:
|
|
|
|
| 281 |
top_n: Number of top assistants to show (default: 5)
|
| 282 |
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 283 |
# Load from saved dataset
|
| 284 |
saved_data = load_leaderboard_data_from_hf()
|
| 285 |
|
| 286 |
-
if not saved_data or
|
| 287 |
# Return an empty figure with a message
|
| 288 |
fig = go.Figure()
|
| 289 |
fig.add_annotation(
|
| 290 |
-
text=
|
| 291 |
xref="paper", yref="paper",
|
| 292 |
x=0.5, y=0.5, showarrow=False,
|
| 293 |
font=dict(size=16)
|
|
@@ -299,19 +317,19 @@ def create_monthly_metrics_plot(top_n=5):
|
|
| 299 |
)
|
| 300 |
return fig
|
| 301 |
|
| 302 |
-
metrics = saved_data[
|
| 303 |
-
print(f"Loaded monthly metrics from saved dataset")
|
| 304 |
|
| 305 |
# Apply top_n filter if specified
|
| 306 |
if top_n is not None and top_n > 0 and metrics.get('assistants'):
|
| 307 |
-
# Calculate total
|
| 308 |
agent_totals = []
|
| 309 |
for agent_name in metrics['assistants']:
|
| 310 |
agent_data = metrics['data'].get(agent_name, {})
|
| 311 |
-
|
| 312 |
-
agent_totals.append((agent_name,
|
| 313 |
|
| 314 |
-
# Sort by total
|
| 315 |
agent_totals.sort(key=lambda x: x[1], reverse=True)
|
| 316 |
top_agents = [agent_name for agent_name, _ in agent_totals[:top_n]]
|
| 317 |
|
|
@@ -338,8 +356,11 @@ def create_monthly_metrics_plot(top_n=5):
|
|
| 338 |
)
|
| 339 |
return fig
|
| 340 |
|
| 341 |
-
# Create figure with secondary y-axis
|
| 342 |
-
|
|
|
|
|
|
|
|
|
|
| 343 |
|
| 344 |
# Generate unique colors for many assistants using HSL color space
|
| 345 |
def generate_color(index, total):
|
|
@@ -361,70 +382,79 @@ def create_monthly_metrics_plot(top_n=5):
|
|
| 361 |
color = agent_colors[agent_name]
|
| 362 |
agent_data = data[agent_name]
|
| 363 |
|
| 364 |
-
|
| 365 |
-
|
| 366 |
-
|
| 367 |
-
|
| 368 |
-
|
| 369 |
-
|
| 370 |
-
|
| 371 |
-
|
| 372 |
-
|
| 373 |
-
|
| 374 |
-
|
| 375 |
-
|
| 376 |
-
|
| 377 |
-
|
| 378 |
-
|
| 379 |
-
|
| 380 |
-
|
| 381 |
-
|
| 382 |
-
|
| 383 |
-
|
| 384 |
-
|
| 385 |
-
|
| 386 |
-
|
| 387 |
-
|
|
|
|
| 388 |
|
| 389 |
-
# Add bar trace for total
|
| 390 |
-
# Only show bars for months where assistant has
|
| 391 |
x_bars = []
|
| 392 |
y_bars = []
|
| 393 |
-
for month, count in zip(months, agent_data[
|
| 394 |
-
if count > 0: # Only include months with
|
| 395 |
x_bars.append(month)
|
| 396 |
y_bars.append(count)
|
| 397 |
|
| 398 |
if x_bars and y_bars: # Only add trace if there's data
|
| 399 |
-
|
| 400 |
-
|
| 401 |
-
|
| 402 |
-
|
| 403 |
-
|
| 404 |
-
|
| 405 |
-
|
| 406 |
-
|
| 407 |
-
|
| 408 |
-
|
| 409 |
-
|
| 410 |
-
|
| 411 |
-
|
| 412 |
-
|
| 413 |
-
|
| 414 |
-
|
|
|
|
|
|
|
| 415 |
|
| 416 |
# Update axes labels
|
| 417 |
fig.update_xaxes(title_text=None)
|
| 418 |
-
|
| 419 |
-
|
| 420 |
-
|
| 421 |
-
|
| 422 |
-
|
| 423 |
-
|
| 424 |
-
|
| 425 |
-
|
| 426 |
-
|
| 427 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 428 |
|
| 429 |
# Update layout
|
| 430 |
show_legend = (top_n is not None and top_n <= 10)
|
|
@@ -481,6 +511,7 @@ def get_leaderboard_dataframe():
|
|
| 481 |
data.get('name', 'Unknown'),
|
| 482 |
data.get('website', 'N/A'),
|
| 483 |
total_prs,
|
|
|
|
| 484 |
data.get('merged_prs', 0),
|
| 485 |
data.get('acceptance_rate', 0.0),
|
| 486 |
])
|
|
@@ -493,7 +524,7 @@ def get_leaderboard_dataframe():
|
|
| 493 |
df = pd.DataFrame(rows, columns=column_names)
|
| 494 |
|
| 495 |
# Ensure numeric types
|
| 496 |
-
numeric_cols = ["Total PRs", "Merged PRs", "Acceptance Rate (%)"]
|
| 497 |
for col in numeric_cols:
|
| 498 |
if col in df.columns:
|
| 499 |
df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0)
|
|
@@ -610,15 +641,15 @@ print(f"On startup: Loads cached data from HuggingFace on demand")
|
|
| 610 |
print(f"{'='*80}\n")
|
| 611 |
|
| 612 |
# Create Gradio interface
|
| 613 |
-
with gr.Blocks(title="SWE Assistant PR Leaderboard", theme=gr.themes.Soft()) as app:
|
| 614 |
-
gr.Markdown("# SWE Assistant PR Leaderboard")
|
| 615 |
-
gr.Markdown(f"Track and compare GitHub pull request statistics for SWE assistants")
|
| 616 |
|
| 617 |
with gr.Tabs():
|
| 618 |
|
| 619 |
# Leaderboard Tab
|
| 620 |
with gr.Tab("Leaderboard"):
|
| 621 |
-
gr.Markdown("*Statistics are based on assistant PR activity tracked by the system*")
|
| 622 |
leaderboard_table = Leaderboard(
|
| 623 |
value=pd.DataFrame(columns=[col[0] for col in LEADERBOARD_COLUMNS]), # Empty initially
|
| 624 |
datatype=LEADERBOARD_COLUMNS,
|
|
@@ -642,18 +673,32 @@ with gr.Blocks(title="SWE Assistant PR Leaderboard", theme=gr.themes.Soft()) as
|
|
| 642 |
outputs=[leaderboard_table]
|
| 643 |
)
|
| 644 |
|
| 645 |
-
# Monthly Metrics Section
|
| 646 |
gr.Markdown("---") # Divider
|
| 647 |
with gr.Group():
|
| 648 |
-
gr.Markdown("### Monthly Performance - Top 5 Assistants")
|
| 649 |
gr.Markdown("*Shows acceptance rate trends and PR volumes for the most active assistants*")
|
| 650 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 651 |
|
| 652 |
-
# Load monthly metrics when app starts
|
| 653 |
app.load(
|
| 654 |
-
fn=lambda: create_monthly_metrics_plot(),
|
| 655 |
inputs=[],
|
| 656 |
-
outputs=[
|
| 657 |
)
|
| 658 |
|
| 659 |
|
|
|
|
| 31 |
("Assistant", "string"),
|
| 32 |
("Website", "string"),
|
| 33 |
("Total PRs", "number"),
|
| 34 |
+
("Total Commits", "number"),
|
| 35 |
("Merged PRs", "number"),
|
| 36 |
("Acceptance Rate (%)", "number"),
|
| 37 |
]
|
|
|
|
| 270 |
# UI FUNCTIONS
|
| 271 |
# =============================================================================
|
| 272 |
|
| 273 |
+
def create_monthly_metrics_plot(type="pr", top_n=5):
|
| 274 |
"""
|
| 275 |
+
Create a Plotly figure showing monthly metrics.
|
| 276 |
+
- For PRs: Acceptance Rate (%) as line curves, Total PRs as bar charts
|
| 277 |
+
- For Commits: Total Commits as bar charts
|
| 278 |
|
| 279 |
Each assistant gets a unique color for both their line and bars.
|
| 280 |
|
| 281 |
Args:
|
| 282 |
+
type: Type of metrics to display - "pr" or "commit" (default: "pr")
|
| 283 |
top_n: Number of top assistants to show (default: 5)
|
| 284 |
"""
|
| 285 |
+
# Determine metrics key and field names based on type
|
| 286 |
+
if type == "commit":
|
| 287 |
+
metrics_key = 'commit_monthly_metrics'
|
| 288 |
+
total_field = 'total_commits'
|
| 289 |
+
no_data_msg = "No commit data available for visualization"
|
| 290 |
+
total_label = "Total Commits"
|
| 291 |
+
print_msg = "commit"
|
| 292 |
+
has_rate = False
|
| 293 |
+
else: # default to "pr"
|
| 294 |
+
metrics_key = 'pr_monthly_metrics'
|
| 295 |
+
total_field = 'total_prs'
|
| 296 |
+
no_data_msg = "No PR data available for visualization"
|
| 297 |
+
total_label = "Total PRs"
|
| 298 |
+
print_msg = "PR"
|
| 299 |
+
has_rate = True
|
| 300 |
+
|
| 301 |
# Load from saved dataset
|
| 302 |
saved_data = load_leaderboard_data_from_hf()
|
| 303 |
|
| 304 |
+
if not saved_data or metrics_key not in saved_data:
|
| 305 |
# Return an empty figure with a message
|
| 306 |
fig = go.Figure()
|
| 307 |
fig.add_annotation(
|
| 308 |
+
text=no_data_msg,
|
| 309 |
xref="paper", yref="paper",
|
| 310 |
x=0.5, y=0.5, showarrow=False,
|
| 311 |
font=dict(size=16)
|
|
|
|
| 317 |
)
|
| 318 |
return fig
|
| 319 |
|
| 320 |
+
metrics = saved_data[metrics_key]
|
| 321 |
+
print(f"Loaded {print_msg} monthly metrics from saved dataset")
|
| 322 |
|
| 323 |
# Apply top_n filter if specified
|
| 324 |
if top_n is not None and top_n > 0 and metrics.get('assistants'):
|
| 325 |
+
# Calculate total count for each assistant
|
| 326 |
agent_totals = []
|
| 327 |
for agent_name in metrics['assistants']:
|
| 328 |
agent_data = metrics['data'].get(agent_name, {})
|
| 329 |
+
total_count = sum(agent_data.get(total_field, []))
|
| 330 |
+
agent_totals.append((agent_name, total_count))
|
| 331 |
|
| 332 |
+
# Sort by total count and take top N
|
| 333 |
agent_totals.sort(key=lambda x: x[1], reverse=True)
|
| 334 |
top_agents = [agent_name for agent_name, _ in agent_totals[:top_n]]
|
| 335 |
|
|
|
|
| 356 |
)
|
| 357 |
return fig
|
| 358 |
|
| 359 |
+
# Create figure with secondary y-axis (for PRs) or single axis (for commits)
|
| 360 |
+
if has_rate:
|
| 361 |
+
fig = make_subplots(specs=[[{"secondary_y": True}]])
|
| 362 |
+
else:
|
| 363 |
+
fig = go.Figure()
|
| 364 |
|
| 365 |
# Generate unique colors for many assistants using HSL color space
|
| 366 |
def generate_color(index, total):
|
|
|
|
| 382 |
color = agent_colors[agent_name]
|
| 383 |
agent_data = data[agent_name]
|
| 384 |
|
| 385 |
+
if has_rate:
|
| 386 |
+
# Add line trace for acceptance rate (left y-axis) - PR only
|
| 387 |
+
acceptance_rates = agent_data['acceptance_rates']
|
| 388 |
+
# Filter out None values for plotting
|
| 389 |
+
x_acceptance = [month for month, rate in zip(months, acceptance_rates) if rate is not None]
|
| 390 |
+
y_acceptance = [rate for rate in acceptance_rates if rate is not None]
|
| 391 |
+
|
| 392 |
+
if x_acceptance and y_acceptance: # Only add trace if there's data
|
| 393 |
+
fig.add_trace(
|
| 394 |
+
go.Scatter(
|
| 395 |
+
x=x_acceptance,
|
| 396 |
+
y=y_acceptance,
|
| 397 |
+
name=agent_name,
|
| 398 |
+
mode='lines+markers',
|
| 399 |
+
line=dict(color=color, width=2),
|
| 400 |
+
marker=dict(size=8),
|
| 401 |
+
legendgroup=agent_name,
|
| 402 |
+
showlegend=(top_n is not None and top_n <= 10), # Show legend for top N assistants
|
| 403 |
+
hovertemplate='<b>Assistant: %{fullData.name}</b><br>' +
|
| 404 |
+
'Month: %{x}<br>' +
|
| 405 |
+
'Acceptance Rate: %{y:.2f}%<br>' +
|
| 406 |
+
'<extra></extra>'
|
| 407 |
+
),
|
| 408 |
+
secondary_y=False
|
| 409 |
+
)
|
| 410 |
|
| 411 |
+
# Add bar trace for total count (right y-axis for PRs, single axis for commits)
|
| 412 |
+
# Only show bars for months where assistant has data
|
| 413 |
x_bars = []
|
| 414 |
y_bars = []
|
| 415 |
+
for month, count in zip(months, agent_data[total_field]):
|
| 416 |
+
if count > 0: # Only include months with data
|
| 417 |
x_bars.append(month)
|
| 418 |
y_bars.append(count)
|
| 419 |
|
| 420 |
if x_bars and y_bars: # Only add trace if there's data
|
| 421 |
+
trace_args = {
|
| 422 |
+
'x': x_bars,
|
| 423 |
+
'y': y_bars,
|
| 424 |
+
'name': agent_name,
|
| 425 |
+
'marker': dict(color=color, opacity=0.7 if type == "commit" else 0.6),
|
| 426 |
+
'legendgroup': agent_name,
|
| 427 |
+
'showlegend': False if has_rate else (top_n is not None and top_n <= 10),
|
| 428 |
+
'hovertemplate': f'<b>Assistant: %{{fullData.name}}</b><br>' +
|
| 429 |
+
f'Month: %{{x}}<br>' +
|
| 430 |
+
f'{total_label}: %{{y}}<br>' +
|
| 431 |
+
'<extra></extra>',
|
| 432 |
+
'offsetgroup': agent_name
|
| 433 |
+
}
|
| 434 |
+
|
| 435 |
+
if has_rate:
|
| 436 |
+
fig.add_trace(go.Bar(**trace_args), secondary_y=True)
|
| 437 |
+
else:
|
| 438 |
+
fig.add_trace(go.Bar(**trace_args))
|
| 439 |
|
| 440 |
# Update axes labels
|
| 441 |
fig.update_xaxes(title_text=None)
|
| 442 |
+
|
| 443 |
+
if has_rate:
|
| 444 |
+
# For PRs: dual y-axes
|
| 445 |
+
fig.update_yaxes(
|
| 446 |
+
title_text="<b>Acceptance Rate (%)</b>",
|
| 447 |
+
range=[0, 100],
|
| 448 |
+
secondary_y=False,
|
| 449 |
+
showticklabels=True,
|
| 450 |
+
tickmode='linear',
|
| 451 |
+
dtick=10,
|
| 452 |
+
showgrid=True
|
| 453 |
+
)
|
| 454 |
+
fig.update_yaxes(title_text=f"<b>{total_label}</b>", secondary_y=True)
|
| 455 |
+
else:
|
| 456 |
+
# For commits: single y-axis
|
| 457 |
+
fig.update_yaxes(title_text=f"<b>{total_label}</b>")
|
| 458 |
|
| 459 |
# Update layout
|
| 460 |
show_legend = (top_n is not None and top_n <= 10)
|
|
|
|
| 511 |
data.get('name', 'Unknown'),
|
| 512 |
data.get('website', 'N/A'),
|
| 513 |
total_prs,
|
| 514 |
+
data.get('total_commits', 0),
|
| 515 |
data.get('merged_prs', 0),
|
| 516 |
data.get('acceptance_rate', 0.0),
|
| 517 |
])
|
|
|
|
| 524 |
df = pd.DataFrame(rows, columns=column_names)
|
| 525 |
|
| 526 |
# Ensure numeric types
|
| 527 |
+
numeric_cols = ["Total PRs", "Total Commits", "Merged PRs", "Acceptance Rate (%)"]
|
| 528 |
for col in numeric_cols:
|
| 529 |
if col in df.columns:
|
| 530 |
df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0)
|
|
|
|
| 641 |
print(f"{'='*80}\n")
|
| 642 |
|
| 643 |
# Create Gradio interface
|
| 644 |
+
with gr.Blocks(title="SWE Assistant PR & Commit Leaderboard", theme=gr.themes.Soft()) as app:
|
| 645 |
+
gr.Markdown("# SWE Assistant PR & Commit Leaderboard")
|
| 646 |
+
gr.Markdown(f"Track and compare GitHub pull request and commit statistics for SWE assistants")
|
| 647 |
|
| 648 |
with gr.Tabs():
|
| 649 |
|
| 650 |
# Leaderboard Tab
|
| 651 |
with gr.Tab("Leaderboard"):
|
| 652 |
+
gr.Markdown("*Statistics are based on assistant PR and commit activity tracked by the system*")
|
| 653 |
leaderboard_table = Leaderboard(
|
| 654 |
value=pd.DataFrame(columns=[col[0] for col in LEADERBOARD_COLUMNS]), # Empty initially
|
| 655 |
datatype=LEADERBOARD_COLUMNS,
|
|
|
|
| 673 |
outputs=[leaderboard_table]
|
| 674 |
)
|
| 675 |
|
| 676 |
+
# PR Monthly Metrics Section
|
| 677 |
gr.Markdown("---") # Divider
|
| 678 |
with gr.Group():
|
| 679 |
+
gr.Markdown("### PR Monthly Performance - Top 5 Assistants")
|
| 680 |
gr.Markdown("*Shows acceptance rate trends and PR volumes for the most active assistants*")
|
| 681 |
+
pr_monthly_metrics_plot = gr.Plot(label="PR Monthly Metrics")
|
| 682 |
+
|
| 683 |
+
# Load PR monthly metrics when app starts
|
| 684 |
+
app.load(
|
| 685 |
+
fn=lambda: create_monthly_metrics_plot(type="pr"),
|
| 686 |
+
inputs=[],
|
| 687 |
+
outputs=[pr_monthly_metrics_plot]
|
| 688 |
+
)
|
| 689 |
+
|
| 690 |
+
# Commit Monthly Metrics Section
|
| 691 |
+
gr.Markdown("---") # Divider
|
| 692 |
+
with gr.Group():
|
| 693 |
+
gr.Markdown("### Commit Monthly Performance - Top 5 Assistants")
|
| 694 |
+
gr.Markdown("*Shows commit volumes for the most active assistants*")
|
| 695 |
+
commit_monthly_metrics_plot = gr.Plot(label="Commit Monthly Metrics")
|
| 696 |
|
| 697 |
+
# Load commit monthly metrics when app starts
|
| 698 |
app.load(
|
| 699 |
+
fn=lambda: create_monthly_metrics_plot(type="commit"),
|
| 700 |
inputs=[],
|
| 701 |
+
outputs=[commit_monthly_metrics_plot]
|
| 702 |
)
|
| 703 |
|
| 704 |
|
msr.py
CHANGED
|
@@ -344,39 +344,54 @@ def generate_file_path_patterns(start_date, end_date, data_dir=GHARCHIVE_DATA_LO
|
|
| 344 |
# STREAMING BATCH PROCESSING
|
| 345 |
# =============================================================================
|
| 346 |
|
| 347 |
-
def
|
| 348 |
"""
|
| 349 |
-
|
| 350 |
-
|
| 351 |
Processes GHArchive files in BATCH_SIZE_DAYS chunks to limit memory usage.
|
| 352 |
Instead of loading 180 days (4,344 files) at once, processes 7 days at a time.
|
| 353 |
-
|
| 354 |
This prevents OOM errors by:
|
| 355 |
1. Only keeping ~168 hourly files in memory per batch (vs 4,344)
|
| 356 |
2. Incrementally building the results dictionary
|
| 357 |
3. Allowing DuckDB to garbage collect after each batch
|
| 358 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 359 |
Args:
|
| 360 |
conn: DuckDB connection instance
|
| 361 |
identifiers: List of GitHub usernames/bot identifiers (~1500)
|
| 362 |
start_date: Start datetime (timezone-aware)
|
| 363 |
end_date: End datetime (timezone-aware)
|
| 364 |
-
|
| 365 |
Returns:
|
| 366 |
-
Dictionary
|
|
|
|
|
|
|
| 367 |
"""
|
| 368 |
identifier_list = ', '.join([f"'{id}'" for id in identifiers])
|
| 369 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 370 |
|
| 371 |
# Calculate total batches
|
| 372 |
total_days = (end_date - start_date).days
|
| 373 |
total_batches = (total_days // BATCH_SIZE_DAYS) + 1
|
| 374 |
-
|
| 375 |
# Process in configurable batches
|
| 376 |
current_date = start_date
|
| 377 |
batch_num = 0
|
|
|
|
| 378 |
total_prs = 0
|
| 379 |
-
|
| 380 |
print(f" Streaming {total_batches} batches of {BATCH_SIZE_DAYS}-day intervals...")
|
| 381 |
|
| 382 |
while current_date <= end_date:
|
|
@@ -396,23 +411,27 @@ def fetch_all_pr_metadata_streaming(conn, identifiers, start_date, end_date):
|
|
| 396 |
|
| 397 |
# Build file patterns SQL for THIS BATCH
|
| 398 |
file_patterns_sql = '[' + ', '.join([f"'{fp}'" for fp in file_patterns]) + ']'
|
| 399 |
-
|
| 400 |
-
#
|
| 401 |
-
#
|
| 402 |
-
|
| 403 |
-
|
| 404 |
-
|
| 405 |
-
|
|
|
|
|
|
|
|
|
|
| 406 |
CONCAT(
|
| 407 |
REPLACE(repo.url, 'api.github.com/repos/', 'github.com/'),
|
| 408 |
'/pull/',
|
| 409 |
CAST(payload.pull_request.number AS VARCHAR)
|
| 410 |
-
) as
|
| 411 |
TRY_CAST(json_extract_string(to_json(payload), '$.action') AS VARCHAR) as action,
|
| 412 |
TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.user.login') AS VARCHAR) as pr_author,
|
| 413 |
-
TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.created_at') AS VARCHAR) as
|
| 414 |
-
TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.merged_at') AS VARCHAR) as
|
| 415 |
-
TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.closed_at') AS VARCHAR) as
|
|
|
|
| 416 |
FROM read_json(
|
| 417 |
{file_patterns_sql},
|
| 418 |
union_by_name=true,
|
|
@@ -422,38 +441,83 @@ def fetch_all_pr_metadata_streaming(conn, identifiers, start_date, end_date):
|
|
| 422 |
ignore_errors=true,
|
| 423 |
maximum_object_size=2147483648
|
| 424 |
)
|
| 425 |
-
WHERE
|
| 426 |
-
|
| 427 |
-
|
| 428 |
-
|
| 429 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 430 |
"""
|
| 431 |
|
| 432 |
try:
|
| 433 |
-
|
| 434 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 435 |
# Group events by PR URL to merge opened and closed events
|
| 436 |
pr_events = defaultdict(lambda: {'opened': None, 'closed': None})
|
| 437 |
|
| 438 |
-
for row in
|
| 439 |
-
|
| 440 |
-
action = row[
|
| 441 |
-
pr_author = row[
|
| 442 |
-
|
| 443 |
-
|
| 444 |
-
|
| 445 |
|
| 446 |
-
if not
|
| 447 |
continue
|
| 448 |
|
| 449 |
event_data = {
|
| 450 |
'pr_author': pr_author,
|
| 451 |
-
'created_at':
|
| 452 |
-
'merged_at':
|
| 453 |
-
'closed_at':
|
| 454 |
}
|
| 455 |
|
| 456 |
-
pr_events[
|
| 457 |
|
| 458 |
# Only include PRs that have an 'opened' event
|
| 459 |
# Use closed event data (if available) to get merged_at and closed_at
|
|
@@ -480,11 +544,11 @@ def fetch_all_pr_metadata_streaming(conn, identifiers, start_date, end_date):
|
|
| 480 |
'closed_at': closed_event['closed_at'] if closed_event else None,
|
| 481 |
}
|
| 482 |
|
| 483 |
-
|
| 484 |
batch_prs += 1
|
| 485 |
total_prs += 1
|
| 486 |
|
| 487 |
-
print(f"✓ {batch_prs} PRs found")
|
| 488 |
|
| 489 |
except Exception as e:
|
| 490 |
print(f"\n ✗ Batch {batch_num} error: {str(e)}")
|
|
@@ -492,12 +556,17 @@ def fetch_all_pr_metadata_streaming(conn, identifiers, start_date, end_date):
|
|
| 492 |
|
| 493 |
# Move to next batch
|
| 494 |
current_date = batch_end + timedelta(days=1)
|
| 495 |
-
|
| 496 |
# Final summary
|
| 497 |
-
|
| 498 |
-
|
| 499 |
-
|
| 500 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 501 |
|
| 502 |
|
| 503 |
def sync_agents_repo():
|
|
@@ -609,6 +678,15 @@ def load_agents_from_hf():
|
|
| 609 |
return assistants
|
| 610 |
|
| 611 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 612 |
def calculate_pr_stats_from_metadata(metadata_list):
|
| 613 |
"""Calculate statistics from a list of PR metadata."""
|
| 614 |
total_prs = len(metadata_list)
|
|
@@ -626,8 +704,62 @@ def calculate_pr_stats_from_metadata(metadata_list):
|
|
| 626 |
}
|
| 627 |
|
| 628 |
|
| 629 |
-
def
|
| 630 |
-
"""Calculate monthly metrics for all assistants for visualization."""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 631 |
identifier_to_name = {assistant.get('github_identifier'): assistant.get('name') for assistant in assistants if assistant.get('github_identifier')}
|
| 632 |
|
| 633 |
if not all_metadata_dict:
|
|
@@ -696,8 +828,17 @@ def calculate_monthly_metrics_by_agent(all_metadata_dict, assistants):
|
|
| 696 |
}
|
| 697 |
|
| 698 |
|
| 699 |
-
def construct_leaderboard_from_metadata(
|
| 700 |
-
"""Construct leaderboard from in-memory PR metadata.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 701 |
if not assistants:
|
| 702 |
print("Error: No assistants found")
|
| 703 |
return {}
|
|
@@ -708,21 +849,27 @@ def construct_leaderboard_from_metadata(all_metadata_dict, assistants):
|
|
| 708 |
identifier = assistant.get('github_identifier')
|
| 709 |
agent_name = assistant.get('name', 'Unknown')
|
| 710 |
|
| 711 |
-
|
| 712 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 713 |
|
| 714 |
cache_dict[identifier] = {
|
| 715 |
'name': agent_name,
|
| 716 |
'website': assistant.get('website', 'N/A'),
|
| 717 |
'github_identifier': identifier,
|
| 718 |
-
**
|
|
|
|
| 719 |
}
|
| 720 |
|
| 721 |
return cache_dict
|
| 722 |
|
| 723 |
|
| 724 |
-
def save_leaderboard_data_to_hf(leaderboard_dict,
|
| 725 |
-
"""Save leaderboard data and monthly metrics to HuggingFace dataset."""
|
| 726 |
try:
|
| 727 |
token = get_hf_token()
|
| 728 |
if not token:
|
|
@@ -731,12 +878,13 @@ def save_leaderboard_data_to_hf(leaderboard_dict, monthly_metrics):
|
|
| 731 |
api = HfApi(token=token)
|
| 732 |
|
| 733 |
combined_data = {
|
| 734 |
-
'last_updated': datetime.now(timezone.utc).isoformat(),
|
| 735 |
-
'leaderboard': leaderboard_dict,
|
| 736 |
-
'monthly_metrics': monthly_metrics,
|
| 737 |
'metadata': {
|
|
|
|
| 738 |
'leaderboard_time_frame_days': LEADERBOARD_TIME_FRAME_DAYS
|
| 739 |
-
}
|
|
|
|
|
|
|
|
|
|
| 740 |
}
|
| 741 |
|
| 742 |
with open(LEADERBOARD_FILENAME, 'w') as f:
|
|
@@ -767,8 +915,8 @@ def save_leaderboard_data_to_hf(leaderboard_dict, monthly_metrics):
|
|
| 767 |
|
| 768 |
def mine_all_agents():
|
| 769 |
"""
|
| 770 |
-
Mine PR metadata for all assistants using STREAMING batch processing.
|
| 771 |
-
Downloads GHArchive data, then uses BATCH-based DuckDB queries.
|
| 772 |
"""
|
| 773 |
print(f"\n[1/4] Downloading GHArchive data...")
|
| 774 |
|
|
@@ -787,7 +935,7 @@ def mine_all_agents():
|
|
| 787 |
print("Error: No valid assistant identifiers found")
|
| 788 |
return
|
| 789 |
|
| 790 |
-
print(f"\n[3/4] Mining PR metadata ({len(identifiers)} assistants, {LEADERBOARD_TIME_FRAME_DAYS} days)...")
|
| 791 |
|
| 792 |
try:
|
| 793 |
conn = get_duckdb_connection()
|
|
@@ -800,11 +948,15 @@ def mine_all_agents():
|
|
| 800 |
start_date = end_date - timedelta(days=LEADERBOARD_TIME_FRAME_DAYS)
|
| 801 |
|
| 802 |
try:
|
| 803 |
-
# USE STREAMING FUNCTION
|
| 804 |
-
|
| 805 |
conn, identifiers, start_date, end_date
|
| 806 |
)
|
| 807 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 808 |
except Exception as e:
|
| 809 |
print(f"Error during DuckDB fetch: {str(e)}")
|
| 810 |
traceback.print_exc()
|
|
@@ -815,9 +967,23 @@ def mine_all_agents():
|
|
| 815 |
print(f"\n[4/4] Saving leaderboard...")
|
| 816 |
|
| 817 |
try:
|
| 818 |
-
|
| 819 |
-
|
| 820 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 821 |
|
| 822 |
except Exception as e:
|
| 823 |
print(f"Error saving leaderboard: {str(e)}")
|
|
|
|
| 344 |
# STREAMING BATCH PROCESSING
|
| 345 |
# =============================================================================
|
| 346 |
|
| 347 |
+
def fetch_all_metadata_streaming(conn, identifiers, start_date, end_date):
|
| 348 |
"""
|
| 349 |
+
UNIFIED QUERY: Fetch both commit and PR metadata using streaming batch processing.
|
| 350 |
+
|
| 351 |
Processes GHArchive files in BATCH_SIZE_DAYS chunks to limit memory usage.
|
| 352 |
Instead of loading 180 days (4,344 files) at once, processes 7 days at a time.
|
| 353 |
+
|
| 354 |
This prevents OOM errors by:
|
| 355 |
1. Only keeping ~168 hourly files in memory per batch (vs 4,344)
|
| 356 |
2. Incrementally building the results dictionary
|
| 357 |
3. Allowing DuckDB to garbage collect after each batch
|
| 358 |
+
|
| 359 |
+
Fetches both:
|
| 360 |
+
- PushEvent (for commit tracking)
|
| 361 |
+
- PullRequestEvent (for PR tracking)
|
| 362 |
+
|
| 363 |
+
Then post-processes in Python to separate commits and PRs.
|
| 364 |
+
|
| 365 |
Args:
|
| 366 |
conn: DuckDB connection instance
|
| 367 |
identifiers: List of GitHub usernames/bot identifiers (~1500)
|
| 368 |
start_date: Start datetime (timezone-aware)
|
| 369 |
end_date: End datetime (timezone-aware)
|
| 370 |
+
|
| 371 |
Returns:
|
| 372 |
+
Dictionary with two keys:
|
| 373 |
+
- 'commits': {author: [commit_metadata]} for commit tracking
|
| 374 |
+
- 'prs': {author: [pr_metadata]} for PR tracking
|
| 375 |
"""
|
| 376 |
identifier_list = ', '.join([f"'{id}'" for id in identifiers])
|
| 377 |
+
identifier_set = set(identifiers)
|
| 378 |
+
|
| 379 |
+
# Storage for commits
|
| 380 |
+
commits_by_agent = defaultdict(list)
|
| 381 |
+
|
| 382 |
+
# Storage for PRs
|
| 383 |
+
prs_by_agent = defaultdict(list)
|
| 384 |
|
| 385 |
# Calculate total batches
|
| 386 |
total_days = (end_date - start_date).days
|
| 387 |
total_batches = (total_days // BATCH_SIZE_DAYS) + 1
|
| 388 |
+
|
| 389 |
# Process in configurable batches
|
| 390 |
current_date = start_date
|
| 391 |
batch_num = 0
|
| 392 |
+
total_commits = 0
|
| 393 |
total_prs = 0
|
| 394 |
+
|
| 395 |
print(f" Streaming {total_batches} batches of {BATCH_SIZE_DAYS}-day intervals...")
|
| 396 |
|
| 397 |
while current_date <= end_date:
|
|
|
|
| 411 |
|
| 412 |
# Build file patterns SQL for THIS BATCH
|
| 413 |
file_patterns_sql = '[' + ', '.join([f"'{fp}'" for fp in file_patterns]) + ']'
|
| 414 |
+
|
| 415 |
+
# UNIFIED QUERY: Fetch both commits (PushEvent) and PRs (PullRequestEvent)
|
| 416 |
+
# Post-process in Python to separate them
|
| 417 |
+
unified_query = f"""
|
| 418 |
+
SELECT
|
| 419 |
+
type,
|
| 420 |
+
-- Commit fields (from PushEvent)
|
| 421 |
+
TRY_CAST(json_extract_string(to_json(actor), '$.login') AS VARCHAR) as author,
|
| 422 |
+
TRY_CAST(json_extract_string(to_json(payload), '$.head') AS VARCHAR) as commit_sha,
|
| 423 |
+
-- PR fields (from PullRequestEvent)
|
| 424 |
CONCAT(
|
| 425 |
REPLACE(repo.url, 'api.github.com/repos/', 'github.com/'),
|
| 426 |
'/pull/',
|
| 427 |
CAST(payload.pull_request.number AS VARCHAR)
|
| 428 |
+
) as pr_url,
|
| 429 |
TRY_CAST(json_extract_string(to_json(payload), '$.action') AS VARCHAR) as action,
|
| 430 |
TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.user.login') AS VARCHAR) as pr_author,
|
| 431 |
+
TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.created_at') AS VARCHAR) as pr_created_at,
|
| 432 |
+
TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.merged_at') AS VARCHAR) as pr_merged_at,
|
| 433 |
+
TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.closed_at') AS VARCHAR) as pr_closed_at,
|
| 434 |
+
created_at
|
| 435 |
FROM read_json(
|
| 436 |
{file_patterns_sql},
|
| 437 |
union_by_name=true,
|
|
|
|
| 441 |
ignore_errors=true,
|
| 442 |
maximum_object_size=2147483648
|
| 443 |
)
|
| 444 |
+
WHERE
|
| 445 |
+
-- PushEvent: Commits by assistants
|
| 446 |
+
(type = 'PushEvent'
|
| 447 |
+
AND TRY_CAST(json_extract_string(to_json(payload), '$.head') AS VARCHAR) IS NOT NULL
|
| 448 |
+
AND TRY_CAST(json_extract_string(to_json(actor), '$.login') AS VARCHAR) IN ({identifier_list})
|
| 449 |
+
)
|
| 450 |
+
OR
|
| 451 |
+
-- PullRequestEvent: PRs by assistants
|
| 452 |
+
(type = 'PullRequestEvent'
|
| 453 |
+
AND payload.pull_request.number IS NOT NULL
|
| 454 |
+
AND TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.created_at') AS VARCHAR) IS NOT NULL
|
| 455 |
+
AND TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.user.login') AS VARCHAR) IN ({identifier_list})
|
| 456 |
+
AND TRY_CAST(json_extract_string(to_json(payload), '$.action') AS VARCHAR) IN ('opened', 'closed')
|
| 457 |
+
)
|
| 458 |
"""
|
| 459 |
|
| 460 |
try:
|
| 461 |
+
all_results = conn.execute(unified_query).fetchall()
|
| 462 |
|
| 463 |
+
# Post-process results to separate commits and PRs
|
| 464 |
+
# Row structure: [type, author, commit_sha, pr_url, action, pr_author,
|
| 465 |
+
# pr_created_at, pr_merged_at, pr_closed_at, created_at]
|
| 466 |
+
|
| 467 |
+
commit_events = []
|
| 468 |
+
pr_events_list = []
|
| 469 |
+
|
| 470 |
+
for row in all_results:
|
| 471 |
+
event_type = row[0]
|
| 472 |
+
|
| 473 |
+
if event_type == 'PushEvent':
|
| 474 |
+
commit_events.append(row)
|
| 475 |
+
elif event_type == 'PullRequestEvent':
|
| 476 |
+
pr_events_list.append(row)
|
| 477 |
+
|
| 478 |
+
# Process commits
|
| 479 |
+
batch_commits = 0
|
| 480 |
+
for row in commit_events:
|
| 481 |
+
author = row[1]
|
| 482 |
+
sha = row[2]
|
| 483 |
+
created_at = normalize_date_format(row[9]) if row[9] else None
|
| 484 |
+
|
| 485 |
+
if not author or not sha:
|
| 486 |
+
continue
|
| 487 |
+
|
| 488 |
+
# Build commit metadata
|
| 489 |
+
commit_metadata = {
|
| 490 |
+
'sha': sha,
|
| 491 |
+
'created_at': created_at,
|
| 492 |
+
}
|
| 493 |
+
|
| 494 |
+
commits_by_agent[author].append(commit_metadata)
|
| 495 |
+
batch_commits += 1
|
| 496 |
+
total_commits += 1
|
| 497 |
+
|
| 498 |
+
# Process PRs
|
| 499 |
# Group events by PR URL to merge opened and closed events
|
| 500 |
pr_events = defaultdict(lambda: {'opened': None, 'closed': None})
|
| 501 |
|
| 502 |
+
for row in pr_events_list:
|
| 503 |
+
pr_url = row[3]
|
| 504 |
+
action = row[4]
|
| 505 |
+
pr_author = row[5]
|
| 506 |
+
pr_created_at = normalize_date_format(row[6]) if row[6] else None
|
| 507 |
+
pr_merged_at = normalize_date_format(row[7]) if row[7] else None
|
| 508 |
+
pr_closed_at = normalize_date_format(row[8]) if row[8] else None
|
| 509 |
|
| 510 |
+
if not pr_url or not action:
|
| 511 |
continue
|
| 512 |
|
| 513 |
event_data = {
|
| 514 |
'pr_author': pr_author,
|
| 515 |
+
'created_at': pr_created_at,
|
| 516 |
+
'merged_at': pr_merged_at,
|
| 517 |
+
'closed_at': pr_closed_at,
|
| 518 |
}
|
| 519 |
|
| 520 |
+
pr_events[pr_url][action] = event_data
|
| 521 |
|
| 522 |
# Only include PRs that have an 'opened' event
|
| 523 |
# Use closed event data (if available) to get merged_at and closed_at
|
|
|
|
| 544 |
'closed_at': closed_event['closed_at'] if closed_event else None,
|
| 545 |
}
|
| 546 |
|
| 547 |
+
prs_by_agent[pr_author].append(pr_metadata)
|
| 548 |
batch_prs += 1
|
| 549 |
total_prs += 1
|
| 550 |
|
| 551 |
+
print(f"✓ {batch_commits} commits, {batch_prs} PRs found")
|
| 552 |
|
| 553 |
except Exception as e:
|
| 554 |
print(f"\n ✗ Batch {batch_num} error: {str(e)}")
|
|
|
|
| 556 |
|
| 557 |
# Move to next batch
|
| 558 |
current_date = batch_end + timedelta(days=1)
|
| 559 |
+
|
| 560 |
# Final summary
|
| 561 |
+
agents_with_commits = sum(1 for commits in commits_by_agent.values() if commits)
|
| 562 |
+
agents_with_prs = sum(1 for prs in prs_by_agent.values() if prs)
|
| 563 |
+
print(f"\n ✓ Complete: {total_commits} commits for {agents_with_commits}/{len(identifiers)} assistants")
|
| 564 |
+
print(f" ✓ Complete: {total_prs} PRs for {agents_with_prs}/{len(identifiers)} assistants")
|
| 565 |
+
|
| 566 |
+
return {
|
| 567 |
+
'commits': dict(commits_by_agent),
|
| 568 |
+
'prs': dict(prs_by_agent)
|
| 569 |
+
}
|
| 570 |
|
| 571 |
|
| 572 |
def sync_agents_repo():
|
|
|
|
| 678 |
return assistants
|
| 679 |
|
| 680 |
|
| 681 |
+
def calculate_commit_stats_from_metadata(metadata_list):
|
| 682 |
+
"""Calculate statistics from a list of commit metadata."""
|
| 683 |
+
total_commits = len(metadata_list)
|
| 684 |
+
|
| 685 |
+
return {
|
| 686 |
+
'total_commits': total_commits,
|
| 687 |
+
}
|
| 688 |
+
|
| 689 |
+
|
| 690 |
def calculate_pr_stats_from_metadata(metadata_list):
|
| 691 |
"""Calculate statistics from a list of PR metadata."""
|
| 692 |
total_prs = len(metadata_list)
|
|
|
|
| 704 |
}
|
| 705 |
|
| 706 |
|
| 707 |
+
def calculate_monthly_metrics_by_agent_commits(all_metadata_dict, assistants):
|
| 708 |
+
"""Calculate monthly metrics for commits for all assistants for visualization."""
|
| 709 |
+
identifier_to_name = {assistant.get('github_identifier'): assistant.get('name') for assistant in assistants if assistant.get('github_identifier')}
|
| 710 |
+
|
| 711 |
+
if not all_metadata_dict:
|
| 712 |
+
return {'assistants': [], 'months': [], 'data': {}}
|
| 713 |
+
|
| 714 |
+
agent_month_data = defaultdict(lambda: defaultdict(list))
|
| 715 |
+
|
| 716 |
+
for agent_identifier, metadata_list in all_metadata_dict.items():
|
| 717 |
+
for commit_meta in metadata_list:
|
| 718 |
+
created_at = commit_meta.get('created_at')
|
| 719 |
+
|
| 720 |
+
if not created_at:
|
| 721 |
+
continue
|
| 722 |
+
|
| 723 |
+
agent_name = identifier_to_name.get(agent_identifier, agent_identifier)
|
| 724 |
+
|
| 725 |
+
try:
|
| 726 |
+
dt = datetime.fromisoformat(created_at.replace('Z', '+00:00'))
|
| 727 |
+
month_key = f"{dt.year}-{dt.month:02d}"
|
| 728 |
+
agent_month_data[agent_name][month_key].append(commit_meta)
|
| 729 |
+
except Exception as e:
|
| 730 |
+
print(f"Warning: Could not parse date '{created_at}': {e}")
|
| 731 |
+
continue
|
| 732 |
+
|
| 733 |
+
all_months = set()
|
| 734 |
+
for agent_data in agent_month_data.values():
|
| 735 |
+
all_months.update(agent_data.keys())
|
| 736 |
+
months = sorted(list(all_months))
|
| 737 |
+
|
| 738 |
+
result_data = {}
|
| 739 |
+
for agent_name, month_dict in agent_month_data.items():
|
| 740 |
+
total_commits_list = []
|
| 741 |
+
|
| 742 |
+
for month in months:
|
| 743 |
+
commits_in_month = month_dict.get(month, [])
|
| 744 |
+
total_count = len(commits_in_month)
|
| 745 |
+
|
| 746 |
+
total_commits_list.append(total_count)
|
| 747 |
+
|
| 748 |
+
result_data[agent_name] = {
|
| 749 |
+
'total_commits': total_commits_list,
|
| 750 |
+
}
|
| 751 |
+
|
| 752 |
+
agents_list = sorted(list(agent_month_data.keys()))
|
| 753 |
+
|
| 754 |
+
return {
|
| 755 |
+
'assistants': agents_list,
|
| 756 |
+
'months': months,
|
| 757 |
+
'data': result_data
|
| 758 |
+
}
|
| 759 |
+
|
| 760 |
+
|
| 761 |
+
def calculate_monthly_metrics_by_agent_prs(all_metadata_dict, assistants):
|
| 762 |
+
"""Calculate monthly metrics for PRs for all assistants for visualization."""
|
| 763 |
identifier_to_name = {assistant.get('github_identifier'): assistant.get('name') for assistant in assistants if assistant.get('github_identifier')}
|
| 764 |
|
| 765 |
if not all_metadata_dict:
|
|
|
|
| 828 |
}
|
| 829 |
|
| 830 |
|
| 831 |
+
def construct_leaderboard_from_metadata(commit_metadata_dict, pr_metadata_dict, assistants):
|
| 832 |
+
"""Construct leaderboard from in-memory commit and PR metadata.
|
| 833 |
+
|
| 834 |
+
Args:
|
| 835 |
+
commit_metadata_dict: Dictionary mapping assistant ID to list of commit metadata
|
| 836 |
+
pr_metadata_dict: Dictionary mapping assistant ID to list of PR metadata
|
| 837 |
+
assistants: List of assistant metadata
|
| 838 |
+
|
| 839 |
+
Returns:
|
| 840 |
+
Dictionary with leaderboard data including both commit and PR statistics
|
| 841 |
+
"""
|
| 842 |
if not assistants:
|
| 843 |
print("Error: No assistants found")
|
| 844 |
return {}
|
|
|
|
| 849 |
identifier = assistant.get('github_identifier')
|
| 850 |
agent_name = assistant.get('name', 'Unknown')
|
| 851 |
|
| 852 |
+
# Get commit and PR metadata
|
| 853 |
+
commit_metadata = commit_metadata_dict.get(identifier, [])
|
| 854 |
+
pr_metadata = pr_metadata_dict.get(identifier, [])
|
| 855 |
+
|
| 856 |
+
# Calculate statistics
|
| 857 |
+
commit_stats = calculate_commit_stats_from_metadata(commit_metadata)
|
| 858 |
+
pr_stats = calculate_pr_stats_from_metadata(pr_metadata)
|
| 859 |
|
| 860 |
cache_dict[identifier] = {
|
| 861 |
'name': agent_name,
|
| 862 |
'website': assistant.get('website', 'N/A'),
|
| 863 |
'github_identifier': identifier,
|
| 864 |
+
**commit_stats,
|
| 865 |
+
**pr_stats
|
| 866 |
}
|
| 867 |
|
| 868 |
return cache_dict
|
| 869 |
|
| 870 |
|
| 871 |
+
def save_leaderboard_data_to_hf(leaderboard_dict, commit_monthly_metrics, pr_monthly_metrics):
|
| 872 |
+
"""Save leaderboard data, commit monthly metrics, and PR monthly metrics to HuggingFace dataset."""
|
| 873 |
try:
|
| 874 |
token = get_hf_token()
|
| 875 |
if not token:
|
|
|
|
| 878 |
api = HfApi(token=token)
|
| 879 |
|
| 880 |
combined_data = {
|
|
|
|
|
|
|
|
|
|
| 881 |
'metadata': {
|
| 882 |
+
'last_updated': datetime.now(timezone.utc).isoformat(),
|
| 883 |
'leaderboard_time_frame_days': LEADERBOARD_TIME_FRAME_DAYS
|
| 884 |
+
},
|
| 885 |
+
'leaderboard': leaderboard_dict,
|
| 886 |
+
'commit_monthly_metrics': commit_monthly_metrics,
|
| 887 |
+
'pr_monthly_metrics': pr_monthly_metrics
|
| 888 |
}
|
| 889 |
|
| 890 |
with open(LEADERBOARD_FILENAME, 'w') as f:
|
|
|
|
| 915 |
|
| 916 |
def mine_all_agents():
|
| 917 |
"""
|
| 918 |
+
Mine commit and PR metadata for all assistants using STREAMING batch processing.
|
| 919 |
+
Downloads GHArchive data, then uses UNIFIED BATCH-based DuckDB queries.
|
| 920 |
"""
|
| 921 |
print(f"\n[1/4] Downloading GHArchive data...")
|
| 922 |
|
|
|
|
| 935 |
print("Error: No valid assistant identifiers found")
|
| 936 |
return
|
| 937 |
|
| 938 |
+
print(f"\n[3/4] Mining commit and PR metadata ({len(identifiers)} assistants, {LEADERBOARD_TIME_FRAME_DAYS} days)...")
|
| 939 |
|
| 940 |
try:
|
| 941 |
conn = get_duckdb_connection()
|
|
|
|
| 948 |
start_date = end_date - timedelta(days=LEADERBOARD_TIME_FRAME_DAYS)
|
| 949 |
|
| 950 |
try:
|
| 951 |
+
# USE UNIFIED STREAMING FUNCTION for both commits and PRs
|
| 952 |
+
results = fetch_all_metadata_streaming(
|
| 953 |
conn, identifiers, start_date, end_date
|
| 954 |
)
|
| 955 |
|
| 956 |
+
# Separate commits and PRs
|
| 957 |
+
commit_metadata = results['commits']
|
| 958 |
+
pr_metadata = results['prs']
|
| 959 |
+
|
| 960 |
except Exception as e:
|
| 961 |
print(f"Error during DuckDB fetch: {str(e)}")
|
| 962 |
traceback.print_exc()
|
|
|
|
| 967 |
print(f"\n[4/4] Saving leaderboard...")
|
| 968 |
|
| 969 |
try:
|
| 970 |
+
# Construct leaderboard with both commit and PR data
|
| 971 |
+
leaderboard_dict = construct_leaderboard_from_metadata(
|
| 972 |
+
commit_metadata, pr_metadata, assistants
|
| 973 |
+
)
|
| 974 |
+
|
| 975 |
+
# Calculate monthly metrics for both commits and PRs
|
| 976 |
+
commit_monthly_metrics = calculate_monthly_metrics_by_agent_commits(
|
| 977 |
+
commit_metadata, assistants
|
| 978 |
+
)
|
| 979 |
+
pr_monthly_metrics = calculate_monthly_metrics_by_agent_prs(
|
| 980 |
+
pr_metadata, assistants
|
| 981 |
+
)
|
| 982 |
+
|
| 983 |
+
# Save everything
|
| 984 |
+
save_leaderboard_data_to_hf(
|
| 985 |
+
leaderboard_dict, commit_monthly_metrics, pr_monthly_metrics
|
| 986 |
+
)
|
| 987 |
|
| 988 |
except Exception as e:
|
| 989 |
print(f"Error saving leaderboard: {str(e)}")
|