zhimin-z commited on
Commit
1b10ccd
·
1 Parent(s): be99280

merge commit with pr

Browse files
Files changed (3) hide show
  1. README.md +11 -7
  2. app.py +124 -79
  3. msr.py +233 -67
README.md CHANGED
@@ -11,17 +11,17 @@ pinned: false
11
  short_description: Track GitHub PR statistics for SWE assistants
12
  ---
13
 
14
- # SWE Assistant PR Leaderboard
15
 
16
- SWE-PR ranks software engineering assistants by their real-world GitHub pull request performance.
17
 
18
- No benchmarks. No sandboxes. Just real code that got merged.
19
 
20
  ## Why This Exists
21
 
22
- Most AI assistant benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the PR get merged? How many made it through? Is the assistant improving?
23
 
24
- If an assistant can consistently get pull requests accepted across different projects, that tells you something no benchmark can.
25
 
26
  ## What We Track
27
 
@@ -31,12 +31,14 @@ Key metrics from the last 180 days:
31
  - **Assistant**: Display name of the assistant
32
  - **Website**: Link to the assistant's homepage or documentation
33
  - **Total PRs**: Pull requests the assistant has opened
 
34
  - **Merged PRs**: PRs that got merged (not just closed)
35
  - **Acceptance Rate**: Percentage of concluded PRs that got merged
36
 
37
  **Monthly Trends**
38
- - Acceptance rate trends (line plots)
39
  - PR volume over time (bar charts)
 
40
 
41
  We focus on 180 days to highlight current capabilities and active assistants.
42
 
@@ -45,6 +47,7 @@ We focus on 180 days to highlight current capabilities and active assistants.
45
  **Data Collection**
46
  We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking:
47
  - PRs opened by the assistant (`PullRequestEvent`)
 
48
 
49
  **Regular Updates**
50
  Leaderboard refreshes weekly (Monday at 00:00 UTC).
@@ -69,7 +72,8 @@ Context matters: 100 PRs at 20% acceptance differs from 10 PRs at 80%. Consider
69
 
70
  Planned improvements:
71
  - Repository-based analysis
72
- - Extended metrics (review round-trips, conversation depth, files changed)
 
73
  - Merge time tracking
74
  - Contribution patterns (bugs, features, docs)
75
 
 
11
  short_description: Track GitHub PR statistics for SWE assistants
12
  ---
13
 
14
+ # SWE Assistant PR & Commit Leaderboard
15
 
16
+ SWE-PR ranks software engineering assistants by their real-world GitHub pull request and commit performance.
17
 
18
+ No benchmarks. No sandboxes. Just real code that got merged and commits that got pushed.
19
 
20
  ## Why This Exists
21
 
22
+ Most AI assistant benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the PR get merged? How many commits are being created? How active is the assistant across different projects? Is the assistant improving?
23
 
24
+ If an assistant can consistently get pull requests accepted and create commits across different projects, that tells you something no benchmark can.
25
 
26
  ## What We Track
27
 
 
31
  - **Assistant**: Display name of the assistant
32
  - **Website**: Link to the assistant's homepage or documentation
33
  - **Total PRs**: Pull requests the assistant has opened
34
+ - **Total Commits**: Commits created by the assistant
35
  - **Merged PRs**: PRs that got merged (not just closed)
36
  - **Acceptance Rate**: Percentage of concluded PRs that got merged
37
 
38
  **Monthly Trends**
39
+ - PR acceptance rate trends (line plots)
40
  - PR volume over time (bar charts)
41
+ - Commit volume over time (bar charts)
42
 
43
  We focus on 180 days to highlight current capabilities and active assistants.
44
 
 
47
  **Data Collection**
48
  We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking:
49
  - PRs opened by the assistant (`PullRequestEvent`)
50
+ - Commits created by the assistant (`PushEvent`)
51
 
52
  **Regular Updates**
53
  Leaderboard refreshes weekly (Monday at 00:00 UTC).
 
72
 
73
  Planned improvements:
74
  - Repository-based analysis
75
+ - Extended PR metrics (review round-trips, conversation depth, files changed)
76
+ - Extended commit metrics (commit frequency patterns, code churn)
77
  - Merge time tracking
78
  - Contribution patterns (bugs, features, docs)
79
 
app.py CHANGED
@@ -31,6 +31,7 @@ LEADERBOARD_COLUMNS = [
31
  ("Assistant", "string"),
32
  ("Website", "string"),
33
  ("Total PRs", "number"),
 
34
  ("Merged PRs", "number"),
35
  ("Acceptance Rate (%)", "number"),
36
  ]
@@ -269,25 +270,42 @@ def load_leaderboard_data_from_hf():
269
  # UI FUNCTIONS
270
  # =============================================================================
271
 
272
- def create_monthly_metrics_plot(top_n=5):
273
  """
274
- Create a Plotly figure with dual y-axes showing:
275
- - Left y-axis: Acceptance Rate (%) as line curves
276
- - Right y-axis: Total PRs created as bar charts
277
 
278
  Each assistant gets a unique color for both their line and bars.
279
 
280
  Args:
 
281
  top_n: Number of top assistants to show (default: 5)
282
  """
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
283
  # Load from saved dataset
284
  saved_data = load_leaderboard_data_from_hf()
285
 
286
- if not saved_data or 'monthly_metrics' not in saved_data:
287
  # Return an empty figure with a message
288
  fig = go.Figure()
289
  fig.add_annotation(
290
- text="No data available for visualization",
291
  xref="paper", yref="paper",
292
  x=0.5, y=0.5, showarrow=False,
293
  font=dict(size=16)
@@ -299,19 +317,19 @@ def create_monthly_metrics_plot(top_n=5):
299
  )
300
  return fig
301
 
302
- metrics = saved_data['monthly_metrics']
303
- print(f"Loaded monthly metrics from saved dataset")
304
 
305
  # Apply top_n filter if specified
306
  if top_n is not None and top_n > 0 and metrics.get('assistants'):
307
- # Calculate total PRs for each assistant
308
  agent_totals = []
309
  for agent_name in metrics['assistants']:
310
  agent_data = metrics['data'].get(agent_name, {})
311
- total_prs = sum(agent_data.get('total_prs', []))
312
- agent_totals.append((agent_name, total_prs))
313
 
314
- # Sort by total PRs and take top N
315
  agent_totals.sort(key=lambda x: x[1], reverse=True)
316
  top_agents = [agent_name for agent_name, _ in agent_totals[:top_n]]
317
 
@@ -338,8 +356,11 @@ def create_monthly_metrics_plot(top_n=5):
338
  )
339
  return fig
340
 
341
- # Create figure with secondary y-axis
342
- fig = make_subplots(specs=[[{"secondary_y": True}]])
 
 
 
343
 
344
  # Generate unique colors for many assistants using HSL color space
345
  def generate_color(index, total):
@@ -361,70 +382,79 @@ def create_monthly_metrics_plot(top_n=5):
361
  color = agent_colors[agent_name]
362
  agent_data = data[agent_name]
363
 
364
- # Add line trace for acceptance rate (left y-axis)
365
- acceptance_rates = agent_data['acceptance_rates']
366
- # Filter out None values for plotting
367
- x_acceptance = [month for month, rate in zip(months, acceptance_rates) if rate is not None]
368
- y_acceptance = [rate for rate in acceptance_rates if rate is not None]
369
-
370
- if x_acceptance and y_acceptance: # Only add trace if there's data
371
- fig.add_trace(
372
- go.Scatter(
373
- x=x_acceptance,
374
- y=y_acceptance,
375
- name=agent_name,
376
- mode='lines+markers',
377
- line=dict(color=color, width=2),
378
- marker=dict(size=8),
379
- legendgroup=agent_name,
380
- showlegend=(top_n is not None and top_n <= 10), # Show legend for top N assistants
381
- hovertemplate='<b>Assistant: %{fullData.name}</b><br>' +
382
- 'Month: %{x}<br>' +
383
- 'Acceptance Rate: %{y:.2f}%<br>' +
384
- '<extra></extra>'
385
- ),
386
- secondary_y=False
387
- )
 
388
 
389
- # Add bar trace for total PRs (right y-axis)
390
- # Only show bars for months where assistant has PRs
391
  x_bars = []
392
  y_bars = []
393
- for month, count in zip(months, agent_data['total_prs']):
394
- if count > 0: # Only include months with PRs
395
  x_bars.append(month)
396
  y_bars.append(count)
397
 
398
  if x_bars and y_bars: # Only add trace if there's data
399
- fig.add_trace(
400
- go.Bar(
401
- x=x_bars,
402
- y=y_bars,
403
- name=agent_name,
404
- marker=dict(color=color, opacity=0.6),
405
- legendgroup=agent_name,
406
- showlegend=False, # Hide duplicate legend entry (already shown in Scatter)
407
- hovertemplate='<b>Assistant: %{fullData.name}</b><br>' +
408
- 'Month: %{x}<br>' +
409
- 'Total PRs: %{y}<br>' +
410
- '<extra></extra>',
411
- offsetgroup=agent_name # Group bars by assistant for proper spacing
412
- ),
413
- secondary_y=True
414
- )
 
 
415
 
416
  # Update axes labels
417
  fig.update_xaxes(title_text=None)
418
- fig.update_yaxes(
419
- title_text="<b>Acceptance Rate (%)</b>",
420
- range=[0, 100],
421
- secondary_y=False,
422
- showticklabels=True,
423
- tickmode='linear',
424
- dtick=10,
425
- showgrid=True
426
- )
427
- fig.update_yaxes(title_text="<b>Total PRs</b>", secondary_y=True)
 
 
 
 
 
 
428
 
429
  # Update layout
430
  show_legend = (top_n is not None and top_n <= 10)
@@ -481,6 +511,7 @@ def get_leaderboard_dataframe():
481
  data.get('name', 'Unknown'),
482
  data.get('website', 'N/A'),
483
  total_prs,
 
484
  data.get('merged_prs', 0),
485
  data.get('acceptance_rate', 0.0),
486
  ])
@@ -493,7 +524,7 @@ def get_leaderboard_dataframe():
493
  df = pd.DataFrame(rows, columns=column_names)
494
 
495
  # Ensure numeric types
496
- numeric_cols = ["Total PRs", "Merged PRs", "Acceptance Rate (%)"]
497
  for col in numeric_cols:
498
  if col in df.columns:
499
  df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0)
@@ -610,15 +641,15 @@ print(f"On startup: Loads cached data from HuggingFace on demand")
610
  print(f"{'='*80}\n")
611
 
612
  # Create Gradio interface
613
- with gr.Blocks(title="SWE Assistant PR Leaderboard", theme=gr.themes.Soft()) as app:
614
- gr.Markdown("# SWE Assistant PR Leaderboard")
615
- gr.Markdown(f"Track and compare GitHub pull request statistics for SWE assistants")
616
 
617
  with gr.Tabs():
618
 
619
  # Leaderboard Tab
620
  with gr.Tab("Leaderboard"):
621
- gr.Markdown("*Statistics are based on assistant PR activity tracked by the system*")
622
  leaderboard_table = Leaderboard(
623
  value=pd.DataFrame(columns=[col[0] for col in LEADERBOARD_COLUMNS]), # Empty initially
624
  datatype=LEADERBOARD_COLUMNS,
@@ -642,18 +673,32 @@ with gr.Blocks(title="SWE Assistant PR Leaderboard", theme=gr.themes.Soft()) as
642
  outputs=[leaderboard_table]
643
  )
644
 
645
- # Monthly Metrics Section
646
  gr.Markdown("---") # Divider
647
  with gr.Group():
648
- gr.Markdown("### Monthly Performance - Top 5 Assistants")
649
  gr.Markdown("*Shows acceptance rate trends and PR volumes for the most active assistants*")
650
- monthly_metrics_plot = gr.Plot(label="Monthly Metrics")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
651
 
652
- # Load monthly metrics when app starts
653
  app.load(
654
- fn=lambda: create_monthly_metrics_plot(),
655
  inputs=[],
656
- outputs=[monthly_metrics_plot]
657
  )
658
 
659
 
 
31
  ("Assistant", "string"),
32
  ("Website", "string"),
33
  ("Total PRs", "number"),
34
+ ("Total Commits", "number"),
35
  ("Merged PRs", "number"),
36
  ("Acceptance Rate (%)", "number"),
37
  ]
 
270
  # UI FUNCTIONS
271
  # =============================================================================
272
 
273
+ def create_monthly_metrics_plot(type="pr", top_n=5):
274
  """
275
+ Create a Plotly figure showing monthly metrics.
276
+ - For PRs: Acceptance Rate (%) as line curves, Total PRs as bar charts
277
+ - For Commits: Total Commits as bar charts
278
 
279
  Each assistant gets a unique color for both their line and bars.
280
 
281
  Args:
282
+ type: Type of metrics to display - "pr" or "commit" (default: "pr")
283
  top_n: Number of top assistants to show (default: 5)
284
  """
285
+ # Determine metrics key and field names based on type
286
+ if type == "commit":
287
+ metrics_key = 'commit_monthly_metrics'
288
+ total_field = 'total_commits'
289
+ no_data_msg = "No commit data available for visualization"
290
+ total_label = "Total Commits"
291
+ print_msg = "commit"
292
+ has_rate = False
293
+ else: # default to "pr"
294
+ metrics_key = 'pr_monthly_metrics'
295
+ total_field = 'total_prs'
296
+ no_data_msg = "No PR data available for visualization"
297
+ total_label = "Total PRs"
298
+ print_msg = "PR"
299
+ has_rate = True
300
+
301
  # Load from saved dataset
302
  saved_data = load_leaderboard_data_from_hf()
303
 
304
+ if not saved_data or metrics_key not in saved_data:
305
  # Return an empty figure with a message
306
  fig = go.Figure()
307
  fig.add_annotation(
308
+ text=no_data_msg,
309
  xref="paper", yref="paper",
310
  x=0.5, y=0.5, showarrow=False,
311
  font=dict(size=16)
 
317
  )
318
  return fig
319
 
320
+ metrics = saved_data[metrics_key]
321
+ print(f"Loaded {print_msg} monthly metrics from saved dataset")
322
 
323
  # Apply top_n filter if specified
324
  if top_n is not None and top_n > 0 and metrics.get('assistants'):
325
+ # Calculate total count for each assistant
326
  agent_totals = []
327
  for agent_name in metrics['assistants']:
328
  agent_data = metrics['data'].get(agent_name, {})
329
+ total_count = sum(agent_data.get(total_field, []))
330
+ agent_totals.append((agent_name, total_count))
331
 
332
+ # Sort by total count and take top N
333
  agent_totals.sort(key=lambda x: x[1], reverse=True)
334
  top_agents = [agent_name for agent_name, _ in agent_totals[:top_n]]
335
 
 
356
  )
357
  return fig
358
 
359
+ # Create figure with secondary y-axis (for PRs) or single axis (for commits)
360
+ if has_rate:
361
+ fig = make_subplots(specs=[[{"secondary_y": True}]])
362
+ else:
363
+ fig = go.Figure()
364
 
365
  # Generate unique colors for many assistants using HSL color space
366
  def generate_color(index, total):
 
382
  color = agent_colors[agent_name]
383
  agent_data = data[agent_name]
384
 
385
+ if has_rate:
386
+ # Add line trace for acceptance rate (left y-axis) - PR only
387
+ acceptance_rates = agent_data['acceptance_rates']
388
+ # Filter out None values for plotting
389
+ x_acceptance = [month for month, rate in zip(months, acceptance_rates) if rate is not None]
390
+ y_acceptance = [rate for rate in acceptance_rates if rate is not None]
391
+
392
+ if x_acceptance and y_acceptance: # Only add trace if there's data
393
+ fig.add_trace(
394
+ go.Scatter(
395
+ x=x_acceptance,
396
+ y=y_acceptance,
397
+ name=agent_name,
398
+ mode='lines+markers',
399
+ line=dict(color=color, width=2),
400
+ marker=dict(size=8),
401
+ legendgroup=agent_name,
402
+ showlegend=(top_n is not None and top_n <= 10), # Show legend for top N assistants
403
+ hovertemplate='<b>Assistant: %{fullData.name}</b><br>' +
404
+ 'Month: %{x}<br>' +
405
+ 'Acceptance Rate: %{y:.2f}%<br>' +
406
+ '<extra></extra>'
407
+ ),
408
+ secondary_y=False
409
+ )
410
 
411
+ # Add bar trace for total count (right y-axis for PRs, single axis for commits)
412
+ # Only show bars for months where assistant has data
413
  x_bars = []
414
  y_bars = []
415
+ for month, count in zip(months, agent_data[total_field]):
416
+ if count > 0: # Only include months with data
417
  x_bars.append(month)
418
  y_bars.append(count)
419
 
420
  if x_bars and y_bars: # Only add trace if there's data
421
+ trace_args = {
422
+ 'x': x_bars,
423
+ 'y': y_bars,
424
+ 'name': agent_name,
425
+ 'marker': dict(color=color, opacity=0.7 if type == "commit" else 0.6),
426
+ 'legendgroup': agent_name,
427
+ 'showlegend': False if has_rate else (top_n is not None and top_n <= 10),
428
+ 'hovertemplate': f'<b>Assistant: %{{fullData.name}}</b><br>' +
429
+ f'Month: %{{x}}<br>' +
430
+ f'{total_label}: %{{y}}<br>' +
431
+ '<extra></extra>',
432
+ 'offsetgroup': agent_name
433
+ }
434
+
435
+ if has_rate:
436
+ fig.add_trace(go.Bar(**trace_args), secondary_y=True)
437
+ else:
438
+ fig.add_trace(go.Bar(**trace_args))
439
 
440
  # Update axes labels
441
  fig.update_xaxes(title_text=None)
442
+
443
+ if has_rate:
444
+ # For PRs: dual y-axes
445
+ fig.update_yaxes(
446
+ title_text="<b>Acceptance Rate (%)</b>",
447
+ range=[0, 100],
448
+ secondary_y=False,
449
+ showticklabels=True,
450
+ tickmode='linear',
451
+ dtick=10,
452
+ showgrid=True
453
+ )
454
+ fig.update_yaxes(title_text=f"<b>{total_label}</b>", secondary_y=True)
455
+ else:
456
+ # For commits: single y-axis
457
+ fig.update_yaxes(title_text=f"<b>{total_label}</b>")
458
 
459
  # Update layout
460
  show_legend = (top_n is not None and top_n <= 10)
 
511
  data.get('name', 'Unknown'),
512
  data.get('website', 'N/A'),
513
  total_prs,
514
+ data.get('total_commits', 0),
515
  data.get('merged_prs', 0),
516
  data.get('acceptance_rate', 0.0),
517
  ])
 
524
  df = pd.DataFrame(rows, columns=column_names)
525
 
526
  # Ensure numeric types
527
+ numeric_cols = ["Total PRs", "Total Commits", "Merged PRs", "Acceptance Rate (%)"]
528
  for col in numeric_cols:
529
  if col in df.columns:
530
  df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0)
 
641
  print(f"{'='*80}\n")
642
 
643
  # Create Gradio interface
644
+ with gr.Blocks(title="SWE Assistant PR & Commit Leaderboard", theme=gr.themes.Soft()) as app:
645
+ gr.Markdown("# SWE Assistant PR & Commit Leaderboard")
646
+ gr.Markdown(f"Track and compare GitHub pull request and commit statistics for SWE assistants")
647
 
648
  with gr.Tabs():
649
 
650
  # Leaderboard Tab
651
  with gr.Tab("Leaderboard"):
652
+ gr.Markdown("*Statistics are based on assistant PR and commit activity tracked by the system*")
653
  leaderboard_table = Leaderboard(
654
  value=pd.DataFrame(columns=[col[0] for col in LEADERBOARD_COLUMNS]), # Empty initially
655
  datatype=LEADERBOARD_COLUMNS,
 
673
  outputs=[leaderboard_table]
674
  )
675
 
676
+ # PR Monthly Metrics Section
677
  gr.Markdown("---") # Divider
678
  with gr.Group():
679
+ gr.Markdown("### PR Monthly Performance - Top 5 Assistants")
680
  gr.Markdown("*Shows acceptance rate trends and PR volumes for the most active assistants*")
681
+ pr_monthly_metrics_plot = gr.Plot(label="PR Monthly Metrics")
682
+
683
+ # Load PR monthly metrics when app starts
684
+ app.load(
685
+ fn=lambda: create_monthly_metrics_plot(type="pr"),
686
+ inputs=[],
687
+ outputs=[pr_monthly_metrics_plot]
688
+ )
689
+
690
+ # Commit Monthly Metrics Section
691
+ gr.Markdown("---") # Divider
692
+ with gr.Group():
693
+ gr.Markdown("### Commit Monthly Performance - Top 5 Assistants")
694
+ gr.Markdown("*Shows commit volumes for the most active assistants*")
695
+ commit_monthly_metrics_plot = gr.Plot(label="Commit Monthly Metrics")
696
 
697
+ # Load commit monthly metrics when app starts
698
  app.load(
699
+ fn=lambda: create_monthly_metrics_plot(type="commit"),
700
  inputs=[],
701
+ outputs=[commit_monthly_metrics_plot]
702
  )
703
 
704
 
msr.py CHANGED
@@ -344,39 +344,54 @@ def generate_file_path_patterns(start_date, end_date, data_dir=GHARCHIVE_DATA_LO
344
  # STREAMING BATCH PROCESSING
345
  # =============================================================================
346
 
347
- def fetch_all_pr_metadata_streaming(conn, identifiers, start_date, end_date):
348
  """
349
- OPTIMIZED: Fetch PR metadata using streaming batch processing.
350
-
351
  Processes GHArchive files in BATCH_SIZE_DAYS chunks to limit memory usage.
352
  Instead of loading 180 days (4,344 files) at once, processes 7 days at a time.
353
-
354
  This prevents OOM errors by:
355
  1. Only keeping ~168 hourly files in memory per batch (vs 4,344)
356
  2. Incrementally building the results dictionary
357
  3. Allowing DuckDB to garbage collect after each batch
358
-
 
 
 
 
 
 
359
  Args:
360
  conn: DuckDB connection instance
361
  identifiers: List of GitHub usernames/bot identifiers (~1500)
362
  start_date: Start datetime (timezone-aware)
363
  end_date: End datetime (timezone-aware)
364
-
365
  Returns:
366
- Dictionary mapping assistant identifier to list of PR metadata
 
 
367
  """
368
  identifier_list = ', '.join([f"'{id}'" for id in identifiers])
369
- metadata_by_agent = defaultdict(list)
 
 
 
 
 
 
370
 
371
  # Calculate total batches
372
  total_days = (end_date - start_date).days
373
  total_batches = (total_days // BATCH_SIZE_DAYS) + 1
374
-
375
  # Process in configurable batches
376
  current_date = start_date
377
  batch_num = 0
 
378
  total_prs = 0
379
-
380
  print(f" Streaming {total_batches} batches of {BATCH_SIZE_DAYS}-day intervals...")
381
 
382
  while current_date <= end_date:
@@ -396,23 +411,27 @@ def fetch_all_pr_metadata_streaming(conn, identifiers, start_date, end_date):
396
 
397
  # Build file patterns SQL for THIS BATCH
398
  file_patterns_sql = '[' + ', '.join([f"'{fp}'" for fp in file_patterns]) + ']'
399
-
400
- # Query for this batch
401
- # We need both opened and closed events:
402
- # - opened events: to identify PRs created within the time frame
403
- # - closed events: to determine if/when those PRs were merged
404
- query = f"""
405
- SELECT DISTINCT
 
 
 
406
  CONCAT(
407
  REPLACE(repo.url, 'api.github.com/repos/', 'github.com/'),
408
  '/pull/',
409
  CAST(payload.pull_request.number AS VARCHAR)
410
- ) as url,
411
  TRY_CAST(json_extract_string(to_json(payload), '$.action') AS VARCHAR) as action,
412
  TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.user.login') AS VARCHAR) as pr_author,
413
- TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.created_at') AS VARCHAR) as created_at,
414
- TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.merged_at') AS VARCHAR) as merged_at,
415
- TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.closed_at') AS VARCHAR) as closed_at
 
416
  FROM read_json(
417
  {file_patterns_sql},
418
  union_by_name=true,
@@ -422,38 +441,83 @@ def fetch_all_pr_metadata_streaming(conn, identifiers, start_date, end_date):
422
  ignore_errors=true,
423
  maximum_object_size=2147483648
424
  )
425
- WHERE type = 'PullRequestEvent'
426
- AND payload.pull_request.number IS NOT NULL
427
- AND TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.created_at') AS VARCHAR) IS NOT NULL
428
- AND TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.user.login') AS VARCHAR) IN ({identifier_list})
429
- AND TRY_CAST(json_extract_string(to_json(payload), '$.action') AS VARCHAR) IN ('opened', 'closed')
 
 
 
 
 
 
 
 
 
430
  """
431
 
432
  try:
433
- results = conn.execute(query).fetchall()
434
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
435
  # Group events by PR URL to merge opened and closed events
436
  pr_events = defaultdict(lambda: {'opened': None, 'closed': None})
437
 
438
- for row in results:
439
- url = row[0]
440
- action = row[1]
441
- pr_author = row[2]
442
- created_at = normalize_date_format(row[3]) if row[3] else None
443
- merged_at = normalize_date_format(row[4]) if row[4] else None
444
- closed_at = normalize_date_format(row[5]) if row[5] else None
445
 
446
- if not url or not action:
447
  continue
448
 
449
  event_data = {
450
  'pr_author': pr_author,
451
- 'created_at': created_at,
452
- 'merged_at': merged_at,
453
- 'closed_at': closed_at,
454
  }
455
 
456
- pr_events[url][action] = event_data
457
 
458
  # Only include PRs that have an 'opened' event
459
  # Use closed event data (if available) to get merged_at and closed_at
@@ -480,11 +544,11 @@ def fetch_all_pr_metadata_streaming(conn, identifiers, start_date, end_date):
480
  'closed_at': closed_event['closed_at'] if closed_event else None,
481
  }
482
 
483
- metadata_by_agent[pr_author].append(pr_metadata)
484
  batch_prs += 1
485
  total_prs += 1
486
 
487
- print(f"✓ {batch_prs} PRs found")
488
 
489
  except Exception as e:
490
  print(f"\n ✗ Batch {batch_num} error: {str(e)}")
@@ -492,12 +556,17 @@ def fetch_all_pr_metadata_streaming(conn, identifiers, start_date, end_date):
492
 
493
  # Move to next batch
494
  current_date = batch_end + timedelta(days=1)
495
-
496
  # Final summary
497
- agents_with_data = sum(1 for prs in metadata_by_agent.values() if prs)
498
- print(f"\n ✓ Complete: {total_prs} PRs found for {agents_with_data}/{len(identifiers)} assistants")
499
-
500
- return dict(metadata_by_agent)
 
 
 
 
 
501
 
502
 
503
  def sync_agents_repo():
@@ -609,6 +678,15 @@ def load_agents_from_hf():
609
  return assistants
610
 
611
 
 
 
 
 
 
 
 
 
 
612
  def calculate_pr_stats_from_metadata(metadata_list):
613
  """Calculate statistics from a list of PR metadata."""
614
  total_prs = len(metadata_list)
@@ -626,8 +704,62 @@ def calculate_pr_stats_from_metadata(metadata_list):
626
  }
627
 
628
 
629
- def calculate_monthly_metrics_by_agent(all_metadata_dict, assistants):
630
- """Calculate monthly metrics for all assistants for visualization."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
631
  identifier_to_name = {assistant.get('github_identifier'): assistant.get('name') for assistant in assistants if assistant.get('github_identifier')}
632
 
633
  if not all_metadata_dict:
@@ -696,8 +828,17 @@ def calculate_monthly_metrics_by_agent(all_metadata_dict, assistants):
696
  }
697
 
698
 
699
- def construct_leaderboard_from_metadata(all_metadata_dict, assistants):
700
- """Construct leaderboard from in-memory PR metadata."""
 
 
 
 
 
 
 
 
 
701
  if not assistants:
702
  print("Error: No assistants found")
703
  return {}
@@ -708,21 +849,27 @@ def construct_leaderboard_from_metadata(all_metadata_dict, assistants):
708
  identifier = assistant.get('github_identifier')
709
  agent_name = assistant.get('name', 'Unknown')
710
 
711
- bot_metadata = all_metadata_dict.get(identifier, [])
712
- stats = calculate_pr_stats_from_metadata(bot_metadata)
 
 
 
 
 
713
 
714
  cache_dict[identifier] = {
715
  'name': agent_name,
716
  'website': assistant.get('website', 'N/A'),
717
  'github_identifier': identifier,
718
- **stats
 
719
  }
720
 
721
  return cache_dict
722
 
723
 
724
- def save_leaderboard_data_to_hf(leaderboard_dict, monthly_metrics):
725
- """Save leaderboard data and monthly metrics to HuggingFace dataset."""
726
  try:
727
  token = get_hf_token()
728
  if not token:
@@ -731,12 +878,13 @@ def save_leaderboard_data_to_hf(leaderboard_dict, monthly_metrics):
731
  api = HfApi(token=token)
732
 
733
  combined_data = {
734
- 'last_updated': datetime.now(timezone.utc).isoformat(),
735
- 'leaderboard': leaderboard_dict,
736
- 'monthly_metrics': monthly_metrics,
737
  'metadata': {
 
738
  'leaderboard_time_frame_days': LEADERBOARD_TIME_FRAME_DAYS
739
- }
 
 
 
740
  }
741
 
742
  with open(LEADERBOARD_FILENAME, 'w') as f:
@@ -767,8 +915,8 @@ def save_leaderboard_data_to_hf(leaderboard_dict, monthly_metrics):
767
 
768
  def mine_all_agents():
769
  """
770
- Mine PR metadata for all assistants using STREAMING batch processing.
771
- Downloads GHArchive data, then uses BATCH-based DuckDB queries.
772
  """
773
  print(f"\n[1/4] Downloading GHArchive data...")
774
 
@@ -787,7 +935,7 @@ def mine_all_agents():
787
  print("Error: No valid assistant identifiers found")
788
  return
789
 
790
- print(f"\n[3/4] Mining PR metadata ({len(identifiers)} assistants, {LEADERBOARD_TIME_FRAME_DAYS} days)...")
791
 
792
  try:
793
  conn = get_duckdb_connection()
@@ -800,11 +948,15 @@ def mine_all_agents():
800
  start_date = end_date - timedelta(days=LEADERBOARD_TIME_FRAME_DAYS)
801
 
802
  try:
803
- # USE STREAMING FUNCTION INSTEAD
804
- all_metadata = fetch_all_pr_metadata_streaming(
805
  conn, identifiers, start_date, end_date
806
  )
807
 
 
 
 
 
808
  except Exception as e:
809
  print(f"Error during DuckDB fetch: {str(e)}")
810
  traceback.print_exc()
@@ -815,9 +967,23 @@ def mine_all_agents():
815
  print(f"\n[4/4] Saving leaderboard...")
816
 
817
  try:
818
- leaderboard_dict = construct_leaderboard_from_metadata(all_metadata, assistants)
819
- monthly_metrics = calculate_monthly_metrics_by_agent(all_metadata, assistants)
820
- save_leaderboard_data_to_hf(leaderboard_dict, monthly_metrics)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
821
 
822
  except Exception as e:
823
  print(f"Error saving leaderboard: {str(e)}")
 
344
  # STREAMING BATCH PROCESSING
345
  # =============================================================================
346
 
347
+ def fetch_all_metadata_streaming(conn, identifiers, start_date, end_date):
348
  """
349
+ UNIFIED QUERY: Fetch both commit and PR metadata using streaming batch processing.
350
+
351
  Processes GHArchive files in BATCH_SIZE_DAYS chunks to limit memory usage.
352
  Instead of loading 180 days (4,344 files) at once, processes 7 days at a time.
353
+
354
  This prevents OOM errors by:
355
  1. Only keeping ~168 hourly files in memory per batch (vs 4,344)
356
  2. Incrementally building the results dictionary
357
  3. Allowing DuckDB to garbage collect after each batch
358
+
359
+ Fetches both:
360
+ - PushEvent (for commit tracking)
361
+ - PullRequestEvent (for PR tracking)
362
+
363
+ Then post-processes in Python to separate commits and PRs.
364
+
365
  Args:
366
  conn: DuckDB connection instance
367
  identifiers: List of GitHub usernames/bot identifiers (~1500)
368
  start_date: Start datetime (timezone-aware)
369
  end_date: End datetime (timezone-aware)
370
+
371
  Returns:
372
+ Dictionary with two keys:
373
+ - 'commits': {author: [commit_metadata]} for commit tracking
374
+ - 'prs': {author: [pr_metadata]} for PR tracking
375
  """
376
  identifier_list = ', '.join([f"'{id}'" for id in identifiers])
377
+ identifier_set = set(identifiers)
378
+
379
+ # Storage for commits
380
+ commits_by_agent = defaultdict(list)
381
+
382
+ # Storage for PRs
383
+ prs_by_agent = defaultdict(list)
384
 
385
  # Calculate total batches
386
  total_days = (end_date - start_date).days
387
  total_batches = (total_days // BATCH_SIZE_DAYS) + 1
388
+
389
  # Process in configurable batches
390
  current_date = start_date
391
  batch_num = 0
392
+ total_commits = 0
393
  total_prs = 0
394
+
395
  print(f" Streaming {total_batches} batches of {BATCH_SIZE_DAYS}-day intervals...")
396
 
397
  while current_date <= end_date:
 
411
 
412
  # Build file patterns SQL for THIS BATCH
413
  file_patterns_sql = '[' + ', '.join([f"'{fp}'" for fp in file_patterns]) + ']'
414
+
415
+ # UNIFIED QUERY: Fetch both commits (PushEvent) and PRs (PullRequestEvent)
416
+ # Post-process in Python to separate them
417
+ unified_query = f"""
418
+ SELECT
419
+ type,
420
+ -- Commit fields (from PushEvent)
421
+ TRY_CAST(json_extract_string(to_json(actor), '$.login') AS VARCHAR) as author,
422
+ TRY_CAST(json_extract_string(to_json(payload), '$.head') AS VARCHAR) as commit_sha,
423
+ -- PR fields (from PullRequestEvent)
424
  CONCAT(
425
  REPLACE(repo.url, 'api.github.com/repos/', 'github.com/'),
426
  '/pull/',
427
  CAST(payload.pull_request.number AS VARCHAR)
428
+ ) as pr_url,
429
  TRY_CAST(json_extract_string(to_json(payload), '$.action') AS VARCHAR) as action,
430
  TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.user.login') AS VARCHAR) as pr_author,
431
+ TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.created_at') AS VARCHAR) as pr_created_at,
432
+ TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.merged_at') AS VARCHAR) as pr_merged_at,
433
+ TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.closed_at') AS VARCHAR) as pr_closed_at,
434
+ created_at
435
  FROM read_json(
436
  {file_patterns_sql},
437
  union_by_name=true,
 
441
  ignore_errors=true,
442
  maximum_object_size=2147483648
443
  )
444
+ WHERE
445
+ -- PushEvent: Commits by assistants
446
+ (type = 'PushEvent'
447
+ AND TRY_CAST(json_extract_string(to_json(payload), '$.head') AS VARCHAR) IS NOT NULL
448
+ AND TRY_CAST(json_extract_string(to_json(actor), '$.login') AS VARCHAR) IN ({identifier_list})
449
+ )
450
+ OR
451
+ -- PullRequestEvent: PRs by assistants
452
+ (type = 'PullRequestEvent'
453
+ AND payload.pull_request.number IS NOT NULL
454
+ AND TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.created_at') AS VARCHAR) IS NOT NULL
455
+ AND TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.user.login') AS VARCHAR) IN ({identifier_list})
456
+ AND TRY_CAST(json_extract_string(to_json(payload), '$.action') AS VARCHAR) IN ('opened', 'closed')
457
+ )
458
  """
459
 
460
  try:
461
+ all_results = conn.execute(unified_query).fetchall()
462
 
463
+ # Post-process results to separate commits and PRs
464
+ # Row structure: [type, author, commit_sha, pr_url, action, pr_author,
465
+ # pr_created_at, pr_merged_at, pr_closed_at, created_at]
466
+
467
+ commit_events = []
468
+ pr_events_list = []
469
+
470
+ for row in all_results:
471
+ event_type = row[0]
472
+
473
+ if event_type == 'PushEvent':
474
+ commit_events.append(row)
475
+ elif event_type == 'PullRequestEvent':
476
+ pr_events_list.append(row)
477
+
478
+ # Process commits
479
+ batch_commits = 0
480
+ for row in commit_events:
481
+ author = row[1]
482
+ sha = row[2]
483
+ created_at = normalize_date_format(row[9]) if row[9] else None
484
+
485
+ if not author or not sha:
486
+ continue
487
+
488
+ # Build commit metadata
489
+ commit_metadata = {
490
+ 'sha': sha,
491
+ 'created_at': created_at,
492
+ }
493
+
494
+ commits_by_agent[author].append(commit_metadata)
495
+ batch_commits += 1
496
+ total_commits += 1
497
+
498
+ # Process PRs
499
  # Group events by PR URL to merge opened and closed events
500
  pr_events = defaultdict(lambda: {'opened': None, 'closed': None})
501
 
502
+ for row in pr_events_list:
503
+ pr_url = row[3]
504
+ action = row[4]
505
+ pr_author = row[5]
506
+ pr_created_at = normalize_date_format(row[6]) if row[6] else None
507
+ pr_merged_at = normalize_date_format(row[7]) if row[7] else None
508
+ pr_closed_at = normalize_date_format(row[8]) if row[8] else None
509
 
510
+ if not pr_url or not action:
511
  continue
512
 
513
  event_data = {
514
  'pr_author': pr_author,
515
+ 'created_at': pr_created_at,
516
+ 'merged_at': pr_merged_at,
517
+ 'closed_at': pr_closed_at,
518
  }
519
 
520
+ pr_events[pr_url][action] = event_data
521
 
522
  # Only include PRs that have an 'opened' event
523
  # Use closed event data (if available) to get merged_at and closed_at
 
544
  'closed_at': closed_event['closed_at'] if closed_event else None,
545
  }
546
 
547
+ prs_by_agent[pr_author].append(pr_metadata)
548
  batch_prs += 1
549
  total_prs += 1
550
 
551
+ print(f"✓ {batch_commits} commits, {batch_prs} PRs found")
552
 
553
  except Exception as e:
554
  print(f"\n ✗ Batch {batch_num} error: {str(e)}")
 
556
 
557
  # Move to next batch
558
  current_date = batch_end + timedelta(days=1)
559
+
560
  # Final summary
561
+ agents_with_commits = sum(1 for commits in commits_by_agent.values() if commits)
562
+ agents_with_prs = sum(1 for prs in prs_by_agent.values() if prs)
563
+ print(f"\n ✓ Complete: {total_commits} commits for {agents_with_commits}/{len(identifiers)} assistants")
564
+ print(f" ✓ Complete: {total_prs} PRs for {agents_with_prs}/{len(identifiers)} assistants")
565
+
566
+ return {
567
+ 'commits': dict(commits_by_agent),
568
+ 'prs': dict(prs_by_agent)
569
+ }
570
 
571
 
572
  def sync_agents_repo():
 
678
  return assistants
679
 
680
 
681
+ def calculate_commit_stats_from_metadata(metadata_list):
682
+ """Calculate statistics from a list of commit metadata."""
683
+ total_commits = len(metadata_list)
684
+
685
+ return {
686
+ 'total_commits': total_commits,
687
+ }
688
+
689
+
690
  def calculate_pr_stats_from_metadata(metadata_list):
691
  """Calculate statistics from a list of PR metadata."""
692
  total_prs = len(metadata_list)
 
704
  }
705
 
706
 
707
+ def calculate_monthly_metrics_by_agent_commits(all_metadata_dict, assistants):
708
+ """Calculate monthly metrics for commits for all assistants for visualization."""
709
+ identifier_to_name = {assistant.get('github_identifier'): assistant.get('name') for assistant in assistants if assistant.get('github_identifier')}
710
+
711
+ if not all_metadata_dict:
712
+ return {'assistants': [], 'months': [], 'data': {}}
713
+
714
+ agent_month_data = defaultdict(lambda: defaultdict(list))
715
+
716
+ for agent_identifier, metadata_list in all_metadata_dict.items():
717
+ for commit_meta in metadata_list:
718
+ created_at = commit_meta.get('created_at')
719
+
720
+ if not created_at:
721
+ continue
722
+
723
+ agent_name = identifier_to_name.get(agent_identifier, agent_identifier)
724
+
725
+ try:
726
+ dt = datetime.fromisoformat(created_at.replace('Z', '+00:00'))
727
+ month_key = f"{dt.year}-{dt.month:02d}"
728
+ agent_month_data[agent_name][month_key].append(commit_meta)
729
+ except Exception as e:
730
+ print(f"Warning: Could not parse date '{created_at}': {e}")
731
+ continue
732
+
733
+ all_months = set()
734
+ for agent_data in agent_month_data.values():
735
+ all_months.update(agent_data.keys())
736
+ months = sorted(list(all_months))
737
+
738
+ result_data = {}
739
+ for agent_name, month_dict in agent_month_data.items():
740
+ total_commits_list = []
741
+
742
+ for month in months:
743
+ commits_in_month = month_dict.get(month, [])
744
+ total_count = len(commits_in_month)
745
+
746
+ total_commits_list.append(total_count)
747
+
748
+ result_data[agent_name] = {
749
+ 'total_commits': total_commits_list,
750
+ }
751
+
752
+ agents_list = sorted(list(agent_month_data.keys()))
753
+
754
+ return {
755
+ 'assistants': agents_list,
756
+ 'months': months,
757
+ 'data': result_data
758
+ }
759
+
760
+
761
+ def calculate_monthly_metrics_by_agent_prs(all_metadata_dict, assistants):
762
+ """Calculate monthly metrics for PRs for all assistants for visualization."""
763
  identifier_to_name = {assistant.get('github_identifier'): assistant.get('name') for assistant in assistants if assistant.get('github_identifier')}
764
 
765
  if not all_metadata_dict:
 
828
  }
829
 
830
 
831
+ def construct_leaderboard_from_metadata(commit_metadata_dict, pr_metadata_dict, assistants):
832
+ """Construct leaderboard from in-memory commit and PR metadata.
833
+
834
+ Args:
835
+ commit_metadata_dict: Dictionary mapping assistant ID to list of commit metadata
836
+ pr_metadata_dict: Dictionary mapping assistant ID to list of PR metadata
837
+ assistants: List of assistant metadata
838
+
839
+ Returns:
840
+ Dictionary with leaderboard data including both commit and PR statistics
841
+ """
842
  if not assistants:
843
  print("Error: No assistants found")
844
  return {}
 
849
  identifier = assistant.get('github_identifier')
850
  agent_name = assistant.get('name', 'Unknown')
851
 
852
+ # Get commit and PR metadata
853
+ commit_metadata = commit_metadata_dict.get(identifier, [])
854
+ pr_metadata = pr_metadata_dict.get(identifier, [])
855
+
856
+ # Calculate statistics
857
+ commit_stats = calculate_commit_stats_from_metadata(commit_metadata)
858
+ pr_stats = calculate_pr_stats_from_metadata(pr_metadata)
859
 
860
  cache_dict[identifier] = {
861
  'name': agent_name,
862
  'website': assistant.get('website', 'N/A'),
863
  'github_identifier': identifier,
864
+ **commit_stats,
865
+ **pr_stats
866
  }
867
 
868
  return cache_dict
869
 
870
 
871
+ def save_leaderboard_data_to_hf(leaderboard_dict, commit_monthly_metrics, pr_monthly_metrics):
872
+ """Save leaderboard data, commit monthly metrics, and PR monthly metrics to HuggingFace dataset."""
873
  try:
874
  token = get_hf_token()
875
  if not token:
 
878
  api = HfApi(token=token)
879
 
880
  combined_data = {
 
 
 
881
  'metadata': {
882
+ 'last_updated': datetime.now(timezone.utc).isoformat(),
883
  'leaderboard_time_frame_days': LEADERBOARD_TIME_FRAME_DAYS
884
+ },
885
+ 'leaderboard': leaderboard_dict,
886
+ 'commit_monthly_metrics': commit_monthly_metrics,
887
+ 'pr_monthly_metrics': pr_monthly_metrics
888
  }
889
 
890
  with open(LEADERBOARD_FILENAME, 'w') as f:
 
915
 
916
  def mine_all_agents():
917
  """
918
+ Mine commit and PR metadata for all assistants using STREAMING batch processing.
919
+ Downloads GHArchive data, then uses UNIFIED BATCH-based DuckDB queries.
920
  """
921
  print(f"\n[1/4] Downloading GHArchive data...")
922
 
 
935
  print("Error: No valid assistant identifiers found")
936
  return
937
 
938
+ print(f"\n[3/4] Mining commit and PR metadata ({len(identifiers)} assistants, {LEADERBOARD_TIME_FRAME_DAYS} days)...")
939
 
940
  try:
941
  conn = get_duckdb_connection()
 
948
  start_date = end_date - timedelta(days=LEADERBOARD_TIME_FRAME_DAYS)
949
 
950
  try:
951
+ # USE UNIFIED STREAMING FUNCTION for both commits and PRs
952
+ results = fetch_all_metadata_streaming(
953
  conn, identifiers, start_date, end_date
954
  )
955
 
956
+ # Separate commits and PRs
957
+ commit_metadata = results['commits']
958
+ pr_metadata = results['prs']
959
+
960
  except Exception as e:
961
  print(f"Error during DuckDB fetch: {str(e)}")
962
  traceback.print_exc()
 
967
  print(f"\n[4/4] Saving leaderboard...")
968
 
969
  try:
970
+ # Construct leaderboard with both commit and PR data
971
+ leaderboard_dict = construct_leaderboard_from_metadata(
972
+ commit_metadata, pr_metadata, assistants
973
+ )
974
+
975
+ # Calculate monthly metrics for both commits and PRs
976
+ commit_monthly_metrics = calculate_monthly_metrics_by_agent_commits(
977
+ commit_metadata, assistants
978
+ )
979
+ pr_monthly_metrics = calculate_monthly_metrics_by_agent_prs(
980
+ pr_metadata, assistants
981
+ )
982
+
983
+ # Save everything
984
+ save_leaderboard_data_to_hf(
985
+ leaderboard_dict, commit_monthly_metrics, pr_monthly_metrics
986
+ )
987
 
988
  except Exception as e:
989
  print(f"Error saving leaderboard: {str(e)}")