Spaces:

SWE-Arena
/

SWE-PR

Running

App Files Files Community

zhimin-z commited on 26 days ago

Commit

1b10ccd

1 Parent(s): be99280

merge commit with pr

Browse files

Files changed (3) hide show

README.md +11 -7
app.py +124 -79
msr.py +233 -67

README.md CHANGED Viewed

@@ -11,17 +11,17 @@ pinned: false
 short_description: Track GitHub PR statistics for SWE assistants
 ---
-# SWE Assistant PR Leaderboard
-SWE-PR ranks software engineering assistants by their real-world GitHub pull request performance.
-No benchmarks. No sandboxes. Just real code that got merged.
 ## Why This Exists
-Most AI assistant benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the PR get merged? How many made it through? Is the assistant improving?
-If an assistant can consistently get pull requests accepted across different projects, that tells you something no benchmark can.
 ## What We Track
@@ -31,12 +31,14 @@ Key metrics from the last 180 days:
 - **Assistant**: Display name of the assistant
 - **Website**: Link to the assistant's homepage or documentation
 - **Total PRs**: Pull requests the assistant has opened
 - **Merged PRs**: PRs that got merged (not just closed)
 - **Acceptance Rate**: Percentage of concluded PRs that got merged
 **Monthly Trends**
-- Acceptance rate trends (line plots)
 - PR volume over time (bar charts)
 We focus on 180 days to highlight current capabilities and active assistants.
@@ -45,6 +47,7 @@ We focus on 180 days to highlight current capabilities and active assistants.
 **Data Collection**
 We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking:
 - PRs opened by the assistant (`PullRequestEvent`)
 **Regular Updates**
 Leaderboard refreshes weekly (Monday at 00:00 UTC).
@@ -69,7 +72,8 @@ Context matters: 100 PRs at 20% acceptance differs from 10 PRs at 80%. Consider
 Planned improvements:
 - Repository-based analysis
-- Extended metrics (review round-trips, conversation depth, files changed)
 - Merge time tracking
 - Contribution patterns (bugs, features, docs)

 short_description: Track GitHub PR statistics for SWE assistants
 ---
+# SWE Assistant PR & Commit Leaderboard
+SWE-PR ranks software engineering assistants by their real-world GitHub pull request and commit performance.
+No benchmarks. No sandboxes. Just real code that got merged and commits that got pushed.
 ## Why This Exists
+Most AI assistant benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the PR get merged? How many commits are being created? How active is the assistant across different projects? Is the assistant improving?
+If an assistant can consistently get pull requests accepted and create commits across different projects, that tells you something no benchmark can.
 ## What We Track
 - **Assistant**: Display name of the assistant
 - **Website**: Link to the assistant's homepage or documentation
 - **Total PRs**: Pull requests the assistant has opened
+- **Total Commits**: Commits created by the assistant
 - **Merged PRs**: PRs that got merged (not just closed)
 - **Acceptance Rate**: Percentage of concluded PRs that got merged
 **Monthly Trends**
+- PR acceptance rate trends (line plots)
 - PR volume over time (bar charts)
+- Commit volume over time (bar charts)
 We focus on 180 days to highlight current capabilities and active assistants.
 **Data Collection**
 We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking:
 - PRs opened by the assistant (`PullRequestEvent`)
+- Commits created by the assistant (`PushEvent`)
 **Regular Updates**
 Leaderboard refreshes weekly (Monday at 00:00 UTC).
 Planned improvements:
 - Repository-based analysis
+- Extended PR metrics (review round-trips, conversation depth, files changed)
+- Extended commit metrics (commit frequency patterns, code churn)
 - Merge time tracking
 - Contribution patterns (bugs, features, docs)

app.py CHANGED Viewed

@@ -31,6 +31,7 @@ LEADERBOARD_COLUMNS = [
     ("Assistant", "string"),
     ("Website", "string"),
     ("Total PRs", "number"),
     ("Merged PRs", "number"),
     ("Acceptance Rate (%)", "number"),
 ]
@@ -269,25 +270,42 @@ def load_leaderboard_data_from_hf():
 # UI FUNCTIONS
 # =============================================================================
-def create_monthly_metrics_plot(top_n=5):
     """
-    Create a Plotly figure with dual y-axes showing:
-    - Left y-axis: Acceptance Rate (%) as line curves
-    - Right y-axis: Total PRs created as bar charts
     Each assistant gets a unique color for both their line and bars.
     Args:
         top_n: Number of top assistants to show (default: 5)
     """
     # Load from saved dataset
     saved_data = load_leaderboard_data_from_hf()
-    if not saved_data or 'monthly_metrics' not in saved_data:
         # Return an empty figure with a message
         fig = go.Figure()
         fig.add_annotation(
-            text="No data available for visualization",
             xref="paper", yref="paper",
             x=0.5, y=0.5, showarrow=False,
             font=dict(size=16)
@@ -299,19 +317,19 @@ def create_monthly_metrics_plot(top_n=5):
         )
         return fig
-    metrics = saved_data['monthly_metrics']
-    print(f"Loaded monthly metrics from saved dataset")
     # Apply top_n filter if specified
     if top_n is not None and top_n > 0 and metrics.get('assistants'):
-        # Calculate total PRs for each assistant
         agent_totals = []
         for agent_name in metrics['assistants']:
             agent_data = metrics['data'].get(agent_name, {})
-            total_prs = sum(agent_data.get('total_prs', []))
-            agent_totals.append((agent_name, total_prs))
-        # Sort by total PRs and take top N
         agent_totals.sort(key=lambda x: x[1], reverse=True)
         top_agents = [agent_name for agent_name, _ in agent_totals[:top_n]]
@@ -338,8 +356,11 @@ def create_monthly_metrics_plot(top_n=5):
         )
         return fig
-    # Create figure with secondary y-axis
-    fig = make_subplots(specs=[[{"secondary_y": True}]])
     # Generate unique colors for many assistants using HSL color space
     def generate_color(index, total):
@@ -361,70 +382,79 @@ def create_monthly_metrics_plot(top_n=5):
         color = agent_colors[agent_name]
         agent_data = data[agent_name]
-        # Add line trace for acceptance rate (left y-axis)
-        acceptance_rates = agent_data['acceptance_rates']
-        # Filter out None values for plotting
-        x_acceptance = [month for month, rate in zip(months, acceptance_rates) if rate is not None]
-        y_acceptance = [rate for rate in acceptance_rates if rate is not None]
-        if x_acceptance and y_acceptance:  # Only add trace if there's data
-            fig.add_trace(
-                go.Scatter(
-                    x=x_acceptance,
-                    y=y_acceptance,
-                    name=agent_name,
-                    mode='lines+markers',
-                    line=dict(color=color, width=2),
-                    marker=dict(size=8),
-                    legendgroup=agent_name,
-                    showlegend=(top_n is not None and top_n <= 10),  # Show legend for top N assistants
-                    hovertemplate='<b>Assistant: %{fullData.name}</b><br>' +
-                                 'Month: %{x}<br>' +
-                                 'Acceptance Rate: %{y:.2f}%<br>' +
-                                 '<extra></extra>'
-                ),
-                secondary_y=False
-            )
-        # Add bar trace for total PRs (right y-axis)
-        # Only show bars for months where assistant has PRs
         x_bars = []
         y_bars = []
-        for month, count in zip(months, agent_data['total_prs']):
-            if count > 0:  # Only include months with PRs
                 x_bars.append(month)
                 y_bars.append(count)
         if x_bars and y_bars:  # Only add trace if there's data
-            fig.add_trace(
-                go.Bar(
-                    x=x_bars,
-                    y=y_bars,
-                    name=agent_name,
-                    marker=dict(color=color, opacity=0.6),
-                    legendgroup=agent_name,
-                    showlegend=False,  # Hide duplicate legend entry (already shown in Scatter)
-                    hovertemplate='<b>Assistant: %{fullData.name}</b><br>' +
-                                 'Month: %{x}<br>' +
-                                 'Total PRs: %{y}<br>' +
-                                 '<extra></extra>',
-                    offsetgroup=agent_name  # Group bars by assistant for proper spacing
-                ),
-                secondary_y=True
-            )
     # Update axes labels
     fig.update_xaxes(title_text=None)
-    fig.update_yaxes(
-        title_text="<b>Acceptance Rate (%)</b>",
-        range=[0, 100],
-        secondary_y=False,
-        showticklabels=True,
-        tickmode='linear',
-        dtick=10,
-        showgrid=True
-    )
-    fig.update_yaxes(title_text="<b>Total PRs</b>", secondary_y=True)
     # Update layout
     show_legend = (top_n is not None and top_n <= 10)
@@ -481,6 +511,7 @@ def get_leaderboard_dataframe():
             data.get('name', 'Unknown'),
             data.get('website', 'N/A'),
             total_prs,
             data.get('merged_prs', 0),
             data.get('acceptance_rate', 0.0),
         ])
@@ -493,7 +524,7 @@ def get_leaderboard_dataframe():
     df = pd.DataFrame(rows, columns=column_names)
     # Ensure numeric types
-    numeric_cols = ["Total PRs", "Merged PRs", "Acceptance Rate (%)"]
     for col in numeric_cols:
         if col in df.columns:
             df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0)
@@ -610,15 +641,15 @@ print(f"On startup: Loads cached data from HuggingFace on demand")
 print(f"{'='*80}\n")
 # Create Gradio interface
-with gr.Blocks(title="SWE Assistant PR Leaderboard", theme=gr.themes.Soft()) as app:
-    gr.Markdown("# SWE Assistant PR Leaderboard")
-    gr.Markdown(f"Track and compare GitHub pull request statistics for SWE assistants")
     with gr.Tabs():
         # Leaderboard Tab
         with gr.Tab("Leaderboard"):
-            gr.Markdown("*Statistics are based on assistant PR activity tracked by the system*")
             leaderboard_table = Leaderboard(
                 value=pd.DataFrame(columns=[col[0] for col in LEADERBOARD_COLUMNS]),  # Empty initially
                 datatype=LEADERBOARD_COLUMNS,
@@ -642,18 +673,32 @@ with gr.Blocks(title="SWE Assistant PR Leaderboard", theme=gr.themes.Soft()) as
                 outputs=[leaderboard_table]
             )
-            # Monthly Metrics Section
             gr.Markdown("---")  # Divider
             with gr.Group():
-                gr.Markdown("### Monthly Performance - Top 5 Assistants")
                 gr.Markdown("*Shows acceptance rate trends and PR volumes for the most active assistants*")
-                monthly_metrics_plot = gr.Plot(label="Monthly Metrics")
-            # Load monthly metrics when app starts
             app.load(
-                fn=lambda: create_monthly_metrics_plot(),
                 inputs=[],
-                outputs=[monthly_metrics_plot]
             )

     ("Assistant", "string"),
     ("Website", "string"),
     ("Total PRs", "number"),
+    ("Total Commits", "number"),
     ("Merged PRs", "number"),
     ("Acceptance Rate (%)", "number"),
 ]
 # UI FUNCTIONS
 # =============================================================================
+def create_monthly_metrics_plot(type="pr", top_n=5):
     """
+    Create a Plotly figure showing monthly metrics.
+    - For PRs: Acceptance Rate (%) as line curves, Total PRs as bar charts
+    - For Commits: Total Commits as bar charts
     Each assistant gets a unique color for both their line and bars.
     Args:
+        type: Type of metrics to display - "pr" or "commit" (default: "pr")
         top_n: Number of top assistants to show (default: 5)
     """
+    # Determine metrics key and field names based on type
+    if type == "commit":
+        metrics_key = 'commit_monthly_metrics'
+        total_field = 'total_commits'
+        no_data_msg = "No commit data available for visualization"
+        total_label = "Total Commits"
+        print_msg = "commit"
+        has_rate = False
+    else:  # default to "pr"
+        metrics_key = 'pr_monthly_metrics'
+        total_field = 'total_prs'
+        no_data_msg = "No PR data available for visualization"
+        total_label = "Total PRs"
+        print_msg = "PR"
+        has_rate = True
     # Load from saved dataset
     saved_data = load_leaderboard_data_from_hf()
+    if not saved_data or metrics_key not in saved_data:
         # Return an empty figure with a message
         fig = go.Figure()
         fig.add_annotation(
+            text=no_data_msg,
             xref="paper", yref="paper",
             x=0.5, y=0.5, showarrow=False,
             font=dict(size=16)
         )
         return fig
+    metrics = saved_data[metrics_key]
+    print(f"Loaded {print_msg} monthly metrics from saved dataset")
     # Apply top_n filter if specified
     if top_n is not None and top_n > 0 and metrics.get('assistants'):
+        # Calculate total count for each assistant
         agent_totals = []
         for agent_name in metrics['assistants']:
             agent_data = metrics['data'].get(agent_name, {})
+            total_count = sum(agent_data.get(total_field, []))
+            agent_totals.append((agent_name, total_count))
+        # Sort by total count and take top N
         agent_totals.sort(key=lambda x: x[1], reverse=True)
         top_agents = [agent_name for agent_name, _ in agent_totals[:top_n]]
         )
         return fig
+    # Create figure with secondary y-axis (for PRs) or single axis (for commits)
+    if has_rate:
+        fig = make_subplots(specs=[[{"secondary_y": True}]])
+    else:
+        fig = go.Figure()
     # Generate unique colors for many assistants using HSL color space
     def generate_color(index, total):
         color = agent_colors[agent_name]
         agent_data = data[agent_name]
+        if has_rate:
+            # Add line trace for acceptance rate (left y-axis) - PR only
+            acceptance_rates = agent_data['acceptance_rates']
+            # Filter out None values for plotting
+            x_acceptance = [month for month, rate in zip(months, acceptance_rates) if rate is not None]
+            y_acceptance = [rate for rate in acceptance_rates if rate is not None]
+            if x_acceptance and y_acceptance:  # Only add trace if there's data
+                fig.add_trace(
+                    go.Scatter(
+                        x=x_acceptance,
+                        y=y_acceptance,
+                        name=agent_name,
+                        mode='lines+markers',
+                        line=dict(color=color, width=2),
+                        marker=dict(size=8),
+                        legendgroup=agent_name,
+                        showlegend=(top_n is not None and top_n <= 10),  # Show legend for top N assistants
+                        hovertemplate='<b>Assistant: %{fullData.name}</b><br>' +
+                                     'Month: %{x}<br>' +
+                                     'Acceptance Rate: %{y:.2f}%<br>' +
+                                     '<extra></extra>'
+                    ),
+                    secondary_y=False
+                )
+        # Add bar trace for total count (right y-axis for PRs, single axis for commits)
+        # Only show bars for months where assistant has data
         x_bars = []
         y_bars = []
+        for month, count in zip(months, agent_data[total_field]):
+            if count > 0:  # Only include months with data
                 x_bars.append(month)
                 y_bars.append(count)
         if x_bars and y_bars:  # Only add trace if there's data
+            trace_args = {
+                'x': x_bars,
+                'y': y_bars,
+                'name': agent_name,
+                'marker': dict(color=color, opacity=0.7 if type == "commit" else 0.6),
+                'legendgroup': agent_name,
+                'showlegend': False if has_rate else (top_n is not None and top_n <= 10),
+                'hovertemplate': f'<b>Assistant: %{{fullData.name}}</b><br>' +
+                                f'Month: %{{x}}<br>' +
+                                f'{total_label}: %{{y}}<br>' +
+                                '<extra></extra>',
+                'offsetgroup': agent_name
+            }
+            if has_rate:
+                fig.add_trace(go.Bar(**trace_args), secondary_y=True)
+            else:
+                fig.add_trace(go.Bar(**trace_args))
     # Update axes labels
     fig.update_xaxes(title_text=None)
+    if has_rate:
+        # For PRs: dual y-axes
+        fig.update_yaxes(
+            title_text="<b>Acceptance Rate (%)</b>",
+            range=[0, 100],
+            secondary_y=False,
+            showticklabels=True,
+            tickmode='linear',
+            dtick=10,
+            showgrid=True
+        )
+        fig.update_yaxes(title_text=f"<b>{total_label}</b>", secondary_y=True)
+    else:
+        # For commits: single y-axis
+        fig.update_yaxes(title_text=f"<b>{total_label}</b>")
     # Update layout
     show_legend = (top_n is not None and top_n <= 10)
             data.get('name', 'Unknown'),
             data.get('website', 'N/A'),
             total_prs,
+            data.get('total_commits', 0),
             data.get('merged_prs', 0),
             data.get('acceptance_rate', 0.0),
         ])
     df = pd.DataFrame(rows, columns=column_names)
     # Ensure numeric types
+    numeric_cols = ["Total PRs", "Total Commits", "Merged PRs", "Acceptance Rate (%)"]
     for col in numeric_cols:
         if col in df.columns:
             df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0)
 print(f"{'='*80}\n")
 # Create Gradio interface
+with gr.Blocks(title="SWE Assistant PR & Commit Leaderboard", theme=gr.themes.Soft()) as app:
+    gr.Markdown("# SWE Assistant PR & Commit Leaderboard")
+    gr.Markdown(f"Track and compare GitHub pull request and commit statistics for SWE assistants")
     with gr.Tabs():
         # Leaderboard Tab
         with gr.Tab("Leaderboard"):
+            gr.Markdown("*Statistics are based on assistant PR and commit activity tracked by the system*")
             leaderboard_table = Leaderboard(
                 value=pd.DataFrame(columns=[col[0] for col in LEADERBOARD_COLUMNS]),  # Empty initially
                 datatype=LEADERBOARD_COLUMNS,
                 outputs=[leaderboard_table]
             )
+            # PR Monthly Metrics Section
             gr.Markdown("---")  # Divider
             with gr.Group():
+                gr.Markdown("### PR Monthly Performance - Top 5 Assistants")
                 gr.Markdown("*Shows acceptance rate trends and PR volumes for the most active assistants*")
+                pr_monthly_metrics_plot = gr.Plot(label="PR Monthly Metrics")
+            # Load PR monthly metrics when app starts
+            app.load(
+                fn=lambda: create_monthly_metrics_plot(type="pr"),
+                inputs=[],
+                outputs=[pr_monthly_metrics_plot]
+            )
+            # Commit Monthly Metrics Section
+            gr.Markdown("---")  # Divider
+            with gr.Group():
+                gr.Markdown("### Commit Monthly Performance - Top 5 Assistants")
+                gr.Markdown("*Shows commit volumes for the most active assistants*")
+                commit_monthly_metrics_plot = gr.Plot(label="Commit Monthly Metrics")
+            # Load commit monthly metrics when app starts
             app.load(
+                fn=lambda: create_monthly_metrics_plot(type="commit"),
                 inputs=[],
+                outputs=[commit_monthly_metrics_plot]
             )

msr.py CHANGED Viewed

@@ -344,39 +344,54 @@ def generate_file_path_patterns(start_date, end_date, data_dir=GHARCHIVE_DATA_LO
 # STREAMING BATCH PROCESSING
 # =============================================================================
-def fetch_all_pr_metadata_streaming(conn, identifiers, start_date, end_date):
     """
-    OPTIMIZED: Fetch PR metadata using streaming batch processing.
     Processes GHArchive files in BATCH_SIZE_DAYS chunks to limit memory usage.
     Instead of loading 180 days (4,344 files) at once, processes 7 days at a time.
     This prevents OOM errors by:
     1. Only keeping ~168 hourly files in memory per batch (vs 4,344)
     2. Incrementally building the results dictionary
     3. Allowing DuckDB to garbage collect after each batch
     Args:
         conn: DuckDB connection instance
         identifiers: List of GitHub usernames/bot identifiers (~1500)
         start_date: Start datetime (timezone-aware)
         end_date: End datetime (timezone-aware)
     Returns:
-        Dictionary mapping assistant identifier to list of PR metadata
     """
     identifier_list = ', '.join([f"'{id}'" for id in identifiers])
-    metadata_by_agent = defaultdict(list)
     # Calculate total batches
     total_days = (end_date - start_date).days
     total_batches = (total_days // BATCH_SIZE_DAYS) + 1
     # Process in configurable batches
     current_date = start_date
     batch_num = 0
     total_prs = 0
     print(f"   Streaming {total_batches} batches of {BATCH_SIZE_DAYS}-day intervals...")
     while current_date <= end_date:
@@ -396,23 +411,27 @@ def fetch_all_pr_metadata_streaming(conn, identifiers, start_date, end_date):
         # Build file patterns SQL for THIS BATCH
         file_patterns_sql = '[' + ', '.join([f"'{fp}'" for fp in file_patterns]) + ']'
-        # Query for this batch
-        # We need both opened and closed events:
-        # - opened events: to identify PRs created within the time frame
-        # - closed events: to determine if/when those PRs were merged
-        query = f"""
-        SELECT DISTINCT
             CONCAT(
                 REPLACE(repo.url, 'api.github.com/repos/', 'github.com/'),
                 '/pull/',
                 CAST(payload.pull_request.number AS VARCHAR)
-            ) as url,
             TRY_CAST(json_extract_string(to_json(payload), '$.action') AS VARCHAR) as action,
             TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.user.login') AS VARCHAR) as pr_author,
-            TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.created_at') AS VARCHAR) as created_at,
-            TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.merged_at') AS VARCHAR) as merged_at,
-            TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.closed_at') AS VARCHAR) as closed_at
         FROM read_json(
             {file_patterns_sql},
             union_by_name=true,
@@ -422,38 +441,83 @@ def fetch_all_pr_metadata_streaming(conn, identifiers, start_date, end_date):
             ignore_errors=true,
             maximum_object_size=2147483648
         )
-        WHERE type = 'PullRequestEvent'
-            AND payload.pull_request.number IS NOT NULL
-            AND TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.created_at') AS VARCHAR) IS NOT NULL
-            AND TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.user.login') AS VARCHAR) IN ({identifier_list})
-            AND TRY_CAST(json_extract_string(to_json(payload), '$.action') AS VARCHAR) IN ('opened', 'closed')
         """
         try:
-            results = conn.execute(query).fetchall()
             # Group events by PR URL to merge opened and closed events
             pr_events = defaultdict(lambda: {'opened': None, 'closed': None})
-            for row in results:
-                url = row[0]
-                action = row[1]
-                pr_author = row[2]
-                created_at = normalize_date_format(row[3]) if row[3] else None
-                merged_at = normalize_date_format(row[4]) if row[4] else None
-                closed_at = normalize_date_format(row[5]) if row[5] else None
-                if not url or not action:
                     continue
                 event_data = {
                     'pr_author': pr_author,
-                    'created_at': created_at,
-                    'merged_at': merged_at,
-                    'closed_at': closed_at,
                 }
-                pr_events[url][action] = event_data
             # Only include PRs that have an 'opened' event
             # Use closed event data (if available) to get merged_at and closed_at
@@ -480,11 +544,11 @@ def fetch_all_pr_metadata_streaming(conn, identifiers, start_date, end_date):
                     'closed_at': closed_event['closed_at'] if closed_event else None,
                 }
-                metadata_by_agent[pr_author].append(pr_metadata)
                 batch_prs += 1
                 total_prs += 1
-            print(f"✓ {batch_prs} PRs found")
         except Exception as e:
             print(f"\n   ✗ Batch {batch_num} error: {str(e)}")
@@ -492,12 +556,17 @@ def fetch_all_pr_metadata_streaming(conn, identifiers, start_date, end_date):
         # Move to next batch
         current_date = batch_end + timedelta(days=1)
     # Final summary
-    agents_with_data = sum(1 for prs in metadata_by_agent.values() if prs)
-    print(f"\n   ✓ Complete: {total_prs} PRs found for {agents_with_data}/{len(identifiers)} assistants")
-    return dict(metadata_by_agent)
 def sync_agents_repo():
@@ -609,6 +678,15 @@ def load_agents_from_hf():
     return assistants
 def calculate_pr_stats_from_metadata(metadata_list):
     """Calculate statistics from a list of PR metadata."""
     total_prs = len(metadata_list)
@@ -626,8 +704,62 @@ def calculate_pr_stats_from_metadata(metadata_list):
     }
-def calculate_monthly_metrics_by_agent(all_metadata_dict, assistants):
-    """Calculate monthly metrics for all assistants for visualization."""
     identifier_to_name = {assistant.get('github_identifier'): assistant.get('name') for assistant in assistants if assistant.get('github_identifier')}
     if not all_metadata_dict:
@@ -696,8 +828,17 @@ def calculate_monthly_metrics_by_agent(all_metadata_dict, assistants):
     }
-def construct_leaderboard_from_metadata(all_metadata_dict, assistants):
-    """Construct leaderboard from in-memory PR metadata."""
     if not assistants:
         print("Error: No assistants found")
         return {}
@@ -708,21 +849,27 @@ def construct_leaderboard_from_metadata(all_metadata_dict, assistants):
         identifier = assistant.get('github_identifier')
         agent_name = assistant.get('name', 'Unknown')
-        bot_metadata = all_metadata_dict.get(identifier, [])
-        stats = calculate_pr_stats_from_metadata(bot_metadata)
         cache_dict[identifier] = {
             'name': agent_name,
             'website': assistant.get('website', 'N/A'),
             'github_identifier': identifier,
-            **stats
         }
     return cache_dict
-def save_leaderboard_data_to_hf(leaderboard_dict, monthly_metrics):
-    """Save leaderboard data and monthly metrics to HuggingFace dataset."""
     try:
         token = get_hf_token()
         if not token:
@@ -731,12 +878,13 @@ def save_leaderboard_data_to_hf(leaderboard_dict, monthly_metrics):
         api = HfApi(token=token)
         combined_data = {
-            'last_updated': datetime.now(timezone.utc).isoformat(),
-            'leaderboard': leaderboard_dict,
-            'monthly_metrics': monthly_metrics,
             'metadata': {
                 'leaderboard_time_frame_days': LEADERBOARD_TIME_FRAME_DAYS
-            }
         }
         with open(LEADERBOARD_FILENAME, 'w') as f:
@@ -767,8 +915,8 @@ def save_leaderboard_data_to_hf(leaderboard_dict, monthly_metrics):
 def mine_all_agents():
     """
-    Mine PR metadata for all assistants using STREAMING batch processing.
-    Downloads GHArchive data, then uses BATCH-based DuckDB queries.
     """
     print(f"\n[1/4] Downloading GHArchive data...")
@@ -787,7 +935,7 @@ def mine_all_agents():
         print("Error: No valid assistant identifiers found")
         return
-    print(f"\n[3/4] Mining PR metadata ({len(identifiers)} assistants, {LEADERBOARD_TIME_FRAME_DAYS} days)...")
     try:
         conn = get_duckdb_connection()
@@ -800,11 +948,15 @@ def mine_all_agents():
     start_date = end_date - timedelta(days=LEADERBOARD_TIME_FRAME_DAYS)
     try:
-        # USE STREAMING FUNCTION INSTEAD
-        all_metadata = fetch_all_pr_metadata_streaming(
             conn, identifiers, start_date, end_date
         )
     except Exception as e:
         print(f"Error during DuckDB fetch: {str(e)}")
         traceback.print_exc()
@@ -815,9 +967,23 @@ def mine_all_agents():
     print(f"\n[4/4] Saving leaderboard...")
     try:
-        leaderboard_dict = construct_leaderboard_from_metadata(all_metadata, assistants)
-        monthly_metrics = calculate_monthly_metrics_by_agent(all_metadata, assistants)
-        save_leaderboard_data_to_hf(leaderboard_dict, monthly_metrics)
     except Exception as e:
         print(f"Error saving leaderboard: {str(e)}")

 # STREAMING BATCH PROCESSING
 # =============================================================================
+def fetch_all_metadata_streaming(conn, identifiers, start_date, end_date):
     """
+    UNIFIED QUERY: Fetch both commit and PR metadata using streaming batch processing.
     Processes GHArchive files in BATCH_SIZE_DAYS chunks to limit memory usage.
     Instead of loading 180 days (4,344 files) at once, processes 7 days at a time.
     This prevents OOM errors by:
     1. Only keeping ~168 hourly files in memory per batch (vs 4,344)
     2. Incrementally building the results dictionary
     3. Allowing DuckDB to garbage collect after each batch
+    Fetches both:
+    - PushEvent (for commit tracking)
+    - PullRequestEvent (for PR tracking)
+    Then post-processes in Python to separate commits and PRs.
     Args:
         conn: DuckDB connection instance
         identifiers: List of GitHub usernames/bot identifiers (~1500)
         start_date: Start datetime (timezone-aware)
         end_date: End datetime (timezone-aware)
     Returns:
+        Dictionary with two keys:
+        - 'commits': {author: [commit_metadata]} for commit tracking
+        - 'prs': {author: [pr_metadata]} for PR tracking
     """
     identifier_list = ', '.join([f"'{id}'" for id in identifiers])
+    identifier_set = set(identifiers)
+    # Storage for commits
+    commits_by_agent = defaultdict(list)
+    # Storage for PRs
+    prs_by_agent = defaultdict(list)
     # Calculate total batches
     total_days = (end_date - start_date).days
     total_batches = (total_days // BATCH_SIZE_DAYS) + 1
     # Process in configurable batches
     current_date = start_date
     batch_num = 0
+    total_commits = 0
     total_prs = 0
     print(f"   Streaming {total_batches} batches of {BATCH_SIZE_DAYS}-day intervals...")
     while current_date <= end_date:
         # Build file patterns SQL for THIS BATCH
         file_patterns_sql = '[' + ', '.join([f"'{fp}'" for fp in file_patterns]) + ']'
+        # UNIFIED QUERY: Fetch both commits (PushEvent) and PRs (PullRequestEvent)
+        # Post-process in Python to separate them
+        unified_query = f"""
+        SELECT
+            type,
+            -- Commit fields (from PushEvent)
+            TRY_CAST(json_extract_string(to_json(actor), '$.login') AS VARCHAR) as author,
+            TRY_CAST(json_extract_string(to_json(payload), '$.head') AS VARCHAR) as commit_sha,
+            -- PR fields (from PullRequestEvent)
             CONCAT(
                 REPLACE(repo.url, 'api.github.com/repos/', 'github.com/'),
                 '/pull/',
                 CAST(payload.pull_request.number AS VARCHAR)
+            ) as pr_url,
             TRY_CAST(json_extract_string(to_json(payload), '$.action') AS VARCHAR) as action,
             TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.user.login') AS VARCHAR) as pr_author,
+            TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.created_at') AS VARCHAR) as pr_created_at,
+            TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.merged_at') AS VARCHAR) as pr_merged_at,
+            TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.closed_at') AS VARCHAR) as pr_closed_at,
+            created_at
         FROM read_json(
             {file_patterns_sql},
             union_by_name=true,
             ignore_errors=true,
             maximum_object_size=2147483648
         )
+        WHERE
+            -- PushEvent: Commits by assistants
+            (type = 'PushEvent'
+                AND TRY_CAST(json_extract_string(to_json(payload), '$.head') AS VARCHAR) IS NOT NULL
+                AND TRY_CAST(json_extract_string(to_json(actor), '$.login') AS VARCHAR) IN ({identifier_list})
+            )
+            OR
+            -- PullRequestEvent: PRs by assistants
+            (type = 'PullRequestEvent'
+                AND payload.pull_request.number IS NOT NULL
+                AND TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.created_at') AS VARCHAR) IS NOT NULL
+                AND TRY_CAST(json_extract_string(to_json(payload), '$.pull_request.user.login') AS VARCHAR) IN ({identifier_list})
+                AND TRY_CAST(json_extract_string(to_json(payload), '$.action') AS VARCHAR) IN ('opened', 'closed')
+            )
         """
         try:
+            all_results = conn.execute(unified_query).fetchall()
+            # Post-process results to separate commits and PRs
+            # Row structure: [type, author, commit_sha, pr_url, action, pr_author,
+            #                 pr_created_at, pr_merged_at, pr_closed_at, created_at]
+            commit_events = []
+            pr_events_list = []
+            for row in all_results:
+                event_type = row[0]
+                if event_type == 'PushEvent':
+                    commit_events.append(row)
+                elif event_type == 'PullRequestEvent':
+                    pr_events_list.append(row)
+            # Process commits
+            batch_commits = 0
+            for row in commit_events:
+                author = row[1]
+                sha = row[2]
+                created_at = normalize_date_format(row[9]) if row[9] else None
+                if not author or not sha:
+                    continue
+                # Build commit metadata
+                commit_metadata = {
+                    'sha': sha,
+                    'created_at': created_at,
+                }
+                commits_by_agent[author].append(commit_metadata)
+                batch_commits += 1
+                total_commits += 1
+            # Process PRs
             # Group events by PR URL to merge opened and closed events
             pr_events = defaultdict(lambda: {'opened': None, 'closed': None})
+            for row in pr_events_list:
+                pr_url = row[3]
+                action = row[4]
+                pr_author = row[5]
+                pr_created_at = normalize_date_format(row[6]) if row[6] else None
+                pr_merged_at = normalize_date_format(row[7]) if row[7] else None
+                pr_closed_at = normalize_date_format(row[8]) if row[8] else None
+                if not pr_url or not action:
                     continue
                 event_data = {
                     'pr_author': pr_author,
+                    'created_at': pr_created_at,
+                    'merged_at': pr_merged_at,
+                    'closed_at': pr_closed_at,
                 }
+                pr_events[pr_url][action] = event_data
             # Only include PRs that have an 'opened' event
             # Use closed event data (if available) to get merged_at and closed_at
                     'closed_at': closed_event['closed_at'] if closed_event else None,
                 }
+                prs_by_agent[pr_author].append(pr_metadata)
                 batch_prs += 1
                 total_prs += 1
+            print(f"✓ {batch_commits} commits, {batch_prs} PRs found")
         except Exception as e:
             print(f"\n   ✗ Batch {batch_num} error: {str(e)}")
         # Move to next batch
         current_date = batch_end + timedelta(days=1)
     # Final summary
+    agents_with_commits = sum(1 for commits in commits_by_agent.values() if commits)
+    agents_with_prs = sum(1 for prs in prs_by_agent.values() if prs)
+    print(f"\n   ✓ Complete: {total_commits} commits for {agents_with_commits}/{len(identifiers)} assistants")
+    print(f"   ✓ Complete: {total_prs} PRs for {agents_with_prs}/{len(identifiers)} assistants")
+    return {
+        'commits': dict(commits_by_agent),
+        'prs': dict(prs_by_agent)
+    }
 def sync_agents_repo():
     return assistants
+def calculate_commit_stats_from_metadata(metadata_list):
+    """Calculate statistics from a list of commit metadata."""
+    total_commits = len(metadata_list)
+    return {
+        'total_commits': total_commits,
+    }
 def calculate_pr_stats_from_metadata(metadata_list):
     """Calculate statistics from a list of PR metadata."""
     total_prs = len(metadata_list)
     }
+def calculate_monthly_metrics_by_agent_commits(all_metadata_dict, assistants):
+    """Calculate monthly metrics for commits for all assistants for visualization."""
+    identifier_to_name = {assistant.get('github_identifier'): assistant.get('name') for assistant in assistants if assistant.get('github_identifier')}
+    if not all_metadata_dict:
+        return {'assistants': [], 'months': [], 'data': {}}
+    agent_month_data = defaultdict(lambda: defaultdict(list))
+    for agent_identifier, metadata_list in all_metadata_dict.items():
+        for commit_meta in metadata_list:
+            created_at = commit_meta.get('created_at')
+            if not created_at:
+                continue
+            agent_name = identifier_to_name.get(agent_identifier, agent_identifier)
+            try:
+                dt = datetime.fromisoformat(created_at.replace('Z', '+00:00'))
+                month_key = f"{dt.year}-{dt.month:02d}"
+                agent_month_data[agent_name][month_key].append(commit_meta)
+            except Exception as e:
+                print(f"Warning: Could not parse date '{created_at}': {e}")
+                continue
+    all_months = set()
+    for agent_data in agent_month_data.values():
+        all_months.update(agent_data.keys())
+    months = sorted(list(all_months))
+    result_data = {}
+    for agent_name, month_dict in agent_month_data.items():
+        total_commits_list = []
+        for month in months:
+            commits_in_month = month_dict.get(month, [])
+            total_count = len(commits_in_month)
+            total_commits_list.append(total_count)
+        result_data[agent_name] = {
+            'total_commits': total_commits_list,
+        }
+    agents_list = sorted(list(agent_month_data.keys()))
+    return {
+        'assistants': agents_list,
+        'months': months,
+        'data': result_data
+    }
+def calculate_monthly_metrics_by_agent_prs(all_metadata_dict, assistants):
+    """Calculate monthly metrics for PRs for all assistants for visualization."""
     identifier_to_name = {assistant.get('github_identifier'): assistant.get('name') for assistant in assistants if assistant.get('github_identifier')}
     if not all_metadata_dict:
     }
+def construct_leaderboard_from_metadata(commit_metadata_dict, pr_metadata_dict, assistants):
+    """Construct leaderboard from in-memory commit and PR metadata.
+    Args:
+        commit_metadata_dict: Dictionary mapping assistant ID to list of commit metadata
+        pr_metadata_dict: Dictionary mapping assistant ID to list of PR metadata
+        assistants: List of assistant metadata
+    Returns:
+        Dictionary with leaderboard data including both commit and PR statistics
+    """
     if not assistants:
         print("Error: No assistants found")
         return {}
         identifier = assistant.get('github_identifier')
         agent_name = assistant.get('name', 'Unknown')
+        # Get commit and PR metadata
+        commit_metadata = commit_metadata_dict.get(identifier, [])
+        pr_metadata = pr_metadata_dict.get(identifier, [])
+        # Calculate statistics
+        commit_stats = calculate_commit_stats_from_metadata(commit_metadata)
+        pr_stats = calculate_pr_stats_from_metadata(pr_metadata)
         cache_dict[identifier] = {
             'name': agent_name,
             'website': assistant.get('website', 'N/A'),
             'github_identifier': identifier,
+            **commit_stats,
+            **pr_stats
         }
     return cache_dict
+def save_leaderboard_data_to_hf(leaderboard_dict, commit_monthly_metrics, pr_monthly_metrics):
+    """Save leaderboard data, commit monthly metrics, and PR monthly metrics to HuggingFace dataset."""
     try:
         token = get_hf_token()
         if not token:
         api = HfApi(token=token)
         combined_data = {
             'metadata': {
+                'last_updated': datetime.now(timezone.utc).isoformat(),
                 'leaderboard_time_frame_days': LEADERBOARD_TIME_FRAME_DAYS
+            },
+            'leaderboard': leaderboard_dict,
+            'commit_monthly_metrics': commit_monthly_metrics,
+            'pr_monthly_metrics': pr_monthly_metrics
         }
         with open(LEADERBOARD_FILENAME, 'w') as f:
 def mine_all_agents():
     """
+    Mine commit and PR metadata for all assistants using STREAMING batch processing.
+    Downloads GHArchive data, then uses UNIFIED BATCH-based DuckDB queries.
     """
     print(f"\n[1/4] Downloading GHArchive data...")
         print("Error: No valid assistant identifiers found")
         return
+    print(f"\n[3/4] Mining commit and PR metadata ({len(identifiers)} assistants, {LEADERBOARD_TIME_FRAME_DAYS} days)...")
     try:
         conn = get_duckdb_connection()
     start_date = end_date - timedelta(days=LEADERBOARD_TIME_FRAME_DAYS)
     try:
+        # USE UNIFIED STREAMING FUNCTION for both commits and PRs
+        results = fetch_all_metadata_streaming(
             conn, identifiers, start_date, end_date
         )
+        # Separate commits and PRs
+        commit_metadata = results['commits']
+        pr_metadata = results['prs']
     except Exception as e:
         print(f"Error during DuckDB fetch: {str(e)}")
         traceback.print_exc()
     print(f"\n[4/4] Saving leaderboard...")
     try:
+        # Construct leaderboard with both commit and PR data
+        leaderboard_dict = construct_leaderboard_from_metadata(
+            commit_metadata, pr_metadata, assistants
+        )
+        # Calculate monthly metrics for both commits and PRs
+        commit_monthly_metrics = calculate_monthly_metrics_by_agent_commits(
+            commit_metadata, assistants
+        )
+        pr_monthly_metrics = calculate_monthly_metrics_by_agent_prs(
+            pr_metadata, assistants
+        )
+        # Save everything
+        save_leaderboard_data_to_hf(
+            leaderboard_dict, commit_monthly_metrics, pr_monthly_metrics
+        )
     except Exception as e:
         print(f"Error saving leaderboard: {str(e)}")