KarthikMuraliM commited on
Commit
4fda5be
·
1 Parent(s): 1a838e3

Fix: Use a more forceful prompt to prevent LLM from re-scraping more debugging yay

Browse files
Files changed (2) hide show
  1. app.py +8 -4
  2. tools.py +7 -2
app.py CHANGED
@@ -78,13 +78,17 @@ async def analyze_data(
78
  system_prompt = """
79
  You are an AI data analyst. Your ONLY task is to write a Python script that operates on a pre-existing pandas DataFrame named `df`.
80
 
81
- **URGENT AND CRITICAL INSTRUCTION:**
82
- DO NOT write any code to read or load data (e.g., from a URL or file). The DataFrame `df` is ALREADY in memory. Start your script as if `df` is already defined.
 
 
 
 
 
83
 
84
  **Your script MUST:**
85
- 1. Perform data cleaning on the `df` DataFrame. Columns that look like numbers may be strings with '$' or ',' symbols.
86
  2. For EACH question the user asks, you MUST `print()` the final answer.
87
- 3. Your entire output must be ONLY the raw Python code. No markdown, no comments, no explanations.
88
  """
89
  user_prompt = f"{df_info}\n\nPlease write a Python script to answer the following questions:\n\n{questions_text}"
90
 
 
78
  system_prompt = """
79
  You are an AI data analyst. Your ONLY task is to write a Python script that operates on a pre-existing pandas DataFrame named `df`.
80
 
81
+ **URGENT AND CRITICAL INSTRUCTIONS:**
82
+ - The pandas DataFrame `df` is ALREADY in memory.
83
+ - The pandas library is ALREADY imported as `pd`.
84
+ - The regex library is ALREADY imported as `re`.
85
+ - DO NOT include any `import` statements in your code.
86
+ - DO NOT write any code to read or load data.
87
+ - Your entire output must be ONLY the raw Python code. No markdown, no comments, no explanations.
88
 
89
  **Your script MUST:**
90
+ 1. Perform data cleaning on the `df` DataFrame first.
91
  2. For EACH question the user asks, you MUST `print()` the final answer.
 
92
  """
93
  user_prompt = f"{df_info}\n\nPlease write a Python script to answer the following questions:\n\n{questions_text}"
94
 
tools.py CHANGED
@@ -4,7 +4,8 @@ from playwright.async_api import async_playwright
4
  from bs4 import BeautifulSoup
5
  import json
6
  import openai
7
-
 
8
  import io
9
  import sys
10
  from contextlib import redirect_stdout
@@ -96,7 +97,11 @@ def run_python_code_on_dataframe(df: pd.DataFrame, python_code: str) -> str:
96
  output_stream = io.StringIO()
97
 
98
  # Create a local scope for the exec to run in, with 'df' pre-populated
99
- local_scope = {'df': df}
 
 
 
 
100
 
101
  try:
102
  # Redirect stdout to our stream
 
4
  from bs4 import BeautifulSoup
5
  import json
6
  import openai
7
+ import pandas as pd
8
+ import re
9
  import io
10
  import sys
11
  from contextlib import redirect_stdout
 
97
  output_stream = io.StringIO()
98
 
99
  # Create a local scope for the exec to run in, with 'df' pre-populated
100
+ local_scope = {
101
+ 'df': df,
102
+ 'pd': pd,
103
+ 're': re
104
+ }
105
 
106
  try:
107
  # Redirect stdout to our stream