KarthikMuraliM commited on
Commit
1a838e3
·
1 Parent(s): e9b9efe

Fix: Use a more forceful prompt to prevent LLM from re-scraping

Browse files
Files changed (1) hide show
  1. app.py +9 -27
app.py CHANGED
@@ -76,33 +76,15 @@ async def analyze_data(
76
  df_info = f"Here is the head of the pandas DataFrame, named 'df':\n{df_head}"
77
 
78
  system_prompt = """
79
- You are a world-class Python data analyst. You will be given the head of a pandas DataFrame named 'df' and a set of questions.
80
- Your ONLY job is to write a Python script to answer the questions.
81
-
82
- **CRITICAL RULES:**
83
- 1. The DataFrame `df` is already loaded in memory. Do NOT load the data.
84
- 2. You MUST perform data cleaning first. The data is messy. Columns with numbers might be strings with symbols like '$' or ','. Use `df['col'].replace(...)` and `pd.to_numeric(..., errors='coerce')`.
85
- 3. For EACH question, you MUST write code to calculate the answer and then immediately print the answer to the console using the `print()` function.
86
- 4. Each print statement MUST be clear and self-contained.
87
- 5. Your final output MUST ONLY BE THE PYTHON CODE, with no explanations, comments, or markdown.
88
-
89
- **EXAMPLE OF A PERFECT SCRIPT:**
90
- ```python
91
- import pandas as pd
92
-
93
- # Data Cleaning
94
- df['Worldwide gross'] = df['Worldwide gross'].replace({r'\\$': '', r',': ''}, regex=True)
95
- df['Worldwide gross'] = pd.to_numeric(df['Worldwide gross'], errors='coerce')
96
- df['Year'] = pd.to_numeric(df['Year'], errors='coerce')
97
-
98
- # Question 1: How many movies grossed over $2.5B?
99
- movies_over_2_5bn = df[df['Worldwide gross'] > 2500000000].shape[0]
100
- print(f"Movies over $2.5B: {movies_over_2_5bn}")
101
-
102
- # Question 2: What is the average gross of movies released in 2019?
103
- avg_gross_2019 = df[df['Year'] == 2019]['Worldwide gross'].mean()
104
- print(f"Average gross for 2019 movies: ${avg_gross_2019:,.2f}")
105
- ```
106
  """
107
  user_prompt = f"{df_info}\n\nPlease write a Python script to answer the following questions:\n\n{questions_text}"
108
 
 
76
  df_info = f"Here is the head of the pandas DataFrame, named 'df':\n{df_head}"
77
 
78
  system_prompt = """
79
+ You are an AI data analyst. Your ONLY task is to write a Python script that operates on a pre-existing pandas DataFrame named `df`.
80
+
81
+ **URGENT AND CRITICAL INSTRUCTION:**
82
+ DO NOT write any code to read or load data (e.g., from a URL or file). The DataFrame `df` is ALREADY in memory. Start your script as if `df` is already defined.
83
+
84
+ **Your script MUST:**
85
+ 1. Perform data cleaning on the `df` DataFrame. Columns that look like numbers may be strings with '$' or ',' symbols.
86
+ 2. For EACH question the user asks, you MUST `print()` the final answer.
87
+ 3. Your entire output must be ONLY the raw Python code. No markdown, no comments, no explanations.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
  """
89
  user_prompt = f"{df_info}\n\nPlease write a Python script to answer the following questions:\n\n{questions_text}"
90