KarthikMuraliM commited on
Commit
f757484
·
1 Parent(s): 98933b3

Final attempt: Use minimalist, aggressive prompt

Browse files
Files changed (1) hide show
  1. app.py +21 -40
app.py CHANGED
@@ -75,47 +75,28 @@ async def analyze_data(
75
  df_head = df.head().to_string()
76
  df_info = f"Here is the head of the pandas DataFrame, named 'df':\n{df_head}"
77
 
78
- system_prompt ="""
79
- You are an AI data analyst. Your ONLY task is to write a Python script that operates on a pre-existing pandas DataFrame named `df`.
80
-
81
- **AVAILABLE LIBRARIES:**
82
- The following libraries are ALREADY IMPORTED and available for you to use:
83
- - `pandas` as `pd`
84
- - `re`
85
- - `matplotlib.pyplot` as `plt`
86
- - `seaborn` as `sns`
87
- - `numpy` as `np`
88
- - `io`
89
- - `base64`
90
- - `sklearn.linear_model.LinearRegression`
91
-
92
- **CRITICAL INSTRUCTIONS:**
93
- - DO NOT include any `import` statements.
94
- - The DataFrame `df` is ALREADY in memory. DO NOT load data.
95
- - Your entire output MUST BE ONLY raw Python code. No markdown or explanations.
96
-
97
- **YOUR SCRIPT MUST:**
98
- 1. First, perform data cleaning on the `df` DataFrame.
99
- 2. For any text-based or calculation questions, `print()` the final answer.
100
- 3. If asked to draw a plot, you MUST generate the plot and print it as a base64 encoded data URI. DO NOT show the plot. Follow this EXACT recipe:
101
- ```python
102
- # --- START PLOT RECIPE ---
103
- fig, ax = plt.subplots()
104
- # ... your plotting code using 'ax', e.g., sns.scatterplot(ax=ax, ...) ...
105
-
106
- # Save the plot to an in-memory buffer
107
- buf = io.BytesIO()
108
- fig.savefig(buf, format='png', bbox_inches='tight')
109
- buf.seek(0)
110
-
111
- # Encode the buffer to a base64 string
112
- image_base64 = base64.b64encode(buf.read()).decode('utf-8')
113
-
114
- # Print the data URI
115
- print(f"data:image/png;base64,{image_base64}")
116
- # --- END PLOT RECIPE ---
117
- ```
118
  """
 
119
  user_prompt = f"{df_info}\n\nPlease write a Python script to answer the following questions:\n\n{questions_text}"
120
 
121
  try:
 
75
  df_head = df.head().to_string()
76
  df_info = f"Here is the head of the pandas DataFrame, named 'df':\n{df_head}"
77
 
78
+ # system_prompt = """
79
+ # You are an AI data analyst. Your ONLY task is to write a Python script that operates on a pre-existing pandas DataFrame named `df`.
80
+
81
+ # **URGENT AND CRITICAL INSTRUCTION:**
82
+ # DO NOT write any code to read or load data (e.g., from a URL or file). The DataFrame `df` is ALREADY in memory. Start your script as if `df` is already defined.
83
+
84
+ # **Your script MUST:**
85
+ # 1. Perform data cleaning on the `df` DataFrame. Columns that look like numbers may be strings with '$' or ',' symbols.
86
+ # 2. For EACH question the user asks, you MUST `print()` the final answer.
87
+ # 3. Your entire output must be ONLY the raw Python code. No markdown, no comments, no explanations.
88
+ # """
89
+ system_prompt = """
90
+ You are a Python script generator. Your only output is code.
91
+ A pandas DataFrame named `df` and the following libraries are pre-loaded: `pd`, `re`, `plt`, `sns`, `np`, `io`, `base64`, `LinearRegression`.
92
+
93
+ **CRITICAL:**
94
+ - DO NOT import any libraries.
95
+ - DO NOT load any data.
96
+ - Write a script that cleans the `df` DataFrame and then prints the answers to the user's questions.
97
+ - For plots, print a base64 data URI using the provided recipe.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98
  """
99
+
100
  user_prompt = f"{df_info}\n\nPlease write a Python script to answer the following questions:\n\n{questions_text}"
101
 
102
  try: