Tom Claude commited on
Commit
2a10e9c
·
1 Parent(s): df042c8

Add Datawrapper chart generation mode with clean iframe display

Browse files

- Add Chart Generation Mode with CSV upload and AI-powered chart creation
- Integrate Datawrapper API via custom MCP handlers for create, publish, and retrieve operations
- Implement RAG-powered chart type selection and configuration
- Display charts as embedded iframes with reasoning and edit button
- Clean up debug output for production-ready UI
- Update README with dual-mode functionality

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

.env.example CHANGED
@@ -21,6 +21,13 @@ HF_TOKEN=hf_your_token_here
21
  # This is used for Jina-CLIP-v2 embeddings
22
  JINA_API_KEY=jina_your_token_here
23
 
 
 
 
 
 
 
 
24
  # =============================================================================
25
  # OPTIONAL: LLM Configuration
26
  # =============================================================================
 
21
  # This is used for Jina-CLIP-v2 embeddings
22
  JINA_API_KEY=jina_your_token_here
23
 
24
+ # =============================================================================
25
+ # REQUIRED: Datawrapper API Token
26
+ # =============================================================================
27
+ # Get your token from: https://app.datawrapper.de/account/api-tokens
28
+ # This is used for creating and publishing charts via Datawrapper API
29
+ DATAWRAPPER_ACCESS_TOKEN=your_datawrapper_token_here
30
+
31
  # =============================================================================
32
  # OPTIONAL: LLM Configuration
33
  # =============================================================================
README.md CHANGED
@@ -7,278 +7,66 @@ sdk: gradio
7
  sdk_version: 5.49.1
8
  app_file: app.py
9
  pinned: false
10
- short_description: AI assistant for visualization guidance and design
11
  license: mit
12
  ---
13
 
14
- # 📊 Graphics Guide / Design Assistant
15
 
16
- A RAG-powered AI assistant that helps users select appropriate visualizations and provides technical implementation guidance for creating effective information graphics. Built with Supabase PGVector and Hugging Face Inference Providers, powered by a knowledge base of graphics research and design principles.
17
 
18
- ## Features
 
19
 
20
- - **🎯 Design Recommendations**: Get tailored visualization suggestions based on your intent and data characteristics
21
- - **📚 Research-Backed Guidance**: Access insights from academic papers and design best practices
22
- - **🔍 Context-Aware Retrieval**: Semantic search finds the most relevant examples and knowledge for your needs
23
- - **🚀 API Access**: Built-in REST API for integration with external applications
24
- - **💬 Chat Interface**: User-friendly conversational interface
25
- - **⚡ Technical Implementation**: Practical guidance on tools, techniques, and code examples
26
 
27
- ## 🏗️ Architecture
 
 
 
 
28
 
29
- ```
30
- ┌──────────────────────────────────────┐
31
- │ Gradio UI + API Endpoints │
32
- └──────────────┬───────────────────────┘
33
-
34
- ┌──────────────▼───────────────────────┐
35
- │ RAG Pipeline │
36
- │ • Query Understanding │
37
- │ • Document Retrieval (PGVector) │
38
- │ • Response Generation (LLM) │
39
- └──────────────┬───────────────────────┘
40
-
41
- ┌──────────┴──────────┐
42
- │ │
43
- ┌───▼───────────┐ ┌─────▼────────────┐
44
- │ Supabase │ │ HF Inference │
45
- │ PGVector DB │ │ Providers │
46
- │ (198 docs) │ │ (Llama 3.1) │
47
- └───────────────┘ └──────────────────┘
48
- ```
49
 
50
- ## 🚀 Quick Start
51
 
52
- ### Local Development
53
-
54
- 1. **Clone the repository**
55
- ```bash
56
- git clone <your-repo-url>
57
- cd graphics-llm
58
- ```
59
-
60
- 2. **Install dependencies**
61
  ```bash
62
  pip install -r requirements.txt
63
  ```
64
 
65
- 3. **Set up environment variables**
66
  ```bash
67
  cp .env.example .env
68
- # Edit .env with your credentials
69
  ```
70
 
71
- Required variables:
72
- - `SUPABASE_URL`: Your Supabase project URL
73
- - `SUPABASE_KEY`: Your Supabase anon key
74
- - `HF_TOKEN`: Your Hugging Face API token (for LLM generation)
75
- - `JINA_API_KEY`: Your Jina AI API token (for embeddings)
76
 
77
- 4. **Run the application**
78
  ```bash
79
  python app.py
80
  ```
81
 
82
- The app will be available at `http://localhost:7860`
83
-
84
- ### Hugging Face Spaces Deployment
85
-
86
- 1. **Create a new Space** on Hugging Face
87
- 2. **Push this repository** to your Space
88
- 3. **Set environment variables** in Space settings:
89
- - `SUPABASE_URL`
90
- - `SUPABASE_KEY`
91
- - `HF_TOKEN`
92
- - `JINA_API_KEY`
93
- 4. **Deploy** - The Space will automatically build and launch
94
-
95
- ## 📚 Usage
96
-
97
- ### Chat Interface
98
-
99
- Simply ask your design questions:
100
-
101
- ```
102
- "What's the best chart type for showing trends over time?"
103
- "How do I create an effective infographic for complex data?"
104
- "What are best practices for data visualization accessibility?"
105
- ```
106
-
107
- The assistant will provide:
108
- 1. Design recommendations based on your intent
109
- 2. WHY each visualization type is suitable
110
- 3. HOW to implement it (tools, techniques, code)
111
- 4. Best practices from research and examples
112
- 5. Accessibility and effectiveness considerations
113
-
114
- ### API Access
115
-
116
- This app automatically exposes REST API endpoints for external integration.
117
-
118
- **Python Client:**
119
-
120
- ```python
121
- from gradio_client import Client
122
-
123
- client = Client("your-space-url")
124
- result = client.predict(
125
- "What's the best chart for time series?",
126
- api_name="/recommend"
127
- )
128
- print(result)
129
- ```
130
-
131
- **JavaScript Client:**
132
-
133
- ```javascript
134
- import { Client } from "@gradio/client";
135
-
136
- const client = await Client.connect("your-space-url");
137
- const result = await client.predict("/recommend", {
138
- message: "What's the best chart for time series?"
139
- });
140
- console.log(result.data);
141
- ```
142
-
143
- **cURL:**
144
-
145
- ```bash
146
- curl -X POST "https://your-space.hf.space/call/recommend" \
147
- -H "Content-Type: application/json" \
148
- -d '{"data": ["What's the best chart for time series?"]}'
149
- ```
150
-
151
- **Available Endpoints:**
152
- - `/call/recommend` - Main design recommendation assistant
153
- - `/gradio_api/openapi.json` - OpenAPI specification
154
-
155
- ## 🗄️ Database
156
-
157
- The app uses Supabase with PGVector extension to store and retrieve document chunks from graphics research and examples.
158
-
159
- **Database Schema:**
160
- ```sql
161
- CREATE TABLE document_embeddings (
162
- id BIGINT PRIMARY KEY,
163
- source_type TEXT, -- pdf, url, or image
164
- source_id TEXT, -- filename or URL
165
- title TEXT,
166
- content_type TEXT, -- text or image
167
- chunk_index INTEGER,
168
- chunk_text TEXT,
169
- page_number INTEGER,
170
- embedding VECTOR(1024), -- 1024-dimensional vectors
171
- metadata JSONB,
172
- word_count INTEGER,
173
- image_metadata JSONB,
174
- created_at TIMESTAMPTZ
175
- );
176
- ```
177
-
178
- **Knowledge Base Content:**
179
- - Research papers on data visualization
180
- - Design principles and best practices
181
- - Visual narrative techniques
182
- - Accessibility guidelines
183
- - Chart type selection guidance
184
- - Real-world examples and case studies
185
-
186
- ## 🛠️ Technology Stack
187
-
188
- - **UI/API**: [Gradio](https://gradio.app/) - Automatic API generation
189
- - **Vector Database**: [Supabase](https://supabase.com/) with PGVector extension
190
- - **Embeddings**: Jina-CLIP-v2 (1024-dimensional)
191
- - **LLM**: [Hugging Face Inference Providers](https://huggingface.co/docs/inference-providers/) - Llama 3.1
192
- - **Language**: Python 3.9+
193
-
194
- ## 📁 Project Structure
195
-
196
- ```
197
- graphics-llm/
198
- ├── app.py # Main Gradio application
199
- ├── requirements.txt # Python dependencies
200
- ├── .env.example # Environment variables template
201
- ├── README.md # This file
202
- └── src/
203
- ├── __init__.py
204
- ├── vectorstore.py # Supabase PGVector connection
205
- ├── rag_pipeline.py # RAG pipeline logic
206
- ├── llm_client.py # Inference Provider client
207
- └── prompts.py # Design recommendation prompt templates
208
- ```
209
-
210
- ## ⚙️ Configuration
211
-
212
- ### Environment Variables
213
-
214
- See `.env.example` for all available configuration options.
215
-
216
- **Required:**
217
- - `SUPABASE_URL` - Supabase project URL
218
- - `SUPABASE_KEY` - Supabase anon key
219
- - `HF_TOKEN` - Hugging Face API token (for LLM generation)
220
- - `JINA_API_KEY` - Jina AI API token (for Jina-CLIP-v2 embeddings)
221
-
222
- **Optional:**
223
- - `LLM_MODEL` - Model to use (default: meta-llama/Llama-3.1-8B-Instruct)
224
- - `LLM_TEMPERATURE` - Generation temperature (default: 0.2)
225
- - `LLM_MAX_TOKENS` - Max tokens to generate (default: 2000)
226
- - `RETRIEVAL_K` - Number of documents to retrieve (default: 5)
227
- - `EMBEDDING_MODEL` - Embedding model (default: jina-clip-v2)
228
-
229
- ### Supported LLM Models
230
-
231
- - `meta-llama/Llama-3.1-8B-Instruct` (recommended)
232
- - `meta-llama/Meta-Llama-3-8B-Instruct`
233
- - `Qwen/Qwen2.5-72B-Instruct`
234
- - `mistralai/Mistral-7B-Instruct-v0.3`
235
-
236
- ## 💰 Cost Considerations
237
-
238
- ### Hugging Face Inference Providers
239
- - Free tier: $0.10/month credits
240
- - PRO tier: $2.00/month credits + pay-as-you-go
241
- - Typical cost: ~$0.001-0.01 per query
242
- - Recommended budget: $10-50/month for moderate usage
243
-
244
- ### Supabase
245
- - Free tier sufficient for most use cases
246
- - PGVector operations are standard database queries
247
-
248
- ### Hugging Face Spaces
249
- - Free CPU hosting available
250
- - GPU upgrade: ~$0.60/hour (optional, not required)
251
-
252
- ## 🔮 Future Enhancements
253
-
254
- - [ ] Multi-turn conversation with memory
255
- - [ ] Code generation for visualization implementations
256
- - [ ] Interactive visualization previews
257
- - [ ] User-uploaded data analysis
258
- - [ ] Export recommendations as PDF/markdown
259
- - [ ] Community-contributed examples
260
- - [ ] Support for more design domains (UI/UX, print graphics)
261
-
262
- ## 🤝 Contributing
263
-
264
- Contributions are welcome! Please feel free to submit issues or pull requests.
265
-
266
- ## 📄 License
267
-
268
- MIT License - See LICENSE file for details
269
-
270
- ## 🙏 Acknowledgments
271
 
272
- - Knowledge base includes research papers on data visualization and information design
273
- - Built to support designers, journalists, and data practitioners
 
 
 
274
 
275
- ## 📞 Support
276
 
277
- For issues or questions:
278
- - Open an issue on GitHub
279
- - Check the [Hugging Face Spaces documentation](https://huggingface.co/docs/hub/spaces)
280
- - Review the [Gradio documentation](https://gradio.app/docs/)
281
 
282
  ---
283
 
284
- Built with ❤️ for the design and visualization community
 
7
  sdk_version: 5.49.1
8
  app_file: app.py
9
  pinned: false
10
+ short_description: AI assistant for visualization guidance and chart generation
11
  license: mit
12
  ---
13
 
14
+ # 📊 Viz LLM
15
 
16
+ AI-powered data visualization assistant with two modes:
17
 
18
+ - **💡 Ideation Mode**: Get design recommendations based on research and best practices
19
+ - **📊 Chart Generation Mode**: Upload CSV data and automatically generate publication-ready charts
20
 
21
+ ## Features
 
 
 
 
 
22
 
23
+ **Ideation Mode:**
24
+ - Research-backed visualization guidance
25
+ - Chart type recommendations
26
+ - Design best practices and accessibility advice
27
+ - Powered by RAG with Jina-CLIP-v2 embeddings
28
 
29
+ **Chart Generation Mode:**
30
+ - Upload CSV data
31
+ - AI analyzes your data and selects optimal chart type
32
+ - Automatic chart creation via Datawrapper API
33
+ - Publication-ready visualizations with one click
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
+ ## Quick Start
36
 
37
+ 1. **Install dependencies:**
 
 
 
 
 
 
 
 
38
  ```bash
39
  pip install -r requirements.txt
40
  ```
41
 
42
+ 2. **Set up environment variables:**
43
  ```bash
44
  cp .env.example .env
 
45
  ```
46
 
47
+ Required:
48
+ - `SUPABASE_URL` - Your Supabase project URL
49
+ - `SUPABASE_KEY` - Your Supabase anon key
50
+ - `HF_TOKEN` - Hugging Face API token
51
+ - `DATAWRAPPER_ACCESS_TOKEN` - Datawrapper API token
52
 
53
+ 3. **Run the app:**
54
  ```bash
55
  python app.py
56
  ```
57
 
58
+ ## Technology Stack
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
60
+ - **UI**: Gradio
61
+ - **Vector Database**: Supabase PGVector
62
+ - **Embeddings**: Jina-CLIP-v2
63
+ - **LLM**: Llama 3.1 via Hugging Face Inference Providers
64
+ - **Charts**: Datawrapper API
65
 
66
+ ## License
67
 
68
+ MIT License
 
 
 
69
 
70
  ---
71
 
72
+ Built for the data visualization community
app.py CHANGED
@@ -3,12 +3,17 @@ Viz LLM - Gradio App
3
 
4
  A RAG-powered assistant for data visualization guidance, powered by Jina-CLIP-v2
5
  embeddings and research from the field of information graphics.
 
 
6
  """
7
 
8
  import os
 
 
9
  import gradio as gr
10
  from dotenv import load_dotenv
11
  from src.rag_pipeline import create_pipeline
 
12
  from datetime import datetime, timedelta
13
  from collections import defaultdict
14
 
@@ -90,7 +95,94 @@ def recommend_stream(message: str, history: list, request: gr.Request):
90
  yield f"Error generating response: {str(e)}\n\nPlease check your environment variables (HF_TOKEN, SUPABASE_URL, SUPABASE_KEY) and try again."
91
 
92
 
93
- # Minimal CSS to fix UI artifacts
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
  custom_css = """
95
  /* Hide retry/undo buttons that appear as artifacts */
96
  .chatbot button[aria-label="Retry"],
@@ -102,9 +194,16 @@ custom_css = """
102
  textarea[data-testid="textbox"] {
103
  overflow-y: hidden !important;
104
  }
 
 
 
 
 
 
 
105
  """
106
 
107
- # Create Gradio interface
108
  with gr.Blocks(
109
  title="Viz LLM",
110
  css=custom_css
@@ -112,29 +211,95 @@ with gr.Blocks(
112
  gr.Markdown("""
113
  # 📊 Viz LLM
114
 
115
- Get design recommendations for creating effective data visualizations based on research and best practices.
116
  """)
117
 
118
- # Main chat interface
119
- chatbot = gr.ChatInterface(
120
- fn=recommend_stream,
121
- type="messages",
122
- examples=[
123
- "What's the best chart type for showing trends over time?",
124
- "How do I create an effective infographic for complex data?",
125
- "What are best practices for data visualization accessibility?",
126
- "How should I design a dashboard for storytelling?",
127
- "What visualization works best for comparing categories?"
128
- ],
129
- cache_examples=False,
130
- api_name="recommend"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
131
  )
132
 
133
- # Knowledge base section (below chat interface)
 
 
 
 
 
 
 
 
 
 
 
 
 
134
  gr.Markdown("""
135
- ### Knowledge Base
 
 
136
 
137
- This assistant draws on research papers, design principles, and examples from the field of information graphics and data visualization.
138
 
139
  **Credits:** Special thanks to the researchers whose work informed this model: Robert Kosara, Edward Segel, Jeffrey Heer, Matthew Conlen, John Maeda, Kennedy Elliott, Scott McCloud, and many others.
140
 
@@ -143,19 +308,21 @@ with gr.Blocks(
143
  **Usage Limits:** This service is limited to 20 queries per day per user to manage costs. Responses are optimized for English.
144
 
145
  <div style="text-align: center; margin-top: 20px; opacity: 0.6; font-size: 0.9em;">
146
- Embeddings: Jina-CLIP-v2
147
  </div>
148
  """)
149
 
150
  # Launch configuration
151
  if __name__ == "__main__":
152
  # Check for required environment variables
153
- required_vars = ["SUPABASE_URL", "SUPABASE_KEY", "HF_TOKEN"]
154
  missing_vars = [var for var in required_vars if not os.getenv(var)]
155
 
156
  if missing_vars:
157
  print(f"⚠️ Warning: Missing environment variables: {', '.join(missing_vars)}")
158
  print("Please set these in your .env file or as environment variables")
 
 
159
 
160
  # Launch the app
161
  demo.launch(
 
3
 
4
  A RAG-powered assistant for data visualization guidance, powered by Jina-CLIP-v2
5
  embeddings and research from the field of information graphics.
6
+
7
+ Now with Datawrapper integration for chart generation!
8
  """
9
 
10
  import os
11
+ import asyncio
12
+ import pandas as pd
13
  import gradio as gr
14
  from dotenv import load_dotenv
15
  from src.rag_pipeline import create_pipeline
16
+ from src.datawrapper_client import create_and_publish_chart, get_iframe_html
17
  from datetime import datetime, timedelta
18
  from collections import defaultdict
19
 
 
95
  yield f"Error generating response: {str(e)}\n\nPlease check your environment variables (HF_TOKEN, SUPABASE_URL, SUPABASE_KEY) and try again."
96
 
97
 
98
+ def generate_chart_from_csv(csv_file, user_prompt):
99
+ """
100
+ Generate a Datawrapper chart from uploaded CSV and user prompt.
101
+
102
+ Args:
103
+ csv_file: Uploaded CSV file
104
+ user_prompt: User's description of the chart
105
+
106
+ Returns:
107
+ HTML string with iframe or error message
108
+ """
109
+ if not csv_file:
110
+ return "<div style='padding: 50px; text-align: center;'>Please upload a CSV file to generate a chart.</div>"
111
+
112
+ if not user_prompt or user_prompt.strip() == "":
113
+ return "<div style='padding: 50px; text-align: center;'>Please describe what chart you want to create.</div>"
114
+
115
+ try:
116
+ # Show loading message
117
+ loading_html = """
118
+ <div style='padding: 100px; text-align: center;'>
119
+ <h3>🎨 Creating your chart...</h3>
120
+ <p>Analyzing your data and selecting the best visualization...</p>
121
+ </div>
122
+ """
123
+
124
+ # Read CSV file
125
+ df = pd.read_csv(csv_file)
126
+
127
+ # Create and publish chart (async function, need to run in event loop)
128
+ loop = asyncio.new_event_loop()
129
+ asyncio.set_event_loop(loop)
130
+ result = loop.run_until_complete(
131
+ create_and_publish_chart(df, user_prompt, pipeline)
132
+ )
133
+ loop.close()
134
+
135
+ if result.get("success"):
136
+ # Get the iframe HTML
137
+ iframe_html = get_iframe_html(result.get('public_url'), height=500)
138
+
139
+ # Create HTML with iframe, reasoning, and edit button
140
+ chart_html = f"""
141
+ <div style='padding: 20px;'>
142
+ <!-- Chart iframe -->
143
+ <div style='margin-bottom: 20px;'>
144
+ {iframe_html}
145
+ </div>
146
+
147
+ <!-- Why this chart? -->
148
+ <div style='background: #f9f9f9; padding: 15px; border-radius: 5px; margin-bottom: 15px;'>
149
+ <strong>Why this chart?</strong><br>
150
+ <p style='margin: 10px 0 0 0;'>{result['reasoning']}</p>
151
+ </div>
152
+
153
+ <!-- Edit button -->
154
+ <div>
155
+ <a href="{result['edit_url']}" target="_blank"
156
+ style="display: inline-block; padding: 12px 24px; background: #1976d2; color: white;
157
+ text-decoration: none; border-radius: 5px; font-weight: bold;">
158
+ ✏️ Open in Datawrapper
159
+ </a>
160
+ </div>
161
+ </div>
162
+ """
163
+
164
+ return chart_html
165
+ else:
166
+ error_msg = result.get("error", "Unknown error")
167
+ return f"""
168
+ <div style='padding: 50px; text-align: center; color: red;'>
169
+ <h3>❌ Chart Generation Failed</h3>
170
+ <p>{error_msg}</p>
171
+ <p style='font-size: 0.9em; color: #666;'>Please check your CSV format and try again.</p>
172
+ </div>
173
+ """
174
+
175
+ except Exception as e:
176
+ return f"""
177
+ <div style='padding: 50px; text-align: center; color: red;'>
178
+ <h3>❌ Error</h3>
179
+ <p>{str(e)}</p>
180
+ <p style='font-size: 0.9em; color: #666;'>Please ensure your CSV is properly formatted and try again.</p>
181
+ </div>
182
+ """
183
+
184
+
185
+ # Minimal CSS to fix UI artifacts and style the mode selector
186
  custom_css = """
187
  /* Hide retry/undo buttons that appear as artifacts */
188
  .chatbot button[aria-label="Retry"],
 
194
  textarea[data-testid="textbox"] {
195
  overflow-y: hidden !important;
196
  }
197
+
198
+ /* Mode selector buttons */
199
+ .mode-button {
200
+ font-size: 1.1em;
201
+ padding: 12px 24px;
202
+ margin: 5px;
203
+ }
204
  """
205
 
206
+ # Create Gradio interface with dual-mode layout
207
  with gr.Blocks(
208
  title="Viz LLM",
209
  css=custom_css
 
211
  gr.Markdown("""
212
  # 📊 Viz LLM
213
 
214
+ Get design recommendations or generate charts with AI-powered data visualization assistance.
215
  """)
216
 
217
+ # Mode selector buttons
218
+ with gr.Row():
219
+ ideation_btn = gr.Button("💡 Ideation Mode", variant="primary", elem_classes="mode-button")
220
+ chart_gen_btn = gr.Button("📊 Chart Generation Mode", variant="secondary", elem_classes="mode-button")
221
+
222
+ # Ideation Mode: Chat interface (shown by default, wrapped in Column)
223
+ with gr.Column(visible=True) as ideation_container:
224
+ ideation_interface = gr.ChatInterface(
225
+ fn=recommend_stream,
226
+ type="messages",
227
+ examples=[
228
+ "What's the best chart type for showing trends over time?",
229
+ "How do I create an effective infographic for complex data?",
230
+ "What are best practices for data visualization accessibility?",
231
+ "How should I design a dashboard for storytelling?",
232
+ "What visualization works best for comparing categories?"
233
+ ],
234
+ cache_examples=False,
235
+ api_name="recommend"
236
+ )
237
+
238
+ # Chart Generation Mode: Chart controls and output (hidden by default)
239
+ with gr.Column(visible=False) as chart_gen_container:
240
+ csv_upload = gr.File(
241
+ label="📁 Upload CSV File",
242
+ file_types=[".csv"],
243
+ type="filepath"
244
+ )
245
+
246
+ chart_prompt_input = gr.Textbox(
247
+ label="Describe your chart",
248
+ placeholder="E.g., 'Show sales trends over time' or 'Compare revenue by category'",
249
+ lines=2
250
+ )
251
+
252
+ generate_chart_btn = gr.Button("Generate Chart", variant="primary", size="lg")
253
+
254
+ chart_output = gr.HTML(
255
+ value="<div style='text-align:center; padding:100px; color: #666;'>Upload a CSV file and describe your visualization above, then click Generate Chart.</div>",
256
+ label="Generated Chart"
257
+ )
258
+
259
+ # Mode switching functions
260
+ def switch_to_ideation():
261
+ return [
262
+ gr.update(variant="primary"), # ideation_btn
263
+ gr.update(variant="secondary"), # chart_gen_btn
264
+ gr.update(visible=True), # ideation_container
265
+ gr.update(visible=False), # chart_gen_container
266
+ ]
267
+
268
+ def switch_to_chart_gen():
269
+ return [
270
+ gr.update(variant="secondary"), # ideation_btn
271
+ gr.update(variant="primary"), # chart_gen_btn
272
+ gr.update(visible=False), # ideation_container
273
+ gr.update(visible=True), # chart_gen_container
274
+ ]
275
+
276
+ # Wire up mode switching
277
+ ideation_btn.click(
278
+ fn=switch_to_ideation,
279
+ inputs=[],
280
+ outputs=[ideation_btn, chart_gen_btn, ideation_container, chart_gen_container]
281
  )
282
 
283
+ chart_gen_btn.click(
284
+ fn=switch_to_chart_gen,
285
+ inputs=[],
286
+ outputs=[ideation_btn, chart_gen_btn, ideation_container, chart_gen_container]
287
+ )
288
+
289
+ # Generate chart when button is clicked
290
+ generate_chart_btn.click(
291
+ fn=generate_chart_from_csv,
292
+ inputs=[csv_upload, chart_prompt_input],
293
+ outputs=[chart_output]
294
+ )
295
+
296
+ # Knowledge base section (below both interfaces)
297
  gr.Markdown("""
298
+ ### About Viz LLM
299
+
300
+ **Ideation Mode:** Get design recommendations based on research papers, design principles, and examples from the field of information graphics and data visualization.
301
 
302
+ **Chart Generation Mode:** Upload your CSV data and describe your visualization goal. The AI will analyze your data, select the optimal chart type, and generate a publication-ready chart using Datawrapper.
303
 
304
  **Credits:** Special thanks to the researchers whose work informed this model: Robert Kosara, Edward Segel, Jeffrey Heer, Matthew Conlen, John Maeda, Kennedy Elliott, Scott McCloud, and many others.
305
 
 
308
  **Usage Limits:** This service is limited to 20 queries per day per user to manage costs. Responses are optimized for English.
309
 
310
  <div style="text-align: center; margin-top: 20px; opacity: 0.6; font-size: 0.9em;">
311
+ Embeddings: Jina-CLIP-v2 | Charts: Datawrapper API
312
  </div>
313
  """)
314
 
315
  # Launch configuration
316
  if __name__ == "__main__":
317
  # Check for required environment variables
318
+ required_vars = ["SUPABASE_URL", "SUPABASE_KEY", "HF_TOKEN", "DATAWRAPPER_ACCESS_TOKEN"]
319
  missing_vars = [var for var in required_vars if not os.getenv(var)]
320
 
321
  if missing_vars:
322
  print(f"⚠️ Warning: Missing environment variables: {', '.join(missing_vars)}")
323
  print("Please set these in your .env file or as environment variables")
324
+ if "DATAWRAPPER_ACCESS_TOKEN" in missing_vars:
325
+ print("Note: DATAWRAPPER_ACCESS_TOKEN is required for chart generation mode")
326
 
327
  # Launch the app
328
  demo.launch(
datawrapper_mcp/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """A Model Context Protocol server for creating Datawrapper charts."""
datawrapper_mcp/config.py ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Configuration and constants for the Datawrapper MCP server."""
2
+
3
+ from datawrapper import (
4
+ AreaChart,
5
+ ArrowChart,
6
+ BarChart,
7
+ ColumnChart,
8
+ LineChart,
9
+ MultipleColumnChart,
10
+ ScatterPlot,
11
+ StackedBarChart,
12
+ )
13
+
14
+ # Map of chart type names to their Pydantic classes
15
+ CHART_CLASSES = {
16
+ "bar": BarChart,
17
+ "line": LineChart,
18
+ "area": AreaChart,
19
+ "arrow": ArrowChart,
20
+ "column": ColumnChart,
21
+ "multiple_column": MultipleColumnChart,
22
+ "scatter": ScatterPlot,
23
+ "stacked_bar": StackedBarChart,
24
+ }
datawrapper_mcp/handlers/__init__.py ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Handler functions for MCP tool implementations."""
2
+
3
+ from .create import create_chart
4
+ from .delete import delete_chart
5
+ from .export import export_chart_png
6
+ from .publish import publish_chart
7
+ from .retrieve import get_chart_info
8
+ from .schema import get_chart_schema
9
+ from .update import update_chart
10
+
11
+ __all__ = [
12
+ "create_chart",
13
+ "delete_chart",
14
+ "export_chart_png",
15
+ "get_chart_info",
16
+ "get_chart_schema",
17
+ "publish_chart",
18
+ "update_chart",
19
+ ]
datawrapper_mcp/handlers/create.py ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Handler for creating Datawrapper charts."""
2
+
3
+ import json
4
+
5
+ from mcp.types import TextContent
6
+
7
+ from ..config import CHART_CLASSES
8
+ from ..utils import get_api_token, json_to_dataframe
9
+
10
+
11
+ async def create_chart(arguments: dict) -> list[TextContent]:
12
+ """Create a chart with full Pydantic model configuration."""
13
+ api_token = get_api_token()
14
+
15
+ # Convert data to DataFrame
16
+ df = json_to_dataframe(arguments["data"])
17
+
18
+ # Get chart class and validate config
19
+ chart_type = arguments["chart_type"]
20
+ chart_class = CHART_CLASSES[chart_type]
21
+
22
+ # Validate and create chart using Pydantic model
23
+ try:
24
+ chart = chart_class.model_validate(arguments["chart_config"])
25
+ except Exception as e:
26
+ return [
27
+ TextContent(
28
+ type="text",
29
+ text=f"Invalid chart configuration: {str(e)}\n\n"
30
+ f"Use get_chart_schema with chart_type '{chart_type}' "
31
+ f"to see the valid schema.",
32
+ )
33
+ ]
34
+
35
+ # Set data on chart instance
36
+ chart.data = df
37
+
38
+ # Create chart using Pydantic instance method
39
+ chart.create(access_token=api_token)
40
+
41
+ result = {
42
+ "chart_id": chart.chart_id,
43
+ "chart_type": chart_type,
44
+ "title": chart.title,
45
+ "edit_url": chart.get_editor_url(),
46
+ "message": (
47
+ f"Chart created successfully! Edit it at: {chart.get_editor_url()}\n"
48
+ f"Use publish_chart with chart_id '{chart.chart_id}' to make it public."
49
+ ),
50
+ }
51
+
52
+ return [TextContent(type="text", text=json.dumps(result, indent=2))]
datawrapper_mcp/handlers/delete.py ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Handler for deleting Datawrapper charts."""
2
+
3
+ import json
4
+
5
+ from datawrapper import get_chart
6
+ from mcp.types import TextContent
7
+
8
+ from ..utils import get_api_token
9
+
10
+
11
+ async def delete_chart(arguments: dict) -> list[TextContent]:
12
+ """Delete a chart permanently."""
13
+ api_token = get_api_token()
14
+ chart_id = arguments["chart_id"]
15
+
16
+ # Get chart and delete using Pydantic instance method
17
+ chart = get_chart(chart_id, access_token=api_token)
18
+ chart.delete(access_token=api_token)
19
+
20
+ result = {
21
+ "chart_id": chart_id,
22
+ "message": "Chart deleted successfully!",
23
+ }
24
+
25
+ return [TextContent(type="text", text=json.dumps(result, indent=2))]
datawrapper_mcp/handlers/export.py ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Handler for exporting Datawrapper charts."""
2
+
3
+ import base64
4
+
5
+ from datawrapper import get_chart
6
+ from mcp.types import ImageContent
7
+
8
+ from ..utils import get_api_token
9
+
10
+
11
+ async def export_chart_png(arguments: dict) -> list[ImageContent]:
12
+ """Export a chart as PNG and return it as inline image."""
13
+ api_token = get_api_token()
14
+ chart_id = arguments["chart_id"]
15
+
16
+ # Get chart using factory function
17
+ chart = get_chart(chart_id, access_token=api_token)
18
+
19
+ # Build export parameters
20
+ export_params = {}
21
+ if "width" in arguments:
22
+ export_params["width"] = arguments["width"]
23
+ if "height" in arguments:
24
+ export_params["height"] = arguments["height"]
25
+ if "plain" in arguments:
26
+ export_params["plain"] = arguments["plain"]
27
+ if "zoom" in arguments:
28
+ export_params["zoom"] = arguments["zoom"]
29
+ if "transparent" in arguments:
30
+ export_params["transparent"] = arguments["transparent"]
31
+ if "border_width" in arguments:
32
+ export_params["borderWidth"] = arguments["border_width"]
33
+ if "border_color" in arguments:
34
+ export_params["borderColor"] = arguments["border_color"]
35
+
36
+ # Export PNG using Pydantic instance method
37
+ png_bytes = chart.export_png(access_token=api_token, **export_params)
38
+
39
+ # Encode to base64
40
+ base64_data = base64.b64encode(png_bytes).decode("utf-8")
41
+
42
+ return [
43
+ ImageContent(
44
+ type="image",
45
+ data=base64_data,
46
+ mimeType="image/png",
47
+ )
48
+ ]
datawrapper_mcp/handlers/publish.py ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Handler for publishing Datawrapper charts."""
2
+
3
+ import json
4
+
5
+ from datawrapper import get_chart
6
+ from mcp.types import TextContent
7
+
8
+ from ..utils import get_api_token
9
+
10
+
11
+ async def publish_chart(arguments: dict) -> list[TextContent]:
12
+ """Publish a chart to make it publicly accessible."""
13
+ api_token = get_api_token()
14
+ chart_id = arguments["chart_id"]
15
+
16
+ # Get chart and publish using Pydantic instance method
17
+ chart = get_chart(chart_id, access_token=api_token)
18
+ chart.publish(access_token=api_token)
19
+
20
+ result = {
21
+ "chart_id": chart_id,
22
+ "public_url": chart.get_public_url(),
23
+ "message": "Chart published successfully!",
24
+ }
25
+
26
+ return [TextContent(type="text", text=json.dumps(result, indent=2))]
datawrapper_mcp/handlers/retrieve.py ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Handler for retrieving chart information."""
2
+
3
+ import json
4
+
5
+ from datawrapper import get_chart
6
+ from mcp.types import TextContent
7
+
8
+ from ..utils import get_api_token
9
+
10
+
11
+ async def get_chart_info(arguments: dict) -> list[TextContent]:
12
+ """Get information about an existing chart."""
13
+ api_token = get_api_token()
14
+ chart_id = arguments["chart_id"]
15
+
16
+ # Get chart using factory function
17
+ chart = get_chart(chart_id, access_token=api_token)
18
+
19
+ result = {
20
+ "chart_id": chart.chart_id,
21
+ "title": chart.title,
22
+ "type": chart.chart_type,
23
+ "public_url": chart.get_public_url(),
24
+ "edit_url": chart.get_editor_url(),
25
+ }
26
+
27
+ return [TextContent(type="text", text=json.dumps(result, indent=2))]
datawrapper_mcp/handlers/schema.py ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Handler for retrieving chart schemas."""
2
+
3
+ import json
4
+
5
+ from mcp.types import TextContent
6
+
7
+ from ..config import CHART_CLASSES
8
+
9
+
10
+ async def get_chart_schema(arguments: dict) -> list[TextContent]:
11
+ """Get the Pydantic schema for a chart type."""
12
+ chart_type = arguments["chart_type"]
13
+ chart_class = CHART_CLASSES[chart_type]
14
+
15
+ schema = chart_class.model_json_schema()
16
+
17
+ # Remove examples that contain DataFrames (not JSON serializable)
18
+ if "examples" in schema:
19
+ del schema["examples"]
20
+
21
+ result = {
22
+ "chart_type": chart_type,
23
+ "class_name": chart_class.__name__,
24
+ "schema": schema,
25
+ "usage": (
26
+ "Use this schema to construct a chart_config dict for create_chart_advanced. "
27
+ "The schema shows all available properties, their types, and descriptions."
28
+ ),
29
+ }
30
+
31
+ return [TextContent(type="text", text=json.dumps(result, indent=2))]
datawrapper_mcp/handlers/update.py ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Handler for updating Datawrapper charts."""
2
+
3
+ import json
4
+
5
+ from datawrapper import get_chart
6
+ from mcp.types import TextContent
7
+
8
+ from ..utils import get_api_token, json_to_dataframe
9
+
10
+
11
+ async def update_chart(arguments: dict) -> list[TextContent]:
12
+ """Update an existing chart's data or configuration."""
13
+ api_token = get_api_token()
14
+ chart_id = arguments["chart_id"]
15
+
16
+ # Get chart using factory function - returns correct Pydantic class instance
17
+ chart = get_chart(chart_id, access_token=api_token)
18
+
19
+ # Update data if provided
20
+ if "data" in arguments:
21
+ df = json_to_dataframe(arguments["data"])
22
+ chart.data = df
23
+
24
+ # Update config if provided
25
+ if "chart_config" in arguments:
26
+ # Directly set attributes on the chart instance
27
+ # Pydantic will validate each assignment automatically due to validate_assignment=True
28
+ try:
29
+ # Build a mapping of aliases to field names
30
+ alias_to_field = {}
31
+ for field_name, field_info in chart.model_fields.items():
32
+ # Add the field name itself
33
+ alias_to_field[field_name] = field_name
34
+ # Add any aliases
35
+ if field_info.alias:
36
+ alias_to_field[field_info.alias] = field_name
37
+
38
+ for key, value in arguments["chart_config"].items():
39
+ # Convert alias to field name if needed
40
+ field_name = alias_to_field.get(key, key)
41
+ setattr(chart, field_name, value)
42
+ except Exception as e:
43
+ return [
44
+ TextContent(
45
+ type="text",
46
+ text=f"Invalid chart configuration: {str(e)}\n\n"
47
+ f"Use get_chart_schema to see the valid schema for this chart type. "
48
+ f"Only high-level Pydantic fields are accepted.",
49
+ )
50
+ ]
51
+
52
+ # Update using Pydantic instance method
53
+ chart.update(access_token=api_token)
54
+
55
+ result = {
56
+ "chart_id": chart.chart_id,
57
+ "message": "Chart updated successfully!",
58
+ "edit_url": chart.get_editor_url(),
59
+ }
60
+
61
+ return [TextContent(type="text", text=json.dumps(result, indent=2))]
datawrapper_mcp/server.py ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Main MCP server implementation for Datawrapper chart creation."""
2
+
3
+ import json
4
+ from typing import Any, Sequence
5
+
6
+ from mcp.server import Server
7
+ from mcp.types import ImageContent, Resource, TextContent
8
+ from pydantic import AnyUrl
9
+
10
+ from .config import CHART_CLASSES
11
+ from .handlers import (
12
+ create_chart,
13
+ delete_chart,
14
+ export_chart_png,
15
+ get_chart_info,
16
+ get_chart_schema,
17
+ publish_chart,
18
+ update_chart,
19
+ )
20
+ from .tools import list_tools as get_tool_list
21
+
22
+ # Initialize the MCP server
23
+ app = Server("datawrapper-mcp")
24
+
25
+
26
+ @app.list_resources()
27
+ async def list_resources() -> list[Resource]:
28
+ """List available resources."""
29
+ return [
30
+ Resource(
31
+ uri=AnyUrl("datawrapper://chart-types"),
32
+ name="Available Chart Types",
33
+ mimeType="application/json",
34
+ description="List of available Datawrapper chart types and their Pydantic schemas",
35
+ )
36
+ ]
37
+
38
+
39
+ @app.read_resource()
40
+ async def read_resource(uri: AnyUrl) -> str:
41
+ """Read a resource by URI."""
42
+ if str(uri) == "datawrapper://chart-types":
43
+ chart_info = {}
44
+ for name, chart_class in CHART_CLASSES.items():
45
+ chart_info[name] = {
46
+ "class_name": chart_class.__name__,
47
+ "schema": chart_class.model_json_schema(),
48
+ }
49
+ return json.dumps(chart_info, indent=2)
50
+
51
+ raise ValueError(f"Unknown resource URI: {uri}")
52
+
53
+
54
+ @app.list_tools()
55
+ async def list_tools():
56
+ """List available tools."""
57
+ return await get_tool_list()
58
+
59
+
60
+ @app.call_tool()
61
+ async def call_tool(name: str, arguments: Any) -> Sequence[TextContent | ImageContent]:
62
+ """Handle tool calls."""
63
+ try:
64
+ if name == "create_chart":
65
+ return await create_chart(arguments)
66
+ elif name == "get_chart_schema":
67
+ return await get_chart_schema(arguments)
68
+ elif name == "publish_chart":
69
+ return await publish_chart(arguments)
70
+ elif name == "get_chart":
71
+ return await get_chart_info(arguments)
72
+ elif name == "update_chart":
73
+ return await update_chart(arguments)
74
+ elif name == "delete_chart":
75
+ return await delete_chart(arguments)
76
+ elif name == "export_chart_png":
77
+ return await export_chart_png(arguments)
78
+ else:
79
+ raise ValueError(f"Unknown tool: {name}")
80
+ except Exception as e:
81
+ return [TextContent(type="text", text=f"Error: {str(e)}")]
82
+
83
+
84
+ def main():
85
+ """Run the MCP server."""
86
+ import asyncio
87
+ from mcp.server.stdio import stdio_server
88
+
89
+ async def run():
90
+ async with stdio_server() as (read_stream, write_stream):
91
+ await app.run(
92
+ read_stream,
93
+ write_stream,
94
+ app.create_initialization_options(),
95
+ )
96
+
97
+ asyncio.run(run())
98
+
99
+
100
+ if __name__ == "__main__":
101
+ main()
datawrapper_mcp/tools.py ADDED
@@ -0,0 +1,286 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Tool definitions for the Datawrapper MCP server."""
2
+
3
+ from mcp.types import Tool
4
+
5
+ from .config import CHART_CLASSES
6
+
7
+
8
+ async def list_tools() -> list[Tool]:
9
+ """List available tools."""
10
+ return [
11
+ Tool(
12
+ name="create_chart",
13
+ description=(
14
+ "⚠️ THIS IS THE DATAWRAPPER INTEGRATION ⚠️\n"
15
+ "Use this MCP tool for ALL Datawrapper chart creation.\n\n"
16
+ "DO NOT:\n"
17
+ "❌ Install the 'datawrapper' Python package\n"
18
+ "❌ Use the Datawrapper API directly\n"
19
+ "❌ Import 'from datawrapper import ...'\n"
20
+ "❌ Run pip install datawrapper\n\n"
21
+ "This MCP server IS the complete Datawrapper integration. All Datawrapper operations "
22
+ "should use the MCP tools provided by this server.\n\n"
23
+ "---\n\n"
24
+ "Create a Datawrapper chart with full control using Pydantic models. "
25
+ "This allows you to specify all chart properties including title, description, "
26
+ "visualization settings, axes, colors, and more. The chart_config should "
27
+ "be a complete Pydantic model dict matching the schema for the chosen chart type.\n\n"
28
+ "STYLING WORKFLOW:\n"
29
+ "1. Use get_chart_schema to explore all available options for your chart type\n"
30
+ "2. Refer to https://datawrapper.readthedocs.io/en/latest/ for detailed examples\n"
31
+ "3. Build your chart_config with the desired styling properties\n\n"
32
+ "Common styling patterns:\n"
33
+ '- Colors: {"color_category": {"sales": "#1d81a2", "profit": "#15607a"}}\n'
34
+ '- Line styling: {"lines": [{"column": "sales", "width": "style1", "interpolation": "curved"}]}\n'
35
+ '- Axis ranges: {"custom_range_y": [0, 100], "custom_range_x": [2020, 2024]}\n'
36
+ '- Grid formatting: {"y_grid_format": "0", "x_grid": "on", "y_grid": "on"}\n'
37
+ '- Tooltips: {"tooltip_number_format": "00.00", "tooltip_x_format": "YYYY"}\n'
38
+ '- Annotations: {"text_annotations": [{"x": "2023", "y": 50, "text": "Peak"}]}\n\n'
39
+ "See the documentation for chart-type specific examples and advanced patterns.\n\n"
40
+ 'Example data format: [{"date": "2024-01", "value": 100}, {"date": "2024-02", "value": 150}]'
41
+ ),
42
+ inputSchema={
43
+ "type": "object",
44
+ "properties": {
45
+ "data": {
46
+ "type": ["string", "array", "object"],
47
+ "description": (
48
+ "Chart data. RECOMMENDED: Pass data inline as a list or dict.\n\n"
49
+ "PREFERRED FORMATS (use these first):\n\n"
50
+ "1. List of records (RECOMMENDED):\n"
51
+ ' [{"year": 2020, "sales": 100}, {"year": 2021, "sales": 150}]\n\n'
52
+ "2. Dict of arrays:\n"
53
+ ' {"year": [2020, 2021], "sales": [100, 150]}\n\n'
54
+ "3. JSON string of format 1 or 2:\n"
55
+ ' \'[{"year": 2020, "sales": 100}]\'\n\n'
56
+ "ALTERNATIVE (only for extremely large datasets where inline data is impractical):\n\n"
57
+ "4. File path to CSV or JSON:\n"
58
+ ' "/path/to/data.csv" or "/path/to/data.json"\n'
59
+ " - Use only when inline data would be too large to pass directly\n"
60
+ " - CSV files are read directly\n"
61
+ " - JSON files must contain list of dicts or dict of arrays"
62
+ ),
63
+ },
64
+ "chart_type": {
65
+ "type": "string",
66
+ "enum": list(CHART_CLASSES.keys()),
67
+ "description": "Type of chart to create",
68
+ },
69
+ "chart_config": {
70
+ "type": "object",
71
+ "description": (
72
+ "Complete chart configuration as a Pydantic model dict. "
73
+ "Must match the schema for the chosen chart_type. "
74
+ "Use get_chart_schema to see the full schema."
75
+ ),
76
+ },
77
+ },
78
+ "required": ["data", "chart_type", "chart_config"],
79
+ },
80
+ ),
81
+ Tool(
82
+ name="get_chart_schema",
83
+ description=(
84
+ "⚠️ DATAWRAPPER MCP TOOL ⚠️\n"
85
+ "This is part of the Datawrapper MCP server integration.\n\n"
86
+ "---\n\n"
87
+ "Get the Pydantic JSON schema for a specific chart type. This is your primary tool "
88
+ "for discovering styling and configuration options.\n\n"
89
+ "The schema shows:\n"
90
+ "- All available properties and their types\n"
91
+ "- Enum values (e.g., line widths, interpolation methods)\n"
92
+ "- Default values\n"
93
+ "- Detailed descriptions for each property\n\n"
94
+ "WORKFLOW: Use this tool first to explore options, then refer to "
95
+ "https://datawrapper.readthedocs.io/en/latest/ for detailed examples and patterns "
96
+ "showing how to use these properties in practice."
97
+ ),
98
+ inputSchema={
99
+ "type": "object",
100
+ "properties": {
101
+ "chart_type": {
102
+ "type": "string",
103
+ "enum": list(CHART_CLASSES.keys()),
104
+ "description": "Chart type to get schema for",
105
+ },
106
+ },
107
+ "required": ["chart_type"],
108
+ },
109
+ ),
110
+ Tool(
111
+ name="publish_chart",
112
+ description=(
113
+ "⚠️ DATAWRAPPER MCP TOOL ⚠️\n"
114
+ "This is part of the Datawrapper MCP server integration.\n\n"
115
+ "---\n\n"
116
+ "Publish a Datawrapper chart to make it publicly accessible. "
117
+ "Returns the public URL of the published chart. "
118
+ "IMPORTANT: Only use this tool when the user explicitly requests to publish the chart. "
119
+ "Do not automatically publish charts after creation unless specifically asked."
120
+ ),
121
+ inputSchema={
122
+ "type": "object",
123
+ "properties": {
124
+ "chart_id": {
125
+ "type": "string",
126
+ "description": "ID of the chart to publish",
127
+ },
128
+ },
129
+ "required": ["chart_id"],
130
+ },
131
+ ),
132
+ Tool(
133
+ name="get_chart",
134
+ description=(
135
+ "⚠️ DATAWRAPPER MCP TOOL ⚠️\n"
136
+ "This is part of the Datawrapper MCP server integration.\n\n"
137
+ "---\n\n"
138
+ "Get information about an existing Datawrapper chart, "
139
+ "including its metadata, data, and public URL if published."
140
+ ),
141
+ inputSchema={
142
+ "type": "object",
143
+ "properties": {
144
+ "chart_id": {
145
+ "type": "string",
146
+ "description": "ID of the chart to retrieve",
147
+ },
148
+ },
149
+ "required": ["chart_id"],
150
+ },
151
+ ),
152
+ Tool(
153
+ name="update_chart",
154
+ description=(
155
+ "⚠️ DATAWRAPPER MCP TOOL ⚠️\n"
156
+ "This is part of the Datawrapper MCP server integration.\n\n"
157
+ "---\n\n"
158
+ "Update an existing Datawrapper chart's data or configuration using Pydantic models. "
159
+ "IMPORTANT: The chart_config must use high-level Pydantic fields only (title, intro, "
160
+ "byline, source_name, source_url, etc.). Do NOT use low-level serialized structures "
161
+ "like 'metadata', 'visualize', or other internal API fields.\n\n"
162
+ "STYLING UPDATES:\n"
163
+ "Use get_chart_schema to see available fields, then apply styling changes:\n"
164
+ '- Colors: {"color_category": {"sales": "#ff0000"}}\n'
165
+ '- Line properties: {"lines": [{"column": "sales", "width": "style2"}]}\n'
166
+ '- Axis settings: {"custom_range_y": [0, 200], "y_grid_format": "0,0"}\n'
167
+ '- Tooltips: {"tooltip_number_format": "0.0"}\n\n'
168
+ "See https://datawrapper.readthedocs.io/en/latest/ for detailed examples. "
169
+ "The provided config will be validated through Pydantic and merged with the existing "
170
+ "chart configuration."
171
+ ),
172
+ inputSchema={
173
+ "type": "object",
174
+ "properties": {
175
+ "chart_id": {
176
+ "type": "string",
177
+ "description": "ID of the chart to update",
178
+ },
179
+ "data": {
180
+ "type": ["string", "array", "object"],
181
+ "description": (
182
+ "Chart data. RECOMMENDED: Pass data inline as a list or dict.\n\n"
183
+ "PREFERRED FORMATS (use these first):\n\n"
184
+ "1. List of records (RECOMMENDED):\n"
185
+ ' [{"year": 2020, "sales": 100}, {"year": 2021, "sales": 150}]\n\n'
186
+ "2. Dict of arrays:\n"
187
+ ' {"year": [2020, 2021], "sales": [100, 150]}\n\n'
188
+ "3. JSON string of format 1 or 2:\n"
189
+ ' \'[{"year": 2020, "sales": 100}]\'\n\n'
190
+ "ALTERNATIVE (only for extremely large datasets where inline data is impractical):\n\n"
191
+ "4. File path to CSV or JSON:\n"
192
+ ' "/path/to/data.csv" or "/path/to/data.json"\n'
193
+ " - Use only when inline data would be too large to pass directly\n"
194
+ " - CSV files are read directly\n"
195
+ " - JSON files must contain list of dicts or dict of arrays"
196
+ ),
197
+ },
198
+ "chart_config": {
199
+ "type": "object",
200
+ "description": (
201
+ "Updated chart configuration using high-level Pydantic fields (optional). "
202
+ "Must use Pydantic model fields like 'title', 'intro', 'byline', etc. "
203
+ "Do NOT use raw API structures like 'metadata' or 'visualize'. "
204
+ "Use get_chart_schema to see valid fields. Will be validated and merged "
205
+ "with existing config."
206
+ ),
207
+ },
208
+ },
209
+ "required": ["chart_id"],
210
+ },
211
+ ),
212
+ Tool(
213
+ name="delete_chart",
214
+ description=(
215
+ "⚠️ DATAWRAPPER MCP TOOL ⚠️\n"
216
+ "This is part of the Datawrapper MCP server integration.\n\n"
217
+ "---\n\n"
218
+ "Delete a Datawrapper chart permanently."
219
+ ),
220
+ inputSchema={
221
+ "type": "object",
222
+ "properties": {
223
+ "chart_id": {
224
+ "type": "string",
225
+ "description": "ID of the chart to delete",
226
+ },
227
+ },
228
+ "required": ["chart_id"],
229
+ },
230
+ ),
231
+ Tool(
232
+ name="export_chart_png",
233
+ description=(
234
+ "⚠️ DATAWRAPPER MCP TOOL ⚠️\n"
235
+ "This is part of the Datawrapper MCP server integration.\n\n"
236
+ "---\n\n"
237
+ "Export a Datawrapper chart as PNG and display it inline. "
238
+ "The chart must be created first using create_chart. "
239
+ "Supports high-resolution output via the zoom parameter. "
240
+ "IMPORTANT: Only use this tool when the user explicitly requests to see the chart image "
241
+ "or export it as PNG. Do not automatically export charts after creation unless specifically asked."
242
+ ),
243
+ inputSchema={
244
+ "type": "object",
245
+ "properties": {
246
+ "chart_id": {
247
+ "type": "string",
248
+ "description": "ID of the chart to export",
249
+ },
250
+ "width": {
251
+ "type": "integer",
252
+ "description": "Width of the image in pixels (optional, uses chart width if not specified)",
253
+ },
254
+ "height": {
255
+ "type": "integer",
256
+ "description": "Height of the image in pixels (optional, uses chart height if not specified)",
257
+ },
258
+ "plain": {
259
+ "type": "boolean",
260
+ "description": "If true, exports only the visualization without header/footer (default: false)",
261
+ "default": False,
262
+ },
263
+ "zoom": {
264
+ "type": "integer",
265
+ "description": "Scale multiplier for resolution, e.g., 2 = 2x resolution (default: 2)",
266
+ "default": 2,
267
+ },
268
+ "transparent": {
269
+ "type": "boolean",
270
+ "description": "If true, exports with transparent background (default: false)",
271
+ "default": False,
272
+ },
273
+ "border_width": {
274
+ "type": "integer",
275
+ "description": "Margin around visualization in pixels (default: 0)",
276
+ "default": 0,
277
+ },
278
+ "border_color": {
279
+ "type": "string",
280
+ "description": "Color of the border, e.g., '#FFFFFF' (optional, uses chart background if not specified)",
281
+ },
282
+ },
283
+ "required": ["chart_id"],
284
+ },
285
+ ),
286
+ ]
datawrapper_mcp/utils.py ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Utility functions for the Datawrapper MCP server."""
2
+
3
+ import json
4
+ import os
5
+
6
+ import pandas as pd
7
+
8
+
9
+ def get_api_token() -> str:
10
+ """Get the Datawrapper API token from environment."""
11
+ api_token = os.environ.get("DATAWRAPPER_ACCESS_TOKEN")
12
+ if not api_token:
13
+ raise ValueError(
14
+ "DATAWRAPPER_ACCESS_TOKEN environment variable is required. "
15
+ "Get your token from https://app.datawrapper.de/account/api-tokens"
16
+ )
17
+ return api_token
18
+
19
+
20
+ def json_to_dataframe(data: str | list | dict) -> pd.DataFrame:
21
+ """Convert JSON data to a pandas DataFrame.
22
+
23
+ Args:
24
+ data: One of:
25
+ - File path to CSV or JSON file (e.g., "/path/to/data.csv")
26
+ - List of records: [{"col1": val1, "col2": val2}, ...]
27
+ - Dict of arrays: {"col1": [val1, val2], "col2": [val3, val4]}
28
+ - JSON string in either format above
29
+
30
+ Returns:
31
+ pandas DataFrame
32
+
33
+ Examples:
34
+ >>> json_to_dataframe("/tmp/data.csv")
35
+ >>> json_to_dataframe("/tmp/data.json")
36
+ >>> json_to_dataframe([{"a": 1, "b": 2}, {"a": 3, "b": 4}])
37
+ >>> json_to_dataframe({"a": [1, 3], "b": [2, 4]})
38
+ >>> json_to_dataframe('[{"a": 1, "b": 2}]')
39
+ """
40
+ if isinstance(data, str):
41
+ # Check if it's a file path that exists
42
+ if os.path.isfile(data):
43
+ if data.endswith(".csv"):
44
+ return pd.read_csv(data)
45
+ elif data.endswith(".json"):
46
+ with open(data) as f:
47
+ file_data = json.load(f)
48
+ # Recursively process the loaded JSON data
49
+ return json_to_dataframe(file_data)
50
+ else:
51
+ raise ValueError(
52
+ f"Unsupported file type: {data}\n\n"
53
+ "Supported file types:\n"
54
+ " - .csv (CSV files)\n"
55
+ " - .json (JSON files containing list of dicts or dict of arrays)"
56
+ )
57
+
58
+ # Check if it looks like CSV content (not a file path)
59
+ if "\n" in data and "," in data and not data.strip().startswith(("[", "{")):
60
+ raise ValueError(
61
+ "CSV strings are not supported. Please save to a file first.\n\n"
62
+ "Options:\n"
63
+ " 1. Save CSV to a file and pass the file path\n"
64
+ ' 2. Parse CSV to list of dicts: [{"col": val}, ...]\n'
65
+ ' 3. Parse CSV to dict of arrays: {"col": [vals]}\n\n'
66
+ "Example:\n"
67
+ ' data = [{"year": 2020, "value": 100}, {"year": 2021, "value": 150}]'
68
+ )
69
+
70
+ # Try to parse as JSON string
71
+ try:
72
+ data = json.loads(data)
73
+ except json.JSONDecodeError as e:
74
+ raise ValueError(
75
+ f"Invalid JSON string: {e}\n\n"
76
+ "Expected one of:\n"
77
+ " 1. File path: '/path/to/data.csv' or '/path/to/data.json'\n"
78
+ ' 2. JSON string: \'[{"year": 2020, "value": 100}, ...]\'\n'
79
+ ' 3. JSON string: \'{"year": [2020, 2021], "value": [100, 150]}\''
80
+ )
81
+
82
+ if isinstance(data, list):
83
+ if not data:
84
+ raise ValueError(
85
+ "Data list is empty. Please provide at least one row of data."
86
+ )
87
+ if not all(isinstance(item, dict) for item in data):
88
+ raise ValueError(
89
+ "List format must contain dictionaries.\n\n"
90
+ "Expected format:\n"
91
+ ' [{"year": 2020, "value": 100}, {"year": 2021, "value": 150}]\n\n'
92
+ f"Got: {type(data[0]).__name__} in list"
93
+ )
94
+ # List of records: [{"col1": val1, "col2": val2}, ...]
95
+ return pd.DataFrame(data)
96
+ elif isinstance(data, dict):
97
+ if not data:
98
+ raise ValueError(
99
+ "Data dict is empty. Please provide at least one column of data."
100
+ )
101
+ # Check if it's a dict of arrays (all values should be lists)
102
+ if not all(isinstance(v, list) for v in data.values()):
103
+ raise ValueError(
104
+ "Dict format must have lists as values.\n\n"
105
+ "Expected format:\n"
106
+ ' {"year": [2020, 2021], "value": [100, 150]}\n\n'
107
+ f"Got dict with values of type: {[type(v).__name__ for v in data.values()]}"
108
+ )
109
+ # Dict of arrays: {"col1": [val1, val2], "col2": [val3, val4]}
110
+ return pd.DataFrame(data)
111
+ else:
112
+ raise ValueError(
113
+ f"Unsupported data type: {type(data).__name__}\n\n"
114
+ "Data must be one of:\n"
115
+ ' 1. List of dicts: [{"year": 2020, "value": 100}, ...]\n'
116
+ ' 2. Dict of arrays: {"year": [2020, 2021], "value": [100, 150]}\n'
117
+ " 3. JSON string in either format above"
118
+ )
requirements.txt CHANGED
@@ -12,3 +12,8 @@ python-dotenv>=1.0.0
12
 
13
  # Utilities
14
  pydantic>=2.0.0
 
 
 
 
 
 
12
 
13
  # Utilities
14
  pydantic>=2.0.0
15
+
16
+ # Datawrapper chart creation
17
+ datawrapper>=2.0.7
18
+ mcp>=1.20.0
19
+ pandas>=2.0.0
src/datawrapper_client.py ADDED
@@ -0,0 +1,336 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Datawrapper Chart Generation Client
3
+
4
+ Integrates RAG pipeline with Datawrapper API for intelligent chart creation.
5
+ """
6
+
7
+ import json
8
+ import os
9
+ from typing import Optional, Tuple
10
+ import pandas as pd
11
+
12
+ from .prompts import (
13
+ CHART_SELECTION_SYSTEM_PROMPT,
14
+ get_chart_selection_prompt,
15
+ get_chart_styling_prompt
16
+ )
17
+ from .llm_client import create_llm_client
18
+ from .rag_pipeline import GraphicsDesignPipeline
19
+
20
+ # Import Datawrapper MCP handlers directly
21
+ from datawrapper_mcp.handlers.create import create_chart as mcp_create_chart
22
+ from datawrapper_mcp.handlers.publish import publish_chart as mcp_publish_chart
23
+ from datawrapper_mcp.handlers.retrieve import get_chart_info as mcp_get_chart_info
24
+
25
+
26
+ def get_data_summary(df: pd.DataFrame) -> str:
27
+ """
28
+ Generate a summary of the DataFrame structure and content.
29
+
30
+ Args:
31
+ df: Input DataFrame
32
+
33
+ Returns:
34
+ String summary of data characteristics
35
+ """
36
+ summary_parts = []
37
+
38
+ # Basic info
39
+ summary_parts.append(f"Rows: {len(df)}, Columns: {len(df.columns)}")
40
+ summary_parts.append(f"Column names: {', '.join(df.columns.tolist())}")
41
+
42
+ # Column types
43
+ numeric_cols = df.select_dtypes(include=['number']).columns.tolist()
44
+ text_cols = df.select_dtypes(include=['object']).columns.tolist()
45
+ date_cols = df.select_dtypes(include=['datetime']).columns.tolist()
46
+
47
+ if numeric_cols:
48
+ summary_parts.append(f"Numeric columns: {', '.join(numeric_cols)}")
49
+ if text_cols:
50
+ summary_parts.append(f"Text columns: {', '.join(text_cols)}")
51
+ if date_cols:
52
+ summary_parts.append(f"Date columns: {', '.join(date_cols)}")
53
+
54
+ # Data preview (first 3 rows)
55
+ summary_parts.append(f"\nData preview:\n{df.head(3).to_string()}")
56
+
57
+ return "\n".join(summary_parts)
58
+
59
+
60
+ def analyze_csv_for_chart_type(
61
+ df: pd.DataFrame,
62
+ user_prompt: str,
63
+ rag_pipeline: GraphicsDesignPipeline
64
+ ) -> Tuple[str, str]:
65
+ """
66
+ Use RAG and LLM to determine the best chart type for the data.
67
+
68
+ Args:
69
+ df: Input DataFrame
70
+ user_prompt: User's description of what they want to visualize
71
+ rag_pipeline: RAG pipeline for retrieving best practices
72
+
73
+ Returns:
74
+ Tuple of (chart_type, reasoning)
75
+ """
76
+ # Get data summary
77
+ data_summary = get_data_summary(df)
78
+
79
+ # Query RAG for chart selection best practices
80
+ rag_query = f"chart type selection for {user_prompt}"
81
+ relevant_docs = rag_pipeline.retrieve_documents(rag_query, k=3)
82
+ rag_context = rag_pipeline.vectorstore.format_documents_for_context(relevant_docs)
83
+
84
+ # Generate chart type recommendation using LLM
85
+ chart_prompt = get_chart_selection_prompt()
86
+ full_prompt = chart_prompt.format(
87
+ user_prompt=user_prompt,
88
+ data_summary=data_summary,
89
+ rag_context=rag_context
90
+ )
91
+
92
+ llm_client = create_llm_client(
93
+ model=os.getenv("LLM_MODEL", "meta-llama/Llama-3.1-8B-Instruct"),
94
+ temperature=0.3, # Lower temperature for more deterministic chart selection
95
+ max_tokens=500
96
+ )
97
+
98
+ response = llm_client.generate(
99
+ prompt=full_prompt,
100
+ system_prompt=CHART_SELECTION_SYSTEM_PROMPT
101
+ )
102
+
103
+ # Parse JSON response
104
+ try:
105
+ # Extract JSON from response (handle markdown code blocks)
106
+ response_clean = response.strip()
107
+ if "```json" in response_clean:
108
+ response_clean = response_clean.split("```json")[1].split("```")[0].strip()
109
+ elif "```" in response_clean:
110
+ response_clean = response_clean.split("```")[1].split("```")[0].strip()
111
+
112
+ result = json.loads(response_clean)
113
+ chart_type = result.get("chart_type", "line")
114
+ reasoning = result.get("reasoning", "")
115
+
116
+ # Validate chart type
117
+ valid_types = ["bar", "line", "area", "scatter", "column", "stacked_bar", "arrow", "multiple_column"]
118
+ if chart_type not in valid_types:
119
+ chart_type = "line" # Default fallback
120
+
121
+ return chart_type, reasoning
122
+ except Exception as e:
123
+ print(f"Error parsing chart type response: {e}")
124
+ print(f"Response was: {response}")
125
+ # Default to line chart
126
+ return "line", "Using default line chart due to parsing error"
127
+
128
+
129
+ def generate_chart_config(
130
+ chart_type: str,
131
+ df: pd.DataFrame,
132
+ user_prompt: str,
133
+ rag_pipeline: GraphicsDesignPipeline
134
+ ) -> dict:
135
+ """
136
+ Generate Datawrapper chart configuration using RAG and LLM.
137
+
138
+ Args:
139
+ chart_type: Type of chart to create
140
+ df: Input DataFrame
141
+ user_prompt: User's visualization request
142
+ rag_pipeline: RAG pipeline for retrieving design best practices
143
+
144
+ Returns:
145
+ Dictionary with chart configuration
146
+ """
147
+ # Get data summary
148
+ data_summary = get_data_summary(df)
149
+
150
+ # Query RAG for styling and design best practices
151
+ rag_query = f"chart design best practices colors accessibility {chart_type}"
152
+ relevant_docs = rag_pipeline.retrieve_documents(rag_query, k=3)
153
+ rag_context = rag_pipeline.vectorstore.format_documents_for_context(relevant_docs)
154
+
155
+ # Generate chart configuration using LLM
156
+ styling_prompt = get_chart_styling_prompt()
157
+ full_prompt = styling_prompt.format(
158
+ chart_type=chart_type,
159
+ user_prompt=user_prompt,
160
+ data_summary=data_summary,
161
+ rag_context=rag_context
162
+ )
163
+
164
+ llm_client = create_llm_client(
165
+ model=os.getenv("LLM_MODEL", "meta-llama/Llama-3.1-8B-Instruct"),
166
+ temperature=0.5,
167
+ max_tokens=800
168
+ )
169
+
170
+ response = llm_client.generate(
171
+ prompt=full_prompt,
172
+ system_prompt="You are a data visualization expert. Generate valid JSON configuration for Datawrapper charts."
173
+ )
174
+
175
+ # Parse JSON response
176
+ try:
177
+ # Extract JSON from response
178
+ response_clean = response.strip()
179
+ if "```json" in response_clean:
180
+ response_clean = response_clean.split("```json")[1].split("```")[0].strip()
181
+ elif "```" in response_clean:
182
+ response_clean = response_clean.split("```")[1].split("```")[0].strip()
183
+
184
+ config = json.loads(response_clean)
185
+
186
+ # Ensure basic required fields
187
+ if "title" not in config:
188
+ config["title"] = user_prompt[:100] # Use prompt as fallback title
189
+
190
+ return config
191
+ except Exception as e:
192
+ print(f"Error parsing chart config: {e}")
193
+ print(f"Response was: {response}")
194
+ # Return minimal config
195
+ return {
196
+ "title": user_prompt[:100] if user_prompt else "Data Visualization",
197
+ "source_name": "User Data"
198
+ }
199
+
200
+
201
+ async def create_and_publish_chart(
202
+ df: pd.DataFrame,
203
+ user_prompt: str,
204
+ rag_pipeline: GraphicsDesignPipeline,
205
+ api_token: Optional[str] = None
206
+ ) -> dict:
207
+ """
208
+ Complete workflow: analyze data, select chart type, create and publish chart.
209
+
210
+ Args:
211
+ df: Input DataFrame
212
+ user_prompt: User's visualization request
213
+ rag_pipeline: RAG pipeline instance
214
+ api_token: Datawrapper API token (defaults to env var)
215
+
216
+ Returns:
217
+ Dictionary with chart info including iframe URL
218
+ """
219
+ if api_token is None:
220
+ api_token = os.getenv("DATAWRAPPER_ACCESS_TOKEN")
221
+ if not api_token:
222
+ raise ValueError("DATAWRAPPER_ACCESS_TOKEN not found in environment")
223
+
224
+ try:
225
+ # Step 1: Analyze data and select chart type
226
+ chart_type, reasoning = analyze_csv_for_chart_type(df, user_prompt, rag_pipeline)
227
+
228
+ # Step 2: Generate chart configuration
229
+ chart_config = generate_chart_config(chart_type, df, user_prompt, rag_pipeline)
230
+
231
+ # Step 3: Convert DataFrame to list of dicts for Datawrapper
232
+ data_list = df.to_dict('records')
233
+
234
+ # Step 4: Create chart using MCP handler
235
+ create_args = {
236
+ "data": data_list,
237
+ "chart_type": chart_type,
238
+ "chart_config": chart_config
239
+ }
240
+
241
+ create_result = await mcp_create_chart(create_args)
242
+
243
+ if not create_result or len(create_result) == 0:
244
+ raise ValueError("Empty response from chart creation")
245
+
246
+ result_text = create_result[0].text
247
+
248
+ if not result_text or result_text.strip() == "":
249
+ raise ValueError("Empty text in chart creation response")
250
+
251
+ result_data = json.loads(result_text)
252
+
253
+ chart_id = result_data.get("chart_id")
254
+ if not chart_id:
255
+ raise ValueError(f"Failed to get chart_id from creation response. Response was: {result_data}")
256
+
257
+ # Step 5: Try to publish chart using MCP handler
258
+ publish_success = False
259
+ publish_message = ""
260
+ try:
261
+ publish_args = {"chart_id": chart_id}
262
+ publish_result = await mcp_publish_chart(publish_args)
263
+ publish_text = publish_result[0].text
264
+ publish_data = json.loads(publish_text)
265
+ publish_success = True
266
+ publish_message = publish_data.get("message", "Published successfully")
267
+ except Exception as publish_error:
268
+ publish_message = f"Publish failed: {str(publish_error)}"
269
+
270
+ # Step 6: Get full chart info using MCP handler
271
+ chart_info_args = {"chart_id": chart_id}
272
+ chart_info_result = await mcp_get_chart_info(chart_info_args)
273
+ chart_info_text = chart_info_result[0].text
274
+ chart_info = json.loads(chart_info_text)
275
+
276
+ # Return complete info
277
+ return {
278
+ "success": True,
279
+ "chart_id": chart_id,
280
+ "chart_type": chart_type,
281
+ "reasoning": reasoning,
282
+ "public_url": chart_info.get("public_url"),
283
+ "edit_url": chart_info.get("edit_url"),
284
+ "published": publish_success,
285
+ "publish_message": publish_message,
286
+ "title": chart_config.get("title", "Chart")
287
+ }
288
+
289
+ except json.JSONDecodeError as e:
290
+ error_msg = f"JSON parsing error: {str(e)}"
291
+ print(f"Error in chart creation: {error_msg}")
292
+ print(f"Failed to parse: {result_text if 'result_text' in locals() else 'N/A'}")
293
+ return {
294
+ "success": False,
295
+ "error": error_msg,
296
+ "chart_type": chart_type if 'chart_type' in locals() else None,
297
+ "public_url": None
298
+ }
299
+ except Exception as e:
300
+ error_msg = f"{type(e).__name__}: {str(e)}"
301
+ print(f"Error in chart creation: {error_msg}")
302
+ import traceback
303
+ traceback.print_exc()
304
+ return {
305
+ "success": False,
306
+ "error": error_msg,
307
+ "chart_type": chart_type if 'chart_type' in locals() else None,
308
+ "public_url": None
309
+ }
310
+
311
+
312
+ def get_iframe_html(chart_url: str, height: int = 600) -> str:
313
+ """
314
+ Generate iframe HTML for embedding a Datawrapper chart.
315
+
316
+ Args:
317
+ chart_url: Public URL of the chart
318
+ height: Height of iframe in pixels
319
+
320
+ Returns:
321
+ HTML string with iframe
322
+ """
323
+ if not chart_url:
324
+ return "<div style='padding: 50px; text-align: center;'>No chart available</div>"
325
+
326
+ return f"""
327
+ <div style="width: 100%; height: {height}px;">
328
+ <iframe
329
+ src="{chart_url}"
330
+ style="width: 100%; height: 100%; border: none;"
331
+ frameborder="0"
332
+ scrolling="no"
333
+ aria-label="Chart">
334
+ </iframe>
335
+ </div>
336
+ """
src/prompts.py CHANGED
@@ -126,3 +126,110 @@ def get_followup_prompt() -> SimplePromptTemplate:
126
  def get_technique_recommendation_prompt() -> SimplePromptTemplate:
127
  """Get the technique recommendation prompt template"""
128
  return TECHNIQUE_RECOMMENDATION_PROMPT
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
126
  def get_technique_recommendation_prompt() -> SimplePromptTemplate:
127
  """Get the technique recommendation prompt template"""
128
  return TECHNIQUE_RECOMMENDATION_PROMPT
129
+
130
+
131
+ # =============================================================================
132
+ # CHART GENERATION PROMPTS (for Datawrapper integration)
133
+ # =============================================================================
134
+
135
+ CHART_SELECTION_SYSTEM_PROMPT = """You are an expert data visualization advisor specialized in selecting the optimal chart type for data storytelling.
136
+
137
+ Your task is to analyze:
138
+ 1. The user's intent and goal (what story they want to tell)
139
+ 2. The structure and characteristics of their data
140
+ 3. Best practices from visualization research
141
+
142
+ You must respond with a JSON object containing:
143
+ - "chart_type": one of [bar, line, area, scatter, column, stacked_bar, arrow, multiple_column]
144
+ - "reasoning": brief explanation of why this chart type is best
145
+ - "data_insights": key patterns or features in the data that inform the choice"""
146
+
147
+ CHART_SELECTION_PROMPT_TEMPLATE = """USER REQUEST: {user_prompt}
148
+
149
+ DATA STRUCTURE:
150
+ {data_summary}
151
+
152
+ VISUALIZATION BEST PRACTICES (from knowledge base):
153
+ {rag_context}
154
+
155
+ Based on the user's request, the data characteristics, and visualization best practices:
156
+
157
+ 1. Analyze the data type:
158
+ - Time series → line, area charts
159
+ - Categorical comparisons → bar, column charts
160
+ - Correlations/relationships → scatter plots
161
+ - Part-to-whole → stacked bar charts
162
+ - Change/movement → arrow charts
163
+ - Multiple categories over time → multiple column charts
164
+
165
+ 2. Consider the user's storytelling goal:
166
+ - Showing trends over time
167
+ - Comparing categories
168
+ - Revealing correlations
169
+ - Displaying composition
170
+ - Highlighting change
171
+
172
+ 3. Apply best practices from research:
173
+ - Accessibility and clarity
174
+ - Appropriate for data density
175
+ - Effective for the message
176
+
177
+ Respond with a JSON object only:
178
+ {{
179
+ "chart_type": "one of [bar, line, area, scatter, column, stacked_bar, arrow, multiple_column]",
180
+ "reasoning": "why this chart type is optimal for this data and intent",
181
+ "data_insights": "key patterns that inform the visualization approach"
182
+ }}"""
183
+
184
+ CHART_STYLING_PROMPT_TEMPLATE = """You are creating a Datawrapper {chart_type} chart configuration.
185
+
186
+ USER REQUEST: {user_prompt}
187
+
188
+ DATA STRUCTURE:
189
+ {data_summary}
190
+
191
+ DESIGN BEST PRACTICES (from knowledge base):
192
+ {rag_context}
193
+
194
+ IMPORTANT: You must ONLY include these fields in your JSON response:
195
+ - title (string, required): Clear, descriptive chart title
196
+ - intro (string, optional): Brief explanation
197
+ - byline (string, optional): Author/source attribution
198
+ - source_name (string, optional): Data source name
199
+ - source_url (string, optional): Link to data source
200
+
201
+ DO NOT include any other fields like:
202
+ - styling, options, data, chart_type, colors, labels, annotations, tooltips
203
+ - metadata, visualize, or any internal fields
204
+
205
+ These other fields will cause validation errors. Keep it simple with just the 5 fields listed above.
206
+
207
+ Example valid response:
208
+ {{
209
+ "title": "Sales Trends 2024",
210
+ "intro": "Monthly sales showing 30% growth",
211
+ "source_name": "Company Data",
212
+ "source_url": "https://example.com"
213
+ }}
214
+
215
+ Generate a minimal, valid JSON configuration with ONLY the allowed fields above."""
216
+
217
+ CHART_SELECTION_PROMPT = SimplePromptTemplate(
218
+ template=CHART_SELECTION_PROMPT_TEMPLATE,
219
+ input_variables=["user_prompt", "data_summary", "rag_context"]
220
+ )
221
+
222
+ CHART_STYLING_PROMPT = SimplePromptTemplate(
223
+ template=CHART_STYLING_PROMPT_TEMPLATE,
224
+ input_variables=["chart_type", "user_prompt", "data_summary", "rag_context"]
225
+ )
226
+
227
+
228
+ def get_chart_selection_prompt() -> SimplePromptTemplate:
229
+ """Get the chart type selection prompt template"""
230
+ return CHART_SELECTION_PROMPT
231
+
232
+
233
+ def get_chart_styling_prompt() -> SimplePromptTemplate:
234
+ """Get the chart styling configuration prompt template"""
235
+ return CHART_STYLING_PROMPT
start.sh ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ # Start script for Viz LLM with Datawrapper integration
4
+
5
+ echo "🚀 Starting Viz LLM..."
6
+ echo ""
7
+
8
+ # Check for required environment variables
9
+ if [ ! -f .env ]; then
10
+ echo "⚠️ Error: .env file not found!"
11
+ echo "Please create a .env file based on .env.example"
12
+ exit 1
13
+ fi
14
+
15
+ # Check if required packages are installed
16
+ echo "📦 Checking dependencies..."
17
+ python -c "import gradio; import datawrapper; import pandas; import mcp" 2>/dev/null
18
+ if [ $? -ne 0 ]; then
19
+ echo "⚠️ Some dependencies are missing. Installing..."
20
+ pip install -r requirements.txt
21
+ fi
22
+
23
+ echo ""
24
+ echo "✓ Dependencies OK"
25
+ echo ""
26
+ echo "Starting Gradio app..."
27
+ echo "Once started, open your browser to: http://localhost:7860"
28
+ echo ""
29
+
30
+ # Run the app
31
+ python app.py