# Science Storyteller - Learning Guide > **For developers new to async Python, OOP, and MCP protocol** > A step-by-step guide to understanding the Science Storyteller codebase --- ## πŸ“š Table of Contents 0. [Architecture](#architecture) 1. [Learning Philosophy](#learning-philosophy) 2. [Object-Oriented Programming Basics](#object-oriented-programming-basics) 3. [Async/Await Deep Dive](#asyncawait-deep-dive) 4. [Module-by-Module Learning Path](#module-by-module-learning-path) 5. [Hands-On Exercises](#hands-on-exercises) 6. [Common Patterns Explained](#common-patterns-explained) 7. [Debugging Tips](#debugging-tips) 8. [Further Resources](#further-resources) 9. [Testing Strategy](#-testing-strategy) --- ## Architecture This diagram shows how a user request flows through the system. ```mermaid graph TD subgraph User Interface A[Gradio UI] end subgraph Orchestration Layer B(app.py: ScienceStoryteller) end subgraph Agent Layer C[agents/research_agent.py] D[agents/analysis_agent.py] E[agents/audio_agent.py] end subgraph Tool Layer F(mcp_tools/arxiv_tool.py) G(mcp_tools/llm_tool.py) H(ElevenLabs API) end subgraph External Services I[arXiv MCP Server] J[Anthropic Claude API] K[ElevenLabs TTS Service] end A -- User Input (Topic) --> B B -- 1. search(topic) --> C C -- 2. search_papers(query) --> F F -- 3. call_tool --> I I -- 4. Paper Results --> F F -- 5. Papers --> C C -- 6. Papers --> B B -- 7. summarize_and_script(paper) --> D D -- 8. summarize_paper(paper) --> G G -- 9. API Call --> J J -- 10. Summary --> G G -- 11. Summary --> D D -- 12. Script --> B B -- 13. text_to_speech(script) --> E E -- 14. API Call --> H H -- 15. API Call --> K K -- 16. Audio MP3 --> H H -- 17. Audio File Path --> E E -- 18. Audio Path --> B B -- 19. Results (Summary, Audio, etc.) --> A ``` --- ## Python Logging Module ### What is Logging? Logging is Python's built-in system for tracking events, debugging, and monitoring your application. It's much better than using `print()` statements for debugging. ### Basic Setup ```python import logging # Create a logger instance specific to this module logger = logging.getLogger(__name__) # Configure logging to display messages logging.basicConfig( level=logging.INFO, # Show INFO and above (INFO, WARNING, ERROR, CRITICAL) format='%(levelname)s - %(name)s - %(message)s' ) # Now you can log messages logger.info("Audio processor functions module loaded.") ``` ### Why Use `__name__` with Logger? **Benefits of `getLogger(__name__)`:** 1. **Hierarchical organization**: If your code is imported as a module (like `utils.audio_processor`), the logger name will be `"utils.audio_processor"` instead of `"__main__"`. This creates a logger hierarchy that helps organize logs from different parts of your app. 2. **Filtering by module**: You can configure different log levels for different parts of your application: ```python logging.getLogger("agents").setLevel(logging.DEBUG) # Verbose for agents logging.getLogger("utils").setLevel(logging.WARNING) # Quiet for utils ``` 3. **Identifies source**: In log output, you can see exactly which module generated each message, making debugging much easier. 4. **Best practice**: Prevents logger name conflicts and follows Python conventions. ### Log Levels From least to most severe: | Level | When to Use | Example | |-------|-------------|---------| | `DEBUG` | Detailed diagnostic information | `logger.debug(f"Variable x = {x}")` | | `INFO` | General informational messages | `logger.info("Processing started")` | | `WARNING` | Something unexpected, but not an error | `logger.warning("Cache miss, fetching from API")` | | `ERROR` | An error occurred, but app can continue | `logger.error(f"Failed to load file: {e}")` | | `CRITICAL` | Serious error, app may crash | `logger.critical("Database connection lost!")` | ### Why Logging Doesn't Show by Default **The problem:** By default, loggers only show messages at WARNING level and above. Your `logger.info()` calls are ignored! **The solution:** Configure logging with `basicConfig()` to set the minimum level: ```python logging.basicConfig(level=logging.INFO) # Now INFO messages will appear ``` ### Format String Explained ```python format='%(levelname)s - %(name)s - %(message)s' ``` This creates output like: ``` INFO - __main__ - Audio processor functions module loaded. ``` - `%(levelname)s` β†’ Log level (INFO, ERROR, etc.) - `%(name)s` β†’ Logger name (from `__name__`) - `%(message)s` β†’ Your actual message **Note:** You can add timestamps with `%(asctime)s` if you need them, but for simple learning it's cleaner without. ### Practical Example ```python import logging logger = logging.getLogger(__name__) def process_audio(file_path): logger.debug(f"Starting audio processing for: {file_path}") # Only in DEBUG mode try: # Process the file logger.info(f"Successfully processed: {file_path}") # Normal operation return True except FileNotFoundError: logger.error(f"File not found: {file_path}") # Error, but continue return False except Exception as e: logger.critical(f"Critical error processing {file_path}: {e}") # Serious problem raise ``` ### Why Use Logging Instead of Print? | Feature | `print()` | `logging` | |---------|-----------|-----------| | **Control output** | ❌ Always prints | βœ… Can turn on/off by level | | **Timestamps** | ❌ Manual | βœ… Automatic | | **File output** | ❌ Manual redirection | βœ… Built-in handlers | | **Severity levels** | ❌ No distinction | βœ… DEBUG, INFO, WARNING, etc. | | **Production-ready** | ❌ Need to remove/comment | βœ… Just change log level | | **Module identification** | ❌ Manual | βœ… Automatic with `__name__` | ### In Your Science Storyteller Project You'll use logging to track: - Which research papers were retrieved - API call successes/failures - Processing steps (search β†’ summarize β†’ TTS) - Errors during workflow - Performance timing **Example from your project:** ```python logger.info(f"Searching for papers on topic: {topic}") logger.warning("No papers found, trying fallback query") logger.error(f"API call failed: {e}") ``` --- ## Working with File Paths: `pathlib.Path` ### What is `pathlib`? `pathlib` is Python's modern, object-oriented way to work with file system paths. It was introduced in **Python 3.4** (2014) and is now the recommended approach for handling files and directories. ### Why Use `Path` Instead of Strings? **Old way (strings and `os.path`):** ```python import os path = "/home/user/audio.mp3" if os.path.exists(path): dirname = os.path.dirname(path) basename = os.path.basename(path) new_path = os.path.join(dirname, "new_audio.mp3") ``` **New way (`pathlib.Path`):** ```python from pathlib import Path path = Path("/home/user/audio.mp3") if path.exists(): dirname = path.parent basename = path.name new_path = path.parent / "new_audio.mp3" # Use / operator! ``` **Benefits:** - βœ… More readable and intuitive - βœ… Works across Windows/Mac/Linux automatically - βœ… Chainable methods - βœ… Less error-prone than string concatenation - βœ… Object-oriented design ### Creating Path Objects ```python from pathlib import Path # From a string p = Path("/home/user/app/assets/audio/test.mp3") # From current directory p = Path.cwd() # Current working directory. It does not need input path. # From home directory p = Path.home() # User's home directory (~) # Relative paths p = Path("./assets/audio") ``` ### Path Properties and Methods ```python from pathlib import Path p = Path("/home/user/app/assets/audio/podcast_123.mp3") # Check existence and type p.exists() # True/False - does it exist? p.is_file() # True/False - is it a file? p.is_dir() # True/False - is it a directory? # Get path components p.name # 'podcast_123.mp3' - filename with extension p.stem # 'podcast_123' - filename without extension p.suffix # '.mp3' - file extension p.parent # Path('/home/user/app/assets/audio') - parent directory p.parts # ('/', 'home', 'user', 'app', 'assets', 'audio', 'podcast_123.mp3') # Path conversion str(p) # Convert Path to string p.absolute() # Get absolute path p.resolve() # Resolve symlinks and make absolute ``` ### Common Operations **1. Check if file exists:** ```python path = Path("myfile.txt") if path.exists(): print("File found!") ``` **2. Create directories:** ```python audio_dir = Path("./assets/audio") audio_dir.mkdir(parents=True, exist_ok=True) # parents=True: creates parent directories if needed # exist_ok=True: doesn't raise error if already exists ``` **3. Join paths (the smart way):** ```python base = Path("./assets") audio_file = base / "audio" / "test.mp3" # Use / operator! # Result: Path('./assets/audio/test.mp3') # Works with strings too! file_path = base / "audio" / f"podcast_{123}.mp3" ``` **4. Find files (glob patterns):** ```python audio_dir = Path("./assets/audio") # All MP3 files in directory mp3_files = list(audio_dir.glob("*.mp3")) # All files recursively all_files = list(audio_dir.glob("**/*")) # Specific pattern podcasts = list(audio_dir.glob("podcast_*.mp3")) ``` **5. Read and write files:** ```python path = Path("data.txt") # Write text path.write_text("Hello, world!") # Read text content = path.read_text() # Write bytes (for binary files) path.write_bytes(b'\x89PNG...') # Read bytes data = path.read_bytes() ``` **6. Get file metadata:** ```python path = Path("myfile.txt") stats = path.stat() size_bytes = stats.st_size modified_time = stats.st_mtime ``` ### Real Example from Your Project From `utils/audio_processor.py`: ```python def process_audio_file(audio_path: str) -> Optional[str]: """Validate an audio file using Path.""" # Convert string to Path object path = Path(audio_path) # Check if file exists if not path.exists(): logger.error(f"Audio file not found: {audio_path}") return None # Check file extension if not path.suffix.lower() in ['.mp3', '.wav', '.ogg']: logger.error(f"Invalid audio format: {path.suffix}") return None # Convert back to string for return return str(path) ``` **Why this is better than strings:** - `path.exists()` is clearer than `os.path.exists(audio_path)` - `path.suffix` is simpler than manually parsing the extension - Cross-platform compatible (Windows uses `\`, Unix uses `/`) - Type-safe with IDE autocomplete ### Advanced Example: Cleanup Old Files ```python from pathlib import Path def cleanup_old_files(directory: str, max_files: int = 10): """Remove oldest audio files, keeping only max_files.""" dir_path = Path(directory) if not dir_path.exists(): return # Get all MP3 files sorted by modification time audio_files = sorted( dir_path.glob('*.mp3'), # Find all MP3s key=lambda p: p.stat().st_mtime, # Sort by modified time reverse=True # Newest first ) # Remove oldest files beyond max_files for old_file in audio_files[max_files:]: old_file.unlink() # Delete the file logger.info(f"Removed old file: {old_file}") ``` ### Path Version History - **Python 3.4** (2014): `pathlib` introduced - **Python 3.5** (2015): Bug fixes and improvements - **Python 3.6+** (2016+): Standard library functions accept `Path` objects **Backward compatibility:** If you need to support Python 2.7 or 3.3, use `pathlib2` package. But for modern projects (like yours), just use built-in `pathlib`. ### Quick Reference Table | Task | Old Way (`os.path`) | New Way (`pathlib.Path`) | |------|---------------------|--------------------------| | Check exists | `os.path.exists(path)` | `Path(path).exists()` | | Get filename | `os.path.basename(path)` | `Path(path).name` | | Get directory | `os.path.dirname(path)` | `Path(path).parent` | | Join paths | `os.path.join(a, b)` | `Path(a) / b` | | Get extension | Manual string split | `Path(path).suffix` | | Create directory | `os.makedirs(path)` | `Path(path).mkdir(parents=True)` | | List files | `os.listdir(path)` | `Path(path).iterdir()` | | Read file | `open(path).read()` | `Path(path).read_text()` | ### When to Convert Between Path and String **Rule of thumb:** - Use `Path` objects internally for all file operations - Convert to `str()` only when: - Passing to APIs that don't accept Path - Displaying to user - Storing in JSON or database ```python # Internal: use Path path = Path("./assets/audio") / "file.mp3" # External API: convert to string audio_url = upload_to_api(str(path)) # Display to user: convert to string print(f"Audio saved to: {path}") # Prints nicely automatically ``` --- ## Python Function Basics Functions are the primary way to group code into reusable blocks. Let's break down a function from our codebase: `utils/audio_processor.py`. ```python def process_audio_file(audio_path: str) -> Optional[str]: """ Process and validate an audio file. Args: audio_path: Path to audio file Returns: Validated path or None if invalid """ # ... function body ... return str(path) ``` ### Anatomy of a Function Let's look at each part of the function definition: 1. **`def` keyword**: This signals the start of a function definition. 2. **Function Name**: `process_audio_file`. This is how you'll call the function later. It should be descriptive and follow the `snake_case` convention (all lowercase with underscores). 3. **Parameters (in `()`)**: `(audio_path: str)`. These are the inputs the function accepts. - `audio_path`: The name of the parameter. - `: str`: This is a **type hint**. It tells developers that this function expects `audio_path` to be a string. It helps with code readability and catching errors. 4. **Return Type Hint**: `-> Optional[str]`. This indicates what the function will return. - `Optional[str]` means the function can return either a `str` (string) or `None`. This is very useful for functions that might not always have a valid result to give back. 5. **Docstring**: The triple-quoted string `"""..."""` right after the definition. It explains the function's purpose, arguments (`Args`), and return value (`Returns`). This is essential for documentation. 6. **Function Body**: The indented code block below the definition. This is where the function's logic is implemented. 7. **`return` statement**: This keyword exits the function and passes back a value to whoever called it. ### Why Use Functions? - **Reusability**: Write code once and use it many times. - **Modularity**: Break down complex problems into smaller, manageable pieces. - **Readability**: Well-named functions make code easier to understand. --- ## Learning Philosophy ### Why Learn Module-by-Module? **Bottom-up approach** is recommended for this project: 1. Start with simple utilities (pure Python functions) 2. Progress to MCP tools (understand protocol basics) 3. Study agents (business logic and coordination) 4. Finally tackle orchestration (integration) **Benefits:** - βœ… Build confidence with simple concepts first - βœ… Understand dependencies before integration - βœ… Easier to debug when you know each piece - βœ… Can test components independently ### Learning vs Building Trade-off For a hackathon project, you need to balance: - **Deep understanding**: Takes time, prevents bugs - **Quick delivery**: Ship working product by deadline **Recommended approach for this project:** - **Week 1**: Deep dive into 2-3 core modules - **Week 2**: Implement and integrate - **Week 3**: Test, polish, document --- ## Object-Oriented Programming Basics ### What is a Class? A **class** is a blueprint for creating objects. Think of it as a cookie cutter. ```python class ScienceStoryteller: # The blueprint """Main orchestrator for the Science Storyteller workflow.""" ``` ### Creating Objects (Instantiation) ```python # Creating an object from the class storyteller = ScienceStoryteller() # Now you have a specific storyteller object ``` ### The `__init__` Method (Constructor) The `__init__` method is called **automatically** when you create a new object. ```python class ScienceStoryteller: def __init__(self): # Runs when ScienceStoryteller() is called self.research_agent = ResearchAgent() self.analysis_agent = AnalysisAgent() self.audio_agent = AudioAgent() ``` **Purpose:** Set up the initial state of your object. **When it runs:** ```python storyteller = ScienceStoryteller() # __init__ runs here automatically ``` ### Understanding `self` `self` refers to **this particular object instance**. ```python class ScienceStoryteller: def __init__(self): self.research_agent = ResearchAgent() # Attach to THIS object async def process_topic(self, topic: str): papers = await self.research_agent.search(topic) # Use THIS object's agent ``` **Why `self`?** So each object can have its own separate data. ```python storyteller1 = ScienceStoryteller() # Has its own research_agent storyteller2 = ScienceStoryteller() # Has a different research_agent ``` ### Attributes (Instance Variables) **Attributes** store data that belongs to an object. ```python self.research_agent = ResearchAgent() # This is an attribute self.analysis_agent = AnalysisAgent() # This is an attribute ``` **Accessing attributes:** ```python async def process_topic(self, topic: str): # Use the attributes we created in __init__ papers = await self.research_agent.search(topic) best_paper = await self.analysis_agent.select_best(papers, topic) ``` ### Methods (Functions in a Class) **Methods** define what an object can **do**. ```python class ScienceStoryteller: async def process_topic(self, topic: str): # This is a method """Process a research topic into a podcast.""" # ... implementation ... def _format_paper_info(self, paper: dict) -> str: # Another method """Format paper metadata for display.""" # ... implementation ... ``` **Key points:** - First parameter is always `self` - Called using dot notation: `storyteller.process_topic("AI")` - Can access attributes: `self.research_agent` ### Public vs Private Naming Convention ```python def process_topic(self, topic): # Public - no underscore """Meant to be called from outside the class.""" def _format_paper_info(self, paper): # Private - starts with _ """Internal helper, not meant to be called externally.""" ``` **Convention (not enforced):** - `method_name` β†’ Public, part of the API - `_method_name` β†’ Private, internal use only ### Complete Example ```python class ScienceStoryteller: """Main orchestrator for the Science Storyteller workflow.""" # Constructor - runs when object is created def __init__(self): self.research_agent = ResearchAgent() # Attribute self.analysis_agent = AnalysisAgent() # Attribute self.audio_agent = AudioAgent() # Attribute # Public method - main workflow async def process_topic(self, topic: str): papers = await self.research_agent.search(topic) # Use attribute best_paper = await self.analysis_agent.select_best(papers) paper_info = self._format_paper_info(best_paper) # Call private method return paper_info # Private method - internal helper def _format_paper_info(self, paper: dict) -> str: return f"**Title:** {paper.get('title', 'Unknown')}" # Usage storyteller = ScienceStoryteller() # Create object (__init__ runs) result = await storyteller.process_topic("AlphaFold") # Call method ``` ### Quick Reference | Concept | Syntax | Purpose | |---------|--------|---------| | **Class** | `class ClassName:` | Blueprint for objects | | **Object** | `obj = ClassName()` | Instance created from class | | **Constructor** | `def __init__(self):` | Initialize object state | | **Self** | `self.attribute` | Reference to current object | | **Attribute** | `self.name = value` | Data stored in object | | **Method** | `def method(self, args):` | Function belonging to class | | **Public** | `def method(self):` | External API | | **Private** | `def _method(self):` | Internal helper | --- ## Async/Await Deep Dive ### Why Async? The Three Use Cases Based on [RealPython's async guide](https://realpython.com/async-io-python/): 1. **Writing pausable/resumable functions** 2. **Managing I/O-bound tasks** (network, files, databases) 3. **Improving performance** (handle multiple tasks concurrently) **Science Storyteller uses all three!** ### The Problem: Blocking I/O **Without async (blocking):** ```python def process_topic_sync(topic): papers = requests.get("arxiv_api") # ⏸️ BLOCKS for 5 seconds summary = requests.post("claude_api") # ⏸️ BLOCKS for 10 seconds audio = requests.post("elevenlabs_api") # ⏸️ BLOCKS for 60 seconds return results # Total: 75 seconds of BLOCKING # During blocking: # ❌ UI freezes # ❌ Progress bar can't update # ❌ Other users can't be served # ❌ Event loop is stuck ``` **With async (non-blocking):** ```python async def process_topic(topic): papers = await arxiv_tool.search() # ⏸️ Yields control for 5 seconds summary = await llm_tool.summarize() # ⏸️ Yields control for 10 seconds audio = await audio_tool.convert() # ⏸️ Yields control for 60 seconds return results # Total: 75 seconds, but non-blocking # During await: # βœ… UI stays responsive # βœ… Progress bar updates # βœ… Other users can be served # βœ… Event loop continues running ``` ### Visualizing Blocking vs. Async **Blocking (Sequential) Execution:** ``` Request 1: [--arxiv--|----claude----|----------------audio----------------|] Request 2: [--arxiv--|----claude----|---... Time -----> 0s 5s 15s 75s 80s 90s ``` - The UI is frozen for the entire 75s duration of Request 1. - Request 2 must wait for Request 1 to completely finish. **Async (Concurrent) Execution:** ``` Request 1: [--arxiv--] ... [----claude----] ... [----------------audio----------------] Request 2: [--arxiv--] ... [----claude----] ... [----------------audio----------------] Time -----> 0s 1s 5s 6s 15s 16s 75s ``` - When Request 1 `await`s `arxiv`, the event loop is free to start Request 2. - Both requests run concurrently, sharing time during I/O waits. The UI remains responsive throughout. ### How Async Works: The Event Loop ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Python Asyncio Event Loop β”‚ β”‚ (Single thread, multiple tasks) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ↓ ↓ ↓ Task A Task B Task C (User 1 req) (User 2 req) (User 3 req) ``` **When `await` is hit:** 1. Function **pauses** at that line 2. Control **returns** to the event loop 3. Event loop **runs other code** (updates UI, handles requests) 4. When I/O completes, function **resumes** from where it paused ### Single VM, Multiple Users **Key insight:** On Hugging Face Spaces, **all users share one Python process**. ``` Hugging Face Space (Single VM) β”œβ”€ Python Process (port 7860) β”‚ └─ Event Loop β”‚ β”œβ”€ Task: User A (paused at await) β”‚ β”œβ”€ Task: User B (paused at await) β”‚ └─ Task: User C (paused at await) ``` **Without async (sequential):** ``` User A: 0-75s (completes at 75s) User B: 75-150s (WAITS 75s, then runs 75s = 150s total) User C: 150-225s (WAITS 150s, then runs 75s = 225s total) ``` **With async (concurrent):** ``` User A: 0-75s (completes at 75s) User B: 1-76s (starts 1s later, runs concurrently = 76s total) User C: 2-77s (starts 2s later, runs concurrently = 77s total) ``` ### Performance Comparison | Metric | Without Async | With Async | |--------|--------------|------------| | **User A wait** | 75s | 75s | | **User B wait** | 150s | ~76s | | **User C wait** | 225s | ~77s | | **UI responsiveness** | Frozen | Live updates | | **Progress tracking** | Can't update | Works | | **Concurrent users** | Sequential | Interleaved | ### Gradio + Async Integration Gradio uses **FastAPI** internally, which is async-native: ```python # Gradio internals (simplified) from fastapi import FastAPI app = FastAPI() @app.post("/api/predict") async def predict(request): result = await your_gradio_function(request.data) return result ``` **Why this matters:** - `gr.Progress()` only works with async (sends WebSocket updates) - Gradio's event loop can handle multiple users - Your async functions integrate seamlessly ### Async Syntax Rules **Defining async functions:** ```python async def my_function(): # Note the 'async' keyword result = await some_async_operation() return result ``` **Calling async functions:** ```python # From another async function: result = await my_function() # From synchronous code: import asyncio result = asyncio.run(my_function()) ``` **Common mistake:** ```python # ❌ Wrong - missing await async def process(): result = some_async_function() # This returns a coroutine, not the result! # βœ… Correct - with await async def process(): result = await some_async_function() # This waits and gets the actual result ``` ### The Async Chain in Science Storyteller ``` app.py: process_topic (async) ↓ await agents/research_agent.py: search (async) ↓ await mcp_tools/arxiv_tool.py: search_papers (async) ↓ await session.call_tool() (MCP I/O) ↓ [Network request to arXiv server] ``` **Every step must be async** because: - MCP communication uses async I/O - Can't `await` inside a non-async function - Event loop requires async all the way up --- ## Module-by-Module Learning Path ### Level 1: Foundation (Start Here) #### 1. `utils/audio_processor.py` **What it does:** File system operations for audio files **Key concepts:** - Creating directories with `Path.mkdir()` - Checking file sizes with `os.path.getsize()` - Working with file paths **Learning exercise:** ```python from utils.audio_processor import ensure_audio_dir, get_file_size_mb # Create the audio directory ensure_audio_dir() # Check size of a file (if it exists) # size = get_file_size_mb("assets/audio/podcast_123.mp3") ``` **What to look for:** - How does it handle file paths in a cross-platform way (`pathlib.Path`)? - The use of `exist_ok=True` to prevent errors. - Simple, pure functions that have no side effects other than interacting with the filesystem. **Questions to answer:** - Why use `Path` instead of strings for file paths? - What happens if the directory already exists? - How is file size converted from bytes to MB? --- #### 2. `utils/script_formatter.py` **What it does:** Clean and format podcast scripts for TTS **Key concepts:** - String manipulation (`strip()`, `replace()`) - Regular expressions (if used) - Estimating audio duration from text **Learning exercise:** ```python from utils.script_formatter import format_podcast_script, estimate_duration script = """ Hello! This is a test. With extra spaces and newlines. """ cleaned = format_podcast_script(script) duration = estimate_duration(cleaned) print(f"Cleaned: {cleaned}") print(f"Duration: {duration} seconds") ``` **What to look for:** - How simple string methods (`.strip()`, `.replace()`) are used for cleaning. - The logic for `estimate_duration`: it's a heuristic, not an exact calculation. - This is another example of pure functions that are easy to test. **Questions to answer:** - How does text length relate to audio duration? - What characters need to be cleaned for TTS? - Why estimate duration before generating audio? --- ### Level 2: MCP Tools (Core Hackathon Requirement) #### 3. `mcp_tools/arxiv_tool.py` **What it does:** Connects to arXiv MCP server to search papers **Key concepts:** - Model Context Protocol (MCP) - Stdio transport (stdin/stdout communication) - Async context managers (`__aenter__`, `__aexit__`) - JSON-RPC messaging **Important code sections:** **Connection setup:** ```python server_params = StdioServerParameters( command="npx", args=["-y", "@blindnotation/arxiv-mcp-server"], env=None ) self.exit_stack = stdio_client(server_params) stdio_transport = await self.exit_stack.__aenter__() read_stream, write_stream = stdio_transport self.session = ClientSession(read_stream, write_stream) await self.session.__aenter__() ``` **Calling tools:** ```python result = await self.session.call_tool( "search_arxiv", { "query": query, "max_results": max_results, "sort_by": sort_by } ) ``` **Learning exercise:** ```python import asyncio from mcp_tools.arxiv_tool import ArxivTool async def explore_arxiv(): tool = ArxivTool() # Connect to MCP server connected = await tool.connect() print(f"Connected: {connected}") # Search for papers papers = await tool.search_papers("quantum computing", max_results=3) print(f"Found {len(papers)} papers:") for paper in papers: print(f"\n Title: {paper.get('title', 'N/A')}") print(f" Authors: {paper.get('authors', [])[:2]}") # Clean up await tool.disconnect() asyncio.run(explore_arxiv()) ``` **Questions to answer:** - What is stdio transport and why use it? - Why do we need both `exit_stack` and `session`? - What happens if the MCP server crashes? - How does `call_tool` send messages to the server? **Deep dive topics:** - JSON-RPC protocol format - Async context managers (what `__aenter__` and `__aexit__` do) - Process communication (pipes and streams) --- #### 4. `mcp_tools/llm_tool.py` **What it does:** Calls Anthropic Claude API for summarization **Key concepts:** - HTTP API requests with async - Prompt engineering - API authentication - Response parsing **Important code sections:** **API call:** ```python message = self.client.messages.create( model=self.model, max_tokens=max_tokens, messages=[ {"role": "user", "content": prompt} ] ) summary = message.content[0].text ``` **Learning exercise:** ```python import asyncio from mcp_tools.llm_tool import LLMTool async def test_llm(): tool = LLMTool() # Needs ANTHROPIC_API_KEY in .env # Fake paper data paper = { "title": "Quantum Computing Fundamentals", "summary": "This paper explores the basic principles of quantum computing...", "authors": [{"name": "Alice"}, {"name": "Bob"}] } # Generate summary summary = await tool.summarize_paper(paper, max_tokens=500) print(f"Summary:\n{summary}") asyncio.run(test_llm()) ``` **Questions to answer:** - How is the prompt structured for summarization? - What's the difference between `max_tokens` in the request and actual tokens used? - How does prompt engineering affect output quality? - What happens if the API returns an error? --- ### Level 3: Agents (Business Logic) #### 5. `agents/research_agent.py` **What it does:** Autonomous paper retrieval and search optimization **Key concepts:** - Query enhancement (autonomous planning) - Fallback strategies (self-correction) - Agent initialization and cleanup **Autonomous behaviors:** ```python def _enhance_query(self, topic: str) -> str: """ Autonomous planning - agent decides how to optimize search. """ topic_lower = topic.lower() enhancements = { 'ai': 'artificial intelligence machine learning', 'ml': 'machine learning', 'quantum': 'quantum computing physics', } for key, value in enhancements.items(): if key in topic_lower and value not in topic_lower: return f"{topic} {value}" return topic ``` **Self-correction:** ```python papers = await self.arxiv_tool.search_papers(enhanced_query) if not papers: # Fallback: try original query papers = await self.arxiv_tool.search_papers(topic) ``` **Learning exercise:** ```python from agents.research_agent import ResearchAgent async def test_research(): agent = ResearchAgent() await agent.initialize() # Test query enhancement original = "AI" enhanced = agent._enhance_query(original) print(f"Original: {original}") print(f"Enhanced: {enhanced}") # Test search papers = await agent.search("AlphaFold", max_results=3) print(f"\nFound {len(papers)} papers") await agent.cleanup() asyncio.run(test_research()) ``` **Questions to answer:** - Why enhance queries? What problem does it solve? - When should you use the fallback strategy? - Why initialize and cleanup separately from `__init__`? --- #### 6. `agents/analysis_agent.py` **What it does:** Paper analysis and podcast script generation **Key concepts:** - Paper selection (reasoning) - LLM-based summarization - Script generation with prompt engineering - Fallback content for LLM failures **Autonomous reasoning:** ```python async def select_best(self, papers: list, topic: str): """ Reasoning - evaluate and select most relevant paper. """ scored_papers = [] for paper in papers: score = 0 # Has abstract if paper.get('summary') or paper.get('abstract'): score += 1 # Recent paper pub_date = paper.get('published', '') if '2024' in pub_date or '2023' in pub_date: score += 2 scored_papers.append((score, paper)) scored_papers.sort(key=lambda x: x[0], reverse=True) return scored_papers[0][1] if scored_papers else papers[0] ``` **Learning exercise:** ```python from agents.analysis_agent import AnalysisAgent async def test_analysis(): agent = AnalysisAgent() # Mock paper data papers = [ {"title": "Old Paper", "published": "2020-01-01", "summary": "..."}, {"title": "New Paper", "published": "2024-01-01", "summary": "..."}, ] best = await agent.select_best(papers, "quantum computing") print(f"Selected: {best['title']}") asyncio.run(test_analysis()) ``` **Questions to answer:** - What criteria determine "best" paper? - Why fallback to template content instead of failing? - How does prompt engineering affect script quality? --- #### 7. `agents/audio_agent.py` **What it does:** Text-to-speech conversion via ElevenLabs **Key concepts:** - HTTP POST with binary response - File I/O (saving MP3 bytes) - API timeout handling - Voice configuration **Learning exercise:** ```python from agents.audio_agent import AudioAgent async def test_audio(): agent = AudioAgent() # Needs ELEVENLABS_API_KEY script = "Welcome to Science Storyteller. Today we explore quantum computing." audio_path = await agent.text_to_speech(script) if audio_path: print(f"Audio saved to: {audio_path}") else: print("Audio generation failed") asyncio.run(test_audio()) ``` **Questions to answer:** - Why does TTS take so long (30-60 seconds)? - What happens if the API times out? - How are MP3 bytes different from text? --- ### Level 4: Orchestration (Integration) #### 8. `app.py` - `ScienceStoryteller` Class **What it does:** Coordinates all agents into a complete workflow **Key concepts:** - Orchestrator pattern - Error recovery - Progress tracking - State management **Learning exercise:** ```python from app import ScienceStoryteller async def test_orchestrator(): storyteller = ScienceStoryteller() # Test full workflow result = await storyteller.process_topic("quantum entanglement") summary, script, audio, paper_info, status = result print(f"Status: {status}") if summary: print(f"Summary length: {len(summary)} chars") asyncio.run(test_orchestrator()) ``` **Questions to answer:** - How does the orchestrator handle partial failures? - Why return a tuple instead of a dict? - What's the role of `gr.Progress()`? --- #### 9. `app.py` - Gradio Interface **What it does:** Web UI for user interaction **Key concepts:** - Gradio Blocks API - Event handlers - Async in Gradio - UI layout **Learning exercise:** ```python # Just run the app python app.py # Then interact with the UI to see the flow ``` **Questions to answer:** - How does Gradio handle async functions? - What's the difference between `gr.Blocks` and `gr.Interface`? - How are outputs mapped to UI components? --- ## Hands-On Exercises ### Exercise 1: Test Individual Tools **Goal:** Verify MCP connection works ```python # File: test_my_learning.py import asyncio from mcp_tools.arxiv_tool import ArxivTool async def main(): print("Testing ArxivTool...") tool = ArxivTool() connected = await tool.connect() if connected: print("βœ“ Connected to MCP server") papers = await tool.search_papers("AlphaFold", max_results=2) print(f"βœ“ Found {len(papers)} papers") for i, paper in enumerate(papers, 1): print(f"\n{i}. {paper.get('title', 'N/A')}") await tool.disconnect() print("\nβœ“ Disconnected") else: print("βœ— Failed to connect") if __name__ == "__main__": asyncio.run(main()) ``` Run: `python test_my_learning.py` --- ### Exercise 2: Trace the Async Chain **Goal:** Understand how async calls propagate Add print statements to trace execution: ```python # In arxiv_tool.py async def search_papers(self, query: str, ...): print(f"[ArxivTool] Starting search for: {query}") result = await self.session.call_tool("search_arxiv", {...}) print(f"[ArxivTool] Search complete, parsing results...") return papers # In research_agent.py async def search(self, topic: str, max_results: int = 5): print(f"[ResearchAgent] Enhancing query: {topic}") enhanced = self._enhance_query(topic) print(f"[ResearchAgent] Enhanced to: {enhanced}") papers = await self.arxiv_tool.search_papers(enhanced) print(f"[ResearchAgent] Got {len(papers)} papers") return papers ``` Then run and watch the flow! --- ### Exercise 3: Mock External Dependencies **Goal:** Test without API keys ```python # test_mock.py from unittest.mock import AsyncMock, Mock from agents.research_agent import ResearchAgent async def test_with_mock(): agent = ResearchAgent() # Mock the arxiv_tool to avoid real API calls agent.arxiv_tool.search_papers = AsyncMock(return_value=[ {"title": "Fake Paper 1", "summary": "Test"}, {"title": "Fake Paper 2", "summary": "Test"}, ]) papers = await agent.search("test topic") assert len(papers) == 2 print(f"βœ“ Mock test passed: {len(papers)} papers") asyncio.run(test_with_mock()) ``` --- ### Exercise 4: Build a Mini Version **Goal:** Understand the workflow by simplifying ```python # mini_storyteller.py import asyncio class MiniStoryteller: """Simplified version to understand the flow""" def __init__(self): print("πŸ“š Initializing agents...") self.research = "ResearchAgent" self.analysis = "AnalysisAgent" self.audio = "AudioAgent" async def process(self, topic): print(f"\nπŸ” Step 1: Search for '{topic}'") await asyncio.sleep(1) # Simulate API call papers = ["Paper 1", "Paper 2"] print(f"πŸ“ Step 2: Select best paper") await asyncio.sleep(1) best = papers[0] print(f"✍️ Step 3: Summarize '{best}'") await asyncio.sleep(1) summary = "This is a summary..." print(f"πŸŽ™οΈ Step 4: Generate script") await asyncio.sleep(1) script = "Welcome to the podcast..." print(f"πŸ”Š Step 5: Convert to audio") await asyncio.sleep(2) audio = "podcast.mp3" print(f"βœ… Done!") return summary, script, audio async def main(): storyteller = MiniStoryteller() result = await storyteller.process("AlphaFold") print(f"\nResult: {result}") asyncio.run(main()) ``` --- ## Common Patterns Explained ### Pattern 1: Async Context Managers **What you see:** ```python self.exit_stack = stdio_client(server_params) stdio_transport = await self.exit_stack.__aenter__() # ... use the connection ... await self.exit_stack.__aexit__(None, None, None) ``` **What it means:** - `__aenter__`: Setup (open connection, allocate resources) - `__aexit__`: Cleanup (close connection, free resources) **Better syntax:** ```python async with stdio_client(server_params) as stdio_transport: # Connection is open here read_stream, write_stream = stdio_transport # ... use streams ... # Connection automatically closed when block exits ``` **Why the manual version in the code?** - Need to keep connection alive for multiple operations - Can't use `async with` because connection persists beyond one function call --- ### Pattern 2: Optional Parameters with Defaults ```python async def search(self, topic: str, max_results: int = 5): """Search with default max_results""" ``` **Usage:** ```python # Use default papers = await agent.search("AI") # max_results=5 # Override default papers = await agent.search("AI", max_results=10) ``` --- ### Pattern 3: Type Hints ```python async def search_papers( self, query: str, # Must be a string max_results: int = 5, # Must be an int, defaults to 5 sort_by: str = "relevance" # Must be a string, defaults to "relevance" ) -> List[Dict[str, Any]]: # Returns a list of dictionaries ``` **Benefits:** - Self-documenting code - IDE autocomplete - Type checking tools (mypy) - Easier to catch bugs --- ### Pattern 4: Dictionary `.get()` with Defaults ```python title = paper.get('title', 'Unknown') # Returns 'Unknown' if 'title' key missing ``` **Why not just `paper['title']`?** - `paper['title']` β†’ Raises `KeyError` if missing - `paper.get('title', 'Unknown')` β†’ Returns default if missing (safer) --- ### Pattern 5: List Comprehension ```python author_names = [ author.get('name', '') for author in authors[:5] if isinstance(author, dict) ] ``` **Equivalent to:** ```python author_names = [] for author in authors[:5]: if isinstance(author, dict): author_names.append(author.get('name', '')) ``` --- ### Pattern 6: Try/Except for Error Handling ```python try: result = await api_call() return result except Exception as e: logger.error(f"API error: {e}") return fallback_result() ``` **Why?** - External APIs can fail - Network can be unreliable - Graceful degradation instead of crashes --- ## Debugging Tips ### Tip 1: Use Print Debugging Add strategic print statements: ```python async def search(self, topic: str): print(f"πŸ” [DEBUG] Searching for: {topic}") enhanced = self._enhance_query(topic) print(f"πŸ” [DEBUG] Enhanced to: {enhanced}") papers = await self.arxiv_tool.search_papers(enhanced) print(f"πŸ” [DEBUG] Found {len(papers)} papers") return papers ``` --- ### Tip 2: Check Logs The app uses Python's logging: ```python logging.basicConfig( level=logging.INFO, # Change to DEBUG for more detail format='%(levelname)s - %(name)s - %(message)s' ) ``` Run with verbose logging: ```bash python app.py 2>&1 | tee app.log ``` --- ### Tip 3: Use Python REPL Test small pieces interactively: ```bash $ python >>> from utils.script_formatter import estimate_duration >>> text = "Hello world, this is a test." >>> duration = estimate_duration(text) >>> print(duration) 5 ``` --- ### Tip 4: Check Environment Variables ```bash # Verify API keys are set echo $ANTHROPIC_API_KEY echo $ELEVENLABS_API_KEY # Or in Python import os print(os.getenv("ANTHROPIC_API_KEY")) ``` --- ### Tip 5: Test Error Cases ```python # Test with invalid input result = await storyteller.process_topic("") # Empty string result = await storyteller.process_topic("xyzinvalidtopic999") # No results ``` --- ### Tip 6: Use Async Debugger For complex async issues: ```python import asyncio asyncio.run(my_function(), debug=True) # Enables debug mode ``` --- ## Further Resources ### Official Documentation - **Python Async/Await**: [RealPython Guide](https://realpython.com/async-io-python/) - **MCP Protocol**: [Official Docs](https://modelcontextprotocol.io/) - **Anthropic Claude API**: [API Reference](https://docs.anthropic.com/claude/reference) - **Gradio**: [Documentation](https://www.gradio.app/docs) - **ElevenLabs**: [API Docs](https://elevenlabs.io/docs/api-reference) ### Learning Paths **If you're new to async:** 1. Read RealPython's async guide 2. Practice with simple async examples 3. Understand event loops 4. Study this project's async chain **If you're new to OOP:** 1. Python classes tutorial 2. Understand `self` and `__init__` 3. Practice with simple class examples 4. Study `ScienceStoryteller` class **If you're new to MCP:** 1. Read MCP specification 2. Understand stdio transport 3. Study `ArxivTool` implementation 4. Try building your own MCP tool ### Practice Projects **After understanding this codebase:** 1. **Add a new MCP tool**: Try Semantic Scholar instead of arXiv 2. **Add a new agent**: Create a fact-checking agent 3. **Extend functionality**: Add multiple podcast voices 4. **Improve error handling**: Better retry logic 5. **Add caching**: Cache arXiv results for 24 hours --- ## Review Checklist Before moving on, can you answer: - [ ] What's the difference between a class and an object? - [ ] What does `self` refer to? - [ ] When does `__init__` run? - [ ] Why use `async`/`await`? - [ ] How does the event loop work? - [ ] What is MCP and why use it? - [ ] How do the three agents differ? - [ ] What does the orchestrator do? - [ ] How does Gradio integrate with async? - [ ] Where would you add error handling? - [ ] What is the difference between a unit and an integration test? --- ## Your Learning Journey **Recommended 3-Week Plan:** ### Week 1: Fundamentals - Day 1-2: OOP basics (`__init__`, `self`, methods) - Day 3-4: Async/await concepts - Day 5-7: Study `utils/` and `mcp_tools/` ### Week 2: Implementation - Day 8-10: Understand all three agents - Day 11-12: Study orchestrator - Day 13-14: Explore Gradio interface ### Week 3: Integration & Polish - Day 15-17: Test full workflow - Day 18-19: Fix bugs, improve error handling - Day 20-21: Polish UI, prepare demo --- **Remember:** Deep understanding takes time. Don't rush. Each module builds on the previous one. Master the basics before tackling integration! --- **Last Updated:** November 17, 2025 **Version:** 1.0 **For:** MCP's 1st Birthday Hackathon 2025 --- ## πŸ§ͺ Testing Strategy A good testing strategy is crucial for building reliable software. For this project, we can use a model called the "Testing Pyramid." ### Unit Tests **Definition:** Test individual components in isolation. - **What to test:** Pure functions, methods with no external dependencies. - **Tools:** Python's built-in `unittest` or `pytest`. - **Example:** ```python import unittest class TestArxivTool(unittest.TestCase): def test_search_papers(self): tool = ArxivTool() result = asyncio.run(tool.search_papers("AI")) self.assertGreater(len(result), 0) ``` ### Integration Tests **Definition:** Test how components work together. - **What to test:** Interactions between modules, like agent and tool communication. - **Tools:** `pytest` with async support. - **Example:** ```python async def test_agent_tool_integration(): agent = ResearchAgent() await agent.initialize() papers = await agent.search("AI") self.assertIsInstance(papers, list) self.assertGreater(len(papers), 0) ``` ### End-to-End Tests **Definition:** Test the complete workflow from start to finish. - **What to test:** User scenarios, like submitting a topic and receiving audio. - **Tools:** Gradio's built-in testing, Selenium for UI tests. - **Example:** ```python def test_gradio_interface(client): response = client.post("/api/predict", json={"data": "AI in healthcare"}) assert response.status_code == 200 assert "audio" in response.json() ``` ### Load Tests **Definition:** Test system behavior under heavy load. - **What to test:** How the system handles many requests at once. - **Tools:** Locust, JMeter. - **Example:** ``` locust -f load_test.py ``` ### Security Tests **Definition:** Identify vulnerabilities in the application. - **What to test:** API security, data validation, authentication. - **Tools:** OWASP ZAP, Burp Suite. - **Example:** ``` zap-cli quick-scan --self-contained --spider -r http://localhost:7860 ``` ### Best Practices - **Automate tests**: Use CI/CD pipelines to run tests automatically. - **Test coverage**: Aim for at least 80% coverage, but prioritize critical paths. - **Mock external services**: Use tools like `vcr.py` or `responses` to mock API calls. - **Data-driven tests**: Use parameterized tests to cover multiple scenarios. - **Regularly review and update tests**: As the code evolves, so should the tests. ---