science-storyteller / README.md
tuhulab's picture
feat: Implement Kokoro-82M TTS via HF Inference API
f05f9c7

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: Science storyteller
sdk: gradio
emoji: ๐Ÿ“š
pinned: true
short_description: science told with ease

๐ŸŽง Science Storyteller: Research to Podcast

MCP's 1st Birthday Hackathon Submission
Track: Track 2 - MCP in Action (Multimodal)
Tag: mcp-in-action-track-multimodal

๐ŸŽฏ Project Overview

Science Storyteller transforms complex scientific research papers into accessible, engaging audio podcasts. Enter any research topic, and our AI-powered system will:

  1. Search for relevant papers using Semantic Scholar API (all research fields)
  2. Analyze and summarize the research using Claude AI
  3. Generate an engaging podcast script optimized for storytelling
  4. Convert to audio using Kokoro-82M (HF Inference API) - high-quality, open-source
  5. Deliver a complete podcast episode you can listen to anywhere

This project makes cutting-edge science accessible to everyoneโ€”from researchers to curious learnersโ€”through the power of audio storytelling.

โœจ Key Features

๐Ÿค– Autonomous Agent Behavior

  • Planning: Intelligently enhances search queries for better results
  • Reasoning: Evaluates and selects the most relevant paper from multiple results
  • Execution: Orchestrates multi-step workflow from search to audio generation
  • Self-correction: Implements fallback strategies when API calls fail

๐Ÿ”ง Direct API Integration

  • Semantic Scholar API: Research paper retrieval across all scientific fields
  • Direct HTTP requests: Simple, reliable, production-ready (no MCP subprocess overhead)
  • Claude AI: Advanced summarization and script generation via Anthropic API
  • Proper error handling: Retry logic, rate limiting, fallback strategies

๐ŸŽจ Polished User Experience

  • Clean, responsive Gradio interface
  • Real-time progress indicators
  • Mobile-friendly design
  • Example topics for quick start
  • Tabbed output (Audio, Summary, Script, Source)

๐ŸŽต Multimodal Output

  • Text: Comprehensive summaries and podcast scripts
  • Audio: High-quality WAV podcasts via Kokoro-82M (HF Inference API)
  • Metadata: Full source paper citations and links

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   User      โ”‚ Enters research topic
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚
       โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚    Gradio Interface (app.py)        โ”‚
โ”‚  - User input handling              โ”‚
โ”‚  - Progress tracking                โ”‚
โ”‚  - Result display                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚
       โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Science Storyteller Orchestrator   โ”‚
โ”‚  - Autonomous workflow planning     โ”‚
โ”‚  - Agent coordination               โ”‚
โ”‚  - Error handling & recovery        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚
       โ”œโ”€โ”€โ–บ ResearchAgent โ”€โ”€โ–บ Semantic Scholar API (Direct HTTP)
       โ”‚     (Search & retrieve papers across all fields)
       โ”‚
       โ”œโ”€โ”€โ–บ AnalysisAgent โ”€โ”€โ–บ Claude AI โ”€โ”€โ–บ Anthropic API
       โ”‚     (Summarize & create script)
       โ”‚
       โ””โ”€โ”€โ–บ AudioAgent โ”€โ”€โ–บ Kokoro-82M โ”€โ”€โ–บ HF Inference API
             (Text-to-speech conversion - high quality, open-source)

Directory Structure

app/
โ”œโ”€โ”€ app.py                      # Main Gradio application
โ”œโ”€โ”€ requirements.txt            # Python dependencies
โ”œโ”€โ”€ README.md                   # This file
โ”œโ”€โ”€ .env.example               # Environment variable template
โ”œโ”€โ”€ .gitignore                 # Git ignore rules
โ”‚
โ”œโ”€โ”€ agents/                    # Autonomous agents
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ research_agent.py      # Paper search & retrieval
โ”‚   โ”œโ”€โ”€ analysis_agent.py      # Summarization & scripting
โ”‚   โ””โ”€โ”€ audio_agent.py         # Text-to-speech conversion
โ”‚
โ”œโ”€โ”€ mcp_tools/                 # API integrations
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ scholar_tool.py        # Semantic Scholar Direct API client
โ”‚   โ””โ”€โ”€ llm_tool.py            # Claude AI wrapper
โ”‚
โ”œโ”€โ”€ utils/                     # Utility functions
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ script_formatter.py    # Script formatting
โ”‚   โ””โ”€โ”€ audio_processor.py     # Audio file handling
โ”‚
โ””โ”€โ”€ assets/                    # Generated content
    โ”œโ”€โ”€ audio/                 # Generated podcasts
    โ””โ”€โ”€ examples/              # Example outputs

๐Ÿš€ Getting Started

Prerequisites

Installation

  1. Clone the repository:

    git clone <your-repo-url>
    cd app
    
  2. Install Python dependencies:

    pip install -r requirements.txt
    
  3. Set up environment variables:

    cp .env.example .env
    # Edit .env and add your API keys
    
  4. Configure your .env file:

    SEMANTIC_SCHOLAR_API=your_semantic_scholar_api_key_here  # Optional
    ANTHROPIC_API_KEY=your_anthropic_api_key_here
    HUGGINGFACE_TOKEN=your_hf_token_here  # For Kokoro-82M TTS
    
  5. Run the application:

    python app.py
    
  6. Open your browser: Navigate to http://localhost:7860

Using in Hugging Face Spaces

This project is designed to run seamlessly on Hugging Face Spaces:

  1. Add your API keys in Space Settings โ†’ Secrets:

    • SEMANTIC_SCHOLAR_API (optional, but recommended for higher rate limits)
    • ANTHROPIC_API_KEY
    • HUGGINGFACE_TOKEN (for Kokoro-82M TTS via Inference API)
  2. The Space will automatically install dependencies and launch

๐ŸŽฌ Usage

  1. Enter a research topic (e.g., "AlphaFold", "CRISPR gene editing", "quantum computing")
  2. Click "Generate Podcast"
  3. Wait for the AI agents to search, analyze, and generate content (~1-2 minutes)
  4. Listen to your podcast in the Audio tab
  5. Read the summary and script in their respective tabs
  6. Check the source paper in the Source Paper tab

Example Topics

Artificial Intelligence:

  • Transformer neural network architecture
  • AlphaFold 3 protein structure prediction
  • GPT language models
  • Diffusion models for image generation

Medicine & Health:

  • mRNA vaccine technology and development
  • Tuberculosis vaccine BCG immunotherapy
  • Cancer immunotherapy checkpoint inhibitors
  • CRISPR Cas9 gene editing applications

Astronomy & Physics:

  • Comet 3I/ATLAS interstellar trajectory
  • Gravitational waves detection
  • Quantum entanglement Bell inequality
  • Dark matter detection experiments

Climate & Environment:

  • Climate change ocean acidification
  • Carbon capture and storage technologies
  • Renewable energy grid integration
  • Arctic ice sheet dynamics

Biology:

  • Gut microbiome metabolic pathways
  • Neuroscience brain plasticity
  • Evolutionary genetics adaptation

๐Ÿ› ๏ธ Technology Stack

Component Technology Purpose
Frontend Gradio 5.x Interactive web interface
Backend Python 3.10+ Application logic
Research API Semantic Scholar Direct HTTP API for paper retrieval
AI Analysis Claude 3.5 Sonnet Summarization & script generation
Audio Kokoro-82M HF Inference API TTS (Apache-2.0)
HTTP Client requests library Reliable API communication
Deployment Hugging Face Spaces Cloud hosting

๐ŸŽฏ Hackathon Requirements Coverage

โœ… Track 2: MCP in Action

  • Autonomous Agent Behavior:

    • Planning (query enhancement, paper selection)
    • Reasoning (best paper evaluation)
    • Execution (multi-step workflow orchestration)
    • Self-correction (fallback strategies)
  • API Integration:

    • Uses Semantic Scholar API directly for reliable research retrieval
    • Follows REST API best practices
    • Demonstrates proper async HTTP client usage
    • Rate limiting and retry logic implemented
  • Gradio Application:

    • Built with Gradio 5.x
    • Professional UI/UX
    • Progress indicators
    • Mobile-responsive
  • Real-world Value:

    • Makes research accessible to non-experts
    • Saves time for researchers doing literature review
    • Educational tool for science communication
    • Multimodal output (text + audio)

๐ŸŽ–๏ธ Advanced Features (Bonus)

  • Context Engineering: Optimized prompts for summarization and script generation
  • Error Handling: Comprehensive fallback strategies with retry logic
  • Caching: Efficient file management
  • Multimodal: Combines text analysis with audio generation
  • Production-ready: Direct API calls, no subprocess dependencies

๐Ÿ“Š Performance

  • Search Speed: < 5 seconds for paper retrieval
  • Analysis Time: 10-20 seconds for summarization
  • Script Generation: 10-20 seconds
  • Audio Synthesis: 30-60 seconds (varies by length)
  • Total Time: ~1-2 minutes for complete workflow

๐ŸŽฅ Demo & Links

๐Ÿ“น Demo Video

Coming Soon: Watch the demo (1-5 minutes)

The demo showcases:

  • Complete workflow from topic input to podcast output
  • Autonomous agent behavior
  • Direct API integration
  • User interface features

๐Ÿ“ฑ Social Media

Coming Soon: Social media post link

๐Ÿงช Testing

Run the test suite to verify all components:

# Test Semantic Scholar API integration
python test_scholar_direct.py

# Test individual components
python test_components.py

๐Ÿค Contributing

This project was created for the MCP's 1st Birthday Hackathon (November 14-30, 2025). Feel free to:

  • Report bugs via Issues
  • Suggest improvements
  • Fork and extend for your own use cases

๐Ÿ“ License

MIT License - feel free to use this project for learning and development.

๐Ÿ™ Acknowledgments

  • Anthropic for the Model Context Protocol and Claude AI
  • Gradio for the amazing web framework
  • Semantic Scholar for comprehensive research paper access across all fields
  • Kokoro-82M (@hexgrad) for the excellent open-source TTS model
  • Hugging Face for hosting, infrastructure, and Inference API
  • MCP Community for the hackathon opportunity

๐Ÿ”ฎ Future Enhancements

Potential improvements for future versions:

  • Support for multiple research sources (arXiv, PubMed, etc.)
  • Multiple voice options for narration
  • Podcast series generation for related topics
  • Export to various audio formats
  • Integration with podcast platforms
  • Multi-language support
  • User accounts for saving favorite podcasts
  • Custom voice training
  • Background music and sound effects
  • Batch processing for multiple topics

๐Ÿ“ง Contact

Created for MCP's 1st Birthday Hackathon 2025
Track 2: MCP in Action (Multimodal)


Made with โค๏ธ for science communication and AI innovation