Spaces:

MCP-1st-Birthday
/

science-storyteller

Running

App Files Files Community

science-storyteller / README.md

tuhulab

feat: Implement Kokoro-82M TTS via HF Inference API

f05f9c7 19 days ago

preview code

raw

history blame contribute delete

11.5 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: Science storyteller
sdk: gradio
emoji: 📚
pinned: true
short_description: science told with ease

🎧 Science Storyteller: Research to Podcast

MCP's 1st Birthday Hackathon Submission
Track: Track 2 - MCP in Action (Multimodal)
Tag: mcp-in-action-track-multimodal

🎯 Project Overview

Science Storyteller transforms complex scientific research papers into accessible, engaging audio podcasts. Enter any research topic, and our AI-powered system will:

Search for relevant papers using Semantic Scholar API (all research fields)
Analyze and summarize the research using Claude AI
Generate an engaging podcast script optimized for storytelling
Convert to audio using Kokoro-82M (HF Inference API) - high-quality, open-source
Deliver a complete podcast episode you can listen to anywhere

This project makes cutting-edge science accessible to everyone—from researchers to curious learners—through the power of audio storytelling.

✨ Key Features

🤖 Autonomous Agent Behavior

Planning: Intelligently enhances search queries for better results
Reasoning: Evaluates and selects the most relevant paper from multiple results
Execution: Orchestrates multi-step workflow from search to audio generation
Self-correction: Implements fallback strategies when API calls fail

🔧 Direct API Integration

Semantic Scholar API: Research paper retrieval across all scientific fields
Direct HTTP requests: Simple, reliable, production-ready (no MCP subprocess overhead)
Claude AI: Advanced summarization and script generation via Anthropic API
Proper error handling: Retry logic, rate limiting, fallback strategies

🎨 Polished User Experience

Clean, responsive Gradio interface
Real-time progress indicators
Mobile-friendly design
Example topics for quick start
Tabbed output (Audio, Summary, Script, Source)

🎵 Multimodal Output

Text: Comprehensive summaries and podcast scripts
Audio: High-quality WAV podcasts via Kokoro-82M (HF Inference API)
Metadata: Full source paper citations and links

🏗️ Architecture

┌─────────────┐
│   User      │ Enters research topic
└──────┬──────┘
       │
       ▼
┌─────────────────────────────────────┐
│    Gradio Interface (app.py)        │
│  - User input handling              │
│  - Progress tracking                │
│  - Result display                   │
└──────┬──────────────────────────────┘
       │
       ▼
┌─────────────────────────────────────┐
│  Science Storyteller Orchestrator   │
│  - Autonomous workflow planning     │
│  - Agent coordination               │
│  - Error handling & recovery        │
└──────┬──────────────────────────────┘
       │
       ├──► ResearchAgent ──► Semantic Scholar API (Direct HTTP)
       │     (Search & retrieve papers across all fields)
       │
       ├──► AnalysisAgent ──► Claude AI ──► Anthropic API
       │     (Summarize & create script)
       │
       └──► AudioAgent ──► Kokoro-82M ──► HF Inference API
             (Text-to-speech conversion - high quality, open-source)

Directory Structure

app/
├── app.py                      # Main Gradio application
├── requirements.txt            # Python dependencies
├── README.md                   # This file
├── .env.example               # Environment variable template
├── .gitignore                 # Git ignore rules
│
├── agents/                    # Autonomous agents
│   ├── __init__.py
│   ├── research_agent.py      # Paper search & retrieval
│   ├── analysis_agent.py      # Summarization & scripting
│   └── audio_agent.py         # Text-to-speech conversion
│
├── mcp_tools/                 # API integrations
│   ├── __init__.py
│   ├── scholar_tool.py        # Semantic Scholar Direct API client
│   └── llm_tool.py            # Claude AI wrapper
│
├── utils/                     # Utility functions
│   ├── __init__.py
│   ├── script_formatter.py    # Script formatting
│   └── audio_processor.py     # Audio file handling
│
└── assets/                    # Generated content
    ├── audio/                 # Generated podcasts
    └── examples/              # Example outputs

🚀 Getting Started

Prerequisites

Python 3.10+
API Keys:
- Semantic Scholar API (optional, for higher rate limits)
- Anthropic API for Claude AI
- Hugging Face Token for Kokoro-82M TTS

Installation

Clone the repository:
```
git clone <your-repo-url>
cd app
```
Install Python dependencies:
```
pip install -r requirements.txt
```

Set up environment variables:

cp .env.example .env
# Edit .env and add your API keys

Configure your .env file:

SEMANTIC_SCHOLAR_API=your_semantic_scholar_api_key_here  # Optional
ANTHROPIC_API_KEY=your_anthropic_api_key_here
HUGGINGFACE_TOKEN=your_hf_token_here  # For Kokoro-82M TTS

Run the application:
```
python app.py
```
Open your browser: Navigate to http://localhost:7860

Using in Hugging Face Spaces

This project is designed to run seamlessly on Hugging Face Spaces:

Add your API keys in Space Settings → Secrets:
- SEMANTIC_SCHOLAR_API (optional, but recommended for higher rate limits)
- ANTHROPIC_API_KEY
- HUGGINGFACE_TOKEN (for Kokoro-82M TTS via Inference API)
The Space will automatically install dependencies and launch

🎬 Usage

Enter a research topic (e.g., "AlphaFold", "CRISPR gene editing", "quantum computing")
Click "Generate Podcast"
Wait for the AI agents to search, analyze, and generate content (~1-2 minutes)
Listen to your podcast in the Audio tab
Read the summary and script in their respective tabs
Check the source paper in the Source Paper tab

Example Topics

Artificial Intelligence:

Transformer neural network architecture
AlphaFold 3 protein structure prediction
GPT language models
Diffusion models for image generation

Medicine & Health:

mRNA vaccine technology and development
Tuberculosis vaccine BCG immunotherapy
Cancer immunotherapy checkpoint inhibitors
CRISPR Cas9 gene editing applications

Astronomy & Physics:

Comet 3I/ATLAS interstellar trajectory
Gravitational waves detection
Quantum entanglement Bell inequality
Dark matter detection experiments

Climate & Environment:

Climate change ocean acidification
Carbon capture and storage technologies
Renewable energy grid integration
Arctic ice sheet dynamics

Biology:

Gut microbiome metabolic pathways
Neuroscience brain plasticity
Evolutionary genetics adaptation

🛠️ Technology Stack

Component	Technology	Purpose
Frontend	Gradio 5.x	Interactive web interface
Backend	Python 3.10+	Application logic
Research API	Semantic Scholar	Direct HTTP API for paper retrieval
AI Analysis	Claude 3.5 Sonnet	Summarization & script generation
Audio	Kokoro-82M	HF Inference API TTS (Apache-2.0)
HTTP Client	requests library	Reliable API communication
Deployment	Hugging Face Spaces	Cloud hosting

🎯 Hackathon Requirements Coverage

✅ Track 2: MCP in Action

Autonomous Agent Behavior:
- Planning (query enhancement, paper selection)
- Reasoning (best paper evaluation)
- Execution (multi-step workflow orchestration)
- Self-correction (fallback strategies)
API Integration:
- Uses Semantic Scholar API directly for reliable research retrieval
- Follows REST API best practices
- Demonstrates proper async HTTP client usage
- Rate limiting and retry logic implemented
Gradio Application:
- Built with Gradio 5.x
- Professional UI/UX
- Progress indicators
- Mobile-responsive
Real-world Value:
- Makes research accessible to non-experts
- Saves time for researchers doing literature review
- Educational tool for science communication
- Multimodal output (text + audio)

🎖️ Advanced Features (Bonus)

Context Engineering: Optimized prompts for summarization and script generation
Error Handling: Comprehensive fallback strategies with retry logic
Caching: Efficient file management
Multimodal: Combines text analysis with audio generation
Production-ready: Direct API calls, no subprocess dependencies

📊 Performance

Search Speed: < 5 seconds for paper retrieval
Analysis Time: 10-20 seconds for summarization
Script Generation: 10-20 seconds
Audio Synthesis: 30-60 seconds (varies by length)
Total Time: ~1-2 minutes for complete workflow

🎥 Demo & Links

📹 Demo Video

Coming Soon: Watch the demo (1-5 minutes)

The demo showcases:

Complete workflow from topic input to podcast output
Autonomous agent behavior
Direct API integration
User interface features

📱 Social Media

Coming Soon: Social media post link

🧪 Testing

Run the test suite to verify all components:

# Test Semantic Scholar API integration
python test_scholar_direct.py

# Test individual components
python test_components.py

🤝 Contributing

This project was created for the MCP's 1st Birthday Hackathon (November 14-30, 2025). Feel free to:

Report bugs via Issues
Suggest improvements
Fork and extend for your own use cases

📝 License

MIT License - feel free to use this project for learning and development.

🙏 Acknowledgments

Anthropic for the Model Context Protocol and Claude AI
Gradio for the amazing web framework
Semantic Scholar for comprehensive research paper access across all fields
Kokoro-82M (@hexgrad) for the excellent open-source TTS model
Hugging Face for hosting, infrastructure, and Inference API
MCP Community for the hackathon opportunity

🔮 Future Enhancements

Potential improvements for future versions:

Support for multiple research sources (arXiv, PubMed, etc.)
Multiple voice options for narration
Podcast series generation for related topics
Export to various audio formats
Integration with podcast platforms
Multi-language support
User accounts for saving favorite podcasts
Custom voice training
Background music and sound effects
Batch processing for multiple topics

📧 Contact

Created for MCP's 1st Birthday Hackathon 2025
Track 2: MCP in Action (Multimodal)

Made with ❤️ for science communication and AI innovation