--- title: BirdScope AI - MCP Multi-Agent System emoji: 🦅 colorFrom: green colorTo: blue sdk: gradio sdk_version: 6.0.1 python_version: 3.11 app_file: app.py pinned: false license: mit short_description: AI-powered bird identification with MCP multi-agent system tags: - building-mcp-track-enterprise - building-mcp-track-consumer - building-mcp-track-creative - mcp-in-action-track-enterprise - mcp-in-action-track-consumer - mcp-in-action-track-creative --- # 🦅 BirdScope AI - MCP Multi-Agent System **AI-powered bird identification with specialized MCP agents** Built for the [MCP 1st Birthday Hackathon](https://huggingface.co/MCP-1st-Birthday) --- ## 📢 Hackathon Submission **Social Media:** [Twitter/X Post](https://x.com/zulucoconuts/status/1995255281064755708) **Demo Video:** [Watch on YouTube/Loom](https://youtu.be/V_ZoOkyjEyU) **Track Submissions:** - 🔧 **Track 1 (Building MCP)**: Two custom MCP servers - **Nuthatch MCP Server** - 7 tools for bird species database (search, species info, images, audio, family search, conservation filtering) - **Modal Bird Classifier MCP** - 2 Modal-hosted GPU-powered image classification tools (base64 & URL inputs) - Categories: Enterprise (wildlife conservation) | Consumer (bird enthusiasts and education) | Creative (multimedia exploration) - 🤖 **Track 2 (MCP in Action)**: Full multi-agent system with supervisor routing - LangGraph-based supervisor orchestrating 3 specialized subagents - Integrates both MCP servers with intelligent tool routing - Categories: Enterprise (conservation orgs) | Consumer (bird watchers) | Creative (educational multimedia) **Author:** [@facemelter](https://huggingface.co/facemelter) **Built with:** Gradio 6 | LangGraph | FastMCP | Modal (GPU) | OpenAI/Anthropic/HuggingFace LLMs --- ## 🌐 Project Overview BirdScope AI showcases an advanced multi-agent system powered by **Gradio 6** and **LangGraph**, designed to identify bird species, explore multimedia content, and provide educational information about birds worldwide. **Our innovation:** We built **two complete systems in one**: - 🔧 **Two Custom MCP Servers** (Track 1): Nuthatch species database (7 tools) + Modal GPU classifier (2 tools) - 🤖 **Multi-Agent Application** (Track 2): Supervisor-orchestrated specialist agents This dual approach demonstrates both **building MCP infrastructure** and **leveraging MCP for autonomous agents**. --- ## ✨ Key Features ### 🤖 Multi-Agent Orchestration - **LangGraph Supervisor Pattern** with intelligent LLM-based routing - **3 Specialized Subagents** (Image Identifier, Species Explorer, Taxonomy Specialist) - **Session-based Agent Caching** - Agents reused within user sessions for 10x faster responses - **Provider-Specific Prompts** - Optimized system prompts for OpenAI, Anthropic, and HuggingFace ### 🔧 Dual MCP Server Architecture - **Modal Bird Classifier** ([modal.com](https://modal.com)) - [prithivMLmods/Bird-Species-Classifier-526](https://huggingface.co/prithivMLmods/Bird-Species-Classifier-526) from HuggingFace - 526 bird species classification on Modal T4 GPU - Serverless GPU deployment for on-demand classification - Streamable HTTP transport with base64 and URL input support - **Nuthatch MCP Server** (Custom Built - Track 1) - FastMCP framework with 7 specialized tools - Integrates [Nuthatch API](https://nuthatch.lastelm.software) (1000+ species) - **Dual Transport Support**: STDIO (subprocess) for HF Spaces + HTTP for local debugging - Data sources: Nuthatch DB, Unsplash (images), xeno-canto (audio) ### 📡 Dual Streaming Output - **Chat Response Stream** - Real-time markdown rendering with embedded media - **Tool Execution Log Stream** - Parallel visibility into MCP tool calls (inputs/outputs) - **Async Progress Indicators** - Immediate user feedback before processing begins ### 🎨 Structured Output Parsing - **LlamaIndex Pydantic Models** - Type-safe response formatting - **Regex URL Extraction** - Automatic detection of image and audio URLs - **Smart Audio Normalization** - xeno-canto links converted to browser-friendly format (`/download` → playable) - **Markdown Media Embedding** - Images and audio automatically formatted ### 🌐 Multi-Provider LLM Support - **OpenAI** (GPT-4o-mini) - Recommended for reliability - **Anthropic** (Claude Sonnet 4) - Best for complex reasoning - **HuggingFace Inference API** - Open-source models (limited tool calling) - **User-Provided Keys** - No backend API key required, users supply their own ### 💅 Production UI/UX - **Gradio 6.0 SSR** - Server-side rendering for enhanced performance - **Custom Cloud Theme** - Sky-inspired CSS with mobile-responsive design - **Dynamic Examples** - Example queries adapt to selected agent mode - **Instant Feedback** - "⏳ Starting..." indicator appears immediately on submit --- ## 🗂️ Data Sources & MCP Servers We built **two custom MCP servers** that integrate with bird data APIs and GPU-powered classification: **Data Sources:** - **Nuthatch API** ([nuthatch.lastelm.software](https://nuthatch.lastelm.software)) - 1000+ bird species database by Last Elm Software - **Unsplash** - High-quality reference images for visual identification - **xeno-canto.org** - Community-contributed bird audio recordings worldwide - **HuggingFace Model** - [prithivMLmods/Bird-Species-Classifier-526](https://huggingface.co/prithivMLmods/Bird-Species-Classifier-526) for GPU classification **MCP Servers:** 1. **Nuthatch MCP Server** (Track 1 - Building MCP) - 7 specialized tools: search, species info, images, audio, family search, conservation filtering - STDIO transport for HF Spaces, HTTP option for local debugging - FastMCP framework with async API integration 2. **Modal Bird Classifier** (GPU-powered) - Image classification tools: URL and base64 input support - Serverless GPU deployment via Modal - Streamable HTTP transport --- ## 🧩 Core Components **Multi-Agent Orchestration:** - **LangGraph Supervisor Pattern** - LLM-based routing between specialist agents - **3 Specialized Subagents** - Each with focused tool subset (image ID, species exploration, taxonomy) - **Session-based Caching** - Agent instances reused within user sessions for performance - **Dual Streaming** - Parallel chat response + tool execution log streams **Agent Architecture:** - `subagent_supervisor.py` - Creates supervisor workflow with LangGraph - `subagent_factory.py` - Builds specialists with filtered tool access - `subagent_config.py` - Defines agent modes and tool allocations - `prompts.py` - Provider-specific system prompts (OpenAI, Anthropic, HuggingFace) **UI & UX:** - **Gradio 6.0** with SSR for enhanced performance - Custom cloud-themed CSS with mobile-responsive design - Dynamic examples that adapt to agent mode selection - Immediate processing feedback with async streaming updates --- ## 🚀 Quick Start **Try the Live Demo:** Just provide your LLM API key (OpenAI, Anthropic, or HuggingFace) in the sidebar and start exploring! **For Developers:** ```bash # Clone and install git clone cd hackathon_draft pip install -r requirements.txt # Configure environment cp .env.example .env # Edit .env with your API keys # Run locally python app.py ``` **Deploy to HuggingFace Spaces:** ```bash python upload_to_space.py # Configure Secrets in Space Settings (see docs/dev/main-README.md) ``` **Full Setup Guide:** See [docs/dev/main-README.md](docs/dev/main-README.md) for comprehensive deployment instructions --- ## 🏆 Credits & License Built for the [HuggingFace MCP 1st Birthday Hackathon](https://huggingface.co/MCP-1st-Birthday) **Data Sources:** [Nuthatch API](https://nuthatch.lastelm.software) (Last Elm Software) | [xeno-canto.org](https://xeno-canto.org) | [Unsplash](https://unsplash.com) **Technology:** [Model Context Protocol](https://github.com/anthropics/mcp) | [LangGraph](https://github.com/langchain-ai/langgraph) | [Gradio 6](https://gradio.app) | [Modal](https://modal.com) MIT License - Educational and research purposes