Spaces:
Running
Running
Commit
Β·
e720905
1
Parent(s):
631e5fc
fix: complete audit fixes for documentation accuracy
Browse filesCode Changes:
- Remove DeepCriticalError backwards-compat alias (src/utils/exceptions.py)
Documentation Accuracy Fixes:
- Update CLAUDE.md, AGENTS.md, GEMINI.md: "Phases 1-13" β "Phases 1-14"
- Replace 11_phase_biorxiv.md with 11_phase_europepmc.md (actual implementation)
- Fix all bioRxiv β Europe PMC references in:
- workflow-diagrams.md (mermaid diagrams)
- 05_phase_magentic.md (code examples)
- 04_phase_ui.md (imports)
- roadmap.md (directory tree, phase list)
- index.md (links)
All 127 tests still pass. Documentation now accurately reflects codebase.
- AGENTS.md +1 -1
- CLAUDE.md +1 -1
- GEMINI.md +1 -6
- docs/implementation/04_phase_ui.md +4 -4
- docs/implementation/05_phase_magentic.md +15 -15
- docs/implementation/11_phase_biorxiv.md +0 -572
- docs/implementation/11_phase_europepmc.md +181 -0
- docs/implementation/roadmap.md +4 -4
- docs/index.md +1 -1
- docs/workflow-diagrams.md +16 -16
- src/utils/exceptions.py +0 -4
AGENTS.md
CHANGED
|
@@ -6,7 +6,7 @@ This file provides guidance to AI agents when working with code in this reposito
|
|
| 6 |
|
| 7 |
DeepBoner is an AI-native sexual health research agent. It uses a search-and-judge loop to autonomously search biomedical databases (PubMed, ClinicalTrials.gov, Europe PMC) and synthesize evidence for queries like "What drugs improve female libido post-menopause?" or "Evidence for testosterone therapy in women with HSDD?".
|
| 8 |
|
| 9 |
-
**Current Status:** Phases 1-
|
| 10 |
|
| 11 |
## Development Commands
|
| 12 |
|
|
|
|
| 6 |
|
| 7 |
DeepBoner is an AI-native sexual health research agent. It uses a search-and-judge loop to autonomously search biomedical databases (PubMed, ClinicalTrials.gov, Europe PMC) and synthesize evidence for queries like "What drugs improve female libido post-menopause?" or "Evidence for testosterone therapy in women with HSDD?".
|
| 8 |
|
| 9 |
+
**Current Status:** Phases 1-14 COMPLETE (Foundation through Demo Submission).
|
| 10 |
|
| 11 |
## Development Commands
|
| 12 |
|
CLAUDE.md
CHANGED
|
@@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
|
|
| 6 |
|
| 7 |
DeepBoner is an AI-native sexual health research agent. It uses a search-and-judge loop to autonomously search biomedical databases (PubMed, ClinicalTrials.gov, Europe PMC) and synthesize evidence for queries like "What drugs improve female libido post-menopause?" or "Evidence for testosterone therapy in women with HSDD?".
|
| 8 |
|
| 9 |
-
**Current Status:** Phases 1-
|
| 10 |
|
| 11 |
## Development Commands
|
| 12 |
|
|
|
|
| 6 |
|
| 7 |
DeepBoner is an AI-native sexual health research agent. It uses a search-and-judge loop to autonomously search biomedical databases (PubMed, ClinicalTrials.gov, Europe PMC) and synthesize evidence for queries like "What drugs improve female libido post-menopause?" or "Evidence for testosterone therapy in women with HSDD?".
|
| 8 |
|
| 9 |
+
**Current Status:** Phases 1-14 COMPLETE (Foundation through Demo Submission).
|
| 10 |
|
| 11 |
## Development Commands
|
| 12 |
|
GEMINI.md
CHANGED
|
@@ -8,12 +8,7 @@
|
|
| 8 |
**Architecture:**
|
| 9 |
The project follows a **Vertical Slice Architecture** (Search -> Judge -> Orchestrator) and adheres to **Strict TDD** (Test-Driven Development).
|
| 10 |
|
| 11 |
-
**Current Status:**
|
| 12 |
-
|
| 13 |
-
- **Phases 1-9:** COMPLETE. Foundation, Search, Judge, UI, Orchestrator, Embeddings, Hypothesis, Report, Cleanup.
|
| 14 |
-
- **Phases 10-11:** COMPLETE. ClinicalTrials.gov and Europe PMC integration.
|
| 15 |
-
- **Phase 12:** COMPLETE. MCP Server integration (Gradio MCP at `/gradio_api/mcp/`).
|
| 16 |
-
- **Phase 13:** COMPLETE. Modal sandbox for statistical analysis.
|
| 17 |
|
| 18 |
## Tech Stack & Tooling
|
| 19 |
|
|
|
|
| 8 |
**Architecture:**
|
| 9 |
The project follows a **Vertical Slice Architecture** (Search -> Judge -> Orchestrator) and adheres to **Strict TDD** (Test-Driven Development).
|
| 10 |
|
| 11 |
+
**Current Status:** Phases 1-14 COMPLETE (Foundation through Demo Submission).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
## Tech Stack & Tooling
|
| 14 |
|
docs/implementation/04_phase_ui.md
CHANGED
|
@@ -409,7 +409,7 @@ from typing import AsyncGenerator
|
|
| 409 |
from src.orchestrator import Orchestrator
|
| 410 |
from src.tools.pubmed import PubMedTool
|
| 411 |
from src.tools.clinicaltrials import ClinicalTrialsTool
|
| 412 |
-
from src.tools.
|
| 413 |
from src.tools.search_handler import SearchHandler
|
| 414 |
from src.agent_factory.judges import JudgeHandler, HFInferenceJudgeHandler
|
| 415 |
from src.utils.models import OrchestratorConfig, AgentEvent
|
|
@@ -443,7 +443,7 @@ def create_orchestrator(
|
|
| 443 |
|
| 444 |
# Create search tools
|
| 445 |
search_handler = SearchHandler(
|
| 446 |
-
tools=[PubMedTool(), ClinicalTrialsTool(),
|
| 447 |
timeout=30.0,
|
| 448 |
)
|
| 449 |
|
|
@@ -1033,7 +1033,7 @@ uv run python -m src.app
|
|
| 1033 |
import asyncio
|
| 1034 |
from src.orchestrator import Orchestrator
|
| 1035 |
from src.tools.pubmed import PubMedTool
|
| 1036 |
-
from src.tools.
|
| 1037 |
from src.tools.clinicaltrials import ClinicalTrialsTool
|
| 1038 |
from src.tools.search_handler import SearchHandler
|
| 1039 |
from src.agent_factory.judges import HFInferenceJudgeHandler, MockJudgeHandler
|
|
@@ -1041,7 +1041,7 @@ from src.utils.models import OrchestratorConfig
|
|
| 1041 |
|
| 1042 |
async def test_full_flow():
|
| 1043 |
# Create components
|
| 1044 |
-
search_handler = SearchHandler([PubMedTool(), ClinicalTrialsTool(),
|
| 1045 |
|
| 1046 |
# Option 1: Use FREE HuggingFace Inference (real AI analysis)
|
| 1047 |
judge_handler = HFInferenceJudgeHandler()
|
|
|
|
| 409 |
from src.orchestrator import Orchestrator
|
| 410 |
from src.tools.pubmed import PubMedTool
|
| 411 |
from src.tools.clinicaltrials import ClinicalTrialsTool
|
| 412 |
+
from src.tools.europepmc import EuropePMCTool
|
| 413 |
from src.tools.search_handler import SearchHandler
|
| 414 |
from src.agent_factory.judges import JudgeHandler, HFInferenceJudgeHandler
|
| 415 |
from src.utils.models import OrchestratorConfig, AgentEvent
|
|
|
|
| 443 |
|
| 444 |
# Create search tools
|
| 445 |
search_handler = SearchHandler(
|
| 446 |
+
tools=[PubMedTool(), ClinicalTrialsTool(), EuropePMCTool()],
|
| 447 |
timeout=30.0,
|
| 448 |
)
|
| 449 |
|
|
|
|
| 1033 |
import asyncio
|
| 1034 |
from src.orchestrator import Orchestrator
|
| 1035 |
from src.tools.pubmed import PubMedTool
|
| 1036 |
+
from src.tools.europepmc import EuropePMCTool
|
| 1037 |
from src.tools.clinicaltrials import ClinicalTrialsTool
|
| 1038 |
from src.tools.search_handler import SearchHandler
|
| 1039 |
from src.agent_factory.judges import HFInferenceJudgeHandler, MockJudgeHandler
|
|
|
|
| 1041 |
|
| 1042 |
async def test_full_flow():
|
| 1043 |
# Create components
|
| 1044 |
+
search_handler = SearchHandler([PubMedTool(), ClinicalTrialsTool(), EuropePMCTool()])
|
| 1045 |
|
| 1046 |
# Option 1: Use FREE HuggingFace Inference (real AI analysis)
|
| 1047 |
judge_handler = HFInferenceJudgeHandler()
|
docs/implementation/05_phase_magentic.md
CHANGED
|
@@ -97,9 +97,9 @@ async def search_pubmed(query: str, max_results: int = 10) -> str:
|
|
| 97 |
search_agent = ChatAgent(
|
| 98 |
name="SearchAgent",
|
| 99 |
description="Searches biomedical databases for drug repurposing evidence",
|
| 100 |
-
instructions="You search PubMed, ClinicalTrials.gov, and
|
| 101 |
chat_client=OpenAIChatClient(model_id="gpt-4o-mini"), # INTERNAL LLM
|
| 102 |
-
tools=[search_pubmed, search_clinicaltrials,
|
| 103 |
)
|
| 104 |
```
|
| 105 |
|
|
@@ -286,14 +286,14 @@ This preserves semantic deduplication and structured citation access.
|
|
| 286 |
from agent_framework import AIFunction
|
| 287 |
|
| 288 |
from src.agents.state import get_magentic_state
|
| 289 |
-
from src.tools.
|
| 290 |
from src.tools.clinicaltrials import ClinicalTrialsTool
|
| 291 |
from src.tools.pubmed import PubMedTool
|
| 292 |
|
| 293 |
# Singleton tool instances
|
| 294 |
_pubmed = PubMedTool()
|
| 295 |
_clinicaltrials = ClinicalTrialsTool()
|
| 296 |
-
|
| 297 |
|
| 298 |
|
| 299 |
def _format_results(results: list, source_name: str, query: str) -> str:
|
|
@@ -382,21 +382,21 @@ async def search_clinical_trials(query: str, max_results: int = 10) -> str:
|
|
| 382 |
|
| 383 |
|
| 384 |
@AIFunction
|
| 385 |
-
async def
|
| 386 |
-
"""Search
|
| 387 |
|
| 388 |
-
Use this tool to find the latest research
|
| 389 |
-
|
| 390 |
|
| 391 |
Args:
|
| 392 |
query: Search terms (e.g., "long covid treatment")
|
| 393 |
max_results: Maximum results to return (default 10)
|
| 394 |
|
| 395 |
Returns:
|
| 396 |
-
Formatted list of
|
| 397 |
"""
|
| 398 |
# 1. Execute search
|
| 399 |
-
results = await
|
| 400 |
|
| 401 |
# 2. Update shared state
|
| 402 |
state = get_magentic_state()
|
|
@@ -406,7 +406,7 @@ async def search_preprints(query: str, max_results: int = 10) -> str:
|
|
| 406 |
total_new = len(unique)
|
| 407 |
total_stored = len(state.evidence_store)
|
| 408 |
|
| 409 |
-
output = _format_results(results, "
|
| 410 |
output += f"\n[State: {total_new} new, {total_stored} total in evidence store]"
|
| 411 |
|
| 412 |
return output
|
|
@@ -513,7 +513,7 @@ def create_search_agent(chat_client: OpenAIChatClient | None = None) -> ChatAgen
|
|
| 513 |
|
| 514 |
return ChatAgent(
|
| 515 |
name="SearchAgent",
|
| 516 |
-
description="Searches biomedical databases (PubMed, ClinicalTrials.gov,
|
| 517 |
instructions="""You are a biomedical search specialist. When asked to find evidence:
|
| 518 |
|
| 519 |
1. Analyze the request to determine what to search for
|
|
@@ -521,13 +521,13 @@ def create_search_agent(chat_client: OpenAIChatClient | None = None) -> ChatAgen
|
|
| 521 |
3. Use the appropriate search tools:
|
| 522 |
- search_pubmed for peer-reviewed papers
|
| 523 |
- search_clinical_trials for clinical studies
|
| 524 |
-
-
|
| 525 |
4. Summarize what you found and highlight key evidence
|
| 526 |
|
| 527 |
Be thorough - search multiple databases when appropriate.
|
| 528 |
Focus on finding: mechanisms of action, clinical evidence, and specific drug candidates.""",
|
| 529 |
chat_client=client,
|
| 530 |
-
tools=[search_pubmed, search_clinical_trials,
|
| 531 |
temperature=0.3, # More deterministic for tool use
|
| 532 |
)
|
| 533 |
|
|
@@ -790,7 +790,7 @@ class MagenticOrchestrator:
|
|
| 790 |
task = f"""Research drug repurposing opportunities for: {query}
|
| 791 |
|
| 792 |
Workflow:
|
| 793 |
-
1. SearchAgent: Find evidence from PubMed, ClinicalTrials.gov, and
|
| 794 |
2. HypothesisAgent: Generate mechanistic hypotheses (Drug β Target β Pathway β Effect)
|
| 795 |
3. JudgeAgent: Evaluate if evidence is sufficient
|
| 796 |
4. If insufficient β SearchAgent refines search based on gaps
|
|
|
|
| 97 |
search_agent = ChatAgent(
|
| 98 |
name="SearchAgent",
|
| 99 |
description="Searches biomedical databases for drug repurposing evidence",
|
| 100 |
+
instructions="You search PubMed, ClinicalTrials.gov, and Europe PMC for evidence.",
|
| 101 |
chat_client=OpenAIChatClient(model_id="gpt-4o-mini"), # INTERNAL LLM
|
| 102 |
+
tools=[search_pubmed, search_clinicaltrials, search_europepmc], # TOOLS
|
| 103 |
)
|
| 104 |
```
|
| 105 |
|
|
|
|
| 286 |
from agent_framework import AIFunction
|
| 287 |
|
| 288 |
from src.agents.state import get_magentic_state
|
| 289 |
+
from src.tools.europepmc import EuropePMCTool
|
| 290 |
from src.tools.clinicaltrials import ClinicalTrialsTool
|
| 291 |
from src.tools.pubmed import PubMedTool
|
| 292 |
|
| 293 |
# Singleton tool instances
|
| 294 |
_pubmed = PubMedTool()
|
| 295 |
_clinicaltrials = ClinicalTrialsTool()
|
| 296 |
+
_europepmc = EuropePMCTool()
|
| 297 |
|
| 298 |
|
| 299 |
def _format_results(results: list, source_name: str, query: str) -> str:
|
|
|
|
| 382 |
|
| 383 |
|
| 384 |
@AIFunction
|
| 385 |
+
async def search_europepmc(query: str, max_results: int = 10) -> str:
|
| 386 |
+
"""Search Europe PMC for preprints and papers.
|
| 387 |
|
| 388 |
+
Use this tool to find the latest research including preprints
|
| 389 |
+
from bioRxiv, medRxiv, and peer-reviewed papers.
|
| 390 |
|
| 391 |
Args:
|
| 392 |
query: Search terms (e.g., "long covid treatment")
|
| 393 |
max_results: Maximum results to return (default 10)
|
| 394 |
|
| 395 |
Returns:
|
| 396 |
+
Formatted list of papers with abstracts and links
|
| 397 |
"""
|
| 398 |
# 1. Execute search
|
| 399 |
+
results = await _europepmc.search(query, max_results)
|
| 400 |
|
| 401 |
# 2. Update shared state
|
| 402 |
state = get_magentic_state()
|
|
|
|
| 406 |
total_new = len(unique)
|
| 407 |
total_stored = len(state.evidence_store)
|
| 408 |
|
| 409 |
+
output = _format_results(results, "Europe PMC", query)
|
| 410 |
output += f"\n[State: {total_new} new, {total_stored} total in evidence store]"
|
| 411 |
|
| 412 |
return output
|
|
|
|
| 513 |
|
| 514 |
return ChatAgent(
|
| 515 |
name="SearchAgent",
|
| 516 |
+
description="Searches biomedical databases (PubMed, ClinicalTrials.gov, Europe PMC) for drug repurposing evidence",
|
| 517 |
instructions="""You are a biomedical search specialist. When asked to find evidence:
|
| 518 |
|
| 519 |
1. Analyze the request to determine what to search for
|
|
|
|
| 521 |
3. Use the appropriate search tools:
|
| 522 |
- search_pubmed for peer-reviewed papers
|
| 523 |
- search_clinical_trials for clinical studies
|
| 524 |
+
- search_europepmc for preprints and additional papers
|
| 525 |
4. Summarize what you found and highlight key evidence
|
| 526 |
|
| 527 |
Be thorough - search multiple databases when appropriate.
|
| 528 |
Focus on finding: mechanisms of action, clinical evidence, and specific drug candidates.""",
|
| 529 |
chat_client=client,
|
| 530 |
+
tools=[search_pubmed, search_clinical_trials, search_europepmc],
|
| 531 |
temperature=0.3, # More deterministic for tool use
|
| 532 |
)
|
| 533 |
|
|
|
|
| 790 |
task = f"""Research drug repurposing opportunities for: {query}
|
| 791 |
|
| 792 |
Workflow:
|
| 793 |
+
1. SearchAgent: Find evidence from PubMed, ClinicalTrials.gov, and Europe PMC
|
| 794 |
2. HypothesisAgent: Generate mechanistic hypotheses (Drug β Target β Pathway β Effect)
|
| 795 |
3. JudgeAgent: Evaluate if evidence is sufficient
|
| 796 |
4. If insufficient β SearchAgent refines search based on gaps
|
docs/implementation/11_phase_biorxiv.md
DELETED
|
@@ -1,572 +0,0 @@
|
|
| 1 |
-
# Phase 11 Implementation Spec: bioRxiv Preprint Integration
|
| 2 |
-
|
| 3 |
-
**Goal**: Add cutting-edge preprint search for the latest research.
|
| 4 |
-
**Philosophy**: "Preprints are where breakthroughs appear first."
|
| 5 |
-
**Prerequisite**: Phase 10 complete (ClinicalTrials.gov working)
|
| 6 |
-
**Estimated Time**: 2-3 hours
|
| 7 |
-
|
| 8 |
-
---
|
| 9 |
-
|
| 10 |
-
## 1. Why bioRxiv?
|
| 11 |
-
|
| 12 |
-
### Scientific Value
|
| 13 |
-
|
| 14 |
-
| Feature | Value for Drug Repurposing |
|
| 15 |
-
|---------|---------------------------|
|
| 16 |
-
| **Cutting-edge research** | 6-12 months ahead of PubMed |
|
| 17 |
-
| **Rapid publication** | Days, not months |
|
| 18 |
-
| **Free full-text** | Complete papers, not just abstracts |
|
| 19 |
-
| **medRxiv included** | Medical preprints via same API |
|
| 20 |
-
| **No API key required** | Free and open |
|
| 21 |
-
|
| 22 |
-
### The Preprint Advantage
|
| 23 |
-
|
| 24 |
-
```
|
| 25 |
-
Traditional Publication Timeline:
|
| 26 |
-
Research β Submit β Review β Revise β Accept β Publish
|
| 27 |
-
|___________________________ 6-18 months _______________|
|
| 28 |
-
|
| 29 |
-
Preprint Timeline:
|
| 30 |
-
Research β Upload β Available
|
| 31 |
-
|______ 1-3 days ______|
|
| 32 |
-
```
|
| 33 |
-
|
| 34 |
-
**For drug repurposing**: Preprints contain the newest hypotheses and evidence!
|
| 35 |
-
|
| 36 |
-
---
|
| 37 |
-
|
| 38 |
-
## 2. API Specification
|
| 39 |
-
|
| 40 |
-
### Endpoint
|
| 41 |
-
|
| 42 |
-
```
|
| 43 |
-
Base URL: https://api.biorxiv.org/details/[server]/[interval]/[cursor]/[format]
|
| 44 |
-
```
|
| 45 |
-
|
| 46 |
-
### Servers
|
| 47 |
-
|
| 48 |
-
| Server | Content |
|
| 49 |
-
|--------|---------|
|
| 50 |
-
| `biorxiv` | Biology preprints |
|
| 51 |
-
| `medrxiv` | Medical preprints (more relevant for us!) |
|
| 52 |
-
|
| 53 |
-
### Interval Formats
|
| 54 |
-
|
| 55 |
-
| Format | Example | Description |
|
| 56 |
-
|--------|---------|-------------|
|
| 57 |
-
| Date range | `2024-01-01/2024-12-31` | Papers between dates |
|
| 58 |
-
| Recent N | `50` | Most recent N papers |
|
| 59 |
-
| Recent N days | `30d` | Papers from last N days |
|
| 60 |
-
|
| 61 |
-
### Response Format
|
| 62 |
-
|
| 63 |
-
```json
|
| 64 |
-
{
|
| 65 |
-
"collection": [
|
| 66 |
-
{
|
| 67 |
-
"doi": "10.1101/2024.01.15.123456",
|
| 68 |
-
"title": "Metformin repurposing for neurodegeneration",
|
| 69 |
-
"authors": "Smith, J; Jones, A",
|
| 70 |
-
"date": "2024-01-15",
|
| 71 |
-
"category": "neuroscience",
|
| 72 |
-
"abstract": "We investigated metformin's potential..."
|
| 73 |
-
}
|
| 74 |
-
],
|
| 75 |
-
"messages": [{"status": "ok", "count": 100}]
|
| 76 |
-
}
|
| 77 |
-
```
|
| 78 |
-
|
| 79 |
-
### Rate Limits
|
| 80 |
-
|
| 81 |
-
- No official limit, but be respectful
|
| 82 |
-
- Results paginated (100 per call)
|
| 83 |
-
- Use cursor for pagination
|
| 84 |
-
|
| 85 |
-
### Documentation
|
| 86 |
-
|
| 87 |
-
- [bioRxiv API](https://api.biorxiv.org/)
|
| 88 |
-
- [medrxivr R package docs](https://docs.ropensci.org/medrxivr/)
|
| 89 |
-
|
| 90 |
-
---
|
| 91 |
-
|
| 92 |
-
## 3. Search Strategy
|
| 93 |
-
|
| 94 |
-
### Challenge: bioRxiv API Limitations
|
| 95 |
-
|
| 96 |
-
The bioRxiv API does NOT support keyword search directly. It returns papers by:
|
| 97 |
-
- Date range
|
| 98 |
-
- Recent count
|
| 99 |
-
|
| 100 |
-
### Solution: Client-Side Filtering
|
| 101 |
-
|
| 102 |
-
```python
|
| 103 |
-
# Strategy:
|
| 104 |
-
# 1. Fetch recent papers (e.g., last 90 days)
|
| 105 |
-
# 2. Filter by keyword matching in title/abstract
|
| 106 |
-
# 3. Use embeddings for semantic matching (leverage Phase 6!)
|
| 107 |
-
```
|
| 108 |
-
|
| 109 |
-
### Alternative: Content Search Endpoint
|
| 110 |
-
|
| 111 |
-
```
|
| 112 |
-
https://api.biorxiv.org/pubs/[server]/[doi_prefix]
|
| 113 |
-
```
|
| 114 |
-
|
| 115 |
-
For searching, we can use the publisher endpoint with filtering.
|
| 116 |
-
|
| 117 |
-
---
|
| 118 |
-
|
| 119 |
-
## 4. Data Model
|
| 120 |
-
|
| 121 |
-
### 4.1 Update Citation Source Type (`src/utils/models.py`)
|
| 122 |
-
|
| 123 |
-
```python
|
| 124 |
-
# After Phase 11
|
| 125 |
-
source: Literal["pubmed", "clinicaltrials", "biorxiv"]
|
| 126 |
-
```
|
| 127 |
-
|
| 128 |
-
### 4.2 Evidence from Preprints
|
| 129 |
-
|
| 130 |
-
```python
|
| 131 |
-
Evidence(
|
| 132 |
-
content=abstract[:2000],
|
| 133 |
-
citation=Citation(
|
| 134 |
-
source="biorxiv", # or "medrxiv"
|
| 135 |
-
title=title,
|
| 136 |
-
url=f"https://doi.org/{doi}",
|
| 137 |
-
date=date,
|
| 138 |
-
authors=authors.split("; ")[:5]
|
| 139 |
-
),
|
| 140 |
-
relevance=0.75 # Preprints slightly lower than peer-reviewed
|
| 141 |
-
)
|
| 142 |
-
```
|
| 143 |
-
|
| 144 |
-
---
|
| 145 |
-
|
| 146 |
-
## 5. Implementation
|
| 147 |
-
|
| 148 |
-
### 5.1 bioRxiv Tool (`src/tools/biorxiv.py`)
|
| 149 |
-
|
| 150 |
-
```python
|
| 151 |
-
"""bioRxiv/medRxiv preprint search tool."""
|
| 152 |
-
|
| 153 |
-
import re
|
| 154 |
-
from datetime import datetime, timedelta
|
| 155 |
-
|
| 156 |
-
import httpx
|
| 157 |
-
from tenacity import retry, stop_after_attempt, wait_exponential
|
| 158 |
-
|
| 159 |
-
from src.utils.exceptions import SearchError
|
| 160 |
-
from src.utils.models import Citation, Evidence
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
class BioRxivTool:
|
| 164 |
-
"""Search tool for bioRxiv and medRxiv preprints."""
|
| 165 |
-
|
| 166 |
-
BASE_URL = "https://api.biorxiv.org/details"
|
| 167 |
-
# Use medRxiv for medical/clinical content (more relevant for drug repurposing)
|
| 168 |
-
DEFAULT_SERVER = "medrxiv"
|
| 169 |
-
# Fetch papers from last N days
|
| 170 |
-
DEFAULT_DAYS = 90
|
| 171 |
-
|
| 172 |
-
def __init__(self, server: str = DEFAULT_SERVER, days: int = DEFAULT_DAYS):
|
| 173 |
-
"""
|
| 174 |
-
Initialize bioRxiv tool.
|
| 175 |
-
|
| 176 |
-
Args:
|
| 177 |
-
server: "biorxiv" or "medrxiv"
|
| 178 |
-
days: How many days back to search
|
| 179 |
-
"""
|
| 180 |
-
self.server = server
|
| 181 |
-
self.days = days
|
| 182 |
-
|
| 183 |
-
@property
|
| 184 |
-
def name(self) -> str:
|
| 185 |
-
return "biorxiv"
|
| 186 |
-
|
| 187 |
-
@retry(
|
| 188 |
-
stop=stop_after_attempt(3),
|
| 189 |
-
wait=wait_exponential(multiplier=1, min=1, max=10),
|
| 190 |
-
reraise=True,
|
| 191 |
-
)
|
| 192 |
-
async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
|
| 193 |
-
"""
|
| 194 |
-
Search bioRxiv/medRxiv for preprints matching query.
|
| 195 |
-
|
| 196 |
-
Note: bioRxiv API doesn't support keyword search directly.
|
| 197 |
-
We fetch recent papers and filter client-side.
|
| 198 |
-
|
| 199 |
-
Args:
|
| 200 |
-
query: Search query (keywords)
|
| 201 |
-
max_results: Maximum results to return
|
| 202 |
-
|
| 203 |
-
Returns:
|
| 204 |
-
List of Evidence objects from preprints
|
| 205 |
-
"""
|
| 206 |
-
# Build date range for last N days
|
| 207 |
-
end_date = datetime.now().strftime("%Y-%m-%d")
|
| 208 |
-
start_date = (datetime.now() - timedelta(days=self.days)).strftime("%Y-%m-%d")
|
| 209 |
-
interval = f"{start_date}/{end_date}"
|
| 210 |
-
|
| 211 |
-
# Fetch recent papers
|
| 212 |
-
url = f"{self.BASE_URL}/{self.server}/{interval}/0/json"
|
| 213 |
-
|
| 214 |
-
async with httpx.AsyncClient(timeout=30.0) as client:
|
| 215 |
-
try:
|
| 216 |
-
response = await client.get(url)
|
| 217 |
-
response.raise_for_status()
|
| 218 |
-
except httpx.HTTPStatusError as e:
|
| 219 |
-
raise SearchError(f"bioRxiv search failed: {e}") from e
|
| 220 |
-
|
| 221 |
-
data = response.json()
|
| 222 |
-
papers = data.get("collection", [])
|
| 223 |
-
|
| 224 |
-
# Filter papers by query keywords
|
| 225 |
-
query_terms = self._extract_terms(query)
|
| 226 |
-
matching = self._filter_by_keywords(papers, query_terms, max_results)
|
| 227 |
-
|
| 228 |
-
return [self._paper_to_evidence(paper) for paper in matching]
|
| 229 |
-
|
| 230 |
-
def _extract_terms(self, query: str) -> list[str]:
|
| 231 |
-
"""Extract search terms from query."""
|
| 232 |
-
# Simple tokenization, lowercase
|
| 233 |
-
terms = re.findall(r'\b\w+\b', query.lower())
|
| 234 |
-
# Filter out common stop words
|
| 235 |
-
stop_words = {'the', 'a', 'an', 'in', 'on', 'for', 'and', 'or', 'of', 'to'}
|
| 236 |
-
return [t for t in terms if t not in stop_words and len(t) > 2]
|
| 237 |
-
|
| 238 |
-
def _filter_by_keywords(
|
| 239 |
-
self, papers: list[dict], terms: list[str], max_results: int
|
| 240 |
-
) -> list[dict]:
|
| 241 |
-
"""Filter papers that contain query terms in title or abstract."""
|
| 242 |
-
scored_papers = []
|
| 243 |
-
|
| 244 |
-
for paper in papers:
|
| 245 |
-
title = paper.get("title", "").lower()
|
| 246 |
-
abstract = paper.get("abstract", "").lower()
|
| 247 |
-
text = f"{title} {abstract}"
|
| 248 |
-
|
| 249 |
-
# Count matching terms
|
| 250 |
-
matches = sum(1 for term in terms if term in text)
|
| 251 |
-
|
| 252 |
-
if matches > 0:
|
| 253 |
-
scored_papers.append((matches, paper))
|
| 254 |
-
|
| 255 |
-
# Sort by match count (descending)
|
| 256 |
-
scored_papers.sort(key=lambda x: x[0], reverse=True)
|
| 257 |
-
|
| 258 |
-
return [paper for _, paper in scored_papers[:max_results]]
|
| 259 |
-
|
| 260 |
-
def _paper_to_evidence(self, paper: dict) -> Evidence:
|
| 261 |
-
"""Convert a preprint paper to Evidence."""
|
| 262 |
-
doi = paper.get("doi", "")
|
| 263 |
-
title = paper.get("title", "Untitled")
|
| 264 |
-
authors_str = paper.get("authors", "Unknown")
|
| 265 |
-
date = paper.get("date", "Unknown")
|
| 266 |
-
abstract = paper.get("abstract", "No abstract available.")
|
| 267 |
-
category = paper.get("category", "")
|
| 268 |
-
|
| 269 |
-
# Parse authors (format: "Smith, J; Jones, A")
|
| 270 |
-
authors = [a.strip() for a in authors_str.split(";")][:5]
|
| 271 |
-
|
| 272 |
-
# Note this is a preprint in the content
|
| 273 |
-
content = (
|
| 274 |
-
f"[PREPRINT - Not peer-reviewed] "
|
| 275 |
-
f"{abstract[:1800]}... "
|
| 276 |
-
f"Category: {category}."
|
| 277 |
-
)
|
| 278 |
-
|
| 279 |
-
return Evidence(
|
| 280 |
-
content=content[:2000],
|
| 281 |
-
citation=Citation(
|
| 282 |
-
source="biorxiv",
|
| 283 |
-
title=title[:500],
|
| 284 |
-
url=f"https://doi.org/{doi}" if doi else f"https://www.medrxiv.org/",
|
| 285 |
-
date=date,
|
| 286 |
-
authors=authors,
|
| 287 |
-
),
|
| 288 |
-
relevance=0.75, # Slightly lower than peer-reviewed
|
| 289 |
-
)
|
| 290 |
-
```
|
| 291 |
-
|
| 292 |
-
---
|
| 293 |
-
|
| 294 |
-
## 6. TDD Test Suite
|
| 295 |
-
|
| 296 |
-
### 6.1 Unit Tests (`tests/unit/tools/test_biorxiv.py`)
|
| 297 |
-
|
| 298 |
-
```python
|
| 299 |
-
"""Unit tests for bioRxiv tool."""
|
| 300 |
-
|
| 301 |
-
import pytest
|
| 302 |
-
import respx
|
| 303 |
-
from httpx import Response
|
| 304 |
-
|
| 305 |
-
from src.tools.biorxiv import BioRxivTool
|
| 306 |
-
from src.utils.models import Evidence
|
| 307 |
-
|
| 308 |
-
|
| 309 |
-
@pytest.fixture
|
| 310 |
-
def mock_biorxiv_response():
|
| 311 |
-
"""Mock bioRxiv API response."""
|
| 312 |
-
return {
|
| 313 |
-
"collection": [
|
| 314 |
-
{
|
| 315 |
-
"doi": "10.1101/2024.01.15.24301234",
|
| 316 |
-
"title": "Metformin repurposing for Alzheimer's disease: a systematic review",
|
| 317 |
-
"authors": "Smith, John; Jones, Alice; Brown, Bob",
|
| 318 |
-
"date": "2024-01-15",
|
| 319 |
-
"category": "neurology",
|
| 320 |
-
"abstract": "Background: Metformin has shown neuroprotective effects. "
|
| 321 |
-
"We conducted a systematic review of metformin's potential "
|
| 322 |
-
"for Alzheimer's disease treatment."
|
| 323 |
-
},
|
| 324 |
-
{
|
| 325 |
-
"doi": "10.1101/2024.01.10.24301111",
|
| 326 |
-
"title": "COVID-19 vaccine efficacy study",
|
| 327 |
-
"authors": "Wilson, C",
|
| 328 |
-
"date": "2024-01-10",
|
| 329 |
-
"category": "infectious diseases",
|
| 330 |
-
"abstract": "This study evaluates COVID-19 vaccine efficacy."
|
| 331 |
-
}
|
| 332 |
-
],
|
| 333 |
-
"messages": [{"status": "ok", "count": 2}]
|
| 334 |
-
}
|
| 335 |
-
|
| 336 |
-
|
| 337 |
-
class TestBioRxivTool:
|
| 338 |
-
"""Tests for BioRxivTool."""
|
| 339 |
-
|
| 340 |
-
def test_tool_name(self):
|
| 341 |
-
"""Tool should have correct name."""
|
| 342 |
-
tool = BioRxivTool()
|
| 343 |
-
assert tool.name == "biorxiv"
|
| 344 |
-
|
| 345 |
-
def test_default_server_is_medrxiv(self):
|
| 346 |
-
"""Default server should be medRxiv for medical relevance."""
|
| 347 |
-
tool = BioRxivTool()
|
| 348 |
-
assert tool.server == "medrxiv"
|
| 349 |
-
|
| 350 |
-
@pytest.mark.asyncio
|
| 351 |
-
@respx.mock
|
| 352 |
-
async def test_search_returns_evidence(self, mock_biorxiv_response):
|
| 353 |
-
"""Search should return Evidence objects."""
|
| 354 |
-
respx.get(url__startswith="https://api.biorxiv.org/details").mock(
|
| 355 |
-
return_value=Response(200, json=mock_biorxiv_response)
|
| 356 |
-
)
|
| 357 |
-
|
| 358 |
-
tool = BioRxivTool()
|
| 359 |
-
results = await tool.search("metformin alzheimer", max_results=5)
|
| 360 |
-
|
| 361 |
-
assert len(results) == 1 # Only the matching paper
|
| 362 |
-
assert isinstance(results[0], Evidence)
|
| 363 |
-
assert results[0].citation.source == "biorxiv"
|
| 364 |
-
assert "metformin" in results[0].citation.title.lower()
|
| 365 |
-
|
| 366 |
-
@pytest.mark.asyncio
|
| 367 |
-
@respx.mock
|
| 368 |
-
async def test_search_filters_by_keywords(self, mock_biorxiv_response):
|
| 369 |
-
"""Search should filter papers by query keywords."""
|
| 370 |
-
respx.get(url__startswith="https://api.biorxiv.org/details").mock(
|
| 371 |
-
return_value=Response(200, json=mock_biorxiv_response)
|
| 372 |
-
)
|
| 373 |
-
|
| 374 |
-
tool = BioRxivTool()
|
| 375 |
-
|
| 376 |
-
# Search for metformin - should match first paper
|
| 377 |
-
results = await tool.search("metformin")
|
| 378 |
-
assert len(results) == 1
|
| 379 |
-
assert "metformin" in results[0].citation.title.lower()
|
| 380 |
-
|
| 381 |
-
# Search for COVID - should match second paper
|
| 382 |
-
results = await tool.search("covid vaccine")
|
| 383 |
-
assert len(results) == 1
|
| 384 |
-
assert "covid" in results[0].citation.title.lower()
|
| 385 |
-
|
| 386 |
-
@pytest.mark.asyncio
|
| 387 |
-
@respx.mock
|
| 388 |
-
async def test_search_marks_as_preprint(self, mock_biorxiv_response):
|
| 389 |
-
"""Evidence content should note it's a preprint."""
|
| 390 |
-
respx.get(url__startswith="https://api.biorxiv.org/details").mock(
|
| 391 |
-
return_value=Response(200, json=mock_biorxiv_response)
|
| 392 |
-
)
|
| 393 |
-
|
| 394 |
-
tool = BioRxivTool()
|
| 395 |
-
results = await tool.search("metformin")
|
| 396 |
-
|
| 397 |
-
assert "PREPRINT" in results[0].content
|
| 398 |
-
assert "Not peer-reviewed" in results[0].content
|
| 399 |
-
|
| 400 |
-
@pytest.mark.asyncio
|
| 401 |
-
@respx.mock
|
| 402 |
-
async def test_search_empty_results(self):
|
| 403 |
-
"""Search should handle empty results gracefully."""
|
| 404 |
-
respx.get(url__startswith="https://api.biorxiv.org/details").mock(
|
| 405 |
-
return_value=Response(200, json={"collection": [], "messages": []})
|
| 406 |
-
)
|
| 407 |
-
|
| 408 |
-
tool = BioRxivTool()
|
| 409 |
-
results = await tool.search("xyznonexistent")
|
| 410 |
-
|
| 411 |
-
assert results == []
|
| 412 |
-
|
| 413 |
-
@pytest.mark.asyncio
|
| 414 |
-
@respx.mock
|
| 415 |
-
async def test_search_api_error(self):
|
| 416 |
-
"""Search should raise SearchError on API failure."""
|
| 417 |
-
from src.utils.exceptions import SearchError
|
| 418 |
-
|
| 419 |
-
respx.get(url__startswith="https://api.biorxiv.org/details").mock(
|
| 420 |
-
return_value=Response(500, text="Internal Server Error")
|
| 421 |
-
)
|
| 422 |
-
|
| 423 |
-
tool = BioRxivTool()
|
| 424 |
-
|
| 425 |
-
with pytest.raises(SearchError):
|
| 426 |
-
await tool.search("metformin")
|
| 427 |
-
|
| 428 |
-
def test_extract_terms(self):
|
| 429 |
-
"""Should extract meaningful search terms."""
|
| 430 |
-
tool = BioRxivTool()
|
| 431 |
-
|
| 432 |
-
terms = tool._extract_terms("metformin for Alzheimer's disease")
|
| 433 |
-
|
| 434 |
-
assert "metformin" in terms
|
| 435 |
-
assert "alzheimer" in terms
|
| 436 |
-
assert "disease" in terms
|
| 437 |
-
assert "for" not in terms # Stop word
|
| 438 |
-
assert "the" not in terms # Stop word
|
| 439 |
-
|
| 440 |
-
|
| 441 |
-
class TestBioRxivIntegration:
|
| 442 |
-
"""Integration tests (marked for separate run)."""
|
| 443 |
-
|
| 444 |
-
@pytest.mark.integration
|
| 445 |
-
@pytest.mark.asyncio
|
| 446 |
-
async def test_real_api_call(self):
|
| 447 |
-
"""Test actual API call (requires network)."""
|
| 448 |
-
tool = BioRxivTool(days=30) # Last 30 days
|
| 449 |
-
results = await tool.search("diabetes", max_results=3)
|
| 450 |
-
|
| 451 |
-
# May or may not find results depending on recent papers
|
| 452 |
-
assert isinstance(results, list)
|
| 453 |
-
for r in results:
|
| 454 |
-
assert isinstance(r, Evidence)
|
| 455 |
-
assert r.citation.source == "biorxiv"
|
| 456 |
-
```
|
| 457 |
-
|
| 458 |
-
---
|
| 459 |
-
|
| 460 |
-
## 7. Integration with SearchHandler
|
| 461 |
-
|
| 462 |
-
### 7.1 Final SearchHandler Configuration
|
| 463 |
-
|
| 464 |
-
```python
|
| 465 |
-
# examples/search_demo/run_search.py
|
| 466 |
-
from src.tools.biorxiv import BioRxivTool
|
| 467 |
-
from src.tools.clinicaltrials import ClinicalTrialsTool
|
| 468 |
-
from src.tools.pubmed import PubMedTool
|
| 469 |
-
from src.tools.search_handler import SearchHandler
|
| 470 |
-
|
| 471 |
-
search_handler = SearchHandler(
|
| 472 |
-
tools=[
|
| 473 |
-
PubMedTool(), # Peer-reviewed papers
|
| 474 |
-
ClinicalTrialsTool(), # Clinical trials
|
| 475 |
-
BioRxivTool(), # Preprints (cutting edge)
|
| 476 |
-
],
|
| 477 |
-
timeout=30.0
|
| 478 |
-
)
|
| 479 |
-
```
|
| 480 |
-
|
| 481 |
-
### 7.2 Final Type Definition
|
| 482 |
-
|
| 483 |
-
```python
|
| 484 |
-
# src/utils/models.py
|
| 485 |
-
sources_searched: list[Literal["pubmed", "clinicaltrials", "biorxiv"]]
|
| 486 |
-
```
|
| 487 |
-
|
| 488 |
-
---
|
| 489 |
-
|
| 490 |
-
## 8. Definition of Done
|
| 491 |
-
|
| 492 |
-
Phase 11 is **COMPLETE** when:
|
| 493 |
-
|
| 494 |
-
- [ ] `src/tools/biorxiv.py` implemented
|
| 495 |
-
- [ ] Unit tests in `tests/unit/tools/test_biorxiv.py`
|
| 496 |
-
- [ ] Integration test marked with `@pytest.mark.integration`
|
| 497 |
-
- [ ] SearchHandler updated to include BioRxivTool
|
| 498 |
-
- [ ] Type definitions updated in models.py
|
| 499 |
-
- [ ] Example files updated
|
| 500 |
-
- [ ] All unit tests pass
|
| 501 |
-
- [ ] Lints pass
|
| 502 |
-
- [ ] Manual verification with real API
|
| 503 |
-
|
| 504 |
-
---
|
| 505 |
-
|
| 506 |
-
## 9. Verification Commands
|
| 507 |
-
|
| 508 |
-
```bash
|
| 509 |
-
# 1. Run unit tests
|
| 510 |
-
uv run pytest tests/unit/tools/test_biorxiv.py -v
|
| 511 |
-
|
| 512 |
-
# 2. Run integration test (requires network)
|
| 513 |
-
uv run pytest tests/unit/tools/test_biorxiv.py -v -m integration
|
| 514 |
-
|
| 515 |
-
# 3. Run full test suite
|
| 516 |
-
uv run pytest tests/unit/ -v
|
| 517 |
-
|
| 518 |
-
# 4. Run example with all three sources
|
| 519 |
-
source .env && uv run python examples/search_demo/run_search.py "metformin diabetes"
|
| 520 |
-
# Should show results from PubMed, ClinicalTrials.gov, AND bioRxiv/medRxiv
|
| 521 |
-
```
|
| 522 |
-
|
| 523 |
-
---
|
| 524 |
-
|
| 525 |
-
## 10. Value Delivered
|
| 526 |
-
|
| 527 |
-
| Before | After |
|
| 528 |
-
|--------|-------|
|
| 529 |
-
| Only published papers | Published + Preprints |
|
| 530 |
-
| 6-18 month lag | Near real-time research |
|
| 531 |
-
| Miss cutting-edge | Catch breakthroughs early |
|
| 532 |
-
|
| 533 |
-
**Demo pitch (final)**:
|
| 534 |
-
> "DeepBoner searches PubMed for peer-reviewed evidence, ClinicalTrials.gov for 400,000+ clinical trials, and bioRxiv/medRxiv for cutting-edge preprints - then uses LLMs to generate mechanistic hypotheses and synthesize findings into publication-quality reports."
|
| 535 |
-
|
| 536 |
-
---
|
| 537 |
-
|
| 538 |
-
## 11. Complete Source Architecture (After Phase 11)
|
| 539 |
-
|
| 540 |
-
```
|
| 541 |
-
User Query: "Can metformin treat Alzheimer's?"
|
| 542 |
-
|
|
| 543 |
-
v
|
| 544 |
-
SearchHandler
|
| 545 |
-
|
|
| 546 |
-
βββββββββββββββββΌββββββββββββββββ
|
| 547 |
-
| | |
|
| 548 |
-
v v v
|
| 549 |
-
PubMedTool ClinicalTrials BioRxivTool
|
| 550 |
-
| Tool |
|
| 551 |
-
| | |
|
| 552 |
-
v v v
|
| 553 |
-
"15 peer- "3 Phase II "2 preprints
|
| 554 |
-
reviewed trials from last
|
| 555 |
-
papers" recruiting" 90 days"
|
| 556 |
-
| | |
|
| 557 |
-
βββββββββββββββββΌββββββββββββββββ
|
| 558 |
-
|
|
| 559 |
-
v
|
| 560 |
-
Evidence Pool
|
| 561 |
-
|
|
| 562 |
-
v
|
| 563 |
-
EmbeddingService.deduplicate()
|
| 564 |
-
|
|
| 565 |
-
v
|
| 566 |
-
HypothesisAgent β JudgeAgent β ReportAgent
|
| 567 |
-
|
|
| 568 |
-
v
|
| 569 |
-
Structured Research Report
|
| 570 |
-
```
|
| 571 |
-
|
| 572 |
-
**This is the Gucci Banger stack.**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/implementation/11_phase_europepmc.md
ADDED
|
@@ -0,0 +1,181 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Phase 11 Implementation Spec: Europe PMC Integration
|
| 2 |
+
|
| 3 |
+
> **Status**: β
COMPLETE
|
| 4 |
+
> **Implemented**: `src/tools/europepmc.py`
|
| 5 |
+
> **Tests**: `tests/unit/tools/test_europepmc.py`
|
| 6 |
+
|
| 7 |
+
## Overview
|
| 8 |
+
|
| 9 |
+
Europe PMC provides access to preprints and peer-reviewed literature through a single, well-designed REST API. This replaces the originally planned bioRxiv integration due to bioRxiv's API limitations (no keyword search).
|
| 10 |
+
|
| 11 |
+
## Why Europe PMC Over bioRxiv?
|
| 12 |
+
|
| 13 |
+
### bioRxiv API Limitations (Why We Abandoned It)
|
| 14 |
+
- bioRxiv API does NOT support keyword search
|
| 15 |
+
- Only supports date-range queries returning all papers
|
| 16 |
+
- Would require downloading entire date ranges and filtering client-side
|
| 17 |
+
- Inefficient and impractical for our use case
|
| 18 |
+
|
| 19 |
+
### Europe PMC Advantages
|
| 20 |
+
1. **Full keyword search** - Query by any term
|
| 21 |
+
2. **Aggregates preprints** - Includes bioRxiv, medRxiv, ChemRxiv content
|
| 22 |
+
3. **No authentication required** - Free, open API
|
| 23 |
+
4. **34+ preprint servers indexed** - Not just bioRxiv
|
| 24 |
+
5. **REST API with JSON** - Easy integration
|
| 25 |
+
|
| 26 |
+
## API Reference
|
| 27 |
+
|
| 28 |
+
**Base URL**: `https://www.ebi.ac.uk/europepmc/webservices/rest/search`
|
| 29 |
+
|
| 30 |
+
**Documentation**: https://europepmc.org/RestfulWebService
|
| 31 |
+
|
| 32 |
+
### Parameters
|
| 33 |
+
|
| 34 |
+
| Parameter | Value | Description |
|
| 35 |
+
|-----------|-------|-------------|
|
| 36 |
+
| `query` | string | Search keywords |
|
| 37 |
+
| `resultType` | `core` | Full metadata including abstracts |
|
| 38 |
+
| `pageSize` | 1-100 | Results per page |
|
| 39 |
+
| `format` | `json` | Response format |
|
| 40 |
+
|
| 41 |
+
### Example Request
|
| 42 |
+
|
| 43 |
+
```
|
| 44 |
+
GET https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=metformin+alzheimer&resultType=core&pageSize=10&format=json
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
## Implementation
|
| 48 |
+
|
| 49 |
+
### EuropePMCTool (`src/tools/europepmc.py`)
|
| 50 |
+
|
| 51 |
+
```python
|
| 52 |
+
class EuropePMCTool:
|
| 53 |
+
"""
|
| 54 |
+
Search Europe PMC for papers and preprints.
|
| 55 |
+
|
| 56 |
+
Europe PMC indexes:
|
| 57 |
+
- PubMed/MEDLINE articles
|
| 58 |
+
- PMC full-text articles
|
| 59 |
+
- Preprints from bioRxiv, medRxiv, ChemRxiv, etc.
|
| 60 |
+
- Patents and clinical guidelines
|
| 61 |
+
"""
|
| 62 |
+
|
| 63 |
+
BASE_URL = "https://www.ebi.ac.uk/europepmc/webservices/rest/search"
|
| 64 |
+
|
| 65 |
+
@property
|
| 66 |
+
def name(self) -> str:
|
| 67 |
+
return "europepmc"
|
| 68 |
+
|
| 69 |
+
async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
|
| 70 |
+
"""Search Europe PMC for papers matching query."""
|
| 71 |
+
# Implementation with retry logic, error handling
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
### Key Features
|
| 75 |
+
|
| 76 |
+
1. **Preprint Detection**: Automatically identifies preprints via `pubTypeList`
|
| 77 |
+
2. **Preprint Marking**: Adds `[PREPRINT - Not peer-reviewed]` prefix to content
|
| 78 |
+
3. **Relevance Scoring**: Preprints get 0.75, peer-reviewed get 0.9
|
| 79 |
+
4. **URL Resolution**: DOI β PubMed β Europe PMC fallback chain
|
| 80 |
+
5. **Retry Logic**: 3 attempts with exponential backoff via tenacity
|
| 81 |
+
|
| 82 |
+
### Response Mapping
|
| 83 |
+
|
| 84 |
+
| Europe PMC Field | Evidence Field |
|
| 85 |
+
|------------------|----------------|
|
| 86 |
+
| `title` | `citation.title` |
|
| 87 |
+
| `abstractText` | `content` |
|
| 88 |
+
| `doi` | Used for URL |
|
| 89 |
+
| `pubYear` | `citation.date` |
|
| 90 |
+
| `authorList.author` | `citation.authors` |
|
| 91 |
+
| `pubTypeList.pubType` | Determines `citation.source` ("preprint" or "europepmc") |
|
| 92 |
+
|
| 93 |
+
## Unit Tests
|
| 94 |
+
|
| 95 |
+
### Test Coverage (`tests/unit/tools/test_europepmc.py`)
|
| 96 |
+
|
| 97 |
+
| Test | Description |
|
| 98 |
+
|------|-------------|
|
| 99 |
+
| `test_tool_name` | Verifies tool name is "europepmc" |
|
| 100 |
+
| `test_search_returns_evidence` | Basic search returns Evidence objects |
|
| 101 |
+
| `test_search_marks_preprints` | Preprints have [PREPRINT] marker and source="preprint" |
|
| 102 |
+
| `test_search_empty_results` | Handles empty results gracefully |
|
| 103 |
+
|
| 104 |
+
### Integration Test
|
| 105 |
+
|
| 106 |
+
```python
|
| 107 |
+
@pytest.mark.integration
|
| 108 |
+
async def test_real_api_call():
|
| 109 |
+
"""Test actual API returns relevant results."""
|
| 110 |
+
tool = EuropePMCTool()
|
| 111 |
+
results = await tool.search("long covid treatment", max_results=3)
|
| 112 |
+
assert len(results) > 0
|
| 113 |
+
```
|
| 114 |
+
|
| 115 |
+
## SearchHandler Integration
|
| 116 |
+
|
| 117 |
+
Europe PMC is included in `src/tools/search_handler.py` alongside PubMed and ClinicalTrials:
|
| 118 |
+
|
| 119 |
+
```python
|
| 120 |
+
from src.tools.europepmc import EuropePMCTool
|
| 121 |
+
|
| 122 |
+
class SearchHandler:
|
| 123 |
+
def __init__(self):
|
| 124 |
+
self.tools = [
|
| 125 |
+
PubMedTool(),
|
| 126 |
+
ClinicalTrialsTool(),
|
| 127 |
+
EuropePMCTool(), # Preprints + peer-reviewed
|
| 128 |
+
]
|
| 129 |
+
```
|
| 130 |
+
|
| 131 |
+
## MCP Tools Integration
|
| 132 |
+
|
| 133 |
+
Europe PMC is exposed via MCP in `src/mcp_tools.py`:
|
| 134 |
+
|
| 135 |
+
```python
|
| 136 |
+
async def search_europepmc(query: str, max_results: int = 10) -> str:
|
| 137 |
+
"""Search Europe PMC for preprints and papers."""
|
| 138 |
+
results = await _europepmc.search(query, max_results)
|
| 139 |
+
# Format and return
|
| 140 |
+
```
|
| 141 |
+
|
| 142 |
+
## Verification
|
| 143 |
+
|
| 144 |
+
```bash
|
| 145 |
+
# Run unit tests
|
| 146 |
+
uv run pytest tests/unit/tools/test_europepmc.py -v
|
| 147 |
+
|
| 148 |
+
# Run integration test (real API)
|
| 149 |
+
uv run pytest tests/unit/tools/test_europepmc.py -v -m integration
|
| 150 |
+
```
|
| 151 |
+
|
| 152 |
+
## Completion Checklist
|
| 153 |
+
|
| 154 |
+
- [x] `src/tools/europepmc.py` implemented
|
| 155 |
+
- [x] Unit tests in `tests/unit/tools/test_europepmc.py`
|
| 156 |
+
- [x] Integration test with real API
|
| 157 |
+
- [x] SearchHandler includes EuropePMCTool
|
| 158 |
+
- [x] MCP wrapper in `src/mcp_tools.py`
|
| 159 |
+
- [x] Preprint detection and marking
|
| 160 |
+
- [x] Retry logic with exponential backoff
|
| 161 |
+
|
| 162 |
+
## Architecture Diagram
|
| 163 |
+
|
| 164 |
+
```
|
| 165 |
+
οΏ½οΏ½ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 166 |
+
β SearchHandler β
|
| 167 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
|
| 168 |
+
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββ β
|
| 169 |
+
β β PubMedTool β βClinicalTrialsβ β EuropePMCTool β β
|
| 170 |
+
β β β β Tool β β β β
|
| 171 |
+
β β Peer-review β β Trials β β Preprints + β β
|
| 172 |
+
β β articles β β data β β peer-review β β
|
| 173 |
+
β ββββββββ¬βββββββ ββββββββ¬ββββββββ βββββββββ¬ββββββββ β
|
| 174 |
+
β β β β β
|
| 175 |
+
β βΌ βΌ βΌ β
|
| 176 |
+
β βββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 177 |
+
β β Evidence List β β
|
| 178 |
+
β β (deduplicated, scored, with citations) β β
|
| 179 |
+
β βββββββββββββββββββββββββββββββββββββββββββββββ β
|
| 180 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 181 |
+
```
|
docs/implementation/roadmap.md
CHANGED
|
@@ -42,7 +42,7 @@ src/
|
|
| 42 |
β βββ __init__.py
|
| 43 |
β βββ pubmed.py # PubMed E-utilities tool
|
| 44 |
β βββ clinicaltrials.py # ClinicalTrials.gov API
|
| 45 |
-
β βββ
|
| 46 |
β βββ code_execution.py # Modal sandbox execution
|
| 47 |
β βββ search_handler.py # Orchestrates multiple tools
|
| 48 |
βββ prompts/ # Prompt templates
|
|
@@ -64,7 +64,7 @@ tests/
|
|
| 64 |
β βββ tools/
|
| 65 |
β β βββ test_pubmed.py
|
| 66 |
β β βββ test_clinicaltrials.py
|
| 67 |
-
β β βββ
|
| 68 |
β β βββ test_search_handler.py
|
| 69 |
β βββ agent_factory/
|
| 70 |
β β βββ test_judges.py
|
|
@@ -201,7 +201,7 @@ Structured Research Report
|
|
| 201 |
|
| 202 |
9. **[Phase 9 Spec: Remove DuckDuckGo](09_phase_source_cleanup.md)** β
|
| 203 |
10. **[Phase 10 Spec: ClinicalTrials.gov](10_phase_clinicaltrials.md)** β
|
| 204 |
-
11. **[Phase 11 Spec:
|
| 205 |
|
| 206 |
### Hackathon Integration (Phases 12-14)
|
| 207 |
|
|
@@ -225,7 +225,7 @@ Structured Research Report
|
|
| 225 |
| Phase 8: Report | β
COMPLETE | Structured scientific reports |
|
| 226 |
| Phase 9: Source Cleanup | β
COMPLETE | Remove DuckDuckGo |
|
| 227 |
| Phase 10: ClinicalTrials | β
COMPLETE | ClinicalTrials.gov API |
|
| 228 |
-
| Phase 11:
|
| 229 |
| Phase 12: MCP Server | β
COMPLETE | MCP protocol integration |
|
| 230 |
| Phase 13: Modal Pipeline | π SPEC READY | Sandboxed code execution |
|
| 231 |
| Phase 14: Demo & Submit | π SPEC READY | Hackathon submission |
|
|
|
|
| 42 |
β βββ __init__.py
|
| 43 |
β βββ pubmed.py # PubMed E-utilities tool
|
| 44 |
β βββ clinicaltrials.py # ClinicalTrials.gov API
|
| 45 |
+
β βββ europepmc.py # Europe PMC (preprints + papers)
|
| 46 |
β βββ code_execution.py # Modal sandbox execution
|
| 47 |
β βββ search_handler.py # Orchestrates multiple tools
|
| 48 |
βββ prompts/ # Prompt templates
|
|
|
|
| 64 |
β βββ tools/
|
| 65 |
β β βββ test_pubmed.py
|
| 66 |
β β βββ test_clinicaltrials.py
|
| 67 |
+
β β βββ test_europepmc.py
|
| 68 |
β β βββ test_search_handler.py
|
| 69 |
β βββ agent_factory/
|
| 70 |
β β βββ test_judges.py
|
|
|
|
| 201 |
|
| 202 |
9. **[Phase 9 Spec: Remove DuckDuckGo](09_phase_source_cleanup.md)** β
|
| 203 |
10. **[Phase 10 Spec: ClinicalTrials.gov](10_phase_clinicaltrials.md)** β
|
| 204 |
+
11. **[Phase 11 Spec: Europe PMC](11_phase_europepmc.md)** β
|
| 205 |
|
| 206 |
### Hackathon Integration (Phases 12-14)
|
| 207 |
|
|
|
|
| 225 |
| Phase 8: Report | β
COMPLETE | Structured scientific reports |
|
| 226 |
| Phase 9: Source Cleanup | β
COMPLETE | Remove DuckDuckGo |
|
| 227 |
| Phase 10: ClinicalTrials | β
COMPLETE | ClinicalTrials.gov API |
|
| 228 |
+
| Phase 11: Europe PMC | β
COMPLETE | Preprint search |
|
| 229 |
| Phase 12: MCP Server | β
COMPLETE | MCP protocol integration |
|
| 230 |
| Phase 13: Modal Pipeline | π SPEC READY | Sandboxed code execution |
|
| 231 |
| Phase 14: Demo & Submit | π SPEC READY | Hackathon submission |
|
docs/index.md
CHANGED
|
@@ -25,7 +25,7 @@ AI-powered deep research system for sexual wellness, reproductive health, and ho
|
|
| 25 |
- **[Phase 8: Report](implementation/08_phase_report.md)** β
- Structured scientific reports
|
| 26 |
- **[Phase 9: Source Cleanup](implementation/09_phase_source_cleanup.md)** β
- Remove DuckDuckGo
|
| 27 |
- **[Phase 10: ClinicalTrials](implementation/10_phase_clinicaltrials.md)** β
- Clinical trials API
|
| 28 |
-
- **[Phase 11: Europe PMC](implementation/
|
| 29 |
- **[Phase 12: MCP Server](implementation/12_phase_mcp_server.md)** β
- Claude Desktop integration
|
| 30 |
- **[Phase 13: Modal Integration](implementation/13_phase_modal_integration.md)** β
- Secure code execution
|
| 31 |
- **[Phase 14: Demo Submission](implementation/14_phase_demo_submission.md)** β
- Hackathon submission
|
|
|
|
| 25 |
- **[Phase 8: Report](implementation/08_phase_report.md)** β
- Structured scientific reports
|
| 26 |
- **[Phase 9: Source Cleanup](implementation/09_phase_source_cleanup.md)** β
- Remove DuckDuckGo
|
| 27 |
- **[Phase 10: ClinicalTrials](implementation/10_phase_clinicaltrials.md)** β
- Clinical trials API
|
| 28 |
+
- **[Phase 11: Europe PMC](implementation/11_phase_europepmc.md)** β
- Preprint search
|
| 29 |
- **[Phase 12: MCP Server](implementation/12_phase_mcp_server.md)** β
- Claude Desktop integration
|
| 30 |
- **[Phase 13: Modal Integration](implementation/13_phase_modal_integration.md)** β
- Secure code execution
|
| 31 |
- **[Phase 14: Demo Submission](implementation/14_phase_demo_submission.md)** β
- Hackathon submission
|
docs/workflow-diagrams.md
CHANGED
|
@@ -85,7 +85,7 @@ graph TB
|
|
| 85 |
end
|
| 86 |
|
| 87 |
subgraph "MCP Tools"
|
| 88 |
-
WebSearch[Web Search<br/>PubMed β’
|
| 89 |
CodeExec[Code Execution<br/>Sandboxed Python]
|
| 90 |
RAG[RAG Retrieval<br/>Vector DB β’ Embeddings]
|
| 91 |
Viz[Visualization<br/>Charts β’ Graphs]
|
|
@@ -229,12 +229,12 @@ flowchart TD
|
|
| 229 |
Strategy --> Multi[Multi-Source Search]
|
| 230 |
|
| 231 |
Multi --> PubMed[PubMed Search<br/>via MCP]
|
| 232 |
-
Multi -->
|
| 233 |
-
Multi -->
|
| 234 |
|
| 235 |
PubMed --> Aggregate[Aggregate Results]
|
| 236 |
-
|
| 237 |
-
|
| 238 |
|
| 239 |
Aggregate --> Filter[Filter & Rank<br/>by Relevance]
|
| 240 |
Filter --> Dedup[Deduplicate<br/>Cross-Reference]
|
|
@@ -388,7 +388,7 @@ graph TB
|
|
| 388 |
end
|
| 389 |
|
| 390 |
subgraph "MCP Servers"
|
| 391 |
-
Server1[Web Search Server<br/>localhost:8001<br/>β’ PubMed<br/>β’
|
| 392 |
Server2[Code Execution Server<br/>localhost:8002<br/>β’ Sandboxed Python<br/>β’ Package management]
|
| 393 |
Server3[RAG Server<br/>localhost:8003<br/>β’ Vector embeddings<br/>β’ Similarity search]
|
| 394 |
Server4[Visualization Server<br/>localhost:8004<br/>β’ Chart generation<br/>β’ Plot rendering]
|
|
@@ -396,8 +396,8 @@ graph TB
|
|
| 396 |
|
| 397 |
subgraph "External Services"
|
| 398 |
PubMed[PubMed API]
|
| 399 |
-
|
| 400 |
-
|
| 401 |
Modal[Modal Sandbox]
|
| 402 |
ChromaDB[(ChromaDB)]
|
| 403 |
end
|
|
@@ -412,8 +412,8 @@ graph TB
|
|
| 412 |
Registry --> Server4
|
| 413 |
|
| 414 |
Server1 --> PubMed
|
| 415 |
-
Server1 -->
|
| 416 |
-
Server1 -->
|
| 417 |
Server2 --> Modal
|
| 418 |
Server3 --> ChromaDB
|
| 419 |
|
|
@@ -517,8 +517,8 @@ graph LR
|
|
| 517 |
User[π€ Researcher<br/>Asks research questions] -->|Submits query| DC[DeepBoner<br/>Magentic Workflow]
|
| 518 |
|
| 519 |
DC -->|Literature search| PubMed[PubMed API<br/>Medical papers]
|
| 520 |
-
DC -->|
|
| 521 |
-
DC -->|
|
| 522 |
DC -->|Agent reasoning| Claude[Claude API<br/>Sonnet 4 / Opus]
|
| 523 |
DC -->|Code execution| Modal[Modal Sandbox<br/>Safe Python env]
|
| 524 |
DC -->|Vector storage| Chroma[ChromaDB<br/>Embeddings & RAG]
|
|
@@ -526,8 +526,8 @@ graph LR
|
|
| 526 |
DC -->|Deployed on| HF[HuggingFace Spaces<br/>Gradio 6.0]
|
| 527 |
|
| 528 |
PubMed -->|Results| DC
|
| 529 |
-
|
| 530 |
-
|
| 531 |
Claude -->|Responses| DC
|
| 532 |
Modal -->|Output| DC
|
| 533 |
Chroma -->|Context| DC
|
|
@@ -537,8 +537,8 @@ graph LR
|
|
| 537 |
style User fill:#e1f5e1
|
| 538 |
style DC fill:#ffe6e6
|
| 539 |
style PubMed fill:#e6f3ff
|
| 540 |
-
style
|
| 541 |
-
style
|
| 542 |
style Claude fill:#ffd6d6
|
| 543 |
style Modal fill:#f0f0f0
|
| 544 |
style Chroma fill:#ffe6f0
|
|
|
|
| 85 |
end
|
| 86 |
|
| 87 |
subgraph "MCP Tools"
|
| 88 |
+
WebSearch[Web Search<br/>PubMed β’ ClinicalTrials β’ Europe PMC]
|
| 89 |
CodeExec[Code Execution<br/>Sandboxed Python]
|
| 90 |
RAG[RAG Retrieval<br/>Vector DB β’ Embeddings]
|
| 91 |
Viz[Visualization<br/>Charts β’ Graphs]
|
|
|
|
| 229 |
Strategy --> Multi[Multi-Source Search]
|
| 230 |
|
| 231 |
Multi --> PubMed[PubMed Search<br/>via MCP]
|
| 232 |
+
Multi --> Trials[ClinicalTrials Search<br/>via MCP]
|
| 233 |
+
Multi --> EuropePMC[Europe PMC Search<br/>via MCP]
|
| 234 |
|
| 235 |
PubMed --> Aggregate[Aggregate Results]
|
| 236 |
+
Trials --> Aggregate
|
| 237 |
+
EuropePMC --> Aggregate
|
| 238 |
|
| 239 |
Aggregate --> Filter[Filter & Rank<br/>by Relevance]
|
| 240 |
Filter --> Dedup[Deduplicate<br/>Cross-Reference]
|
|
|
|
| 388 |
end
|
| 389 |
|
| 390 |
subgraph "MCP Servers"
|
| 391 |
+
Server1[Web Search Server<br/>localhost:8001<br/>β’ PubMed<br/>β’ ClinicalTrials<br/>β’ Europe PMC]
|
| 392 |
Server2[Code Execution Server<br/>localhost:8002<br/>β’ Sandboxed Python<br/>β’ Package management]
|
| 393 |
Server3[RAG Server<br/>localhost:8003<br/>β’ Vector embeddings<br/>β’ Similarity search]
|
| 394 |
Server4[Visualization Server<br/>localhost:8004<br/>β’ Chart generation<br/>β’ Plot rendering]
|
|
|
|
| 396 |
|
| 397 |
subgraph "External Services"
|
| 398 |
PubMed[PubMed API]
|
| 399 |
+
Trials[ClinicalTrials.gov API]
|
| 400 |
+
EuropePMC[Europe PMC API]
|
| 401 |
Modal[Modal Sandbox]
|
| 402 |
ChromaDB[(ChromaDB)]
|
| 403 |
end
|
|
|
|
| 412 |
Registry --> Server4
|
| 413 |
|
| 414 |
Server1 --> PubMed
|
| 415 |
+
Server1 --> Trials
|
| 416 |
+
Server1 --> EuropePMC
|
| 417 |
Server2 --> Modal
|
| 418 |
Server3 --> ChromaDB
|
| 419 |
|
|
|
|
| 517 |
User[π€ Researcher<br/>Asks research questions] -->|Submits query| DC[DeepBoner<br/>Magentic Workflow]
|
| 518 |
|
| 519 |
DC -->|Literature search| PubMed[PubMed API<br/>Medical papers]
|
| 520 |
+
DC -->|Clinical trials| Trials[ClinicalTrials.gov<br/>Trial data]
|
| 521 |
+
DC -->|Preprints| EuropePMC[Europe PMC API<br/>Preprints & papers]
|
| 522 |
DC -->|Agent reasoning| Claude[Claude API<br/>Sonnet 4 / Opus]
|
| 523 |
DC -->|Code execution| Modal[Modal Sandbox<br/>Safe Python env]
|
| 524 |
DC -->|Vector storage| Chroma[ChromaDB<br/>Embeddings & RAG]
|
|
|
|
| 526 |
DC -->|Deployed on| HF[HuggingFace Spaces<br/>Gradio 6.0]
|
| 527 |
|
| 528 |
PubMed -->|Results| DC
|
| 529 |
+
Trials -->|Results| DC
|
| 530 |
+
EuropePMC -->|Results| DC
|
| 531 |
Claude -->|Responses| DC
|
| 532 |
Modal -->|Output| DC
|
| 533 |
Chroma -->|Context| DC
|
|
|
|
| 537 |
style User fill:#e1f5e1
|
| 538 |
style DC fill:#ffe6e6
|
| 539 |
style PubMed fill:#e6f3ff
|
| 540 |
+
style Trials fill:#e6f3ff
|
| 541 |
+
style EuropePMC fill:#e6f3ff
|
| 542 |
style Claude fill:#ffd6d6
|
| 543 |
style Modal fill:#f0f0f0
|
| 544 |
style Chroma fill:#ffe6f0
|
src/utils/exceptions.py
CHANGED
|
@@ -29,7 +29,3 @@ class RateLimitError(SearchError):
|
|
| 29 |
"""Raised when we hit API rate limits."""
|
| 30 |
|
| 31 |
pass
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
# Backwards compatibility alias
|
| 35 |
-
DeepCriticalError = DeepBonerError
|
|
|
|
| 29 |
"""Raised when we hit API rate limits."""
|
| 30 |
|
| 31 |
pass
|
|
|
|
|
|
|
|
|
|
|
|