Spaces:

MCP-1st-Birthday
/

DeepBoner

Running

App Files Files Community

VibecoderMcSwaggins commited on 10 days ago

Commit

2c5db87

unverified ·

1 Parent(s): 91a017e

feat(search): SPEC_13 Evidence Deduplication (#98)

Browse files

* feat(search): implement evidence deduplication (SPEC_13)

- Add PMID extraction to OpenAlex tool (from ids.pmid)
- Implement deduplicate_evidence and extract_paper_id in SearchHandler
- Support deduplication across PubMed, Europe PMC (MED, PMC, PPR, PAT), OpenAlex, and ClinicalTrials
- Add comprehensive unit tests for deduplication logic

* test: complete SPEC_13 test coverage (senior review findings)

Missing tests added per spec:
- test_extracts_europepmc_pat_id_eu_format (EP patent format)
- test_preserves_openalex_without_pmid (critical edge case)
- test_keeps_unidentifiable_evidence (conservative behavior)
- test_clinicaltrials_unique_per_nct (NCT uniqueness)
- test_preprints_preserved_separately (PPR vs PMID)
- test_extracts_pmid_from_ids_object (OpenAlex PMID extraction)
- test_pmid_is_none_when_not_present (null case)

Code improvements:
- Move `import re` to module level in openalex.py (efficiency)
- Add SAMPLE_OPENALEX_WITH_PMID fixture for cross-dedup testing

All 35 tests pass. 100% spec compliance.

* fix: add defensive isinstance check for pmid_url

Per CodeRabbit review - guards against unexpected API response types
that could cause TypeError when checking string containment.

Files changed (27) hide show

docs/bugs/ACTIVE_BUGS.md +4 -2
docs/bugs/{P0_ORCHESTRATOR_DEDUP_AND_JUDGE_BUGS.md → archive/P0_ORCHESTRATOR_DEDUP_AND_JUDGE_BUGS.md} +0 -0
docs/bugs/{P0_SIMPLE_MODE_NEVER_SYNTHESIZES.md → archive/P0_SIMPLE_MODE_NEVER_SYNTHESIZES.md} +0 -0
docs/bugs/{P1_NARRATIVE_SYNTHESIS_FALLBACK.md → archive/P1_NARRATIVE_SYNTHESIS_FALLBACK.md} +0 -0
docs/bugs/{P2_GRADIO_EXAMPLE_NOT_FILLING.md → archive/P2_GRADIO_EXAMPLE_NOT_FILLING.md} +0 -0
docs/bugs/{P3_ARCHITECTURAL_GAP_EPHEMERAL_MEMORY.md → archive/P3_ARCHITECTURAL_GAP_EPHEMERAL_MEMORY.md} +0 -0
docs/bugs/{P3_ARCHITECTURAL_GAP_STRUCTURED_MEMORY.md → archive/P3_ARCHITECTURAL_GAP_STRUCTURED_MEMORY.md} +0 -0
docs/bugs/{P3_MAGENTIC_NO_TERMINATION_EVENT.md → archive/P3_MAGENTIC_NO_TERMINATION_EVENT.md} +0 -0
docs/specs/SPEC_13_EVIDENCE_DEDUPLICATION.md +566 -0
docs/specs/SPEC_14_CLINICALTRIALS_OUTCOMES.md +466 -0
docs/specs/SPEC_15_ADVANCED_MODE_PERFORMANCE.md +478 -0
docs/specs/{SPEC_01_DEMO_TERMINATION.md → archive/SPEC_01_DEMO_TERMINATION.md} +0 -0
docs/specs/{SPEC_02_E2E_TESTING.md → archive/SPEC_02_E2E_TESTING.md} +0 -0
docs/specs/{SPEC_03_OPENALEX_INTEGRATION.md → archive/SPEC_03_OPENALEX_INTEGRATION.md} +0 -0
docs/specs/{SPEC_04_MAGENTIC_UX.md → archive/SPEC_04_MAGENTIC_UX.md} +0 -0
docs/specs/{SPEC_05_ORCHESTRATOR_CLEANUP.md → archive/SPEC_05_ORCHESTRATOR_CLEANUP.md} +0 -0
docs/specs/{SPEC_06_SIMPLE_MODE_SYNTHESIS.md → archive/SPEC_06_SIMPLE_MODE_SYNTHESIS.md} +0 -0
docs/specs/{SPEC_07_LANGGRAPH_MEMORY_ARCH.md → archive/SPEC_07_LANGGRAPH_MEMORY_ARCH.md} +0 -0
docs/specs/{SPEC_08_INTEGRATE_MEMORY_LAYER.md → archive/SPEC_08_INTEGRATE_MEMORY_LAYER.md} +0 -0
docs/specs/{SPEC_09_LLAMAINDEX_INTEGRATION.md → archive/SPEC_09_LLAMAINDEX_INTEGRATION.md} +0 -0
docs/specs/{SPEC_10_DOMAIN_AGNOSTIC_REFACTOR.md → archive/SPEC_10_DOMAIN_AGNOSTIC_REFACTOR.md} +0 -0
docs/specs/{SPEC_11_SEXUAL_HEALTH_FOCUS.md → archive/SPEC_11_SEXUAL_HEALTH_FOCUS.md} +0 -0
docs/specs/{SPEC_12_NARRATIVE_SYNTHESIS.md → archive/SPEC_12_NARRATIVE_SYNTHESIS.md} +0 -0
src/tools/openalex.py +12 -0
src/tools/search_handler.py +130 -1
tests/unit/tools/test_openalex.py +44 -0
tests/unit/tools/test_search_handler.py +178 -41

docs/bugs/ACTIVE_BUGS.md CHANGED Viewed

@@ -1,6 +1,8 @@
 # Active Bugs
 > Last updated: 2025-11-30
 ## P0 - Blocker
@@ -10,8 +12,8 @@
 ## P1 - Important
-### P1 - Narrative Synthesis Falls Back to Template (NEW)
-**File:** `P1_NARRATIVE_SYNTHESIS_FALLBACK.md`
 **Related:** SPEC_12 (implemented but falling back)
 **Problem:** Users see bullet-point template output instead of LLM-generated narrative prose.

 # Active Bugs
 > Last updated: 2025-11-30
+>
+> **Note:** Completed bug docs archived to `docs/bugs/archive/`
 ## P0 - Blocker
 ## P1 - Important
+### P1 - Narrative Synthesis Falls Back to Template
+**File:** `archive/P1_NARRATIVE_SYNTHESIS_FALLBACK.md`
 **Related:** SPEC_12 (implemented but falling back)
 **Problem:** Users see bullet-point template output instead of LLM-generated narrative prose.

docs/bugs/{P0_ORCHESTRATOR_DEDUP_AND_JUDGE_BUGS.md → archive/P0_ORCHESTRATOR_DEDUP_AND_JUDGE_BUGS.md} RENAMED Viewed

File without changes

docs/bugs/{P0_SIMPLE_MODE_NEVER_SYNTHESIZES.md → archive/P0_SIMPLE_MODE_NEVER_SYNTHESIZES.md} RENAMED Viewed

File without changes

docs/bugs/{P1_NARRATIVE_SYNTHESIS_FALLBACK.md → archive/P1_NARRATIVE_SYNTHESIS_FALLBACK.md} RENAMED Viewed

File without changes

docs/bugs/{P2_GRADIO_EXAMPLE_NOT_FILLING.md → archive/P2_GRADIO_EXAMPLE_NOT_FILLING.md} RENAMED Viewed

File without changes

docs/bugs/{P3_ARCHITECTURAL_GAP_EPHEMERAL_MEMORY.md → archive/P3_ARCHITECTURAL_GAP_EPHEMERAL_MEMORY.md} RENAMED Viewed

File without changes

docs/bugs/{P3_ARCHITECTURAL_GAP_STRUCTURED_MEMORY.md → archive/P3_ARCHITECTURAL_GAP_STRUCTURED_MEMORY.md} RENAMED Viewed

File without changes

docs/bugs/{P3_MAGENTIC_NO_TERMINATION_EVENT.md → archive/P3_MAGENTIC_NO_TERMINATION_EVENT.md} RENAMED Viewed

File without changes

docs/specs/SPEC_13_EVIDENCE_DEDUPLICATION.md ADDED Viewed

	@@ -0,0 +1,566 @@

+# SPEC_13: Evidence Deduplication in SearchHandler
+**Status**: Draft (Validated via API Documentation Review)
+**Priority**: P1
+**GitHub Issue**: #94
+**Estimated Effort**: Medium (~100 lines of code, includes OpenAlex metadata extraction)
+**Last Updated**: 2025-11-30
+---
+## Problem Statement
+DeepBoner searches 4 sources in parallel:
+1. **PubMed** - NCBI's biomedical literature database
+2. **ClinicalTrials.gov** - Clinical trial registry
+3. **Europe PMC** - Indexes **ALL of PubMed/MEDLINE** plus preprints
+4. **OpenAlex** - Indexes **250M+ works** including most of PubMed
+**Result**: The same paper appears 2-3 times in evidence because:
+- Europe PMC includes 100% of PubMed content
+- OpenAlex includes ~90% of PubMed content
+### Impact
+| Metric | Current | After Fix |
+|--------|---------|-----------|
+| Duplicate evidence | 30-50% | <5% |
+| Token waste | High | Low |
+| Judge confusion | Yes | No |
+| Report redundancy | Yes | No |
+---
+## API Documentation Review (2025-11-30)
+### OpenAlex Work Object - `ids` Field
+**Source**: [OpenAlex Work Object Docs](https://docs.openalex.org/api-entities/works/work-object)
+OpenAlex returns PMIDs in the `ids` field as **full URLs** (validated via live API testing 2025-11-30):
+```json
+{
+  "ids": {
+    "openalex": "https://openalex.org/W2741809807",
+    "doi": "https://doi.org/10.7717/peerj.4375",
+    "pmid": "https://pubmed.ncbi.nlm.nih.gov/29456894"
+  }
+}
+```
+**Live API Test (2025-11-30):**
+```bash
+curl "https://api.openalex.org/works?filter=ids.pmid:29456894&select=id,ids"
+# Returns: "pmid": "https://pubmed.ncbi.nlm.nih.gov/29456894" (always URL format, never just number)
+```
+**Current Issue**: Our `openalex.py` does NOT extract `ids.pmid` - it only uses DOI.
+**Fix Required**: Extract PMID from OpenAlex response URL and store numeric PMID in `Evidence.metadata`.
+### Europe PMC URL Patterns
+**Source**: [Europe PMC Help](https://europepmc.org/help)
+Four URL patterns exist:
+- `/article/MED/{PMID}` - PubMed/MEDLINE records
+- `/article/PMC/{PMCID}` - PubMed Central full text
+- `/article/PPR/{PPRID}` - Preprints (medRxiv, bioRxiv, etc.)
+- `/article/PAT/{PatentID}` - Patents (e.g., `PAT/WO8601415`, `PAT/EP1234567`)
+**Live API Test (2025-11-30):**
+```bash
+curl "https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=SRC:PAT&pageSize=1&format=json"
+# Returns: source: "PAT", id: "WO8601415", pmid: None
+```
+**Current Behavior**: Our `europepmc.py:92-95` already uses PubMed URL when PMID exists:
+```python
+if doi:
+    url = f"https://doi.org/{doi}"
+elif result.get("pmid"):
+    url = f"https://pubmed.ncbi.nlm.nih.gov/{result['pmid']}/"  # ← Good!
+else:
+    url = f"https://europepmc.org/article/{source_db}/{result.get('id', '')}"
+```
+**Implications**:
+1. For MEDLINE records with PMIDs → PubMed URL used → deduplication works
+2. For PMC/PPR/PAT records without PMIDs → Europe PMC URL used → treated as unique (correct behavior)
+### ClinicalTrials.gov URL Patterns
+Two URL formats exist:
+- Modern: `/study/NCT12345678` (current API v2)
+- Legacy: `/ct2/show/NCT12345678` (deprecated but may exist in old data)
+---
+## Current Code
+```python
+# src/tools/search_handler.py:51-62
+for tool, result in zip(self.tools, results, strict=True):
+    if isinstance(result, Exception):
+        errors.append(f"{tool.name}: {result!s}")
+    else:
+        success_result = cast(list[Evidence], result)
+        all_evidence.extend(success_result)  # ← NO DEDUPLICATION
+```
+---
+## Proposed Solution (Two Parts)
+### Part 1: Extract PMID from OpenAlex (REQUIRED for cross-source dedup)
+```python
+# src/tools/openalex.py - Update _to_evidence() method
+def _to_evidence(self, work: dict[str, Any]) -> Evidence:
+    """Convert OpenAlex work to Evidence with rich metadata."""
+    # ... existing code ...
+    # NEW: Extract PMID from ids object for deduplication
+    ids_obj = work.get("ids", {})
+    pmid_url = ids_obj.get("pmid")  # "https://pubmed.ncbi.nlm.nih.gov/29456894"
+    pmid = None
+    if pmid_url and "pubmed.ncbi.nlm.nih.gov" in pmid_url:
+        # Extract numeric PMID from URL
+        import re
+        pmid_match = re.search(r'/(\d+)/?$', pmid_url)
+        if pmid_match:
+            pmid = pmid_match.group(1)
+    # ... rest of existing code ...
+    return Evidence(
+        content=content[:2000],
+        citation=Citation(
+            source="openalex",
+            title=title[:500],
+            url=url,
+            date=str(year),
+            authors=authors,
+        ),
+        relevance=relevance,
+        metadata={
+            "cited_by_count": cited_by_count,
+            "concepts": concepts,
+            "is_open_access": is_oa,
+            "pdf_url": pdf_url,
+            "pmid": pmid,  # NEW: Store PMID for deduplication
+        },
+    )
+```
+### Part 2: Enhanced Deduplication Function
+```python
+# src/tools/search_handler.py
+import re
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    from src.utils.models import Evidence
+def extract_paper_id(evidence: "Evidence") -> str | None:
+    """Extract unique paper identifier from Evidence.
+    Strategy:
+    1. Check metadata.pmid first (OpenAlex provides this)
+    2. Fall back to URL pattern matching
+    Supports:
+    - PubMed: https://pubmed.ncbi.nlm.nih.gov/12345678/
+    - Europe PMC MED: https://europepmc.org/article/MED/12345678
+    - Europe PMC PMC: https://europepmc.org/article/PMC/PMC1234567
+    - Europe PMC PPR: https://europepmc.org/article/PPR/PPR123456
+    - Europe PMC PAT: https://europepmc.org/article/PAT/WO8601415
+    - DOI: https://doi.org/10.1234/...
+    - OpenAlex: https://openalex.org/W1234567890
+    - ClinicalTrials: https://clinicaltrials.gov/study/NCT12345678
+    - ClinicalTrials (legacy): https://clinicaltrials.gov/ct2/show/NCT12345678
+    """
+    url = evidence.citation.url
+    metadata = evidence.metadata or {}
+    # Strategy 1: Check metadata.pmid (from OpenAlex)
+    if pmid := metadata.get("pmid"):
+        return f"PMID:{pmid}"
+    # Strategy 2: URL pattern matching
+    # PubMed URL pattern
+    pmid_match = re.search(r'pubmed\.ncbi\.nlm\.nih\.gov/(\d+)', url)
+    if pmid_match:
+        return f"PMID:{pmid_match.group(1)}"
+    # Europe PMC MED pattern (same as PMID)
+    epmc_med_match = re.search(r'europepmc\.org/article/MED/(\d+)', url)
+    if epmc_med_match:
+        return f"PMID:{epmc_med_match.group(1)}"
+    # Europe PMC PMC pattern (PubMed Central ID - different from PMID!)
+    epmc_pmc_match = re.search(r'europepmc\.org/article/PMC/(PMC\d+)', url)
+    if epmc_pmc_match:
+        return f"PMCID:{epmc_pmc_match.group(1)}"
+    # Europe PMC PPR pattern (Preprint ID - unique per preprint)
+    epmc_ppr_match = re.search(r'europepmc\.org/article/PPR/(PPR\d+)', url)
+    if epmc_ppr_match:
+        return f"PPRID:{epmc_ppr_match.group(1)}"
+    # Europe PMC PAT pattern (Patent ID - e.g., WO8601415, EP1234567)
+    epmc_pat_match = re.search(r'europepmc\.org/article/PAT/([A-Z]{2}\d+)', url)
+    if epmc_pat_match:
+        return f"PATID:{epmc_pat_match.group(1)}"
+    # DOI pattern (normalize trailing slash/characters)
+    doi_match = re.search(r'doi\.org/(10\.\d+/[^\s\]>]+)', url)
+    if doi_match:
+        doi = doi_match.group(1).rstrip('/')
+        return f"DOI:{doi}"
+    # OpenAlex ID pattern (fallback if no PMID in metadata)
+    openalex_match = re.search(r'openalex\.org/(W\d+)', url)
+    if openalex_match:
+        return f"OAID:{openalex_match.group(1)}"
+    # ClinicalTrials NCT ID (modern format)
+    nct_match = re.search(r'clinicaltrials\.gov/study/(NCT\d+)', url)
+    if nct_match:
+        return f"NCT:{nct_match.group(1)}"
+    # ClinicalTrials NCT ID (legacy format)
+    nct_legacy_match = re.search(r'clinicaltrials\.gov/ct2/show/(NCT\d+)', url)
+    if nct_legacy_match:
+        return f"NCT:{nct_legacy_match.group(1)}"
+    return None
+def deduplicate_evidence(evidence_list: list[Evidence]) -> list[Evidence]:
+    """Remove duplicate evidence based on paper ID.
+    Deduplication priority:
+    1. PubMed (authoritative source)
+    2. Europe PMC (full text links)
+    3. OpenAlex (citation data)
+    4. ClinicalTrials (unique, never duplicated)
+    Returns:
+        Deduplicated list preserving source priority order.
+    """
+    seen_ids: set[str] = set()
+    unique: list[Evidence] = []
+    # Sort by source priority (PubMed first)
+    source_priority = {"pubmed": 0, "europepmc": 1, "openalex": 2, "clinicaltrials": 3}
+    sorted_evidence = sorted(
+        evidence_list,
+        key=lambda e: source_priority.get(e.citation.source, 99)
+    )
+    for evidence in sorted_evidence:
+        paper_id = extract_paper_id(evidence.citation.url)
+        if paper_id is None:
+            # Can't identify - keep it (conservative)
+            unique.append(evidence)
+            continue
+        if paper_id not in seen_ids:
+            seen_ids.add(paper_id)
+            unique.append(evidence)
+    return unique
+```
+### 2. Integrate into SearchHandler.execute()
+```python
+# src/tools/search_handler.py:execute() - AFTER gathering results
+# ... existing code to collect all_evidence ...
+# DEDUPLICATION STEP
+original_count = len(all_evidence)
+all_evidence = deduplicate_evidence(all_evidence)
+dedup_count = original_count - len(all_evidence)
+if dedup_count > 0:
+    logger.info(
+        "Deduplicated evidence",
+        original=original_count,
+        unique=len(all_evidence),
+        removed=dedup_count,
+    )
+return SearchResult(
+    query=query,
+    evidence=all_evidence,  # Now deduplicated
+    sources_searched=sources_searched,
+    total_found=len(all_evidence),
+    errors=errors,
+)
+```
+---
+## Test Plan
+### Unit Tests (`tests/unit/tools/test_search_handler.py`)
+```python
+import pytest
+from src.tools.search_handler import extract_paper_id, deduplicate_evidence
+from src.utils.models import Citation, Evidence
+def _make_evidence(source: str, url: str, metadata: dict | None = None) -> Evidence:
+    """Helper to create Evidence objects for testing."""
+    return Evidence(
+        content="Test content",
+        citation=Citation(
+            source=source,
+            title="Test",
+            url=url,
+            date="2024",
+            authors=[],
+        ),
+        metadata=metadata,
+    )
+class TestExtractPaperId:
+    """Tests for paper ID extraction from Evidence objects."""
+    def test_extracts_pubmed_id(self) -> None:
+        evidence = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
+        assert extract_paper_id(evidence) == "PMID:12345678"
+    def test_extracts_europepmc_med_id(self) -> None:
+        evidence = _make_evidence("europepmc", "https://europepmc.org/article/MED/12345678")
+        assert extract_paper_id(evidence) == "PMID:12345678"
+    def test_extracts_europepmc_pmc_id(self) -> None:
+        """Europe PMC PMC articles have different ID format."""
+        evidence = _make_evidence("europepmc", "https://europepmc.org/article/PMC/PMC7654321")
+        assert extract_paper_id(evidence) == "PMCID:PMC7654321"
+    def test_extracts_europepmc_ppr_id(self) -> None:
+        """Europe PMC preprints have PPR IDs."""
+        evidence = _make_evidence("europepmc", "https://europepmc.org/article/PPR/PPR123456")
+        assert extract_paper_id(evidence) == "PPRID:PPR123456"
+    def test_extracts_europepmc_pat_id(self) -> None:
+        """Europe PMC patents have PAT IDs (e.g., WO8601415, EP1234567)."""
+        evidence = _make_evidence("europepmc", "https://europepmc.org/article/PAT/WO8601415")
+        assert extract_paper_id(evidence) == "PATID:WO8601415"
+    def test_extracts_europepmc_pat_id_eu_format(self) -> None:
+        """European patent format should also work."""
+        evidence = _make_evidence("europepmc", "https://europepmc.org/article/PAT/EP1234567")
+        assert extract_paper_id(evidence) == "PATID:EP1234567"
+    def test_extracts_doi(self) -> None:
+        evidence = _make_evidence("pubmed", "https://doi.org/10.1038/nature12345")
+        assert extract_paper_id(evidence) == "DOI:10.1038/nature12345"
+    def test_extracts_doi_with_trailing_slash(self) -> None:
+        """DOIs should be normalized (trailing slash removed)."""
+        evidence = _make_evidence("pubmed", "https://doi.org/10.1038/nature12345/")
+        assert extract_paper_id(evidence) == "DOI:10.1038/nature12345"
+    def test_extracts_openalex_id_from_url(self) -> None:
+        """OpenAlex ID from URL (fallback when no PMID in metadata)."""
+        evidence = _make_evidence("openalex", "https://openalex.org/W1234567890")
+        assert extract_paper_id(evidence) == "OAID:W1234567890"
+    def test_extracts_openalex_pmid_from_metadata(self) -> None:
+        """OpenAlex PMID from metadata takes priority over URL."""
+        evidence = _make_evidence(
+            "openalex",
+            "https://openalex.org/W1234567890",
+            metadata={"pmid": "98765432"},
+        )
+        assert extract_paper_id(evidence) == "PMID:98765432"
+    def test_extracts_nct_id_modern(self) -> None:
+        evidence = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/study/NCT12345678")
+        assert extract_paper_id(evidence) == "NCT:NCT12345678"
+    def test_extracts_nct_id_legacy(self) -> None:
+        """Legacy ClinicalTrials.gov URL format should also work."""
+        evidence = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/ct2/show/NCT12345678")
+        assert extract_paper_id(evidence) == "NCT:NCT12345678"
+    def test_returns_none_for_unknown_url(self) -> None:
+        evidence = _make_evidence("unknown", "https://example.com/unknown")
+        assert extract_paper_id(evidence) is None
+class TestDeduplicateEvidence:
+    """Tests for evidence deduplication."""
+    def test_removes_pubmed_europepmc_duplicate(self) -> None:
+        """Same paper from PubMed and Europe PMC should dedupe to PubMed."""
+        pubmed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
+        europepmc = _make_evidence("europepmc", "https://europepmc.org/article/MED/12345678")
+        result = deduplicate_evidence([pubmed, europepmc])
+        assert len(result) == 1
+        assert result[0].citation.source == "pubmed"
+    def test_removes_pubmed_openalex_duplicate_via_metadata(self) -> None:
+        """OpenAlex with PMID in metadata should dedupe against PubMed."""
+        pubmed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
+        openalex = _make_evidence(
+            "openalex",
+            "https://openalex.org/W9999999",
+            metadata={"pmid": "12345678", "cited_by_count": 100},
+        )
+        result = deduplicate_evidence([pubmed, openalex])
+        assert len(result) == 1
+        assert result[0].citation.source == "pubmed"
+    def test_preserves_openalex_without_pmid(self) -> None:
+        """OpenAlex papers without PMID should NOT be deduplicated."""
+        pubmed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
+        openalex_no_pmid = _make_evidence(
+            "openalex",
+            "https://openalex.org/W9999999",
+            metadata={"cited_by_count": 100},  # No pmid key
+        )
+        result = deduplicate_evidence([pubmed, openalex_no_pmid])
+        assert len(result) == 2  # Both preserved (different IDs)
+    def test_preserves_unique_evidence(self) -> None:
+        """Different papers should not be deduplicated."""
+        e1 = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/11111111/")
+        e2 = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/22222222/")
+        result = deduplicate_evidence([e1, e2])
+        assert len(result) == 2
+    def test_keeps_unidentifiable_evidence(self) -> None:
+        """Evidence with unrecognized URLs should be preserved."""
+        unknown = _make_evidence("unknown", "https://example.com/paper/123")
+        result = deduplicate_evidence([unknown])
+        assert len(result) == 1
+    def test_clinicaltrials_unique_per_nct(self) -> None:
+        """ClinicalTrials entries have unique NCT IDs."""
+        trial1 = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/study/NCT11111111")
+        trial2 = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/study/NCT22222222")
+        result = deduplicate_evidence([trial1, trial2])
+        assert len(result) == 2
+    def test_preprints_preserved_separately(self) -> None:
+        """Preprints (PPR IDs) should not dedupe against peer-reviewed papers."""
+        peer_reviewed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
+        preprint = _make_evidence("europepmc", "https://europepmc.org/article/PPR/PPR999999")
+        result = deduplicate_evidence([peer_reviewed, preprint])
+        assert len(result) == 2  # Both preserved (different ID types)
+```
+### Integration Test
+```python
+@pytest.mark.integration
+async def test_real_search_deduplicates() -> None:
+    """Integration test: Real search should deduplicate PubMed/Europe PMC."""
+    from src.tools.pubmed import PubMedTool
+    from src.tools.europepmc import EuropePMCTool
+    from src.tools.openalex import OpenAlexTool
+    from src.tools.search_handler import SearchHandler, extract_paper_id
+    handler = SearchHandler(
+        tools=[PubMedTool(), EuropePMCTool(), OpenAlexTool()],
+        timeout=30.0,
+    )
+    result = await handler.execute("sildenafil erectile dysfunction", max_results_per_tool=5)
+    # Should have fewer results than 3x max_results due to deduplication
+    assert result.total_found < 15
+    # Check no duplicate paper IDs in final result
+    paper_ids = [
+        extract_paper_id(e)
+        for e in result.evidence
+        if extract_paper_id(e)
+    ]
+    assert len(paper_ids) == len(set(paper_ids)), f"Duplicate IDs found: {paper_ids}"
+```
+---
+## Files to Modify
+| File | Change |
+|------|--------|
+| `src/tools/openalex.py` | Extract PMID from `ids.pmid` and store in `Evidence.metadata` |
+| `src/tools/search_handler.py` | Add `extract_paper_id()`, `deduplicate_evidence()`, integrate into `execute()` |
+| `tests/unit/tools/test_search_handler.py` | Add comprehensive deduplication tests |
+| `tests/unit/tools/test_openalex.py` | Add test for PMID extraction |
+---
+## Acceptance Criteria
+- [ ] `openalex.py` extracts PMID from `work.ids.pmid` and stores in `Evidence.metadata`
+- [ ] `extract_paper_id()` checks `Evidence.metadata.pmid` first (for OpenAlex cross-dedup)
+- [ ] `extract_paper_id()` correctly parses all URL patterns:
+  - PubMed URLs
+  - Europe PMC MED, PMC, PPR, and PAT URLs
+  - DOIs (normalized without trailing slash)
+  - OpenAlex W-IDs (fallback)
+  - ClinicalTrials NCT IDs (modern and legacy formats)
+- [ ] `deduplicate_evidence()` removes duplicates while preserving source priority
+- [ ] Unit tests cover all edge cases including OpenAlex with/without PMID
+- [ ] Integration test confirms real searches are deduplicated
+- [ ] Logging shows deduplication metrics
+---
+## Known Limitations
+1. **OpenAlex without PMID**: Some OpenAlex papers (especially older or non-biomedical) may not have PMIDs. These will NOT be deduplicated against PubMed. This is acceptable - we prefer false negatives (keeping duplicates) over false positives (removing unique papers).
+2. **PMC vs PMID**: PubMed Central IDs (PMC1234567) are different from PubMed IDs (12345678). A paper may have both, but we treat them as different identifiers. This may result in some duplicates not being caught when Europe PMC returns a PMC URL and PubMed returns a PMID URL for the same paper. Future enhancement: Use NCBI ID Converter API.
+3. **Preprint → Peer-Reviewed**: When a preprint (PPR ID) is later published as a peer-reviewed paper (PMID), they will have different IDs and won't be deduplicated. This is intentional - users may want to see both versions.
+---
+## Rollback Plan
+If deduplication causes issues:
+1. Remove the `deduplicate_evidence()` call from `execute()`
+2. All tests should still pass (deduplication is additive)
+3. No need to revert OpenAlex PMID extraction (metadata only)
+---
+## References
+- GitHub Issue #94
+- [OpenAlex Work Object Documentation](https://docs.openalex.org/api-entities/works/work-object)
+- [Europe PMC Help - ID Types](https://europepmc.org/help)
+- [ClinicalTrials.gov API v2](https://clinicaltrials.gov/data-api/api)
+- `TOOL_ANALYSIS_CRITICAL.md` - "Cross-Tool Issues > Issue 1: MASSIVE DUPLICATION"

docs/specs/SPEC_14_CLINICALTRIALS_OUTCOMES.md ADDED Viewed

	@@ -0,0 +1,466 @@

+# SPEC_14: Add Outcome Measures to ClinicalTrials.gov Fields
+**Status**: Draft (Validated via API Documentation Review)
+**Priority**: P1
+**GitHub Issue**: #95
+**Estimated Effort**: Small (~40 lines of code)
+**Last Updated**: 2025-11-30
+---
+## Problem Statement
+The `ClinicalTrialsTool` retrieves trial metadata but **misses critical efficacy data**:
+### Current Fields Retrieved
+```python
+# src/tools/clinicaltrials.py:24-33
+FIELDS: ClassVar[list[str]] = [
+    "NCTId",
+    "BriefTitle",
+    "Phase",
+    "OverallStatus",
+    "Condition",
+    "InterventionName",
+    "StartDate",
+    "BriefSummary",
+]
+```
+### Missing Data (Critical for Research)
+| Data | Location in Response | Purpose |
+|------|---------------------|---------|
+| Primary Outcomes | `protocolSection.outcomesModule.primaryOutcomes[].measure` | Main efficacy endpoint |
+| Secondary Outcomes | `protocolSection.outcomesModule.secondaryOutcomes[].measure` | Additional endpoints |
+| Has Results | `study.hasResults` (top-level) | Whether results are posted |
+| Results Date | `protocolSection.statusModule.resultsFirstPostDateStruct.date` | When results posted |
+### Impact
+**Current Output**:
+```
+Trial Phase: PHASE3. Status: COMPLETED. Conditions: Erectile Dysfunction.
+Interventions: Sildenafil.
+```
+**Desired Output**:
+```
+Trial Phase: PHASE3. Status: COMPLETED. Conditions: Erectile Dysfunction.
+Interventions: Sildenafil.
+Primary Outcome: Change from baseline in IIEF-EF domain score at Week 12.
+Results Available: Yes (posted 2024-01-15).
+```
+---
+## API Documentation Review (2025-11-30)
+### ClinicalTrials.gov API v2 Response Structure
+**Source**: [Stack Overflow - ClinicalTrials.gov API v2](https://stackoverflow.com/questions/78415818)
+The API returns nested JSON. Key findings:
+1. **`hasResults`** is a **top-level** field on each study object (NOT inside `protocolSection`)
+2. **Outcomes** are in `protocolSection.outcomesModule`:
+   ```python
+   study['protocolSection']['outcomesModule']['primaryOutcomes']  # List
+   study['protocolSection']['outcomesModule']['secondaryOutcomes']  # List
+   ```
+3. **Results date** is in `protocolSection.statusModule.resultsFirstPostDateStruct.date`
+### `fields` Parameter Behavior (VERIFIED VIA LIVE API TESTING)
+The `fields` query parameter filters what the API returns. **If you don't request a field, you don't get it.**
+**Live API Test Results (2025-11-30):**
+```bash
+# Test 1: With limited fields - NO outcomesModule returned
+curl "...&fields=NCTId,BriefTitle"
+# → Returns ONLY: protocolSection.identificationModule.{nctId, briefTitle}
+# Test 2: Without fields param - outcomesModule IS present
+curl "...&pageSize=1"
+# → Returns: hasResults: false, outcomesModule: {primaryOutcomes, secondaryOutcomes, otherOutcomes}
+# Test 3: Valid field names for outcomes
+curl "...&fields=NCTId,OutcomesModule"  # ✅ Works - returns full outcomesModule
+curl "...&fields=NCTId,PrimaryOutcome"  # ✅ Works - returns only primaryOutcomes
+curl "...&fields=NCTId,HasResults"      # ✅ Works - returns hasResults at top level
+```
+**Valid Field Names (Tested):**
+- `OutcomesModule` → Returns full `protocolSection.outcomesModule` with all outcomes
+- `PrimaryOutcome` → Returns only `primaryOutcomes` array
+- `SecondaryOutcome` → Returns only `secondaryOutcomes` array
+- `HasResults` → Returns `hasResults` at study top level
+---
+## Proposed Solution
+### ✅ UPDATE FIELDS Constant (REQUIRED)
+The current implementation explicitly passes `fields=",".join(self.FIELDS)` at line 67.
+**The API ONLY returns requested fields.** We MUST add the new field names.
+```python
+# src/tools/clinicaltrials.py - UPDATE FIELDS
+FIELDS: ClassVar[list[str]] = [
+    "NCTId",
+    "BriefTitle",
+    "Phase",
+    "OverallStatus",
+    "Condition",
+    "InterventionName",
+    "StartDate",
+    "BriefSummary",
+    # NEW: Outcome measures (verified via live API testing 2025-11-30)
+    "OutcomesModule",  # Returns protocolSection.outcomesModule.{primaryOutcomes, secondaryOutcomes}
+    "HasResults",      # Returns study.hasResults (top-level boolean)
+]
+```
+### ✅ Update `_study_to_evidence()` Method
+```python
+def _study_to_evidence(self, study: dict[str, Any]) -> Evidence:
+    """Convert a clinical trial study to Evidence."""
+    # Navigate nested structure
+    protocol = study.get("protocolSection", {})
+    id_module = protocol.get("identificationModule", {})
+    status_module = protocol.get("statusModule", {})
+    desc_module = protocol.get("descriptionModule", {})
+    design_module = protocol.get("designModule", {})
+    conditions_module = protocol.get("conditionsModule", {})
+    arms_module = protocol.get("armsInterventionsModule", {})
+    outcomes_module = protocol.get("outcomesModule", {})  # NEW
+    # ... existing field extraction (nct_id, title, status, phase, etc.) ...
+    # NEW: Extract outcome measures
+    primary_outcomes = outcomes_module.get("primaryOutcomes", [])
+    primary_outcome_str = ""
+    if primary_outcomes:
+        # Get first primary outcome measure and timeframe
+        first = primary_outcomes[0]
+        measure = first.get("measure", "")
+        timeframe = first.get("timeFrame", "")
+        # Truncate long outcome descriptions
+        primary_outcome_str = measure[:200]
+        if timeframe:
+            primary_outcome_str += f" (measured at {timeframe})"
+    secondary_outcomes = outcomes_module.get("secondaryOutcomes", [])
+    secondary_count = len(secondary_outcomes)
+    # NEW: Check if results are available (hasResults is TOP-LEVEL, not in protocol!)
+    has_results = study.get("hasResults", False)
+    # Results date is in statusModule (nested inside date struct)
+    results_date_struct = status_module.get("resultsFirstPostDateStruct", {})
+    results_date = results_date_struct.get("date", "")
+    # Build content with key trial info (UPDATED)
+    content_parts = [
+        f"{summary[:400]}...",
+        f"Trial Phase: {phase}.",
+        f"Status: {status}.",
+        f"Conditions: {conditions_str}.",
+        f"Interventions: {interventions_str}.",
+    ]
+    if primary_outcome_str:
+        content_parts.append(f"Primary Outcome: {primary_outcome_str}.")
+    if secondary_count > 0:
+        content_parts.append(f"Secondary Outcomes: {secondary_count} additional endpoints.")
+    if has_results:
+        results_info = "Results Available: Yes"
+        if results_date:
+            results_info += f" (posted {results_date})"
+        content_parts.append(results_info + ".")
+    else:
+        content_parts.append("Results Available: Not yet posted.")
+    content = " ".join(content_parts)
+    return Evidence(
+        content=content[:2000],
+        citation=Citation(
+            source="clinicaltrials",
+            title=title[:500],
+            url=f"https://clinicaltrials.gov/study/{nct_id}",
+            date=start_date,
+            authors=[],
+        ),
+        relevance=0.90 if has_results else 0.85,  # Boost relevance for trials with results
+    )
+```
+---
+## API Reference
+The ClinicalTrials.gov API v2 returns nested JSON:
+```json
+{
+  "protocolSection": {
+    "outcomesModule": {
+      "primaryOutcomes": [
+        {
+          "measure": "Change from Baseline in IIEF-EF Domain Score",
+          "description": "...",
+          "timeFrame": "Baseline to Week 12"
+        }
+      ],
+      "secondaryOutcomes": [
+        {
+          "measure": "Subject Global Assessment Question",
+          "timeFrame": "Week 12"
+        }
+      ]
+    }
+  },
+  "hasResults": true
+}
+```
+See: https://clinicaltrials.gov/data-api/api
+---
+## Test Plan
+### Unit Tests (`tests/unit/tools/test_clinicaltrials.py`)
+```python
+@pytest.mark.unit
+class TestClinicalTrialsOutcomes:
+    """Tests for outcome measure extraction."""
+    @pytest.mark.asyncio
+    async def test_extracts_primary_outcome(self, tool: ClinicalTrialsTool) -> None:
+        """Test that primary outcome is extracted from response."""
+        mock_study = {
+            "protocolSection": {
+                "identificationModule": {"nctId": "NCT12345678", "briefTitle": "Test"},
+                "statusModule": {"overallStatus": "COMPLETED", "startDateStruct": {"date": "2023"}},
+                "descriptionModule": {"briefSummary": "Summary"},
+                "designModule": {"phases": ["PHASE3"]},
+                "conditionsModule": {"conditions": ["ED"]},
+                "armsInterventionsModule": {"interventions": []},
+                "outcomesModule": {
+                    "primaryOutcomes": [
+                        {
+                            "measure": "Change in IIEF-EF score",
+                            "timeFrame": "Week 12"
+                        }
+                    ]
+                },
+            },
+            "hasResults": True,
+        }
+        mock_response = MagicMock()
+        mock_response.json.return_value = {"studies": [mock_study]}
+        mock_response.raise_for_status = MagicMock()
+        with patch("requests.get", return_value=mock_response):
+            results = await tool.search("test", max_results=1)
+            assert len(results) == 1
+            assert "Primary Outcome" in results[0].content
+            assert "IIEF-EF" in results[0].content
+            assert "Week 12" in results[0].content
+    @pytest.mark.asyncio
+    async def test_includes_results_status(self, tool: ClinicalTrialsTool) -> None:
+        """Test that results availability is shown."""
+        mock_study = {
+            "protocolSection": {
+                "identificationModule": {"nctId": "NCT12345678", "briefTitle": "Test"},
+                "statusModule": {
+                    "overallStatus": "COMPLETED",
+                    "startDateStruct": {"date": "2023"},
+                    # Note: resultsFirstPostDateStruct, not resultsFirstSubmitDate
+                    "resultsFirstPostDateStruct": {"date": "2024-06-15"},
+                },
+                "descriptionModule": {"briefSummary": "Summary"},
+                "designModule": {"phases": ["PHASE3"]},
+                "conditionsModule": {"conditions": ["ED"]},
+                "armsInterventionsModule": {"interventions": []},
+                "outcomesModule": {},
+            },
+            "hasResults": True,  # Note: hasResults is TOP-LEVEL
+        }
+        mock_response = MagicMock()
+        mock_response.json.return_value = {"studies": [mock_study]}
+        mock_response.raise_for_status = MagicMock()
+        with patch("requests.get", return_value=mock_response):
+            results = await tool.search("test", max_results=1)
+            assert "Results Available: Yes" in results[0].content
+            assert "2024-06-15" in results[0].content
+    @pytest.mark.asyncio
+    async def test_shows_no_results_when_missing(self, tool: ClinicalTrialsTool) -> None:
+        """Test that missing results are indicated."""
+        mock_study = {
+            "protocolSection": {
+                "identificationModule": {"nctId": "NCT12345678", "briefTitle": "Test"},
+                "statusModule": {"overallStatus": "RECRUITING", "startDateStruct": {"date": "2024"}},
+                "descriptionModule": {"briefSummary": "Summary"},
+                "designModule": {"phases": ["PHASE2"]},
+                "conditionsModule": {"conditions": ["ED"]},
+                "armsInterventionsModule": {"interventions": []},
+                "outcomesModule": {},
+            },
+            "hasResults": False,
+        }
+        mock_response = MagicMock()
+        mock_response.json.return_value = {"studies": [mock_study]}
+        mock_response.raise_for_status = MagicMock()
+        with patch("requests.get", return_value=mock_response):
+            results = await tool.search("test", max_results=1)
+            assert "Results Available: Not yet posted" in results[0].content
+    @pytest.mark.asyncio
+    async def test_boosts_relevance_for_results(self, tool: ClinicalTrialsTool) -> None:
+        """Trials with results should have higher relevance score."""
+        with_results = {
+            "protocolSection": {
+                "identificationModule": {"nctId": "NCT11111111", "briefTitle": "With Results"},
+                "statusModule": {"overallStatus": "COMPLETED", "startDateStruct": {"date": "2023"}},
+                "descriptionModule": {"briefSummary": "Summary"},
+                "designModule": {"phases": []},
+                "conditionsModule": {"conditions": []},
+                "armsInterventionsModule": {"interventions": []},
+                "outcomesModule": {},
+            },
+            "hasResults": True,
+        }
+        without_results = {
+            "protocolSection": {
+                "identificationModule": {"nctId": "NCT22222222", "briefTitle": "No Results"},
+                "statusModule": {"overallStatus": "RECRUITING", "startDateStruct": {"date": "2024"}},
+                "descriptionModule": {"briefSummary": "Summary"},
+                "designModule": {"phases": []},
+                "conditionsModule": {"conditions": []},
+                "armsInterventionsModule": {"interventions": []},
+                "outcomesModule": {},
+            },
+            "hasResults": False,
+        }
+        mock_response = MagicMock()
+        mock_response.json.return_value = {"studies": [with_results, without_results]}
+        mock_response.raise_for_status = MagicMock()
+        with patch("requests.get", return_value=mock_response):
+            results = await tool.search("test", max_results=2)
+            assert results[0].relevance == 0.90  # With results
+            assert results[1].relevance == 0.85  # Without results
+```
+### Integration Test
+```python
+@pytest.mark.integration
+class TestClinicalTrialsOutcomesIntegration:
+    """Integration tests with real API."""
+    @pytest.mark.asyncio
+    async def test_real_completed_trial_has_outcome(self) -> None:
+        """Real completed Phase 3 trials should have outcome measures."""
+        tool = ClinicalTrialsTool()
+        # Search for completed Phase 3 ED trials (likely to have outcomes)
+        results = await tool.search(
+            "sildenafil erectile dysfunction Phase 3 COMPLETED",
+            max_results=3
+        )
+        # At least one should have primary outcome
+        has_outcome = any("Primary Outcome" in r.content for r in results)
+        assert has_outcome, "No completed trials with outcome measures found"
+```
+---
+## Files to Modify
+| File | Change |
+|------|--------|
+| `src/tools/clinicaltrials.py` | **ADD** `OutcomesModule` and `HasResults` to `FIELDS`, update `_study_to_evidence()` |
+| `tests/unit/tools/test_clinicaltrials.py` | Add outcome parsing tests |
+---
+## Acceptance Criteria
+### FIELDS Constant (REQUIRED CHANGE)
+- [ ] `FIELDS` includes `"OutcomesModule"` (returns full outcomesModule)
+- [ ] `FIELDS` includes `"HasResults"` (returns top-level boolean)
+### `_study_to_evidence()` Method
+- [ ] Extracts `protocolSection.outcomesModule.primaryOutcomes`
+- [ ] Accesses `study.hasResults` at TOP LEVEL (not inside protocolSection)
+- [ ] Results date extracted from `statusModule.resultsFirstPostDateStruct.date`
+- [ ] Evidence content includes primary outcome measure when available
+- [ ] Evidence content shows results availability status
+- [ ] Outcome measure text truncated to 200 chars
+- [ ] Trials with results have boosted relevance (0.90 vs 0.85)
+### Testing
+- [ ] All unit tests pass
+- [ ] Integration test confirms real trials return outcome data
+- [ ] Live API test confirms `OutcomesModule` and `HasResults` fields work
+---
+## Edge Cases
+1. **No outcomes defined**: Some early-phase trials don't have outcomes yet
+   - Solution: Gracefully skip outcome section if `outcomesModule` is empty or missing
+2. **Multiple primary outcomes**: Some trials have 2-3 primary outcomes
+   - Solution: Show first outcome only, mention count of others
+3. **Long outcome descriptions**: Some measures are very verbose (500+ chars)
+   - Solution: Truncate measure to 200 chars with `[:200]`
+4. **hasResults without resultsFirstPostDateStruct**: Some completed trials may have results without a posted date
+   - Solution: Show "Results Available: Yes" without date
+5. **outcomesModule missing entirely**: Not all API responses include this module
+   - Solution: Use `.get("outcomesModule", {})` for safe access
+---
+## Rollback Plan
+If outcome extraction causes issues:
+1. DO NOT modify `FIELDS` - nothing to revert there
+2. Remove outcome extraction code from `_study_to_evidence()`
+3. Existing tests should still pass
+---
+## References
+- GitHub Issue #95
+- [ClinicalTrials.gov API v2 Studies Endpoint](https://clinicaltrials.gov/data-api/api)
+- [Stack Overflow - ClinicalTrials.gov API v2 Response Structure](https://stackoverflow.com/questions/78415818)
+- `TOOL_ANALYSIS_CRITICAL.md` - "Tool 2: ClinicalTrials.gov > Current Implementation Gaps"

docs/specs/SPEC_15_ADVANCED_MODE_PERFORMANCE.md ADDED Viewed

	@@ -0,0 +1,478 @@

+# SPEC_15: Advanced Mode Performance Optimization
+**Status**: Draft (Validated - Implement All Solutions)
+**Priority**: P1
+**GitHub Issue**: #65
+**Estimated Effort**: Medium (config changes + early termination logic)
+**Last Updated**: 2025-11-30
+> **Senior Review Verdict**: ✅ APPROVED
+> **Recommendation**: Implement Solution A + B + C together. Solution B (Early Termination) is NOT "post-hackathon" - it's the core fix that solves the root cause. The patterns used are consistent with Microsoft Agent Framework best practices.
+---
+## Problem Statement
+Advanced (Multi-Agent) mode runs **10 rounds of multi-agent coordination** which takes **10-15+ minutes**.
+**For hackathon demos**: No judge will wait this long. They'll close the tab before seeing results.
+### Observed Behavior
+- System works correctly (no crashes)
+- Produces detailed, high-quality research output
+- Takes too long for practical demo use
+- User had to manually terminate after ~10 minutes
+### Current Configuration
+```python
+# src/orchestrators/advanced.py:133
+.with_standard_manager(
+    chat_client=manager_client,
+    max_round_count=self._max_rounds,  # Default: 10
+    max_stall_count=3,
+    max_reset_count=2,
+)
+```
+### Time Breakdown (Estimated)
+| Component | Time per Round | Notes |
+|-----------|---------------|-------|
+| Manager LLM call | 2-5s | Decides next agent |
+| Search Agent | 10-20s | 4 API calls (PubMed, CT, EPMC, OA) |
+| Hypothesis Agent | 5-10s | LLM reasoning |
+| Judge Agent | 5-10s | LLM evaluation |
+| Report Agent | 10-20s | LLM synthesis (when called) |
+**Total per round**: ~30-60 seconds
+**10 rounds**: 5-10 minutes minimum
+---
+## Root Cause Analysis
+### Issue 1: Default `max_rounds=10` is Too High
+The Microsoft Agent Framework keeps iterating until:
+1. `max_rounds` reached, OR
+2. Manager decides workflow is complete
+For research tasks, the manager often wants "more evidence" and keeps searching.
+### Issue 2: No Early Termination Heuristic
+Even when the Judge says `sufficient=True` with high confidence, the workflow continues because the manager wants to be thorough.
+### Issue 3: No User Expectation Setting
+Users don't know how long to expect. Progress indication is minimal.
+---
+## Proposed Solutions
+### Solution A: Reduce Default `max_rounds` (QUICK FIX)
+**Change**: Reduce `max_rounds` from 10 to 5 (or make configurable via env).
+```python
+# src/orchestrators/advanced.py
+def __init__(
+    self,
+    max_rounds: int | None = None,  # Changed from 10
+    ...
+) -> None:
+    # Read from environment, default to 5 for faster demos
+    default_rounds = int(os.getenv("ADVANCED_MAX_ROUNDS", "5"))
+    self._max_rounds = max_rounds if max_rounds is not None else default_rounds
+```
+**Pros**:
+- Simple, 2-line change
+- Immediately halves demo time
+**Cons**:
+- Less thorough research
+- Trade-off: speed vs. quality
+### Solution B: Early Termination on High-Confidence Judge (RECOMMENDED)
+**Change**: Add workflow termination signal when Judge returns `sufficient=True` with confidence > 70%.
+This requires modifying the JudgeAgent to signal completion:
+```python
+# src/agents/magentic_agents.py - create_judge_agent()
+@chat_agent.on_message
+async def handle_judge_message(message: str, context: Context) -> ChatMessage:
+    """Process judge request and potentially signal completion."""
+    # ... existing judge logic ...
+    assessment = await judge_handler.evaluate(evidence, query)
+    if assessment.sufficient and assessment.confidence >= 0.70:
+        # Signal to manager that we have enough evidence
+        # The manager prompt should respect this signal
+        return ChatMessage(
+            content=f"SUFFICIENT EVIDENCE (confidence: {assessment.confidence:.0%}). "
+            f"Recommend immediate synthesis. {assessment.reasoning}",
+            metadata={"sufficient": True, "confidence": assessment.confidence},
+        )
+    return ChatMessage(content=f"INSUFFICIENT: {assessment.reasoning}")
+```
+And update the manager's system prompt to respect this:
+```python
+# src/orchestrators/advanced.py - _build_workflow()
+manager_system_prompt = """You are a research workflow manager.
+IMPORTANT: When JudgeAgent returns "SUFFICIENT EVIDENCE", immediately
+delegate to ReportAgent for final synthesis. Do NOT continue searching.
+Workflow:
+1. SearchAgent finds evidence
+2. HypothesisAgent generates hypotheses
+3. JudgeAgent evaluates sufficiency
+4. IF sufficient → ReportAgent synthesizes (END)
+5. IF insufficient → SearchAgent refines search (CONTINUE)
+"""
+```
+**Pros**:
+- Respects actual evidence quality
+- Can terminate early (round 3-4) when evidence is strong
+- Maintains quality for complex queries
+**Cons**:
+- Requires testing to ensure manager respects signal
+- More complex change
+### Solution C: Better Progress Indication
+Add estimated time remaining to UI:
+```python
+# src/orchestrators/advanced.py - run()
+yield AgentEvent(
+    type="progress",
+    message=f"Round {iteration}/{self._max_rounds} "
+            f"(~{(self._max_rounds - iteration) * 45}s remaining)",
+    iteration=iteration,
+)
+```
+**Pros**:
+- Sets user expectations
+- Doesn't change workflow behavior
+**Cons**:
+- Doesn't actually speed up the workflow
+---
+## Recommended Implementation
+**IMPLEMENT ALL THREE SOLUTIONS NOW**:
+1. **Solution A**: Reduce `max_rounds` to 5 via environment variable
+2. **Solution B**: Early termination when Judge returns `sufficient=True` with confidence ≥70%
+3. **Solution C**: Better progress indication with time estimates
+> **Why Solution B NOW?** The Manager acting as a "termination condition" based on Judge feedback is a standard multi-agent pattern (Critique/Refine loop with exit). This aligns with Microsoft Agent Framework best practices and solves the ROOT CAUSE, not just a symptom.
+---
+## Implementation Details
+### Phase 1: All Solutions Together (A + B + C)
+#### 1. Update Advanced Orchestrator Constructor
+```python
+# src/orchestrators/advanced.py
+import os
+class AdvancedOrchestrator(OrchestratorProtocol):
+    def __init__(
+        self,
+        max_rounds: int | None = None,
+        chat_client: OpenAIChatClient | None = None,
+        api_key: str | None = None,
+        timeout_seconds: float = 300.0,  # Reduced from 600 to 5 min
+        domain: ResearchDomain | str | None = None,
+    ) -> None:
+        # Environment-configurable rounds (default 5 for demos)
+        default_rounds = int(os.getenv("ADVANCED_MAX_ROUNDS", "5"))
+        self._max_rounds = max_rounds if max_rounds is not None else default_rounds
+        self._timeout_seconds = timeout_seconds
+        # ... rest unchanged ...
+```
+#### 2. Add Progress Estimation
+```python
+# src/orchestrators/advanced.py - run()
+# After processing MagenticAgentMessageEvent:
+if isinstance(event, MagenticAgentMessageEvent):
+    iteration += 1
+    rounds_remaining = self._max_rounds - iteration
+    # Estimate ~45s per round based on observed timing
+    est_seconds = rounds_remaining * 45
+    est_display = f"{est_seconds // 60}m {est_seconds % 60}s" if est_seconds >= 60 else f"{est_seconds}s"
+    yield AgentEvent(
+        type="progress",
+        message=f"Round {iteration}/{self._max_rounds} (~{est_display} remaining)",
+        iteration=iteration,
+    )
+```
+#### 3. Update UI Message (Solution C)
+```python
+# src/orchestrators/advanced.py - run()
+# UX FIX: More accurate timing message
+yield AgentEvent(
+    type="thinking",
+    message=(
+        f"Multi-agent reasoning in progress ({self._max_rounds} rounds max)... "
+        f"Estimated time: {self._max_rounds * 45 // 60}-{self._max_rounds * 60 // 60} minutes."
+    ),
+    iteration=0,
+)
+```
+#### 4. Add Early Termination Signal (Solution B)
+```python
+# src/agents/magentic_agents.py - Update create_judge_agent()
+@chat_agent.on_message
+async def handle_judge_message(message: str, context: Context) -> ChatMessage:
+    """Process judge request and signal completion when evidence is sufficient."""
+    # ... existing parsing logic to extract evidence and query ...
+    assessment = await judge_handler.evaluate(evidence, query)
+    # NEW: Strong termination signal for high-confidence assessments
+    if assessment.sufficient and assessment.confidence >= 0.70:
+        # Clear, unambiguous signal that Manager should respect
+        return ChatMessage(
+            content=(
+                f"✅ SUFFICIENT EVIDENCE (confidence: {assessment.confidence:.0%}). "
+                f"STOP SEARCHING. Delegate to ReportAgent NOW for final synthesis. "
+                f"Reasoning: {assessment.reasoning}"
+            ),
+            metadata={"sufficient": True, "confidence": assessment.confidence},
+        )
+    # Insufficient - continue the loop
+    return ChatMessage(
+        content=(
+            f"❌ INSUFFICIENT: {assessment.reasoning}. "
+            f"Confidence: {assessment.confidence:.0%}. "
+            f"Suggested refinements: {', '.join(assessment.next_search_queries[:2])}"
+        )
+    )
+```
+#### 5. Update Manager System Prompt (Solution B)
+```python
+# src/orchestrators/advanced.py - _build_workflow()
+MANAGER_SYSTEM_PROMPT = """You are a medical research workflow manager.
+## CRITICAL RULE
+When JudgeAgent says "SUFFICIENT EVIDENCE" or "STOP SEARCHING":
+→ IMMEDIATELY delegate to ReportAgent for synthesis
+→ Do NOT continue searching or gathering more evidence
+→ The Judge has determined evidence quality is adequate
+## Standard Workflow
+1. SearchAgent → finds evidence from PubMed, ClinicalTrials, etc.
+2. HypothesisAgent → generates testable hypotheses
+3. JudgeAgent → evaluates evidence sufficiency
+4. IF sufficient → ReportAgent (DONE)
+5. IF insufficient → SearchAgent with refined queries (CONTINUE)
+## Your Role
+- Coordinate agents efficiently
+- Respect the Judge's termination signal
+- Prioritize completing the task over perfection
+"""
+```
+---
+## Test Plan
+### Unit Tests
+```python
+# tests/unit/orchestrators/test_advanced_orchestrator.py
+import os
+from unittest.mock import patch
+import pytest
+from src.orchestrators.advanced import AdvancedOrchestrator
+class TestAdvancedOrchestratorConfig:
+    """Tests for configuration options."""
+    def test_default_max_rounds_is_five(self) -> None:
+        """Default max_rounds should be 5 for faster demos."""
+        with patch.dict(os.environ, {}, clear=True):
+            # Clear any existing env var
+            os.environ.pop("ADVANCED_MAX_ROUNDS", None)
+            orch = AdvancedOrchestrator.__new__(AdvancedOrchestrator)
+            orch.__init__()
+            assert orch._max_rounds == 5
+    def test_max_rounds_from_env(self) -> None:
+        """max_rounds should be configurable via environment."""
+        with patch.dict(os.environ, {"ADVANCED_MAX_ROUNDS": "3"}):
+            orch = AdvancedOrchestrator.__new__(AdvancedOrchestrator)
+            orch.__init__()
+            assert orch._max_rounds == 3
+    def test_explicit_max_rounds_overrides_env(self) -> None:
+        """Explicit parameter should override environment."""
+        with patch.dict(os.environ, {"ADVANCED_MAX_ROUNDS": "3"}):
+            orch = AdvancedOrchestrator.__new__(AdvancedOrchestrator)
+            orch.__init__(max_rounds=7)
+            assert orch._max_rounds == 7
+    def test_timeout_default_is_five_minutes(self) -> None:
+        """Default timeout should be 300s (5 min) for faster failure."""
+        orch = AdvancedOrchestrator.__new__(AdvancedOrchestrator)
+        orch.__init__()
+        assert orch._timeout_seconds == 300.0
+```
+### Integration Test (Manual)
+```bash
+# Run advanced mode with reduced rounds
+ADVANCED_MAX_ROUNDS=3 uv run python -c "
+import asyncio
+from src.orchestrators.advanced import AdvancedOrchestrator
+async def test():
+    orch = AdvancedOrchestrator()
+    print(f'Max rounds: {orch._max_rounds}')  # Should be 3
+    async for event in orch.run('sildenafil mechanism'):
+        print(f'{event.type}: {event.message[:100]}...')
+asyncio.run(test())
+"
+```
+### Timing Benchmark
+Create a benchmark script to measure actual performance:
+```python
+# examples/benchmark_advanced.py
+"""Benchmark Advanced mode with different max_rounds settings."""
+import asyncio
+import os
+import time
+async def benchmark(max_rounds: int) -> float:
+    """Run benchmark with specified rounds, return elapsed time."""
+    os.environ["ADVANCED_MAX_ROUNDS"] = str(max_rounds)
+    # Import after setting env
+    from src.orchestrators.advanced import AdvancedOrchestrator
+    orch = AdvancedOrchestrator()
+    start = time.time()
+    async for event in orch.run("sildenafil erectile dysfunction"):
+        if event.type == "complete":
+            break
+    return time.time() - start
+async def main() -> None:
+    """Run benchmarks for different configurations."""
+    for rounds in [3, 5, 7, 10]:
+        elapsed = await benchmark(rounds)
+        print(f"max_rounds={rounds}: {elapsed:.1f}s ({elapsed/60:.1f}min)")
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+---
+## Files to Modify
+| File | Change |
+|------|--------|
+| `src/orchestrators/advanced.py` | Add env-configurable `max_rounds`, reduce default to 5, add progress estimation, update Manager prompt |
+| `src/agents/magentic_agents.py` | Add early termination signal in JudgeAgent |
+| `tests/unit/orchestrators/test_advanced_orchestrator.py` | Add config tests |
+| `tests/unit/agents/test_magentic_judge_termination.py` | Add termination signal tests |
+| `examples/benchmark_advanced.py` | Add timing benchmark (optional) |
+---
+## Acceptance Criteria
+### Solution A: Configuration
+- [ ] Default `max_rounds` is 5 (not 10)
+- [ ] `max_rounds` configurable via `ADVANCED_MAX_ROUNDS` env var
+- [ ] Explicit `max_rounds` parameter overrides env var
+- [ ] Default timeout is 5 minutes (300s, not 600s)
+### Solution B: Early Termination
+- [ ] JudgeAgent returns "SUFFICIENT EVIDENCE" message when confidence ≥70%
+- [ ] JudgeAgent returns "STOP SEARCHING" in termination signal
+- [ ] Manager system prompt includes explicit termination instructions
+- [ ] Workflow terminates early when Judge signals sufficiency (observed in logs)
+### Solution C: Progress Indication
+- [ ] Progress events show current round / max rounds
+- [ ] Progress events show estimated time remaining
+- [ ] Initial "thinking" message shows estimated total time
+### Overall
+- [ ] Demo completes in <5 minutes with useful output
+- [ ] Quality of output is maintained (no degradation from early termination)
+---
+## Rollback Plan
+If reduced rounds cause quality issues:
+1. Increase `ADVANCED_MAX_ROUNDS` environment variable
+2. No code changes needed
+---
+## References
+- GitHub Issue #65
+- Microsoft Agent Framework: https://github.com/microsoft/agent-framework
+- MagenticBuilder docs: Round configuration

docs/specs/{SPEC_01_DEMO_TERMINATION.md → archive/SPEC_01_DEMO_TERMINATION.md} RENAMED Viewed

File without changes

docs/specs/{SPEC_02_E2E_TESTING.md → archive/SPEC_02_E2E_TESTING.md} RENAMED Viewed

File without changes

docs/specs/{SPEC_03_OPENALEX_INTEGRATION.md → archive/SPEC_03_OPENALEX_INTEGRATION.md} RENAMED Viewed

File without changes

docs/specs/{SPEC_04_MAGENTIC_UX.md → archive/SPEC_04_MAGENTIC_UX.md} RENAMED Viewed

File without changes

docs/specs/{SPEC_05_ORCHESTRATOR_CLEANUP.md → archive/SPEC_05_ORCHESTRATOR_CLEANUP.md} RENAMED Viewed

File without changes

docs/specs/{SPEC_06_SIMPLE_MODE_SYNTHESIS.md → archive/SPEC_06_SIMPLE_MODE_SYNTHESIS.md} RENAMED Viewed

File without changes

docs/specs/{SPEC_07_LANGGRAPH_MEMORY_ARCH.md → archive/SPEC_07_LANGGRAPH_MEMORY_ARCH.md} RENAMED Viewed

File without changes

docs/specs/{SPEC_08_INTEGRATE_MEMORY_LAYER.md → archive/SPEC_08_INTEGRATE_MEMORY_LAYER.md} RENAMED Viewed

File without changes

docs/specs/{SPEC_09_LLAMAINDEX_INTEGRATION.md → archive/SPEC_09_LLAMAINDEX_INTEGRATION.md} RENAMED Viewed

File without changes

docs/specs/{SPEC_10_DOMAIN_AGNOSTIC_REFACTOR.md → archive/SPEC_10_DOMAIN_AGNOSTIC_REFACTOR.md} RENAMED Viewed

File without changes

docs/specs/{SPEC_11_SEXUAL_HEALTH_FOCUS.md → archive/SPEC_11_SEXUAL_HEALTH_FOCUS.md} RENAMED Viewed

File without changes

docs/specs/{SPEC_12_NARRATIVE_SYNTHESIS.md → archive/SPEC_12_NARRATIVE_SYNTHESIS.md} RENAMED Viewed

File without changes

src/tools/openalex.py CHANGED Viewed

@@ -1,5 +1,6 @@
 """OpenAlex search tool - citation-aware scholarly search."""
 from typing import Any
 import httpx
@@ -104,6 +105,16 @@ class OpenAlexTool:
             openalex_id = work.get("id", "")
             url = openalex_id if openalex_id else "https://openalex.org"
         # Prepend citation badge to content
         citation_badge = f"[Cited by {cited_by_count}] " if cited_by_count > 0 else ""
         content = f"{citation_badge}{abstract[:1900]}"
@@ -127,6 +138,7 @@ class OpenAlexTool:
                 "concepts": concepts,
                 "is_open_access": is_oa,
                 "pdf_url": pdf_url,
             },
         )

 """OpenAlex search tool - citation-aware scholarly search."""
+import re
 from typing import Any
 import httpx
             openalex_id = work.get("id", "")
             url = openalex_id if openalex_id else "https://openalex.org"
+        # NEW: Extract PMID from ids object for deduplication
+        ids_obj = work.get("ids", {})
+        pmid_url = ids_obj.get("pmid")  # "https://pubmed.ncbi.nlm.nih.gov/29456894"
+        pmid = None
+        if pmid_url and isinstance(pmid_url, str) and "pubmed.ncbi.nlm.nih.gov" in pmid_url:
+            # Extract numeric PMID from URL
+            pmid_match = re.search(r"/(\d+)/?$", pmid_url)
+            if pmid_match:
+                pmid = pmid_match.group(1)
         # Prepend citation badge to content
         citation_badge = f"[Cited by {cited_by_count}] " if cited_by_count > 0 else ""
         content = f"{citation_badge}{abstract[:1900]}"
                 "concepts": concepts,
                 "is_open_access": is_oa,
                 "pdf_url": pdf_url,
+                "pmid": pmid,  # NEW: Store PMID for deduplication
             },
         )

src/tools/search_handler.py CHANGED Viewed

@@ -1,7 +1,8 @@
 """Search handler - orchestrates multiple search tools."""
 import asyncio
-from typing import cast
 import structlog
@@ -9,9 +10,124 @@ from src.tools.base import SearchTool
 from src.utils.exceptions import SearchError
 from src.utils.models import Evidence, SearchResult, SourceName
 logger = structlog.get_logger()
 class SearchHandler:
     """Orchestrates parallel searches across multiple tools."""
@@ -66,6 +182,19 @@ class SearchHandler:
                 sources_searched.append(tool_name)
                 logger.info("Search tool succeeded", tool=tool.name, count=len(success_result))
         return SearchResult(
             query=query,
             evidence=all_evidence,

 """Search handler - orchestrates multiple search tools."""
 import asyncio
+import re
+from typing import TYPE_CHECKING, cast
 import structlog
 from src.utils.exceptions import SearchError
 from src.utils.models import Evidence, SearchResult, SourceName
+if TYPE_CHECKING:
+    from src.utils.models import Evidence
 logger = structlog.get_logger()
+def extract_paper_id(evidence: "Evidence") -> str | None:
+    """Extract unique paper identifier from Evidence.
+    Strategy:
+    1. Check metadata.pmid first (OpenAlex provides this)
+    2. Fall back to URL pattern matching
+    Supports:
+    - PubMed: https://pubmed.ncbi.nlm.nih.gov/12345678/
+    - Europe PMC MED: https://europepmc.org/article/MED/12345678
+    - Europe PMC PMC: https://europepmc.org/article/PMC/PMC1234567
+    - Europe PMC PPR: https://europepmc.org/article/PPR/PPR123456
+    - Europe PMC PAT: https://europepmc.org/article/PAT/WO8601415
+    - DOI: https://doi.org/10.1234/...
+    - OpenAlex: https://openalex.org/W1234567890
+    - ClinicalTrials: https://clinicaltrials.gov/study/NCT12345678
+    - ClinicalTrials (legacy): https://clinicaltrials.gov/ct2/show/NCT12345678
+    """
+    url = evidence.citation.url
+    metadata = evidence.metadata or {}
+    # Strategy 1: Check metadata.pmid (from OpenAlex)
+    if pmid := metadata.get("pmid"):
+        return f"PMID:{pmid}"
+    # Strategy 2: URL pattern matching
+    # PubMed URL pattern
+    pmid_match = re.search(r"pubmed\.ncbi\.nlm\.nih\.gov/(\d+)", url)
+    if pmid_match:
+        return f"PMID:{pmid_match.group(1)}"
+    # Europe PMC MED pattern (same as PMID)
+    epmc_med_match = re.search(r"europepmc\.org/article/MED/(\d+)", url)
+    if epmc_med_match:
+        return f"PMID:{epmc_med_match.group(1)}"
+    # Europe PMC PMC pattern (PubMed Central ID - different from PMID!)
+    epmc_pmc_match = re.search(r"europepmc\.org/article/PMC/(PMC\d+)", url)
+    if epmc_pmc_match:
+        return f"PMCID:{epmc_pmc_match.group(1)}"
+    # Europe PMC PPR pattern (Preprint ID - unique per preprint)
+    epmc_ppr_match = re.search(r"europepmc\.org/article/PPR/(PPR\d+)", url)
+    if epmc_ppr_match:
+        return f"PPRID:{epmc_ppr_match.group(1)}"
+    # Europe PMC PAT pattern (Patent ID - e.g., WO8601415, EP1234567)
+    epmc_pat_match = re.search(r"europepmc\.org/article/PAT/([A-Z]{2}\d+)", url)
+    if epmc_pat_match:
+        return f"PATID:{epmc_pat_match.group(1)}"
+    # DOI pattern (normalize trailing slash/characters)
+    doi_match = re.search(r"doi\.org/(10\.\d+/[^\s\]>]+)", url)
+    if doi_match:
+        doi = doi_match.group(1).rstrip("/")
+        return f"DOI:{doi}"
+    # OpenAlex ID pattern (fallback if no PMID in metadata)
+    openalex_match = re.search(r"openalex\.org/(W\d+)", url)
+    if openalex_match:
+        return f"OAID:{openalex_match.group(1)}"
+    # ClinicalTrials NCT ID (modern format)
+    nct_match = re.search(r"clinicaltrials\.gov/study/(NCT\d+)", url)
+    if nct_match:
+        return f"NCT:{nct_match.group(1)}"
+    # ClinicalTrials NCT ID (legacy format)
+    nct_legacy_match = re.search(r"clinicaltrials\.gov/ct2/show/(NCT\d+)", url)
+    if nct_legacy_match:
+        return f"NCT:{nct_legacy_match.group(1)}"
+    return None
+def deduplicate_evidence(evidence_list: list["Evidence"]) -> list["Evidence"]:
+    """Remove duplicate evidence based on paper ID.
+    Deduplication priority:
+    1. PubMed (authoritative source)
+    2. Europe PMC (full text links)
+    3. OpenAlex (citation data)
+    4. ClinicalTrials (unique, never duplicated)
+    Returns:
+        Deduplicated list preserving source priority order.
+    """
+    seen_ids: set[str] = set()
+    unique: list[Evidence] = []
+    # Sort by source priority (PubMed first)
+    source_priority = {"pubmed": 0, "europepmc": 1, "openalex": 2, "clinicaltrials": 3}
+    sorted_evidence = sorted(
+        evidence_list, key=lambda e: source_priority.get(e.citation.source, 99)
+    )
+    for evidence in sorted_evidence:
+        paper_id = extract_paper_id(evidence)
+        if paper_id is None:
+            # Can't identify - keep it (conservative)
+            unique.append(evidence)
+            continue
+        if paper_id not in seen_ids:
+            seen_ids.add(paper_id)
+            unique.append(evidence)
+    return unique
 class SearchHandler:
     """Orchestrates parallel searches across multiple tools."""
                 sources_searched.append(tool_name)
                 logger.info("Search tool succeeded", tool=tool.name, count=len(success_result))
+        # DEDUPLICATION STEP
+        original_count = len(all_evidence)
+        all_evidence = deduplicate_evidence(all_evidence)
+        dedup_count = original_count - len(all_evidence)
+        if dedup_count > 0:
+            logger.info(
+                "Deduplicated evidence",
+                original=original_count,
+                unique=len(all_evidence),
+                removed=dedup_count,
+            )
         return SearchResult(
             query=query,
             evidence=all_evidence,

tests/unit/tools/test_openalex.py CHANGED Viewed

@@ -38,6 +38,30 @@ SAMPLE_OPENALEX_RESPONSE = {
     ]
 }
 @pytest.mark.unit
 class TestOpenAlexTool:
@@ -144,6 +168,26 @@ class TestOpenAlexTool:
         assert "sildenafil" in params["search"]
         assert params["per_page"] == 3
 @pytest.mark.integration
 class TestOpenAlexIntegration:

     ]
 }
+# Sample response WITH PMID (for deduplication testing)
+SAMPLE_OPENALEX_WITH_PMID = {
+    "results": [
+        {
+            "id": "https://openalex.org/W98765",
+            "doi": "https://doi.org/10.1038/nature12345",
+            "display_name": "Paper with PMID for deduplication",
+            "publication_year": 2023,
+            "cited_by_count": 50,
+            "abstract_inverted_index": {"Test": [0], "abstract": [1]},
+            "concepts": [],
+            "authorships": [],
+            "open_access": {"is_oa": False},
+            "best_oa_location": None,
+            # CRITICAL: ids object with PMID for cross-source deduplication
+            "ids": {
+                "openalex": "https://openalex.org/W98765",
+                "doi": "https://doi.org/10.1038/nature12345",
+                "pmid": "https://pubmed.ncbi.nlm.nih.gov/29456894",
+            },
+        }
+    ]
+}
 @pytest.mark.unit
 class TestOpenAlexTool:
         assert "sildenafil" in params["search"]
         assert params["per_page"] == 3
+    @pytest.mark.asyncio
+    async def test_extracts_pmid_from_ids_object(self, tool: OpenAlexTool, mock_client) -> None:
+        """PMID should be extracted from ids.pmid for cross-source deduplication."""
+        mock_client.get.return_value.json.return_value = SAMPLE_OPENALEX_WITH_PMID
+        results = await tool.search("test", max_results=1)
+        assert len(results) == 1
+        # PMID should be extracted from URL and stored as numeric string
+        assert results[0].metadata["pmid"] == "29456894"
+    @pytest.mark.asyncio
+    async def test_pmid_is_none_when_not_present(self, tool: OpenAlexTool, mock_client) -> None:
+        """PMID should be None when ids.pmid is not in response."""
+        # SAMPLE_OPENALEX_RESPONSE has no ids.pmid field
+        results = await tool.search("sildenafil ED", max_results=1)
+        assert len(results) == 1
+        assert results[0].metadata["pmid"] is None
 @pytest.mark.integration
 class TestOpenAlexIntegration:

tests/unit/tools/test_search_handler.py CHANGED Viewed

@@ -5,43 +5,203 @@ from unittest.mock import AsyncMock, create_autospec
 import pytest
 from src.tools.base import SearchTool
-from src.tools.search_handler import SearchHandler
 from src.utils.exceptions import SearchError
 from src.utils.models import Citation, Evidence
 class TestSearchHandler:
     """Tests for SearchHandler."""
     @pytest.mark.asyncio
-    async def test_execute_aggregates_results(self):
-        """SearchHandler should aggregate results from all tools."""
         # Setup
         mock_tool1 = AsyncMock(spec=SearchTool)
         mock_tool1.name = "pubmed"
         mock_tool1.search.return_value = [
-            Evidence(
-                content="C1",
-                citation=Citation(source="pubmed", title="T1", url="u1", date="2024"),
-            )
         ]
         mock_tool2 = AsyncMock(spec=SearchTool)
-        mock_tool2.name = "clinicaltrials"
         mock_tool2.search.return_value = [
-            Evidence(
-                content="C2",
-                citation=Citation(source="clinicaltrials", title="T2", url="u2", date="2024"),
-            )
         ]
         handler = SearchHandler(tools=[mock_tool1, mock_tool2])
         # Execute
-        result = await handler.execute("testosterone libido", max_results_per_tool=3)
-        assert result.total_found == 2
         assert "pubmed" in result.sources_searched
-        assert "clinicaltrials" in result.sources_searched
     @pytest.mark.asyncio
     async def test_execute_handles_tool_failure(self):
@@ -49,16 +209,11 @@ class TestSearchHandler:
         mock_tool_ok = create_autospec(SearchTool, instance=True)
         mock_tool_ok.name = "pubmed"
         mock_tool_ok.search = AsyncMock(
-            return_value=[
-                Evidence(
-                    content="Good result",
-                    citation=Citation(source="pubmed", title="T", url="u", date="2024"),
-                )
-            ]
         )
         mock_tool_fail = create_autospec(SearchTool, instance=True)
-        mock_tool_fail.name = "pubmed"  # Mocking a second pubmed instance failing
         mock_tool_fail.search = AsyncMock(side_effect=SearchError("API down"))
         handler = SearchHandler(tools=[mock_tool_ok, mock_tool_fail])
@@ -67,22 +222,4 @@ class TestSearchHandler:
         assert result.total_found == 1
         assert "pubmed" in result.sources_searched
         assert len(result.errors) == 1
-        # The error message format is "{tool.name}: {error!s}"
-        assert "pubmed: API down" in result.errors[0]
-    @pytest.mark.asyncio
-    async def test_search_handler_pubmed_only(self):
-        """SearchHandler should work with only PubMed tool."""
-        # This is the specific test requested in Phase 9 spec
-        from src.tools.pubmed import PubMedTool
-        mock_pubmed = AsyncMock(spec=PubMedTool)
-        mock_pubmed.name = "pubmed"
-        mock_pubmed.search.return_value = []
-        handler = SearchHandler(tools=[mock_pubmed], timeout=30.0)
-        result = await handler.execute("testosterone libido", max_results_per_tool=3)
-        assert result.sources_searched == ["pubmed"]
-        assert "web" not in result.sources_searched
-        assert len(result.errors) == 0

 import pytest
 from src.tools.base import SearchTool
+from src.tools.search_handler import SearchHandler, deduplicate_evidence, extract_paper_id
 from src.utils.exceptions import SearchError
 from src.utils.models import Citation, Evidence
+def _make_evidence(source: str, url: str, metadata: dict | None = None) -> Evidence:
+    """Helper to create Evidence objects for testing."""
+    return Evidence(
+        content="Test content",
+        citation=Citation(
+            source=source,
+            title="Test",
+            url=url,
+            date="2024",
+            authors=[],
+        ),
+        metadata=metadata or {},
+    )
+class TestExtractPaperId:
+    """Tests for paper ID extraction from Evidence objects."""
+    def test_extracts_pubmed_id(self) -> None:
+        evidence = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
+        assert extract_paper_id(evidence) == "PMID:12345678"
+    def test_extracts_europepmc_med_id(self) -> None:
+        evidence = _make_evidence("europepmc", "https://europepmc.org/article/MED/12345678")
+        assert extract_paper_id(evidence) == "PMID:12345678"
+    def test_extracts_europepmc_pmc_id(self) -> None:
+        """Europe PMC PMC articles have different ID format."""
+        evidence = _make_evidence("europepmc", "https://europepmc.org/article/PMC/PMC7654321")
+        assert extract_paper_id(evidence) == "PMCID:PMC7654321"
+    def test_extracts_europepmc_ppr_id(self) -> None:
+        """Europe PMC preprints have PPR IDs."""
+        evidence = _make_evidence("europepmc", "https://europepmc.org/article/PPR/PPR123456")
+        assert extract_paper_id(evidence) == "PPRID:PPR123456"
+    def test_extracts_europepmc_pat_id(self) -> None:
+        """Europe PMC patents have PAT IDs (WIPO format)."""
+        evidence = _make_evidence("europepmc", "https://europepmc.org/article/PAT/WO8601415")
+        assert extract_paper_id(evidence) == "PATID:WO8601415"
+    def test_extracts_europepmc_pat_id_eu_format(self) -> None:
+        """European patent format should also work."""
+        evidence = _make_evidence("europepmc", "https://europepmc.org/article/PAT/EP1234567")
+        assert extract_paper_id(evidence) == "PATID:EP1234567"
+    def test_extracts_doi(self) -> None:
+        evidence = _make_evidence("pubmed", "https://doi.org/10.1038/nature12345")
+        assert extract_paper_id(evidence) == "DOI:10.1038/nature12345"
+    def test_extracts_doi_with_trailing_slash(self) -> None:
+        """DOIs should be normalized (trailing slash removed)."""
+        evidence = _make_evidence("pubmed", "https://doi.org/10.1038/nature12345/")
+        assert extract_paper_id(evidence) == "DOI:10.1038/nature12345"
+    def test_extracts_openalex_id_from_url(self) -> None:
+        """OpenAlex ID from URL (fallback when no PMID in metadata)."""
+        evidence = _make_evidence("openalex", "https://openalex.org/W1234567890")
+        assert extract_paper_id(evidence) == "OAID:W1234567890"
+    def test_extracts_openalex_pmid_from_metadata(self) -> None:
+        """OpenAlex PMID from metadata takes priority over URL."""
+        evidence = _make_evidence(
+            "openalex",
+            "https://openalex.org/W1234567890",
+            metadata={"pmid": "98765432"},
+        )
+        assert extract_paper_id(evidence) == "PMID:98765432"
+    def test_extracts_nct_id_modern(self) -> None:
+        evidence = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/study/NCT12345678")
+        assert extract_paper_id(evidence) == "NCT:NCT12345678"
+    def test_extracts_nct_id_legacy(self) -> None:
+        """Legacy ClinicalTrials.gov URL format should also work."""
+        evidence = _make_evidence(
+            "clinicaltrials", "https://clinicaltrials.gov/ct2/show/NCT12345678"
+        )
+        assert extract_paper_id(evidence) == "NCT:NCT12345678"
+    def test_returns_none_for_unknown_url(self) -> None:
+        evidence = _make_evidence("web", "https://example.com/unknown")
+        assert extract_paper_id(evidence) is None
+class TestDeduplicateEvidence:
+    """Tests for evidence deduplication."""
+    def test_removes_pubmed_europepmc_duplicate(self) -> None:
+        """Same paper from PubMed and Europe PMC should dedupe to PubMed."""
+        pubmed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
+        europepmc = _make_evidence("europepmc", "https://europepmc.org/article/MED/12345678")
+        result = deduplicate_evidence([pubmed, europepmc])
+        assert len(result) == 1
+        assert result[0].citation.source == "pubmed"
+    def test_removes_pubmed_openalex_duplicate_via_metadata(self) -> None:
+        """OpenAlex with PMID in metadata should dedupe against PubMed."""
+        pubmed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
+        openalex = _make_evidence(
+            "openalex",
+            "https://openalex.org/W9999999",
+            metadata={"pmid": "12345678", "cited_by_count": 100},
+        )
+        result = deduplicate_evidence([pubmed, openalex])
+        assert len(result) == 1
+        assert result[0].citation.source == "pubmed"
+    def test_preserves_unique_evidence(self) -> None:
+        """Different papers should not be deduplicated."""
+        e1 = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/11111111/")
+        e2 = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/22222222/")
+        result = deduplicate_evidence([e1, e2])
+        assert len(result) == 2
+    def test_preserves_openalex_without_pmid(self) -> None:
+        """OpenAlex papers without PMID should NOT be deduplicated against PubMed."""
+        pubmed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
+        openalex_no_pmid = _make_evidence(
+            "openalex",
+            "https://openalex.org/W9999999",
+            metadata={"cited_by_count": 100},  # No pmid key
+        )
+        result = deduplicate_evidence([pubmed, openalex_no_pmid])
+        assert len(result) == 2  # Both preserved (different IDs)
+    def test_keeps_unidentifiable_evidence(self) -> None:
+        """Evidence with unrecognized URLs should be preserved."""
+        unknown = _make_evidence("web", "https://example.com/paper/123")
+        result = deduplicate_evidence([unknown])
+        assert len(result) == 1
+    def test_clinicaltrials_unique_per_nct(self) -> None:
+        """ClinicalTrials entries have unique NCT IDs."""
+        trial1 = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/study/NCT11111111")
+        trial2 = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/study/NCT22222222")
+        result = deduplicate_evidence([trial1, trial2])
+        assert len(result) == 2
+    def test_preprints_preserved_separately(self) -> None:
+        """Preprints (PPR IDs) should not dedupe against peer-reviewed papers."""
+        peer_reviewed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
+        preprint = _make_evidence("europepmc", "https://europepmc.org/article/PPR/PPR999999")
+        result = deduplicate_evidence([peer_reviewed, preprint])
+        assert len(result) == 2  # Both preserved (different ID types)
 class TestSearchHandler:
     """Tests for SearchHandler."""
     @pytest.mark.asyncio
+    async def test_execute_aggregates_and_deduplicates(self):
+        """SearchHandler should aggregate results and deduplicate them."""
         # Setup
         mock_tool1 = AsyncMock(spec=SearchTool)
         mock_tool1.name = "pubmed"
         mock_tool1.search.return_value = [
+            _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
         ]
         mock_tool2 = AsyncMock(spec=SearchTool)
+        mock_tool2.name = "europepmc"
+        # Duplicate of the pubmed result
         mock_tool2.search.return_value = [
+            _make_evidence("europepmc", "https://europepmc.org/article/MED/12345678")
         ]
         handler = SearchHandler(tools=[mock_tool1, mock_tool2])
         # Execute
+        result = await handler.execute("test")
+        # Should only have 1 result after deduplication
+        assert result.total_found == 1
+        assert len(result.evidence) == 1
+        assert result.evidence[0].citation.source == "pubmed"  # Priority source kept
         assert "pubmed" in result.sources_searched
+        assert "europepmc" in result.sources_searched
     @pytest.mark.asyncio
     async def test_execute_handles_tool_failure(self):
         mock_tool_ok = create_autospec(SearchTool, instance=True)
         mock_tool_ok.name = "pubmed"
         mock_tool_ok.search = AsyncMock(
+            return_value=[_make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")]
         )
         mock_tool_fail = create_autospec(SearchTool, instance=True)
+        mock_tool_fail.name = "clinicaltrials"
         mock_tool_fail.search = AsyncMock(side_effect=SearchError("API down"))
         handler = SearchHandler(tools=[mock_tool_ok, mock_tool_fail])
         assert result.total_found == 1
         assert "pubmed" in result.sources_searched
         assert len(result.errors) == 1
+        assert "clinicaltrials: API down" in result.errors[0]