Spaces:
Running
feat(search): SPEC_13 Evidence Deduplication (#98)
Browse files* feat(search): implement evidence deduplication (SPEC_13)
- Add PMID extraction to OpenAlex tool (from ids.pmid)
- Implement deduplicate_evidence and extract_paper_id in SearchHandler
- Support deduplication across PubMed, Europe PMC (MED, PMC, PPR, PAT), OpenAlex, and ClinicalTrials
- Add comprehensive unit tests for deduplication logic
* test: complete SPEC_13 test coverage (senior review findings)
Missing tests added per spec:
- test_extracts_europepmc_pat_id_eu_format (EP patent format)
- test_preserves_openalex_without_pmid (critical edge case)
- test_keeps_unidentifiable_evidence (conservative behavior)
- test_clinicaltrials_unique_per_nct (NCT uniqueness)
- test_preprints_preserved_separately (PPR vs PMID)
- test_extracts_pmid_from_ids_object (OpenAlex PMID extraction)
- test_pmid_is_none_when_not_present (null case)
Code improvements:
- Move `import re` to module level in openalex.py (efficiency)
- Add SAMPLE_OPENALEX_WITH_PMID fixture for cross-dedup testing
All 35 tests pass. 100% spec compliance.
* fix: add defensive isinstance check for pmid_url
Per CodeRabbit review - guards against unexpected API response types
that could cause TypeError when checking string containment.
- docs/bugs/ACTIVE_BUGS.md +4 -2
- docs/bugs/{P0_ORCHESTRATOR_DEDUP_AND_JUDGE_BUGS.md β archive/P0_ORCHESTRATOR_DEDUP_AND_JUDGE_BUGS.md} +0 -0
- docs/bugs/{P0_SIMPLE_MODE_NEVER_SYNTHESIZES.md β archive/P0_SIMPLE_MODE_NEVER_SYNTHESIZES.md} +0 -0
- docs/bugs/{P1_NARRATIVE_SYNTHESIS_FALLBACK.md β archive/P1_NARRATIVE_SYNTHESIS_FALLBACK.md} +0 -0
- docs/bugs/{P2_GRADIO_EXAMPLE_NOT_FILLING.md β archive/P2_GRADIO_EXAMPLE_NOT_FILLING.md} +0 -0
- docs/bugs/{P3_ARCHITECTURAL_GAP_EPHEMERAL_MEMORY.md β archive/P3_ARCHITECTURAL_GAP_EPHEMERAL_MEMORY.md} +0 -0
- docs/bugs/{P3_ARCHITECTURAL_GAP_STRUCTURED_MEMORY.md β archive/P3_ARCHITECTURAL_GAP_STRUCTURED_MEMORY.md} +0 -0
- docs/bugs/{P3_MAGENTIC_NO_TERMINATION_EVENT.md β archive/P3_MAGENTIC_NO_TERMINATION_EVENT.md} +0 -0
- docs/specs/SPEC_13_EVIDENCE_DEDUPLICATION.md +566 -0
- docs/specs/SPEC_14_CLINICALTRIALS_OUTCOMES.md +466 -0
- docs/specs/SPEC_15_ADVANCED_MODE_PERFORMANCE.md +478 -0
- docs/specs/{SPEC_01_DEMO_TERMINATION.md β archive/SPEC_01_DEMO_TERMINATION.md} +0 -0
- docs/specs/{SPEC_02_E2E_TESTING.md β archive/SPEC_02_E2E_TESTING.md} +0 -0
- docs/specs/{SPEC_03_OPENALEX_INTEGRATION.md β archive/SPEC_03_OPENALEX_INTEGRATION.md} +0 -0
- docs/specs/{SPEC_04_MAGENTIC_UX.md β archive/SPEC_04_MAGENTIC_UX.md} +0 -0
- docs/specs/{SPEC_05_ORCHESTRATOR_CLEANUP.md β archive/SPEC_05_ORCHESTRATOR_CLEANUP.md} +0 -0
- docs/specs/{SPEC_06_SIMPLE_MODE_SYNTHESIS.md β archive/SPEC_06_SIMPLE_MODE_SYNTHESIS.md} +0 -0
- docs/specs/{SPEC_07_LANGGRAPH_MEMORY_ARCH.md β archive/SPEC_07_LANGGRAPH_MEMORY_ARCH.md} +0 -0
- docs/specs/{SPEC_08_INTEGRATE_MEMORY_LAYER.md β archive/SPEC_08_INTEGRATE_MEMORY_LAYER.md} +0 -0
- docs/specs/{SPEC_09_LLAMAINDEX_INTEGRATION.md β archive/SPEC_09_LLAMAINDEX_INTEGRATION.md} +0 -0
- docs/specs/{SPEC_10_DOMAIN_AGNOSTIC_REFACTOR.md β archive/SPEC_10_DOMAIN_AGNOSTIC_REFACTOR.md} +0 -0
- docs/specs/{SPEC_11_SEXUAL_HEALTH_FOCUS.md β archive/SPEC_11_SEXUAL_HEALTH_FOCUS.md} +0 -0
- docs/specs/{SPEC_12_NARRATIVE_SYNTHESIS.md β archive/SPEC_12_NARRATIVE_SYNTHESIS.md} +0 -0
- src/tools/openalex.py +12 -0
- src/tools/search_handler.py +130 -1
- tests/unit/tools/test_openalex.py +44 -0
- tests/unit/tools/test_search_handler.py +178 -41
|
@@ -1,6 +1,8 @@
|
|
| 1 |
# Active Bugs
|
| 2 |
|
| 3 |
> Last updated: 2025-11-30
|
|
|
|
|
|
|
| 4 |
|
| 5 |
## P0 - Blocker
|
| 6 |
|
|
@@ -10,8 +12,8 @@
|
|
| 10 |
|
| 11 |
## P1 - Important
|
| 12 |
|
| 13 |
-
### P1 - Narrative Synthesis Falls Back to Template
|
| 14 |
-
**File:** `P1_NARRATIVE_SYNTHESIS_FALLBACK.md`
|
| 15 |
**Related:** SPEC_12 (implemented but falling back)
|
| 16 |
|
| 17 |
**Problem:** Users see bullet-point template output instead of LLM-generated narrative prose.
|
|
|
|
| 1 |
# Active Bugs
|
| 2 |
|
| 3 |
> Last updated: 2025-11-30
|
| 4 |
+
>
|
| 5 |
+
> **Note:** Completed bug docs archived to `docs/bugs/archive/`
|
| 6 |
|
| 7 |
## P0 - Blocker
|
| 8 |
|
|
|
|
| 12 |
|
| 13 |
## P1 - Important
|
| 14 |
|
| 15 |
+
### P1 - Narrative Synthesis Falls Back to Template
|
| 16 |
+
**File:** `archive/P1_NARRATIVE_SYNTHESIS_FALLBACK.md`
|
| 17 |
**Related:** SPEC_12 (implemented but falling back)
|
| 18 |
|
| 19 |
**Problem:** Users see bullet-point template output instead of LLM-generated narrative prose.
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
@@ -0,0 +1,566 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# SPEC_13: Evidence Deduplication in SearchHandler
|
| 2 |
+
|
| 3 |
+
**Status**: Draft (Validated via API Documentation Review)
|
| 4 |
+
**Priority**: P1
|
| 5 |
+
**GitHub Issue**: #94
|
| 6 |
+
**Estimated Effort**: Medium (~100 lines of code, includes OpenAlex metadata extraction)
|
| 7 |
+
**Last Updated**: 2025-11-30
|
| 8 |
+
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
## Problem Statement
|
| 12 |
+
|
| 13 |
+
DeepBoner searches 4 sources in parallel:
|
| 14 |
+
1. **PubMed** - NCBI's biomedical literature database
|
| 15 |
+
2. **ClinicalTrials.gov** - Clinical trial registry
|
| 16 |
+
3. **Europe PMC** - Indexes **ALL of PubMed/MEDLINE** plus preprints
|
| 17 |
+
4. **OpenAlex** - Indexes **250M+ works** including most of PubMed
|
| 18 |
+
|
| 19 |
+
**Result**: The same paper appears 2-3 times in evidence because:
|
| 20 |
+
- Europe PMC includes 100% of PubMed content
|
| 21 |
+
- OpenAlex includes ~90% of PubMed content
|
| 22 |
+
|
| 23 |
+
### Impact
|
| 24 |
+
|
| 25 |
+
| Metric | Current | After Fix |
|
| 26 |
+
|--------|---------|-----------|
|
| 27 |
+
| Duplicate evidence | 30-50% | <5% |
|
| 28 |
+
| Token waste | High | Low |
|
| 29 |
+
| Judge confusion | Yes | No |
|
| 30 |
+
| Report redundancy | Yes | No |
|
| 31 |
+
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
## API Documentation Review (2025-11-30)
|
| 35 |
+
|
| 36 |
+
### OpenAlex Work Object - `ids` Field
|
| 37 |
+
|
| 38 |
+
**Source**: [OpenAlex Work Object Docs](https://docs.openalex.org/api-entities/works/work-object)
|
| 39 |
+
|
| 40 |
+
OpenAlex returns PMIDs in the `ids` field as **full URLs** (validated via live API testing 2025-11-30):
|
| 41 |
+
```json
|
| 42 |
+
{
|
| 43 |
+
"ids": {
|
| 44 |
+
"openalex": "https://openalex.org/W2741809807",
|
| 45 |
+
"doi": "https://doi.org/10.7717/peerj.4375",
|
| 46 |
+
"pmid": "https://pubmed.ncbi.nlm.nih.gov/29456894"
|
| 47 |
+
}
|
| 48 |
+
}
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
**Live API Test (2025-11-30):**
|
| 52 |
+
```bash
|
| 53 |
+
curl "https://api.openalex.org/works?filter=ids.pmid:29456894&select=id,ids"
|
| 54 |
+
# Returns: "pmid": "https://pubmed.ncbi.nlm.nih.gov/29456894" (always URL format, never just number)
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
**Current Issue**: Our `openalex.py` does NOT extract `ids.pmid` - it only uses DOI.
|
| 58 |
+
**Fix Required**: Extract PMID from OpenAlex response URL and store numeric PMID in `Evidence.metadata`.
|
| 59 |
+
|
| 60 |
+
### Europe PMC URL Patterns
|
| 61 |
+
|
| 62 |
+
**Source**: [Europe PMC Help](https://europepmc.org/help)
|
| 63 |
+
|
| 64 |
+
Four URL patterns exist:
|
| 65 |
+
- `/article/MED/{PMID}` - PubMed/MEDLINE records
|
| 66 |
+
- `/article/PMC/{PMCID}` - PubMed Central full text
|
| 67 |
+
- `/article/PPR/{PPRID}` - Preprints (medRxiv, bioRxiv, etc.)
|
| 68 |
+
- `/article/PAT/{PatentID}` - Patents (e.g., `PAT/WO8601415`, `PAT/EP1234567`)
|
| 69 |
+
|
| 70 |
+
**Live API Test (2025-11-30):**
|
| 71 |
+
```bash
|
| 72 |
+
curl "https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=SRC:PAT&pageSize=1&format=json"
|
| 73 |
+
# Returns: source: "PAT", id: "WO8601415", pmid: None
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
**Current Behavior**: Our `europepmc.py:92-95` already uses PubMed URL when PMID exists:
|
| 77 |
+
```python
|
| 78 |
+
if doi:
|
| 79 |
+
url = f"https://doi.org/{doi}"
|
| 80 |
+
elif result.get("pmid"):
|
| 81 |
+
url = f"https://pubmed.ncbi.nlm.nih.gov/{result['pmid']}/" # β Good!
|
| 82 |
+
else:
|
| 83 |
+
url = f"https://europepmc.org/article/{source_db}/{result.get('id', '')}"
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
+
**Implications**:
|
| 87 |
+
1. For MEDLINE records with PMIDs β PubMed URL used β deduplication works
|
| 88 |
+
2. For PMC/PPR/PAT records without PMIDs β Europe PMC URL used β treated as unique (correct behavior)
|
| 89 |
+
|
| 90 |
+
### ClinicalTrials.gov URL Patterns
|
| 91 |
+
|
| 92 |
+
Two URL formats exist:
|
| 93 |
+
- Modern: `/study/NCT12345678` (current API v2)
|
| 94 |
+
- Legacy: `/ct2/show/NCT12345678` (deprecated but may exist in old data)
|
| 95 |
+
|
| 96 |
+
---
|
| 97 |
+
|
| 98 |
+
## Current Code
|
| 99 |
+
|
| 100 |
+
```python
|
| 101 |
+
# src/tools/search_handler.py:51-62
|
| 102 |
+
for tool, result in zip(self.tools, results, strict=True):
|
| 103 |
+
if isinstance(result, Exception):
|
| 104 |
+
errors.append(f"{tool.name}: {result!s}")
|
| 105 |
+
else:
|
| 106 |
+
success_result = cast(list[Evidence], result)
|
| 107 |
+
all_evidence.extend(success_result) # β NO DEDUPLICATION
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
---
|
| 111 |
+
|
| 112 |
+
## Proposed Solution (Two Parts)
|
| 113 |
+
|
| 114 |
+
### Part 1: Extract PMID from OpenAlex (REQUIRED for cross-source dedup)
|
| 115 |
+
|
| 116 |
+
```python
|
| 117 |
+
# src/tools/openalex.py - Update _to_evidence() method
|
| 118 |
+
|
| 119 |
+
def _to_evidence(self, work: dict[str, Any]) -> Evidence:
|
| 120 |
+
"""Convert OpenAlex work to Evidence with rich metadata."""
|
| 121 |
+
# ... existing code ...
|
| 122 |
+
|
| 123 |
+
# NEW: Extract PMID from ids object for deduplication
|
| 124 |
+
ids_obj = work.get("ids", {})
|
| 125 |
+
pmid_url = ids_obj.get("pmid") # "https://pubmed.ncbi.nlm.nih.gov/29456894"
|
| 126 |
+
pmid = None
|
| 127 |
+
if pmid_url and "pubmed.ncbi.nlm.nih.gov" in pmid_url:
|
| 128 |
+
# Extract numeric PMID from URL
|
| 129 |
+
import re
|
| 130 |
+
pmid_match = re.search(r'/(\d+)/?$', pmid_url)
|
| 131 |
+
if pmid_match:
|
| 132 |
+
pmid = pmid_match.group(1)
|
| 133 |
+
|
| 134 |
+
# ... rest of existing code ...
|
| 135 |
+
|
| 136 |
+
return Evidence(
|
| 137 |
+
content=content[:2000],
|
| 138 |
+
citation=Citation(
|
| 139 |
+
source="openalex",
|
| 140 |
+
title=title[:500],
|
| 141 |
+
url=url,
|
| 142 |
+
date=str(year),
|
| 143 |
+
authors=authors,
|
| 144 |
+
),
|
| 145 |
+
relevance=relevance,
|
| 146 |
+
metadata={
|
| 147 |
+
"cited_by_count": cited_by_count,
|
| 148 |
+
"concepts": concepts,
|
| 149 |
+
"is_open_access": is_oa,
|
| 150 |
+
"pdf_url": pdf_url,
|
| 151 |
+
"pmid": pmid, # NEW: Store PMID for deduplication
|
| 152 |
+
},
|
| 153 |
+
)
|
| 154 |
+
```
|
| 155 |
+
|
| 156 |
+
### Part 2: Enhanced Deduplication Function
|
| 157 |
+
|
| 158 |
+
```python
|
| 159 |
+
# src/tools/search_handler.py
|
| 160 |
+
|
| 161 |
+
import re
|
| 162 |
+
from typing import TYPE_CHECKING
|
| 163 |
+
|
| 164 |
+
if TYPE_CHECKING:
|
| 165 |
+
from src.utils.models import Evidence
|
| 166 |
+
|
| 167 |
+
|
| 168 |
+
def extract_paper_id(evidence: "Evidence") -> str | None:
|
| 169 |
+
"""Extract unique paper identifier from Evidence.
|
| 170 |
+
|
| 171 |
+
Strategy:
|
| 172 |
+
1. Check metadata.pmid first (OpenAlex provides this)
|
| 173 |
+
2. Fall back to URL pattern matching
|
| 174 |
+
|
| 175 |
+
Supports:
|
| 176 |
+
- PubMed: https://pubmed.ncbi.nlm.nih.gov/12345678/
|
| 177 |
+
- Europe PMC MED: https://europepmc.org/article/MED/12345678
|
| 178 |
+
- Europe PMC PMC: https://europepmc.org/article/PMC/PMC1234567
|
| 179 |
+
- Europe PMC PPR: https://europepmc.org/article/PPR/PPR123456
|
| 180 |
+
- Europe PMC PAT: https://europepmc.org/article/PAT/WO8601415
|
| 181 |
+
- DOI: https://doi.org/10.1234/...
|
| 182 |
+
- OpenAlex: https://openalex.org/W1234567890
|
| 183 |
+
- ClinicalTrials: https://clinicaltrials.gov/study/NCT12345678
|
| 184 |
+
- ClinicalTrials (legacy): https://clinicaltrials.gov/ct2/show/NCT12345678
|
| 185 |
+
"""
|
| 186 |
+
url = evidence.citation.url
|
| 187 |
+
metadata = evidence.metadata or {}
|
| 188 |
+
|
| 189 |
+
# Strategy 1: Check metadata.pmid (from OpenAlex)
|
| 190 |
+
if pmid := metadata.get("pmid"):
|
| 191 |
+
return f"PMID:{pmid}"
|
| 192 |
+
|
| 193 |
+
# Strategy 2: URL pattern matching
|
| 194 |
+
|
| 195 |
+
# PubMed URL pattern
|
| 196 |
+
pmid_match = re.search(r'pubmed\.ncbi\.nlm\.nih\.gov/(\d+)', url)
|
| 197 |
+
if pmid_match:
|
| 198 |
+
return f"PMID:{pmid_match.group(1)}"
|
| 199 |
+
|
| 200 |
+
# Europe PMC MED pattern (same as PMID)
|
| 201 |
+
epmc_med_match = re.search(r'europepmc\.org/article/MED/(\d+)', url)
|
| 202 |
+
if epmc_med_match:
|
| 203 |
+
return f"PMID:{epmc_med_match.group(1)}"
|
| 204 |
+
|
| 205 |
+
# Europe PMC PMC pattern (PubMed Central ID - different from PMID!)
|
| 206 |
+
epmc_pmc_match = re.search(r'europepmc\.org/article/PMC/(PMC\d+)', url)
|
| 207 |
+
if epmc_pmc_match:
|
| 208 |
+
return f"PMCID:{epmc_pmc_match.group(1)}"
|
| 209 |
+
|
| 210 |
+
# Europe PMC PPR pattern (Preprint ID - unique per preprint)
|
| 211 |
+
epmc_ppr_match = re.search(r'europepmc\.org/article/PPR/(PPR\d+)', url)
|
| 212 |
+
if epmc_ppr_match:
|
| 213 |
+
return f"PPRID:{epmc_ppr_match.group(1)}"
|
| 214 |
+
|
| 215 |
+
# Europe PMC PAT pattern (Patent ID - e.g., WO8601415, EP1234567)
|
| 216 |
+
epmc_pat_match = re.search(r'europepmc\.org/article/PAT/([A-Z]{2}\d+)', url)
|
| 217 |
+
if epmc_pat_match:
|
| 218 |
+
return f"PATID:{epmc_pat_match.group(1)}"
|
| 219 |
+
|
| 220 |
+
# DOI pattern (normalize trailing slash/characters)
|
| 221 |
+
doi_match = re.search(r'doi\.org/(10\.\d+/[^\s\]>]+)', url)
|
| 222 |
+
if doi_match:
|
| 223 |
+
doi = doi_match.group(1).rstrip('/')
|
| 224 |
+
return f"DOI:{doi}"
|
| 225 |
+
|
| 226 |
+
# OpenAlex ID pattern (fallback if no PMID in metadata)
|
| 227 |
+
openalex_match = re.search(r'openalex\.org/(W\d+)', url)
|
| 228 |
+
if openalex_match:
|
| 229 |
+
return f"OAID:{openalex_match.group(1)}"
|
| 230 |
+
|
| 231 |
+
# ClinicalTrials NCT ID (modern format)
|
| 232 |
+
nct_match = re.search(r'clinicaltrials\.gov/study/(NCT\d+)', url)
|
| 233 |
+
if nct_match:
|
| 234 |
+
return f"NCT:{nct_match.group(1)}"
|
| 235 |
+
|
| 236 |
+
# ClinicalTrials NCT ID (legacy format)
|
| 237 |
+
nct_legacy_match = re.search(r'clinicaltrials\.gov/ct2/show/(NCT\d+)', url)
|
| 238 |
+
if nct_legacy_match:
|
| 239 |
+
return f"NCT:{nct_legacy_match.group(1)}"
|
| 240 |
+
|
| 241 |
+
return None
|
| 242 |
+
|
| 243 |
+
|
| 244 |
+
def deduplicate_evidence(evidence_list: list[Evidence]) -> list[Evidence]:
|
| 245 |
+
"""Remove duplicate evidence based on paper ID.
|
| 246 |
+
|
| 247 |
+
Deduplication priority:
|
| 248 |
+
1. PubMed (authoritative source)
|
| 249 |
+
2. Europe PMC (full text links)
|
| 250 |
+
3. OpenAlex (citation data)
|
| 251 |
+
4. ClinicalTrials (unique, never duplicated)
|
| 252 |
+
|
| 253 |
+
Returns:
|
| 254 |
+
Deduplicated list preserving source priority order.
|
| 255 |
+
"""
|
| 256 |
+
seen_ids: set[str] = set()
|
| 257 |
+
unique: list[Evidence] = []
|
| 258 |
+
|
| 259 |
+
# Sort by source priority (PubMed first)
|
| 260 |
+
source_priority = {"pubmed": 0, "europepmc": 1, "openalex": 2, "clinicaltrials": 3}
|
| 261 |
+
sorted_evidence = sorted(
|
| 262 |
+
evidence_list,
|
| 263 |
+
key=lambda e: source_priority.get(e.citation.source, 99)
|
| 264 |
+
)
|
| 265 |
+
|
| 266 |
+
for evidence in sorted_evidence:
|
| 267 |
+
paper_id = extract_paper_id(evidence.citation.url)
|
| 268 |
+
|
| 269 |
+
if paper_id is None:
|
| 270 |
+
# Can't identify - keep it (conservative)
|
| 271 |
+
unique.append(evidence)
|
| 272 |
+
continue
|
| 273 |
+
|
| 274 |
+
if paper_id not in seen_ids:
|
| 275 |
+
seen_ids.add(paper_id)
|
| 276 |
+
unique.append(evidence)
|
| 277 |
+
|
| 278 |
+
return unique
|
| 279 |
+
```
|
| 280 |
+
|
| 281 |
+
### 2. Integrate into SearchHandler.execute()
|
| 282 |
+
|
| 283 |
+
```python
|
| 284 |
+
# src/tools/search_handler.py:execute() - AFTER gathering results
|
| 285 |
+
|
| 286 |
+
# ... existing code to collect all_evidence ...
|
| 287 |
+
|
| 288 |
+
# DEDUPLICATION STEP
|
| 289 |
+
original_count = len(all_evidence)
|
| 290 |
+
all_evidence = deduplicate_evidence(all_evidence)
|
| 291 |
+
dedup_count = original_count - len(all_evidence)
|
| 292 |
+
|
| 293 |
+
if dedup_count > 0:
|
| 294 |
+
logger.info(
|
| 295 |
+
"Deduplicated evidence",
|
| 296 |
+
original=original_count,
|
| 297 |
+
unique=len(all_evidence),
|
| 298 |
+
removed=dedup_count,
|
| 299 |
+
)
|
| 300 |
+
|
| 301 |
+
return SearchResult(
|
| 302 |
+
query=query,
|
| 303 |
+
evidence=all_evidence, # Now deduplicated
|
| 304 |
+
sources_searched=sources_searched,
|
| 305 |
+
total_found=len(all_evidence),
|
| 306 |
+
errors=errors,
|
| 307 |
+
)
|
| 308 |
+
```
|
| 309 |
+
|
| 310 |
+
---
|
| 311 |
+
|
| 312 |
+
## Test Plan
|
| 313 |
+
|
| 314 |
+
### Unit Tests (`tests/unit/tools/test_search_handler.py`)
|
| 315 |
+
|
| 316 |
+
```python
|
| 317 |
+
import pytest
|
| 318 |
+
from src.tools.search_handler import extract_paper_id, deduplicate_evidence
|
| 319 |
+
from src.utils.models import Citation, Evidence
|
| 320 |
+
|
| 321 |
+
|
| 322 |
+
def _make_evidence(source: str, url: str, metadata: dict | None = None) -> Evidence:
|
| 323 |
+
"""Helper to create Evidence objects for testing."""
|
| 324 |
+
return Evidence(
|
| 325 |
+
content="Test content",
|
| 326 |
+
citation=Citation(
|
| 327 |
+
source=source,
|
| 328 |
+
title="Test",
|
| 329 |
+
url=url,
|
| 330 |
+
date="2024",
|
| 331 |
+
authors=[],
|
| 332 |
+
),
|
| 333 |
+
metadata=metadata,
|
| 334 |
+
)
|
| 335 |
+
|
| 336 |
+
|
| 337 |
+
class TestExtractPaperId:
|
| 338 |
+
"""Tests for paper ID extraction from Evidence objects."""
|
| 339 |
+
|
| 340 |
+
def test_extracts_pubmed_id(self) -> None:
|
| 341 |
+
evidence = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
|
| 342 |
+
assert extract_paper_id(evidence) == "PMID:12345678"
|
| 343 |
+
|
| 344 |
+
def test_extracts_europepmc_med_id(self) -> None:
|
| 345 |
+
evidence = _make_evidence("europepmc", "https://europepmc.org/article/MED/12345678")
|
| 346 |
+
assert extract_paper_id(evidence) == "PMID:12345678"
|
| 347 |
+
|
| 348 |
+
def test_extracts_europepmc_pmc_id(self) -> None:
|
| 349 |
+
"""Europe PMC PMC articles have different ID format."""
|
| 350 |
+
evidence = _make_evidence("europepmc", "https://europepmc.org/article/PMC/PMC7654321")
|
| 351 |
+
assert extract_paper_id(evidence) == "PMCID:PMC7654321"
|
| 352 |
+
|
| 353 |
+
def test_extracts_europepmc_ppr_id(self) -> None:
|
| 354 |
+
"""Europe PMC preprints have PPR IDs."""
|
| 355 |
+
evidence = _make_evidence("europepmc", "https://europepmc.org/article/PPR/PPR123456")
|
| 356 |
+
assert extract_paper_id(evidence) == "PPRID:PPR123456"
|
| 357 |
+
|
| 358 |
+
def test_extracts_europepmc_pat_id(self) -> None:
|
| 359 |
+
"""Europe PMC patents have PAT IDs (e.g., WO8601415, EP1234567)."""
|
| 360 |
+
evidence = _make_evidence("europepmc", "https://europepmc.org/article/PAT/WO8601415")
|
| 361 |
+
assert extract_paper_id(evidence) == "PATID:WO8601415"
|
| 362 |
+
|
| 363 |
+
def test_extracts_europepmc_pat_id_eu_format(self) -> None:
|
| 364 |
+
"""European patent format should also work."""
|
| 365 |
+
evidence = _make_evidence("europepmc", "https://europepmc.org/article/PAT/EP1234567")
|
| 366 |
+
assert extract_paper_id(evidence) == "PATID:EP1234567"
|
| 367 |
+
|
| 368 |
+
def test_extracts_doi(self) -> None:
|
| 369 |
+
evidence = _make_evidence("pubmed", "https://doi.org/10.1038/nature12345")
|
| 370 |
+
assert extract_paper_id(evidence) == "DOI:10.1038/nature12345"
|
| 371 |
+
|
| 372 |
+
def test_extracts_doi_with_trailing_slash(self) -> None:
|
| 373 |
+
"""DOIs should be normalized (trailing slash removed)."""
|
| 374 |
+
evidence = _make_evidence("pubmed", "https://doi.org/10.1038/nature12345/")
|
| 375 |
+
assert extract_paper_id(evidence) == "DOI:10.1038/nature12345"
|
| 376 |
+
|
| 377 |
+
def test_extracts_openalex_id_from_url(self) -> None:
|
| 378 |
+
"""OpenAlex ID from URL (fallback when no PMID in metadata)."""
|
| 379 |
+
evidence = _make_evidence("openalex", "https://openalex.org/W1234567890")
|
| 380 |
+
assert extract_paper_id(evidence) == "OAID:W1234567890"
|
| 381 |
+
|
| 382 |
+
def test_extracts_openalex_pmid_from_metadata(self) -> None:
|
| 383 |
+
"""OpenAlex PMID from metadata takes priority over URL."""
|
| 384 |
+
evidence = _make_evidence(
|
| 385 |
+
"openalex",
|
| 386 |
+
"https://openalex.org/W1234567890",
|
| 387 |
+
metadata={"pmid": "98765432"},
|
| 388 |
+
)
|
| 389 |
+
assert extract_paper_id(evidence) == "PMID:98765432"
|
| 390 |
+
|
| 391 |
+
def test_extracts_nct_id_modern(self) -> None:
|
| 392 |
+
evidence = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/study/NCT12345678")
|
| 393 |
+
assert extract_paper_id(evidence) == "NCT:NCT12345678"
|
| 394 |
+
|
| 395 |
+
def test_extracts_nct_id_legacy(self) -> None:
|
| 396 |
+
"""Legacy ClinicalTrials.gov URL format should also work."""
|
| 397 |
+
evidence = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/ct2/show/NCT12345678")
|
| 398 |
+
assert extract_paper_id(evidence) == "NCT:NCT12345678"
|
| 399 |
+
|
| 400 |
+
def test_returns_none_for_unknown_url(self) -> None:
|
| 401 |
+
evidence = _make_evidence("unknown", "https://example.com/unknown")
|
| 402 |
+
assert extract_paper_id(evidence) is None
|
| 403 |
+
|
| 404 |
+
|
| 405 |
+
class TestDeduplicateEvidence:
|
| 406 |
+
"""Tests for evidence deduplication."""
|
| 407 |
+
|
| 408 |
+
def test_removes_pubmed_europepmc_duplicate(self) -> None:
|
| 409 |
+
"""Same paper from PubMed and Europe PMC should dedupe to PubMed."""
|
| 410 |
+
pubmed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
|
| 411 |
+
europepmc = _make_evidence("europepmc", "https://europepmc.org/article/MED/12345678")
|
| 412 |
+
|
| 413 |
+
result = deduplicate_evidence([pubmed, europepmc])
|
| 414 |
+
|
| 415 |
+
assert len(result) == 1
|
| 416 |
+
assert result[0].citation.source == "pubmed"
|
| 417 |
+
|
| 418 |
+
def test_removes_pubmed_openalex_duplicate_via_metadata(self) -> None:
|
| 419 |
+
"""OpenAlex with PMID in metadata should dedupe against PubMed."""
|
| 420 |
+
pubmed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
|
| 421 |
+
openalex = _make_evidence(
|
| 422 |
+
"openalex",
|
| 423 |
+
"https://openalex.org/W9999999",
|
| 424 |
+
metadata={"pmid": "12345678", "cited_by_count": 100},
|
| 425 |
+
)
|
| 426 |
+
|
| 427 |
+
result = deduplicate_evidence([pubmed, openalex])
|
| 428 |
+
|
| 429 |
+
assert len(result) == 1
|
| 430 |
+
assert result[0].citation.source == "pubmed"
|
| 431 |
+
|
| 432 |
+
def test_preserves_openalex_without_pmid(self) -> None:
|
| 433 |
+
"""OpenAlex papers without PMID should NOT be deduplicated."""
|
| 434 |
+
pubmed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
|
| 435 |
+
openalex_no_pmid = _make_evidence(
|
| 436 |
+
"openalex",
|
| 437 |
+
"https://openalex.org/W9999999",
|
| 438 |
+
metadata={"cited_by_count": 100}, # No pmid key
|
| 439 |
+
)
|
| 440 |
+
|
| 441 |
+
result = deduplicate_evidence([pubmed, openalex_no_pmid])
|
| 442 |
+
|
| 443 |
+
assert len(result) == 2 # Both preserved (different IDs)
|
| 444 |
+
|
| 445 |
+
def test_preserves_unique_evidence(self) -> None:
|
| 446 |
+
"""Different papers should not be deduplicated."""
|
| 447 |
+
e1 = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/11111111/")
|
| 448 |
+
e2 = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/22222222/")
|
| 449 |
+
|
| 450 |
+
result = deduplicate_evidence([e1, e2])
|
| 451 |
+
|
| 452 |
+
assert len(result) == 2
|
| 453 |
+
|
| 454 |
+
def test_keeps_unidentifiable_evidence(self) -> None:
|
| 455 |
+
"""Evidence with unrecognized URLs should be preserved."""
|
| 456 |
+
unknown = _make_evidence("unknown", "https://example.com/paper/123")
|
| 457 |
+
|
| 458 |
+
result = deduplicate_evidence([unknown])
|
| 459 |
+
|
| 460 |
+
assert len(result) == 1
|
| 461 |
+
|
| 462 |
+
def test_clinicaltrials_unique_per_nct(self) -> None:
|
| 463 |
+
"""ClinicalTrials entries have unique NCT IDs."""
|
| 464 |
+
trial1 = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/study/NCT11111111")
|
| 465 |
+
trial2 = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/study/NCT22222222")
|
| 466 |
+
|
| 467 |
+
result = deduplicate_evidence([trial1, trial2])
|
| 468 |
+
|
| 469 |
+
assert len(result) == 2
|
| 470 |
+
|
| 471 |
+
def test_preprints_preserved_separately(self) -> None:
|
| 472 |
+
"""Preprints (PPR IDs) should not dedupe against peer-reviewed papers."""
|
| 473 |
+
peer_reviewed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
|
| 474 |
+
preprint = _make_evidence("europepmc", "https://europepmc.org/article/PPR/PPR999999")
|
| 475 |
+
|
| 476 |
+
result = deduplicate_evidence([peer_reviewed, preprint])
|
| 477 |
+
|
| 478 |
+
assert len(result) == 2 # Both preserved (different ID types)
|
| 479 |
+
```
|
| 480 |
+
|
| 481 |
+
### Integration Test
|
| 482 |
+
|
| 483 |
+
```python
|
| 484 |
+
@pytest.mark.integration
|
| 485 |
+
async def test_real_search_deduplicates() -> None:
|
| 486 |
+
"""Integration test: Real search should deduplicate PubMed/Europe PMC."""
|
| 487 |
+
from src.tools.pubmed import PubMedTool
|
| 488 |
+
from src.tools.europepmc import EuropePMCTool
|
| 489 |
+
from src.tools.openalex import OpenAlexTool
|
| 490 |
+
from src.tools.search_handler import SearchHandler, extract_paper_id
|
| 491 |
+
|
| 492 |
+
handler = SearchHandler(
|
| 493 |
+
tools=[PubMedTool(), EuropePMCTool(), OpenAlexTool()],
|
| 494 |
+
timeout=30.0,
|
| 495 |
+
)
|
| 496 |
+
|
| 497 |
+
result = await handler.execute("sildenafil erectile dysfunction", max_results_per_tool=5)
|
| 498 |
+
|
| 499 |
+
# Should have fewer results than 3x max_results due to deduplication
|
| 500 |
+
assert result.total_found < 15
|
| 501 |
+
|
| 502 |
+
# Check no duplicate paper IDs in final result
|
| 503 |
+
paper_ids = [
|
| 504 |
+
extract_paper_id(e)
|
| 505 |
+
for e in result.evidence
|
| 506 |
+
if extract_paper_id(e)
|
| 507 |
+
]
|
| 508 |
+
assert len(paper_ids) == len(set(paper_ids)), f"Duplicate IDs found: {paper_ids}"
|
| 509 |
+
```
|
| 510 |
+
|
| 511 |
+
---
|
| 512 |
+
|
| 513 |
+
## Files to Modify
|
| 514 |
+
|
| 515 |
+
| File | Change |
|
| 516 |
+
|------|--------|
|
| 517 |
+
| `src/tools/openalex.py` | Extract PMID from `ids.pmid` and store in `Evidence.metadata` |
|
| 518 |
+
| `src/tools/search_handler.py` | Add `extract_paper_id()`, `deduplicate_evidence()`, integrate into `execute()` |
|
| 519 |
+
| `tests/unit/tools/test_search_handler.py` | Add comprehensive deduplication tests |
|
| 520 |
+
| `tests/unit/tools/test_openalex.py` | Add test for PMID extraction |
|
| 521 |
+
|
| 522 |
+
---
|
| 523 |
+
|
| 524 |
+
## Acceptance Criteria
|
| 525 |
+
|
| 526 |
+
- [ ] `openalex.py` extracts PMID from `work.ids.pmid` and stores in `Evidence.metadata`
|
| 527 |
+
- [ ] `extract_paper_id()` checks `Evidence.metadata.pmid` first (for OpenAlex cross-dedup)
|
| 528 |
+
- [ ] `extract_paper_id()` correctly parses all URL patterns:
|
| 529 |
+
- PubMed URLs
|
| 530 |
+
- Europe PMC MED, PMC, PPR, and PAT URLs
|
| 531 |
+
- DOIs (normalized without trailing slash)
|
| 532 |
+
- OpenAlex W-IDs (fallback)
|
| 533 |
+
- ClinicalTrials NCT IDs (modern and legacy formats)
|
| 534 |
+
- [ ] `deduplicate_evidence()` removes duplicates while preserving source priority
|
| 535 |
+
- [ ] Unit tests cover all edge cases including OpenAlex with/without PMID
|
| 536 |
+
- [ ] Integration test confirms real searches are deduplicated
|
| 537 |
+
- [ ] Logging shows deduplication metrics
|
| 538 |
+
|
| 539 |
+
---
|
| 540 |
+
|
| 541 |
+
## Known Limitations
|
| 542 |
+
|
| 543 |
+
1. **OpenAlex without PMID**: Some OpenAlex papers (especially older or non-biomedical) may not have PMIDs. These will NOT be deduplicated against PubMed. This is acceptable - we prefer false negatives (keeping duplicates) over false positives (removing unique papers).
|
| 544 |
+
|
| 545 |
+
2. **PMC vs PMID**: PubMed Central IDs (PMC1234567) are different from PubMed IDs (12345678). A paper may have both, but we treat them as different identifiers. This may result in some duplicates not being caught when Europe PMC returns a PMC URL and PubMed returns a PMID URL for the same paper. Future enhancement: Use NCBI ID Converter API.
|
| 546 |
+
|
| 547 |
+
3. **Preprint β Peer-Reviewed**: When a preprint (PPR ID) is later published as a peer-reviewed paper (PMID), they will have different IDs and won't be deduplicated. This is intentional - users may want to see both versions.
|
| 548 |
+
|
| 549 |
+
---
|
| 550 |
+
|
| 551 |
+
## Rollback Plan
|
| 552 |
+
|
| 553 |
+
If deduplication causes issues:
|
| 554 |
+
1. Remove the `deduplicate_evidence()` call from `execute()`
|
| 555 |
+
2. All tests should still pass (deduplication is additive)
|
| 556 |
+
3. No need to revert OpenAlex PMID extraction (metadata only)
|
| 557 |
+
|
| 558 |
+
---
|
| 559 |
+
|
| 560 |
+
## References
|
| 561 |
+
|
| 562 |
+
- GitHub Issue #94
|
| 563 |
+
- [OpenAlex Work Object Documentation](https://docs.openalex.org/api-entities/works/work-object)
|
| 564 |
+
- [Europe PMC Help - ID Types](https://europepmc.org/help)
|
| 565 |
+
- [ClinicalTrials.gov API v2](https://clinicaltrials.gov/data-api/api)
|
| 566 |
+
- `TOOL_ANALYSIS_CRITICAL.md` - "Cross-Tool Issues > Issue 1: MASSIVE DUPLICATION"
|
|
@@ -0,0 +1,466 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# SPEC_14: Add Outcome Measures to ClinicalTrials.gov Fields
|
| 2 |
+
|
| 3 |
+
**Status**: Draft (Validated via API Documentation Review)
|
| 4 |
+
**Priority**: P1
|
| 5 |
+
**GitHub Issue**: #95
|
| 6 |
+
**Estimated Effort**: Small (~40 lines of code)
|
| 7 |
+
**Last Updated**: 2025-11-30
|
| 8 |
+
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
## Problem Statement
|
| 12 |
+
|
| 13 |
+
The `ClinicalTrialsTool` retrieves trial metadata but **misses critical efficacy data**:
|
| 14 |
+
|
| 15 |
+
### Current Fields Retrieved
|
| 16 |
+
|
| 17 |
+
```python
|
| 18 |
+
# src/tools/clinicaltrials.py:24-33
|
| 19 |
+
FIELDS: ClassVar[list[str]] = [
|
| 20 |
+
"NCTId",
|
| 21 |
+
"BriefTitle",
|
| 22 |
+
"Phase",
|
| 23 |
+
"OverallStatus",
|
| 24 |
+
"Condition",
|
| 25 |
+
"InterventionName",
|
| 26 |
+
"StartDate",
|
| 27 |
+
"BriefSummary",
|
| 28 |
+
]
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
### Missing Data (Critical for Research)
|
| 32 |
+
|
| 33 |
+
| Data | Location in Response | Purpose |
|
| 34 |
+
|------|---------------------|---------|
|
| 35 |
+
| Primary Outcomes | `protocolSection.outcomesModule.primaryOutcomes[].measure` | Main efficacy endpoint |
|
| 36 |
+
| Secondary Outcomes | `protocolSection.outcomesModule.secondaryOutcomes[].measure` | Additional endpoints |
|
| 37 |
+
| Has Results | `study.hasResults` (top-level) | Whether results are posted |
|
| 38 |
+
| Results Date | `protocolSection.statusModule.resultsFirstPostDateStruct.date` | When results posted |
|
| 39 |
+
|
| 40 |
+
### Impact
|
| 41 |
+
|
| 42 |
+
**Current Output**:
|
| 43 |
+
```
|
| 44 |
+
Trial Phase: PHASE3. Status: COMPLETED. Conditions: Erectile Dysfunction.
|
| 45 |
+
Interventions: Sildenafil.
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
**Desired Output**:
|
| 49 |
+
```
|
| 50 |
+
Trial Phase: PHASE3. Status: COMPLETED. Conditions: Erectile Dysfunction.
|
| 51 |
+
Interventions: Sildenafil.
|
| 52 |
+
Primary Outcome: Change from baseline in IIEF-EF domain score at Week 12.
|
| 53 |
+
Results Available: Yes (posted 2024-01-15).
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
+
---
|
| 57 |
+
|
| 58 |
+
## API Documentation Review (2025-11-30)
|
| 59 |
+
|
| 60 |
+
### ClinicalTrials.gov API v2 Response Structure
|
| 61 |
+
|
| 62 |
+
**Source**: [Stack Overflow - ClinicalTrials.gov API v2](https://stackoverflow.com/questions/78415818)
|
| 63 |
+
|
| 64 |
+
The API returns nested JSON. Key findings:
|
| 65 |
+
|
| 66 |
+
1. **`hasResults`** is a **top-level** field on each study object (NOT inside `protocolSection`)
|
| 67 |
+
2. **Outcomes** are in `protocolSection.outcomesModule`:
|
| 68 |
+
```python
|
| 69 |
+
study['protocolSection']['outcomesModule']['primaryOutcomes'] # List
|
| 70 |
+
study['protocolSection']['outcomesModule']['secondaryOutcomes'] # List
|
| 71 |
+
```
|
| 72 |
+
3. **Results date** is in `protocolSection.statusModule.resultsFirstPostDateStruct.date`
|
| 73 |
+
|
| 74 |
+
### `fields` Parameter Behavior (VERIFIED VIA LIVE API TESTING)
|
| 75 |
+
|
| 76 |
+
The `fields` query parameter filters what the API returns. **If you don't request a field, you don't get it.**
|
| 77 |
+
|
| 78 |
+
**Live API Test Results (2025-11-30):**
|
| 79 |
+
|
| 80 |
+
```bash
|
| 81 |
+
# Test 1: With limited fields - NO outcomesModule returned
|
| 82 |
+
curl "...&fields=NCTId,BriefTitle"
|
| 83 |
+
# β Returns ONLY: protocolSection.identificationModule.{nctId, briefTitle}
|
| 84 |
+
|
| 85 |
+
# Test 2: Without fields param - outcomesModule IS present
|
| 86 |
+
curl "...&pageSize=1"
|
| 87 |
+
# β Returns: hasResults: false, outcomesModule: {primaryOutcomes, secondaryOutcomes, otherOutcomes}
|
| 88 |
+
|
| 89 |
+
# Test 3: Valid field names for outcomes
|
| 90 |
+
curl "...&fields=NCTId,OutcomesModule" # β
Works - returns full outcomesModule
|
| 91 |
+
curl "...&fields=NCTId,PrimaryOutcome" # β
Works - returns only primaryOutcomes
|
| 92 |
+
curl "...&fields=NCTId,HasResults" # β
Works - returns hasResults at top level
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
**Valid Field Names (Tested):**
|
| 96 |
+
- `OutcomesModule` β Returns full `protocolSection.outcomesModule` with all outcomes
|
| 97 |
+
- `PrimaryOutcome` β Returns only `primaryOutcomes` array
|
| 98 |
+
- `SecondaryOutcome` β Returns only `secondaryOutcomes` array
|
| 99 |
+
- `HasResults` β Returns `hasResults` at study top level
|
| 100 |
+
|
| 101 |
+
---
|
| 102 |
+
|
| 103 |
+
## Proposed Solution
|
| 104 |
+
|
| 105 |
+
### β
UPDATE FIELDS Constant (REQUIRED)
|
| 106 |
+
|
| 107 |
+
The current implementation explicitly passes `fields=",".join(self.FIELDS)` at line 67.
|
| 108 |
+
**The API ONLY returns requested fields.** We MUST add the new field names.
|
| 109 |
+
|
| 110 |
+
```python
|
| 111 |
+
# src/tools/clinicaltrials.py - UPDATE FIELDS
|
| 112 |
+
FIELDS: ClassVar[list[str]] = [
|
| 113 |
+
"NCTId",
|
| 114 |
+
"BriefTitle",
|
| 115 |
+
"Phase",
|
| 116 |
+
"OverallStatus",
|
| 117 |
+
"Condition",
|
| 118 |
+
"InterventionName",
|
| 119 |
+
"StartDate",
|
| 120 |
+
"BriefSummary",
|
| 121 |
+
# NEW: Outcome measures (verified via live API testing 2025-11-30)
|
| 122 |
+
"OutcomesModule", # Returns protocolSection.outcomesModule.{primaryOutcomes, secondaryOutcomes}
|
| 123 |
+
"HasResults", # Returns study.hasResults (top-level boolean)
|
| 124 |
+
]
|
| 125 |
+
```
|
| 126 |
+
|
| 127 |
+
### β
Update `_study_to_evidence()` Method
|
| 128 |
+
|
| 129 |
+
```python
|
| 130 |
+
def _study_to_evidence(self, study: dict[str, Any]) -> Evidence:
|
| 131 |
+
"""Convert a clinical trial study to Evidence."""
|
| 132 |
+
# Navigate nested structure
|
| 133 |
+
protocol = study.get("protocolSection", {})
|
| 134 |
+
id_module = protocol.get("identificationModule", {})
|
| 135 |
+
status_module = protocol.get("statusModule", {})
|
| 136 |
+
desc_module = protocol.get("descriptionModule", {})
|
| 137 |
+
design_module = protocol.get("designModule", {})
|
| 138 |
+
conditions_module = protocol.get("conditionsModule", {})
|
| 139 |
+
arms_module = protocol.get("armsInterventionsModule", {})
|
| 140 |
+
outcomes_module = protocol.get("outcomesModule", {}) # NEW
|
| 141 |
+
|
| 142 |
+
# ... existing field extraction (nct_id, title, status, phase, etc.) ...
|
| 143 |
+
|
| 144 |
+
# NEW: Extract outcome measures
|
| 145 |
+
primary_outcomes = outcomes_module.get("primaryOutcomes", [])
|
| 146 |
+
primary_outcome_str = ""
|
| 147 |
+
if primary_outcomes:
|
| 148 |
+
# Get first primary outcome measure and timeframe
|
| 149 |
+
first = primary_outcomes[0]
|
| 150 |
+
measure = first.get("measure", "")
|
| 151 |
+
timeframe = first.get("timeFrame", "")
|
| 152 |
+
# Truncate long outcome descriptions
|
| 153 |
+
primary_outcome_str = measure[:200]
|
| 154 |
+
if timeframe:
|
| 155 |
+
primary_outcome_str += f" (measured at {timeframe})"
|
| 156 |
+
|
| 157 |
+
secondary_outcomes = outcomes_module.get("secondaryOutcomes", [])
|
| 158 |
+
secondary_count = len(secondary_outcomes)
|
| 159 |
+
|
| 160 |
+
# NEW: Check if results are available (hasResults is TOP-LEVEL, not in protocol!)
|
| 161 |
+
has_results = study.get("hasResults", False)
|
| 162 |
+
|
| 163 |
+
# Results date is in statusModule (nested inside date struct)
|
| 164 |
+
results_date_struct = status_module.get("resultsFirstPostDateStruct", {})
|
| 165 |
+
results_date = results_date_struct.get("date", "")
|
| 166 |
+
|
| 167 |
+
# Build content with key trial info (UPDATED)
|
| 168 |
+
content_parts = [
|
| 169 |
+
f"{summary[:400]}...",
|
| 170 |
+
f"Trial Phase: {phase}.",
|
| 171 |
+
f"Status: {status}.",
|
| 172 |
+
f"Conditions: {conditions_str}.",
|
| 173 |
+
f"Interventions: {interventions_str}.",
|
| 174 |
+
]
|
| 175 |
+
|
| 176 |
+
if primary_outcome_str:
|
| 177 |
+
content_parts.append(f"Primary Outcome: {primary_outcome_str}.")
|
| 178 |
+
|
| 179 |
+
if secondary_count > 0:
|
| 180 |
+
content_parts.append(f"Secondary Outcomes: {secondary_count} additional endpoints.")
|
| 181 |
+
|
| 182 |
+
if has_results:
|
| 183 |
+
results_info = "Results Available: Yes"
|
| 184 |
+
if results_date:
|
| 185 |
+
results_info += f" (posted {results_date})"
|
| 186 |
+
content_parts.append(results_info + ".")
|
| 187 |
+
else:
|
| 188 |
+
content_parts.append("Results Available: Not yet posted.")
|
| 189 |
+
|
| 190 |
+
content = " ".join(content_parts)
|
| 191 |
+
|
| 192 |
+
return Evidence(
|
| 193 |
+
content=content[:2000],
|
| 194 |
+
citation=Citation(
|
| 195 |
+
source="clinicaltrials",
|
| 196 |
+
title=title[:500],
|
| 197 |
+
url=f"https://clinicaltrials.gov/study/{nct_id}",
|
| 198 |
+
date=start_date,
|
| 199 |
+
authors=[],
|
| 200 |
+
),
|
| 201 |
+
relevance=0.90 if has_results else 0.85, # Boost relevance for trials with results
|
| 202 |
+
)
|
| 203 |
+
```
|
| 204 |
+
|
| 205 |
+
---
|
| 206 |
+
|
| 207 |
+
## API Reference
|
| 208 |
+
|
| 209 |
+
The ClinicalTrials.gov API v2 returns nested JSON:
|
| 210 |
+
|
| 211 |
+
```json
|
| 212 |
+
{
|
| 213 |
+
"protocolSection": {
|
| 214 |
+
"outcomesModule": {
|
| 215 |
+
"primaryOutcomes": [
|
| 216 |
+
{
|
| 217 |
+
"measure": "Change from Baseline in IIEF-EF Domain Score",
|
| 218 |
+
"description": "...",
|
| 219 |
+
"timeFrame": "Baseline to Week 12"
|
| 220 |
+
}
|
| 221 |
+
],
|
| 222 |
+
"secondaryOutcomes": [
|
| 223 |
+
{
|
| 224 |
+
"measure": "Subject Global Assessment Question",
|
| 225 |
+
"timeFrame": "Week 12"
|
| 226 |
+
}
|
| 227 |
+
]
|
| 228 |
+
}
|
| 229 |
+
},
|
| 230 |
+
"hasResults": true
|
| 231 |
+
}
|
| 232 |
+
```
|
| 233 |
+
|
| 234 |
+
See: https://clinicaltrials.gov/data-api/api
|
| 235 |
+
|
| 236 |
+
---
|
| 237 |
+
|
| 238 |
+
## Test Plan
|
| 239 |
+
|
| 240 |
+
### Unit Tests (`tests/unit/tools/test_clinicaltrials.py`)
|
| 241 |
+
|
| 242 |
+
```python
|
| 243 |
+
@pytest.mark.unit
|
| 244 |
+
class TestClinicalTrialsOutcomes:
|
| 245 |
+
"""Tests for outcome measure extraction."""
|
| 246 |
+
|
| 247 |
+
@pytest.mark.asyncio
|
| 248 |
+
async def test_extracts_primary_outcome(self, tool: ClinicalTrialsTool) -> None:
|
| 249 |
+
"""Test that primary outcome is extracted from response."""
|
| 250 |
+
mock_study = {
|
| 251 |
+
"protocolSection": {
|
| 252 |
+
"identificationModule": {"nctId": "NCT12345678", "briefTitle": "Test"},
|
| 253 |
+
"statusModule": {"overallStatus": "COMPLETED", "startDateStruct": {"date": "2023"}},
|
| 254 |
+
"descriptionModule": {"briefSummary": "Summary"},
|
| 255 |
+
"designModule": {"phases": ["PHASE3"]},
|
| 256 |
+
"conditionsModule": {"conditions": ["ED"]},
|
| 257 |
+
"armsInterventionsModule": {"interventions": []},
|
| 258 |
+
"outcomesModule": {
|
| 259 |
+
"primaryOutcomes": [
|
| 260 |
+
{
|
| 261 |
+
"measure": "Change in IIEF-EF score",
|
| 262 |
+
"timeFrame": "Week 12"
|
| 263 |
+
}
|
| 264 |
+
]
|
| 265 |
+
},
|
| 266 |
+
},
|
| 267 |
+
"hasResults": True,
|
| 268 |
+
}
|
| 269 |
+
|
| 270 |
+
mock_response = MagicMock()
|
| 271 |
+
mock_response.json.return_value = {"studies": [mock_study]}
|
| 272 |
+
mock_response.raise_for_status = MagicMock()
|
| 273 |
+
|
| 274 |
+
with patch("requests.get", return_value=mock_response):
|
| 275 |
+
results = await tool.search("test", max_results=1)
|
| 276 |
+
|
| 277 |
+
assert len(results) == 1
|
| 278 |
+
assert "Primary Outcome" in results[0].content
|
| 279 |
+
assert "IIEF-EF" in results[0].content
|
| 280 |
+
assert "Week 12" in results[0].content
|
| 281 |
+
|
| 282 |
+
@pytest.mark.asyncio
|
| 283 |
+
async def test_includes_results_status(self, tool: ClinicalTrialsTool) -> None:
|
| 284 |
+
"""Test that results availability is shown."""
|
| 285 |
+
mock_study = {
|
| 286 |
+
"protocolSection": {
|
| 287 |
+
"identificationModule": {"nctId": "NCT12345678", "briefTitle": "Test"},
|
| 288 |
+
"statusModule": {
|
| 289 |
+
"overallStatus": "COMPLETED",
|
| 290 |
+
"startDateStruct": {"date": "2023"},
|
| 291 |
+
# Note: resultsFirstPostDateStruct, not resultsFirstSubmitDate
|
| 292 |
+
"resultsFirstPostDateStruct": {"date": "2024-06-15"},
|
| 293 |
+
},
|
| 294 |
+
"descriptionModule": {"briefSummary": "Summary"},
|
| 295 |
+
"designModule": {"phases": ["PHASE3"]},
|
| 296 |
+
"conditionsModule": {"conditions": ["ED"]},
|
| 297 |
+
"armsInterventionsModule": {"interventions": []},
|
| 298 |
+
"outcomesModule": {},
|
| 299 |
+
},
|
| 300 |
+
"hasResults": True, # Note: hasResults is TOP-LEVEL
|
| 301 |
+
}
|
| 302 |
+
|
| 303 |
+
mock_response = MagicMock()
|
| 304 |
+
mock_response.json.return_value = {"studies": [mock_study]}
|
| 305 |
+
mock_response.raise_for_status = MagicMock()
|
| 306 |
+
|
| 307 |
+
with patch("requests.get", return_value=mock_response):
|
| 308 |
+
results = await tool.search("test", max_results=1)
|
| 309 |
+
|
| 310 |
+
assert "Results Available: Yes" in results[0].content
|
| 311 |
+
assert "2024-06-15" in results[0].content
|
| 312 |
+
|
| 313 |
+
@pytest.mark.asyncio
|
| 314 |
+
async def test_shows_no_results_when_missing(self, tool: ClinicalTrialsTool) -> None:
|
| 315 |
+
"""Test that missing results are indicated."""
|
| 316 |
+
mock_study = {
|
| 317 |
+
"protocolSection": {
|
| 318 |
+
"identificationModule": {"nctId": "NCT12345678", "briefTitle": "Test"},
|
| 319 |
+
"statusModule": {"overallStatus": "RECRUITING", "startDateStruct": {"date": "2024"}},
|
| 320 |
+
"descriptionModule": {"briefSummary": "Summary"},
|
| 321 |
+
"designModule": {"phases": ["PHASE2"]},
|
| 322 |
+
"conditionsModule": {"conditions": ["ED"]},
|
| 323 |
+
"armsInterventionsModule": {"interventions": []},
|
| 324 |
+
"outcomesModule": {},
|
| 325 |
+
},
|
| 326 |
+
"hasResults": False,
|
| 327 |
+
}
|
| 328 |
+
|
| 329 |
+
mock_response = MagicMock()
|
| 330 |
+
mock_response.json.return_value = {"studies": [mock_study]}
|
| 331 |
+
mock_response.raise_for_status = MagicMock()
|
| 332 |
+
|
| 333 |
+
with patch("requests.get", return_value=mock_response):
|
| 334 |
+
results = await tool.search("test", max_results=1)
|
| 335 |
+
|
| 336 |
+
assert "Results Available: Not yet posted" in results[0].content
|
| 337 |
+
|
| 338 |
+
@pytest.mark.asyncio
|
| 339 |
+
async def test_boosts_relevance_for_results(self, tool: ClinicalTrialsTool) -> None:
|
| 340 |
+
"""Trials with results should have higher relevance score."""
|
| 341 |
+
with_results = {
|
| 342 |
+
"protocolSection": {
|
| 343 |
+
"identificationModule": {"nctId": "NCT11111111", "briefTitle": "With Results"},
|
| 344 |
+
"statusModule": {"overallStatus": "COMPLETED", "startDateStruct": {"date": "2023"}},
|
| 345 |
+
"descriptionModule": {"briefSummary": "Summary"},
|
| 346 |
+
"designModule": {"phases": []},
|
| 347 |
+
"conditionsModule": {"conditions": []},
|
| 348 |
+
"armsInterventionsModule": {"interventions": []},
|
| 349 |
+
"outcomesModule": {},
|
| 350 |
+
},
|
| 351 |
+
"hasResults": True,
|
| 352 |
+
}
|
| 353 |
+
without_results = {
|
| 354 |
+
"protocolSection": {
|
| 355 |
+
"identificationModule": {"nctId": "NCT22222222", "briefTitle": "No Results"},
|
| 356 |
+
"statusModule": {"overallStatus": "RECRUITING", "startDateStruct": {"date": "2024"}},
|
| 357 |
+
"descriptionModule": {"briefSummary": "Summary"},
|
| 358 |
+
"designModule": {"phases": []},
|
| 359 |
+
"conditionsModule": {"conditions": []},
|
| 360 |
+
"armsInterventionsModule": {"interventions": []},
|
| 361 |
+
"outcomesModule": {},
|
| 362 |
+
},
|
| 363 |
+
"hasResults": False,
|
| 364 |
+
}
|
| 365 |
+
|
| 366 |
+
mock_response = MagicMock()
|
| 367 |
+
mock_response.json.return_value = {"studies": [with_results, without_results]}
|
| 368 |
+
mock_response.raise_for_status = MagicMock()
|
| 369 |
+
|
| 370 |
+
with patch("requests.get", return_value=mock_response):
|
| 371 |
+
results = await tool.search("test", max_results=2)
|
| 372 |
+
|
| 373 |
+
assert results[0].relevance == 0.90 # With results
|
| 374 |
+
assert results[1].relevance == 0.85 # Without results
|
| 375 |
+
```
|
| 376 |
+
|
| 377 |
+
### Integration Test
|
| 378 |
+
|
| 379 |
+
```python
|
| 380 |
+
@pytest.mark.integration
|
| 381 |
+
class TestClinicalTrialsOutcomesIntegration:
|
| 382 |
+
"""Integration tests with real API."""
|
| 383 |
+
|
| 384 |
+
@pytest.mark.asyncio
|
| 385 |
+
async def test_real_completed_trial_has_outcome(self) -> None:
|
| 386 |
+
"""Real completed Phase 3 trials should have outcome measures."""
|
| 387 |
+
tool = ClinicalTrialsTool()
|
| 388 |
+
|
| 389 |
+
# Search for completed Phase 3 ED trials (likely to have outcomes)
|
| 390 |
+
results = await tool.search(
|
| 391 |
+
"sildenafil erectile dysfunction Phase 3 COMPLETED",
|
| 392 |
+
max_results=3
|
| 393 |
+
)
|
| 394 |
+
|
| 395 |
+
# At least one should have primary outcome
|
| 396 |
+
has_outcome = any("Primary Outcome" in r.content for r in results)
|
| 397 |
+
assert has_outcome, "No completed trials with outcome measures found"
|
| 398 |
+
```
|
| 399 |
+
|
| 400 |
+
---
|
| 401 |
+
|
| 402 |
+
## Files to Modify
|
| 403 |
+
|
| 404 |
+
| File | Change |
|
| 405 |
+
|------|--------|
|
| 406 |
+
| `src/tools/clinicaltrials.py` | **ADD** `OutcomesModule` and `HasResults` to `FIELDS`, update `_study_to_evidence()` |
|
| 407 |
+
| `tests/unit/tools/test_clinicaltrials.py` | Add outcome parsing tests |
|
| 408 |
+
|
| 409 |
+
---
|
| 410 |
+
|
| 411 |
+
## Acceptance Criteria
|
| 412 |
+
|
| 413 |
+
### FIELDS Constant (REQUIRED CHANGE)
|
| 414 |
+
- [ ] `FIELDS` includes `"OutcomesModule"` (returns full outcomesModule)
|
| 415 |
+
- [ ] `FIELDS` includes `"HasResults"` (returns top-level boolean)
|
| 416 |
+
|
| 417 |
+
### `_study_to_evidence()` Method
|
| 418 |
+
- [ ] Extracts `protocolSection.outcomesModule.primaryOutcomes`
|
| 419 |
+
- [ ] Accesses `study.hasResults` at TOP LEVEL (not inside protocolSection)
|
| 420 |
+
- [ ] Results date extracted from `statusModule.resultsFirstPostDateStruct.date`
|
| 421 |
+
- [ ] Evidence content includes primary outcome measure when available
|
| 422 |
+
- [ ] Evidence content shows results availability status
|
| 423 |
+
- [ ] Outcome measure text truncated to 200 chars
|
| 424 |
+
- [ ] Trials with results have boosted relevance (0.90 vs 0.85)
|
| 425 |
+
|
| 426 |
+
### Testing
|
| 427 |
+
- [ ] All unit tests pass
|
| 428 |
+
- [ ] Integration test confirms real trials return outcome data
|
| 429 |
+
- [ ] Live API test confirms `OutcomesModule` and `HasResults` fields work
|
| 430 |
+
|
| 431 |
+
---
|
| 432 |
+
|
| 433 |
+
## Edge Cases
|
| 434 |
+
|
| 435 |
+
1. **No outcomes defined**: Some early-phase trials don't have outcomes yet
|
| 436 |
+
- Solution: Gracefully skip outcome section if `outcomesModule` is empty or missing
|
| 437 |
+
|
| 438 |
+
2. **Multiple primary outcomes**: Some trials have 2-3 primary outcomes
|
| 439 |
+
- Solution: Show first outcome only, mention count of others
|
| 440 |
+
|
| 441 |
+
3. **Long outcome descriptions**: Some measures are very verbose (500+ chars)
|
| 442 |
+
- Solution: Truncate measure to 200 chars with `[:200]`
|
| 443 |
+
|
| 444 |
+
4. **hasResults without resultsFirstPostDateStruct**: Some completed trials may have results without a posted date
|
| 445 |
+
- Solution: Show "Results Available: Yes" without date
|
| 446 |
+
|
| 447 |
+
5. **outcomesModule missing entirely**: Not all API responses include this module
|
| 448 |
+
- Solution: Use `.get("outcomesModule", {})` for safe access
|
| 449 |
+
|
| 450 |
+
---
|
| 451 |
+
|
| 452 |
+
## Rollback Plan
|
| 453 |
+
|
| 454 |
+
If outcome extraction causes issues:
|
| 455 |
+
1. DO NOT modify `FIELDS` - nothing to revert there
|
| 456 |
+
2. Remove outcome extraction code from `_study_to_evidence()`
|
| 457 |
+
3. Existing tests should still pass
|
| 458 |
+
|
| 459 |
+
---
|
| 460 |
+
|
| 461 |
+
## References
|
| 462 |
+
|
| 463 |
+
- GitHub Issue #95
|
| 464 |
+
- [ClinicalTrials.gov API v2 Studies Endpoint](https://clinicaltrials.gov/data-api/api)
|
| 465 |
+
- [Stack Overflow - ClinicalTrials.gov API v2 Response Structure](https://stackoverflow.com/questions/78415818)
|
| 466 |
+
- `TOOL_ANALYSIS_CRITICAL.md` - "Tool 2: ClinicalTrials.gov > Current Implementation Gaps"
|
|
@@ -0,0 +1,478 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# SPEC_15: Advanced Mode Performance Optimization
|
| 2 |
+
|
| 3 |
+
**Status**: Draft (Validated - Implement All Solutions)
|
| 4 |
+
**Priority**: P1
|
| 5 |
+
**GitHub Issue**: #65
|
| 6 |
+
**Estimated Effort**: Medium (config changes + early termination logic)
|
| 7 |
+
**Last Updated**: 2025-11-30
|
| 8 |
+
|
| 9 |
+
> **Senior Review Verdict**: β
APPROVED
|
| 10 |
+
> **Recommendation**: Implement Solution A + B + C together. Solution B (Early Termination) is NOT "post-hackathon" - it's the core fix that solves the root cause. The patterns used are consistent with Microsoft Agent Framework best practices.
|
| 11 |
+
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
## Problem Statement
|
| 15 |
+
|
| 16 |
+
Advanced (Multi-Agent) mode runs **10 rounds of multi-agent coordination** which takes **10-15+ minutes**.
|
| 17 |
+
|
| 18 |
+
**For hackathon demos**: No judge will wait this long. They'll close the tab before seeing results.
|
| 19 |
+
|
| 20 |
+
### Observed Behavior
|
| 21 |
+
|
| 22 |
+
- System works correctly (no crashes)
|
| 23 |
+
- Produces detailed, high-quality research output
|
| 24 |
+
- Takes too long for practical demo use
|
| 25 |
+
- User had to manually terminate after ~10 minutes
|
| 26 |
+
|
| 27 |
+
### Current Configuration
|
| 28 |
+
|
| 29 |
+
```python
|
| 30 |
+
# src/orchestrators/advanced.py:133
|
| 31 |
+
.with_standard_manager(
|
| 32 |
+
chat_client=manager_client,
|
| 33 |
+
max_round_count=self._max_rounds, # Default: 10
|
| 34 |
+
max_stall_count=3,
|
| 35 |
+
max_reset_count=2,
|
| 36 |
+
)
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
### Time Breakdown (Estimated)
|
| 40 |
+
|
| 41 |
+
| Component | Time per Round | Notes |
|
| 42 |
+
|-----------|---------------|-------|
|
| 43 |
+
| Manager LLM call | 2-5s | Decides next agent |
|
| 44 |
+
| Search Agent | 10-20s | 4 API calls (PubMed, CT, EPMC, OA) |
|
| 45 |
+
| Hypothesis Agent | 5-10s | LLM reasoning |
|
| 46 |
+
| Judge Agent | 5-10s | LLM evaluation |
|
| 47 |
+
| Report Agent | 10-20s | LLM synthesis (when called) |
|
| 48 |
+
|
| 49 |
+
**Total per round**: ~30-60 seconds
|
| 50 |
+
**10 rounds**: 5-10 minutes minimum
|
| 51 |
+
|
| 52 |
+
---
|
| 53 |
+
|
| 54 |
+
## Root Cause Analysis
|
| 55 |
+
|
| 56 |
+
### Issue 1: Default `max_rounds=10` is Too High
|
| 57 |
+
|
| 58 |
+
The Microsoft Agent Framework keeps iterating until:
|
| 59 |
+
1. `max_rounds` reached, OR
|
| 60 |
+
2. Manager decides workflow is complete
|
| 61 |
+
|
| 62 |
+
For research tasks, the manager often wants "more evidence" and keeps searching.
|
| 63 |
+
|
| 64 |
+
### Issue 2: No Early Termination Heuristic
|
| 65 |
+
|
| 66 |
+
Even when the Judge says `sufficient=True` with high confidence, the workflow continues because the manager wants to be thorough.
|
| 67 |
+
|
| 68 |
+
### Issue 3: No User Expectation Setting
|
| 69 |
+
|
| 70 |
+
Users don't know how long to expect. Progress indication is minimal.
|
| 71 |
+
|
| 72 |
+
---
|
| 73 |
+
|
| 74 |
+
## Proposed Solutions
|
| 75 |
+
|
| 76 |
+
### Solution A: Reduce Default `max_rounds` (QUICK FIX)
|
| 77 |
+
|
| 78 |
+
**Change**: Reduce `max_rounds` from 10 to 5 (or make configurable via env).
|
| 79 |
+
|
| 80 |
+
```python
|
| 81 |
+
# src/orchestrators/advanced.py
|
| 82 |
+
|
| 83 |
+
def __init__(
|
| 84 |
+
self,
|
| 85 |
+
max_rounds: int | None = None, # Changed from 10
|
| 86 |
+
...
|
| 87 |
+
) -> None:
|
| 88 |
+
# Read from environment, default to 5 for faster demos
|
| 89 |
+
default_rounds = int(os.getenv("ADVANCED_MAX_ROUNDS", "5"))
|
| 90 |
+
self._max_rounds = max_rounds if max_rounds is not None else default_rounds
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
**Pros**:
|
| 94 |
+
- Simple, 2-line change
|
| 95 |
+
- Immediately halves demo time
|
| 96 |
+
|
| 97 |
+
**Cons**:
|
| 98 |
+
- Less thorough research
|
| 99 |
+
- Trade-off: speed vs. quality
|
| 100 |
+
|
| 101 |
+
### Solution B: Early Termination on High-Confidence Judge (RECOMMENDED)
|
| 102 |
+
|
| 103 |
+
**Change**: Add workflow termination signal when Judge returns `sufficient=True` with confidence > 70%.
|
| 104 |
+
|
| 105 |
+
This requires modifying the JudgeAgent to signal completion:
|
| 106 |
+
|
| 107 |
+
```python
|
| 108 |
+
# src/agents/magentic_agents.py - create_judge_agent()
|
| 109 |
+
|
| 110 |
+
@chat_agent.on_message
|
| 111 |
+
async def handle_judge_message(message: str, context: Context) -> ChatMessage:
|
| 112 |
+
"""Process judge request and potentially signal completion."""
|
| 113 |
+
# ... existing judge logic ...
|
| 114 |
+
|
| 115 |
+
assessment = await judge_handler.evaluate(evidence, query)
|
| 116 |
+
|
| 117 |
+
if assessment.sufficient and assessment.confidence >= 0.70:
|
| 118 |
+
# Signal to manager that we have enough evidence
|
| 119 |
+
# The manager prompt should respect this signal
|
| 120 |
+
return ChatMessage(
|
| 121 |
+
content=f"SUFFICIENT EVIDENCE (confidence: {assessment.confidence:.0%}). "
|
| 122 |
+
f"Recommend immediate synthesis. {assessment.reasoning}",
|
| 123 |
+
metadata={"sufficient": True, "confidence": assessment.confidence},
|
| 124 |
+
)
|
| 125 |
+
|
| 126 |
+
return ChatMessage(content=f"INSUFFICIENT: {assessment.reasoning}")
|
| 127 |
+
```
|
| 128 |
+
|
| 129 |
+
And update the manager's system prompt to respect this:
|
| 130 |
+
|
| 131 |
+
```python
|
| 132 |
+
# src/orchestrators/advanced.py - _build_workflow()
|
| 133 |
+
|
| 134 |
+
manager_system_prompt = """You are a research workflow manager.
|
| 135 |
+
|
| 136 |
+
IMPORTANT: When JudgeAgent returns "SUFFICIENT EVIDENCE", immediately
|
| 137 |
+
delegate to ReportAgent for final synthesis. Do NOT continue searching.
|
| 138 |
+
|
| 139 |
+
Workflow:
|
| 140 |
+
1. SearchAgent finds evidence
|
| 141 |
+
2. HypothesisAgent generates hypotheses
|
| 142 |
+
3. JudgeAgent evaluates sufficiency
|
| 143 |
+
4. IF sufficient β ReportAgent synthesizes (END)
|
| 144 |
+
5. IF insufficient β SearchAgent refines search (CONTINUE)
|
| 145 |
+
"""
|
| 146 |
+
```
|
| 147 |
+
|
| 148 |
+
**Pros**:
|
| 149 |
+
- Respects actual evidence quality
|
| 150 |
+
- Can terminate early (round 3-4) when evidence is strong
|
| 151 |
+
- Maintains quality for complex queries
|
| 152 |
+
|
| 153 |
+
**Cons**:
|
| 154 |
+
- Requires testing to ensure manager respects signal
|
| 155 |
+
- More complex change
|
| 156 |
+
|
| 157 |
+
### Solution C: Better Progress Indication
|
| 158 |
+
|
| 159 |
+
Add estimated time remaining to UI:
|
| 160 |
+
|
| 161 |
+
```python
|
| 162 |
+
# src/orchestrators/advanced.py - run()
|
| 163 |
+
|
| 164 |
+
yield AgentEvent(
|
| 165 |
+
type="progress",
|
| 166 |
+
message=f"Round {iteration}/{self._max_rounds} "
|
| 167 |
+
f"(~{(self._max_rounds - iteration) * 45}s remaining)",
|
| 168 |
+
iteration=iteration,
|
| 169 |
+
)
|
| 170 |
+
```
|
| 171 |
+
|
| 172 |
+
**Pros**:
|
| 173 |
+
- Sets user expectations
|
| 174 |
+
- Doesn't change workflow behavior
|
| 175 |
+
|
| 176 |
+
**Cons**:
|
| 177 |
+
- Doesn't actually speed up the workflow
|
| 178 |
+
|
| 179 |
+
---
|
| 180 |
+
|
| 181 |
+
## Recommended Implementation
|
| 182 |
+
|
| 183 |
+
**IMPLEMENT ALL THREE SOLUTIONS NOW**:
|
| 184 |
+
|
| 185 |
+
1. **Solution A**: Reduce `max_rounds` to 5 via environment variable
|
| 186 |
+
2. **Solution B**: Early termination when Judge returns `sufficient=True` with confidence β₯70%
|
| 187 |
+
3. **Solution C**: Better progress indication with time estimates
|
| 188 |
+
|
| 189 |
+
> **Why Solution B NOW?** The Manager acting as a "termination condition" based on Judge feedback is a standard multi-agent pattern (Critique/Refine loop with exit). This aligns with Microsoft Agent Framework best practices and solves the ROOT CAUSE, not just a symptom.
|
| 190 |
+
|
| 191 |
+
---
|
| 192 |
+
|
| 193 |
+
## Implementation Details
|
| 194 |
+
|
| 195 |
+
### Phase 1: All Solutions Together (A + B + C)
|
| 196 |
+
|
| 197 |
+
#### 1. Update Advanced Orchestrator Constructor
|
| 198 |
+
|
| 199 |
+
```python
|
| 200 |
+
# src/orchestrators/advanced.py
|
| 201 |
+
|
| 202 |
+
import os
|
| 203 |
+
|
| 204 |
+
class AdvancedOrchestrator(OrchestratorProtocol):
|
| 205 |
+
def __init__(
|
| 206 |
+
self,
|
| 207 |
+
max_rounds: int | None = None,
|
| 208 |
+
chat_client: OpenAIChatClient | None = None,
|
| 209 |
+
api_key: str | None = None,
|
| 210 |
+
timeout_seconds: float = 300.0, # Reduced from 600 to 5 min
|
| 211 |
+
domain: ResearchDomain | str | None = None,
|
| 212 |
+
) -> None:
|
| 213 |
+
# Environment-configurable rounds (default 5 for demos)
|
| 214 |
+
default_rounds = int(os.getenv("ADVANCED_MAX_ROUNDS", "5"))
|
| 215 |
+
self._max_rounds = max_rounds if max_rounds is not None else default_rounds
|
| 216 |
+
self._timeout_seconds = timeout_seconds
|
| 217 |
+
# ... rest unchanged ...
|
| 218 |
+
```
|
| 219 |
+
|
| 220 |
+
#### 2. Add Progress Estimation
|
| 221 |
+
|
| 222 |
+
```python
|
| 223 |
+
# src/orchestrators/advanced.py - run()
|
| 224 |
+
|
| 225 |
+
# After processing MagenticAgentMessageEvent:
|
| 226 |
+
if isinstance(event, MagenticAgentMessageEvent):
|
| 227 |
+
iteration += 1
|
| 228 |
+
rounds_remaining = self._max_rounds - iteration
|
| 229 |
+
# Estimate ~45s per round based on observed timing
|
| 230 |
+
est_seconds = rounds_remaining * 45
|
| 231 |
+
est_display = f"{est_seconds // 60}m {est_seconds % 60}s" if est_seconds >= 60 else f"{est_seconds}s"
|
| 232 |
+
|
| 233 |
+
yield AgentEvent(
|
| 234 |
+
type="progress",
|
| 235 |
+
message=f"Round {iteration}/{self._max_rounds} (~{est_display} remaining)",
|
| 236 |
+
iteration=iteration,
|
| 237 |
+
)
|
| 238 |
+
```
|
| 239 |
+
|
| 240 |
+
#### 3. Update UI Message (Solution C)
|
| 241 |
+
|
| 242 |
+
```python
|
| 243 |
+
# src/orchestrators/advanced.py - run()
|
| 244 |
+
|
| 245 |
+
# UX FIX: More accurate timing message
|
| 246 |
+
yield AgentEvent(
|
| 247 |
+
type="thinking",
|
| 248 |
+
message=(
|
| 249 |
+
f"Multi-agent reasoning in progress ({self._max_rounds} rounds max)... "
|
| 250 |
+
f"Estimated time: {self._max_rounds * 45 // 60}-{self._max_rounds * 60 // 60} minutes."
|
| 251 |
+
),
|
| 252 |
+
iteration=0,
|
| 253 |
+
)
|
| 254 |
+
```
|
| 255 |
+
|
| 256 |
+
#### 4. Add Early Termination Signal (Solution B)
|
| 257 |
+
|
| 258 |
+
```python
|
| 259 |
+
# src/agents/magentic_agents.py - Update create_judge_agent()
|
| 260 |
+
|
| 261 |
+
@chat_agent.on_message
|
| 262 |
+
async def handle_judge_message(message: str, context: Context) -> ChatMessage:
|
| 263 |
+
"""Process judge request and signal completion when evidence is sufficient."""
|
| 264 |
+
# ... existing parsing logic to extract evidence and query ...
|
| 265 |
+
|
| 266 |
+
assessment = await judge_handler.evaluate(evidence, query)
|
| 267 |
+
|
| 268 |
+
# NEW: Strong termination signal for high-confidence assessments
|
| 269 |
+
if assessment.sufficient and assessment.confidence >= 0.70:
|
| 270 |
+
# Clear, unambiguous signal that Manager should respect
|
| 271 |
+
return ChatMessage(
|
| 272 |
+
content=(
|
| 273 |
+
f"β
SUFFICIENT EVIDENCE (confidence: {assessment.confidence:.0%}). "
|
| 274 |
+
f"STOP SEARCHING. Delegate to ReportAgent NOW for final synthesis. "
|
| 275 |
+
f"Reasoning: {assessment.reasoning}"
|
| 276 |
+
),
|
| 277 |
+
metadata={"sufficient": True, "confidence": assessment.confidence},
|
| 278 |
+
)
|
| 279 |
+
|
| 280 |
+
# Insufficient - continue the loop
|
| 281 |
+
return ChatMessage(
|
| 282 |
+
content=(
|
| 283 |
+
f"β INSUFFICIENT: {assessment.reasoning}. "
|
| 284 |
+
f"Confidence: {assessment.confidence:.0%}. "
|
| 285 |
+
f"Suggested refinements: {', '.join(assessment.next_search_queries[:2])}"
|
| 286 |
+
)
|
| 287 |
+
)
|
| 288 |
+
```
|
| 289 |
+
|
| 290 |
+
#### 5. Update Manager System Prompt (Solution B)
|
| 291 |
+
|
| 292 |
+
```python
|
| 293 |
+
# src/orchestrators/advanced.py - _build_workflow()
|
| 294 |
+
|
| 295 |
+
MANAGER_SYSTEM_PROMPT = """You are a medical research workflow manager.
|
| 296 |
+
|
| 297 |
+
## CRITICAL RULE
|
| 298 |
+
When JudgeAgent says "SUFFICIENT EVIDENCE" or "STOP SEARCHING":
|
| 299 |
+
β IMMEDIATELY delegate to ReportAgent for synthesis
|
| 300 |
+
β Do NOT continue searching or gathering more evidence
|
| 301 |
+
β The Judge has determined evidence quality is adequate
|
| 302 |
+
|
| 303 |
+
## Standard Workflow
|
| 304 |
+
1. SearchAgent β finds evidence from PubMed, ClinicalTrials, etc.
|
| 305 |
+
2. HypothesisAgent β generates testable hypotheses
|
| 306 |
+
3. JudgeAgent β evaluates evidence sufficiency
|
| 307 |
+
4. IF sufficient β ReportAgent (DONE)
|
| 308 |
+
5. IF insufficient β SearchAgent with refined queries (CONTINUE)
|
| 309 |
+
|
| 310 |
+
## Your Role
|
| 311 |
+
- Coordinate agents efficiently
|
| 312 |
+
- Respect the Judge's termination signal
|
| 313 |
+
- Prioritize completing the task over perfection
|
| 314 |
+
"""
|
| 315 |
+
```
|
| 316 |
+
|
| 317 |
+
---
|
| 318 |
+
|
| 319 |
+
## Test Plan
|
| 320 |
+
|
| 321 |
+
### Unit Tests
|
| 322 |
+
|
| 323 |
+
```python
|
| 324 |
+
# tests/unit/orchestrators/test_advanced_orchestrator.py
|
| 325 |
+
|
| 326 |
+
import os
|
| 327 |
+
from unittest.mock import patch
|
| 328 |
+
|
| 329 |
+
import pytest
|
| 330 |
+
|
| 331 |
+
from src.orchestrators.advanced import AdvancedOrchestrator
|
| 332 |
+
|
| 333 |
+
|
| 334 |
+
class TestAdvancedOrchestratorConfig:
|
| 335 |
+
"""Tests for configuration options."""
|
| 336 |
+
|
| 337 |
+
def test_default_max_rounds_is_five(self) -> None:
|
| 338 |
+
"""Default max_rounds should be 5 for faster demos."""
|
| 339 |
+
with patch.dict(os.environ, {}, clear=True):
|
| 340 |
+
# Clear any existing env var
|
| 341 |
+
os.environ.pop("ADVANCED_MAX_ROUNDS", None)
|
| 342 |
+
orch = AdvancedOrchestrator.__new__(AdvancedOrchestrator)
|
| 343 |
+
orch.__init__()
|
| 344 |
+
assert orch._max_rounds == 5
|
| 345 |
+
|
| 346 |
+
def test_max_rounds_from_env(self) -> None:
|
| 347 |
+
"""max_rounds should be configurable via environment."""
|
| 348 |
+
with patch.dict(os.environ, {"ADVANCED_MAX_ROUNDS": "3"}):
|
| 349 |
+
orch = AdvancedOrchestrator.__new__(AdvancedOrchestrator)
|
| 350 |
+
orch.__init__()
|
| 351 |
+
assert orch._max_rounds == 3
|
| 352 |
+
|
| 353 |
+
def test_explicit_max_rounds_overrides_env(self) -> None:
|
| 354 |
+
"""Explicit parameter should override environment."""
|
| 355 |
+
with patch.dict(os.environ, {"ADVANCED_MAX_ROUNDS": "3"}):
|
| 356 |
+
orch = AdvancedOrchestrator.__new__(AdvancedOrchestrator)
|
| 357 |
+
orch.__init__(max_rounds=7)
|
| 358 |
+
assert orch._max_rounds == 7
|
| 359 |
+
|
| 360 |
+
def test_timeout_default_is_five_minutes(self) -> None:
|
| 361 |
+
"""Default timeout should be 300s (5 min) for faster failure."""
|
| 362 |
+
orch = AdvancedOrchestrator.__new__(AdvancedOrchestrator)
|
| 363 |
+
orch.__init__()
|
| 364 |
+
assert orch._timeout_seconds == 300.0
|
| 365 |
+
```
|
| 366 |
+
|
| 367 |
+
### Integration Test (Manual)
|
| 368 |
+
|
| 369 |
+
```bash
|
| 370 |
+
# Run advanced mode with reduced rounds
|
| 371 |
+
ADVANCED_MAX_ROUNDS=3 uv run python -c "
|
| 372 |
+
import asyncio
|
| 373 |
+
from src.orchestrators.advanced import AdvancedOrchestrator
|
| 374 |
+
|
| 375 |
+
async def test():
|
| 376 |
+
orch = AdvancedOrchestrator()
|
| 377 |
+
print(f'Max rounds: {orch._max_rounds}') # Should be 3
|
| 378 |
+
|
| 379 |
+
async for event in orch.run('sildenafil mechanism'):
|
| 380 |
+
print(f'{event.type}: {event.message[:100]}...')
|
| 381 |
+
|
| 382 |
+
asyncio.run(test())
|
| 383 |
+
"
|
| 384 |
+
```
|
| 385 |
+
|
| 386 |
+
### Timing Benchmark
|
| 387 |
+
|
| 388 |
+
Create a benchmark script to measure actual performance:
|
| 389 |
+
|
| 390 |
+
```python
|
| 391 |
+
# examples/benchmark_advanced.py
|
| 392 |
+
"""Benchmark Advanced mode with different max_rounds settings."""
|
| 393 |
+
|
| 394 |
+
import asyncio
|
| 395 |
+
import os
|
| 396 |
+
import time
|
| 397 |
+
|
| 398 |
+
|
| 399 |
+
async def benchmark(max_rounds: int) -> float:
|
| 400 |
+
"""Run benchmark with specified rounds, return elapsed time."""
|
| 401 |
+
os.environ["ADVANCED_MAX_ROUNDS"] = str(max_rounds)
|
| 402 |
+
|
| 403 |
+
# Import after setting env
|
| 404 |
+
from src.orchestrators.advanced import AdvancedOrchestrator
|
| 405 |
+
|
| 406 |
+
orch = AdvancedOrchestrator()
|
| 407 |
+
start = time.time()
|
| 408 |
+
|
| 409 |
+
async for event in orch.run("sildenafil erectile dysfunction"):
|
| 410 |
+
if event.type == "complete":
|
| 411 |
+
break
|
| 412 |
+
|
| 413 |
+
return time.time() - start
|
| 414 |
+
|
| 415 |
+
|
| 416 |
+
async def main() -> None:
|
| 417 |
+
"""Run benchmarks for different configurations."""
|
| 418 |
+
for rounds in [3, 5, 7, 10]:
|
| 419 |
+
elapsed = await benchmark(rounds)
|
| 420 |
+
print(f"max_rounds={rounds}: {elapsed:.1f}s ({elapsed/60:.1f}min)")
|
| 421 |
+
|
| 422 |
+
|
| 423 |
+
if __name__ == "__main__":
|
| 424 |
+
asyncio.run(main())
|
| 425 |
+
```
|
| 426 |
+
|
| 427 |
+
---
|
| 428 |
+
|
| 429 |
+
## Files to Modify
|
| 430 |
+
|
| 431 |
+
| File | Change |
|
| 432 |
+
|------|--------|
|
| 433 |
+
| `src/orchestrators/advanced.py` | Add env-configurable `max_rounds`, reduce default to 5, add progress estimation, update Manager prompt |
|
| 434 |
+
| `src/agents/magentic_agents.py` | Add early termination signal in JudgeAgent |
|
| 435 |
+
| `tests/unit/orchestrators/test_advanced_orchestrator.py` | Add config tests |
|
| 436 |
+
| `tests/unit/agents/test_magentic_judge_termination.py` | Add termination signal tests |
|
| 437 |
+
| `examples/benchmark_advanced.py` | Add timing benchmark (optional) |
|
| 438 |
+
|
| 439 |
+
---
|
| 440 |
+
|
| 441 |
+
## Acceptance Criteria
|
| 442 |
+
|
| 443 |
+
### Solution A: Configuration
|
| 444 |
+
- [ ] Default `max_rounds` is 5 (not 10)
|
| 445 |
+
- [ ] `max_rounds` configurable via `ADVANCED_MAX_ROUNDS` env var
|
| 446 |
+
- [ ] Explicit `max_rounds` parameter overrides env var
|
| 447 |
+
- [ ] Default timeout is 5 minutes (300s, not 600s)
|
| 448 |
+
|
| 449 |
+
### Solution B: Early Termination
|
| 450 |
+
- [ ] JudgeAgent returns "SUFFICIENT EVIDENCE" message when confidence β₯70%
|
| 451 |
+
- [ ] JudgeAgent returns "STOP SEARCHING" in termination signal
|
| 452 |
+
- [ ] Manager system prompt includes explicit termination instructions
|
| 453 |
+
- [ ] Workflow terminates early when Judge signals sufficiency (observed in logs)
|
| 454 |
+
|
| 455 |
+
### Solution C: Progress Indication
|
| 456 |
+
- [ ] Progress events show current round / max rounds
|
| 457 |
+
- [ ] Progress events show estimated time remaining
|
| 458 |
+
- [ ] Initial "thinking" message shows estimated total time
|
| 459 |
+
|
| 460 |
+
### Overall
|
| 461 |
+
- [ ] Demo completes in <5 minutes with useful output
|
| 462 |
+
- [ ] Quality of output is maintained (no degradation from early termination)
|
| 463 |
+
|
| 464 |
+
---
|
| 465 |
+
|
| 466 |
+
## Rollback Plan
|
| 467 |
+
|
| 468 |
+
If reduced rounds cause quality issues:
|
| 469 |
+
1. Increase `ADVANCED_MAX_ROUNDS` environment variable
|
| 470 |
+
2. No code changes needed
|
| 471 |
+
|
| 472 |
+
---
|
| 473 |
+
|
| 474 |
+
## References
|
| 475 |
+
|
| 476 |
+
- GitHub Issue #65
|
| 477 |
+
- Microsoft Agent Framework: https://github.com/microsoft/agent-framework
|
| 478 |
+
- MagenticBuilder docs: Round configuration
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
@@ -1,5 +1,6 @@
|
|
| 1 |
"""OpenAlex search tool - citation-aware scholarly search."""
|
| 2 |
|
|
|
|
| 3 |
from typing import Any
|
| 4 |
|
| 5 |
import httpx
|
|
@@ -104,6 +105,16 @@ class OpenAlexTool:
|
|
| 104 |
openalex_id = work.get("id", "")
|
| 105 |
url = openalex_id if openalex_id else "https://openalex.org"
|
| 106 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
# Prepend citation badge to content
|
| 108 |
citation_badge = f"[Cited by {cited_by_count}] " if cited_by_count > 0 else ""
|
| 109 |
content = f"{citation_badge}{abstract[:1900]}"
|
|
@@ -127,6 +138,7 @@ class OpenAlexTool:
|
|
| 127 |
"concepts": concepts,
|
| 128 |
"is_open_access": is_oa,
|
| 129 |
"pdf_url": pdf_url,
|
|
|
|
| 130 |
},
|
| 131 |
)
|
| 132 |
|
|
|
|
| 1 |
"""OpenAlex search tool - citation-aware scholarly search."""
|
| 2 |
|
| 3 |
+
import re
|
| 4 |
from typing import Any
|
| 5 |
|
| 6 |
import httpx
|
|
|
|
| 105 |
openalex_id = work.get("id", "")
|
| 106 |
url = openalex_id if openalex_id else "https://openalex.org"
|
| 107 |
|
| 108 |
+
# NEW: Extract PMID from ids object for deduplication
|
| 109 |
+
ids_obj = work.get("ids", {})
|
| 110 |
+
pmid_url = ids_obj.get("pmid") # "https://pubmed.ncbi.nlm.nih.gov/29456894"
|
| 111 |
+
pmid = None
|
| 112 |
+
if pmid_url and isinstance(pmid_url, str) and "pubmed.ncbi.nlm.nih.gov" in pmid_url:
|
| 113 |
+
# Extract numeric PMID from URL
|
| 114 |
+
pmid_match = re.search(r"/(\d+)/?$", pmid_url)
|
| 115 |
+
if pmid_match:
|
| 116 |
+
pmid = pmid_match.group(1)
|
| 117 |
+
|
| 118 |
# Prepend citation badge to content
|
| 119 |
citation_badge = f"[Cited by {cited_by_count}] " if cited_by_count > 0 else ""
|
| 120 |
content = f"{citation_badge}{abstract[:1900]}"
|
|
|
|
| 138 |
"concepts": concepts,
|
| 139 |
"is_open_access": is_oa,
|
| 140 |
"pdf_url": pdf_url,
|
| 141 |
+
"pmid": pmid, # NEW: Store PMID for deduplication
|
| 142 |
},
|
| 143 |
)
|
| 144 |
|
|
@@ -1,7 +1,8 @@
|
|
| 1 |
"""Search handler - orchestrates multiple search tools."""
|
| 2 |
|
| 3 |
import asyncio
|
| 4 |
-
|
|
|
|
| 5 |
|
| 6 |
import structlog
|
| 7 |
|
|
@@ -9,9 +10,124 @@ from src.tools.base import SearchTool
|
|
| 9 |
from src.utils.exceptions import SearchError
|
| 10 |
from src.utils.models import Evidence, SearchResult, SourceName
|
| 11 |
|
|
|
|
|
|
|
|
|
|
| 12 |
logger = structlog.get_logger()
|
| 13 |
|
| 14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
class SearchHandler:
|
| 16 |
"""Orchestrates parallel searches across multiple tools."""
|
| 17 |
|
|
@@ -66,6 +182,19 @@ class SearchHandler:
|
|
| 66 |
sources_searched.append(tool_name)
|
| 67 |
logger.info("Search tool succeeded", tool=tool.name, count=len(success_result))
|
| 68 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
return SearchResult(
|
| 70 |
query=query,
|
| 71 |
evidence=all_evidence,
|
|
|
|
| 1 |
"""Search handler - orchestrates multiple search tools."""
|
| 2 |
|
| 3 |
import asyncio
|
| 4 |
+
import re
|
| 5 |
+
from typing import TYPE_CHECKING, cast
|
| 6 |
|
| 7 |
import structlog
|
| 8 |
|
|
|
|
| 10 |
from src.utils.exceptions import SearchError
|
| 11 |
from src.utils.models import Evidence, SearchResult, SourceName
|
| 12 |
|
| 13 |
+
if TYPE_CHECKING:
|
| 14 |
+
from src.utils.models import Evidence
|
| 15 |
+
|
| 16 |
logger = structlog.get_logger()
|
| 17 |
|
| 18 |
|
| 19 |
+
def extract_paper_id(evidence: "Evidence") -> str | None:
|
| 20 |
+
"""Extract unique paper identifier from Evidence.
|
| 21 |
+
|
| 22 |
+
Strategy:
|
| 23 |
+
1. Check metadata.pmid first (OpenAlex provides this)
|
| 24 |
+
2. Fall back to URL pattern matching
|
| 25 |
+
|
| 26 |
+
Supports:
|
| 27 |
+
- PubMed: https://pubmed.ncbi.nlm.nih.gov/12345678/
|
| 28 |
+
- Europe PMC MED: https://europepmc.org/article/MED/12345678
|
| 29 |
+
- Europe PMC PMC: https://europepmc.org/article/PMC/PMC1234567
|
| 30 |
+
- Europe PMC PPR: https://europepmc.org/article/PPR/PPR123456
|
| 31 |
+
- Europe PMC PAT: https://europepmc.org/article/PAT/WO8601415
|
| 32 |
+
- DOI: https://doi.org/10.1234/...
|
| 33 |
+
- OpenAlex: https://openalex.org/W1234567890
|
| 34 |
+
- ClinicalTrials: https://clinicaltrials.gov/study/NCT12345678
|
| 35 |
+
- ClinicalTrials (legacy): https://clinicaltrials.gov/ct2/show/NCT12345678
|
| 36 |
+
"""
|
| 37 |
+
url = evidence.citation.url
|
| 38 |
+
metadata = evidence.metadata or {}
|
| 39 |
+
|
| 40 |
+
# Strategy 1: Check metadata.pmid (from OpenAlex)
|
| 41 |
+
if pmid := metadata.get("pmid"):
|
| 42 |
+
return f"PMID:{pmid}"
|
| 43 |
+
|
| 44 |
+
# Strategy 2: URL pattern matching
|
| 45 |
+
|
| 46 |
+
# PubMed URL pattern
|
| 47 |
+
pmid_match = re.search(r"pubmed\.ncbi\.nlm\.nih\.gov/(\d+)", url)
|
| 48 |
+
if pmid_match:
|
| 49 |
+
return f"PMID:{pmid_match.group(1)}"
|
| 50 |
+
|
| 51 |
+
# Europe PMC MED pattern (same as PMID)
|
| 52 |
+
epmc_med_match = re.search(r"europepmc\.org/article/MED/(\d+)", url)
|
| 53 |
+
if epmc_med_match:
|
| 54 |
+
return f"PMID:{epmc_med_match.group(1)}"
|
| 55 |
+
|
| 56 |
+
# Europe PMC PMC pattern (PubMed Central ID - different from PMID!)
|
| 57 |
+
epmc_pmc_match = re.search(r"europepmc\.org/article/PMC/(PMC\d+)", url)
|
| 58 |
+
if epmc_pmc_match:
|
| 59 |
+
return f"PMCID:{epmc_pmc_match.group(1)}"
|
| 60 |
+
|
| 61 |
+
# Europe PMC PPR pattern (Preprint ID - unique per preprint)
|
| 62 |
+
epmc_ppr_match = re.search(r"europepmc\.org/article/PPR/(PPR\d+)", url)
|
| 63 |
+
if epmc_ppr_match:
|
| 64 |
+
return f"PPRID:{epmc_ppr_match.group(1)}"
|
| 65 |
+
|
| 66 |
+
# Europe PMC PAT pattern (Patent ID - e.g., WO8601415, EP1234567)
|
| 67 |
+
epmc_pat_match = re.search(r"europepmc\.org/article/PAT/([A-Z]{2}\d+)", url)
|
| 68 |
+
if epmc_pat_match:
|
| 69 |
+
return f"PATID:{epmc_pat_match.group(1)}"
|
| 70 |
+
|
| 71 |
+
# DOI pattern (normalize trailing slash/characters)
|
| 72 |
+
doi_match = re.search(r"doi\.org/(10\.\d+/[^\s\]>]+)", url)
|
| 73 |
+
if doi_match:
|
| 74 |
+
doi = doi_match.group(1).rstrip("/")
|
| 75 |
+
return f"DOI:{doi}"
|
| 76 |
+
|
| 77 |
+
# OpenAlex ID pattern (fallback if no PMID in metadata)
|
| 78 |
+
openalex_match = re.search(r"openalex\.org/(W\d+)", url)
|
| 79 |
+
if openalex_match:
|
| 80 |
+
return f"OAID:{openalex_match.group(1)}"
|
| 81 |
+
|
| 82 |
+
# ClinicalTrials NCT ID (modern format)
|
| 83 |
+
nct_match = re.search(r"clinicaltrials\.gov/study/(NCT\d+)", url)
|
| 84 |
+
if nct_match:
|
| 85 |
+
return f"NCT:{nct_match.group(1)}"
|
| 86 |
+
|
| 87 |
+
# ClinicalTrials NCT ID (legacy format)
|
| 88 |
+
nct_legacy_match = re.search(r"clinicaltrials\.gov/ct2/show/(NCT\d+)", url)
|
| 89 |
+
if nct_legacy_match:
|
| 90 |
+
return f"NCT:{nct_legacy_match.group(1)}"
|
| 91 |
+
|
| 92 |
+
return None
|
| 93 |
+
|
| 94 |
+
|
| 95 |
+
def deduplicate_evidence(evidence_list: list["Evidence"]) -> list["Evidence"]:
|
| 96 |
+
"""Remove duplicate evidence based on paper ID.
|
| 97 |
+
|
| 98 |
+
Deduplication priority:
|
| 99 |
+
1. PubMed (authoritative source)
|
| 100 |
+
2. Europe PMC (full text links)
|
| 101 |
+
3. OpenAlex (citation data)
|
| 102 |
+
4. ClinicalTrials (unique, never duplicated)
|
| 103 |
+
|
| 104 |
+
Returns:
|
| 105 |
+
Deduplicated list preserving source priority order.
|
| 106 |
+
"""
|
| 107 |
+
seen_ids: set[str] = set()
|
| 108 |
+
unique: list[Evidence] = []
|
| 109 |
+
|
| 110 |
+
# Sort by source priority (PubMed first)
|
| 111 |
+
source_priority = {"pubmed": 0, "europepmc": 1, "openalex": 2, "clinicaltrials": 3}
|
| 112 |
+
sorted_evidence = sorted(
|
| 113 |
+
evidence_list, key=lambda e: source_priority.get(e.citation.source, 99)
|
| 114 |
+
)
|
| 115 |
+
|
| 116 |
+
for evidence in sorted_evidence:
|
| 117 |
+
paper_id = extract_paper_id(evidence)
|
| 118 |
+
|
| 119 |
+
if paper_id is None:
|
| 120 |
+
# Can't identify - keep it (conservative)
|
| 121 |
+
unique.append(evidence)
|
| 122 |
+
continue
|
| 123 |
+
|
| 124 |
+
if paper_id not in seen_ids:
|
| 125 |
+
seen_ids.add(paper_id)
|
| 126 |
+
unique.append(evidence)
|
| 127 |
+
|
| 128 |
+
return unique
|
| 129 |
+
|
| 130 |
+
|
| 131 |
class SearchHandler:
|
| 132 |
"""Orchestrates parallel searches across multiple tools."""
|
| 133 |
|
|
|
|
| 182 |
sources_searched.append(tool_name)
|
| 183 |
logger.info("Search tool succeeded", tool=tool.name, count=len(success_result))
|
| 184 |
|
| 185 |
+
# DEDUPLICATION STEP
|
| 186 |
+
original_count = len(all_evidence)
|
| 187 |
+
all_evidence = deduplicate_evidence(all_evidence)
|
| 188 |
+
dedup_count = original_count - len(all_evidence)
|
| 189 |
+
|
| 190 |
+
if dedup_count > 0:
|
| 191 |
+
logger.info(
|
| 192 |
+
"Deduplicated evidence",
|
| 193 |
+
original=original_count,
|
| 194 |
+
unique=len(all_evidence),
|
| 195 |
+
removed=dedup_count,
|
| 196 |
+
)
|
| 197 |
+
|
| 198 |
return SearchResult(
|
| 199 |
query=query,
|
| 200 |
evidence=all_evidence,
|
|
@@ -38,6 +38,30 @@ SAMPLE_OPENALEX_RESPONSE = {
|
|
| 38 |
]
|
| 39 |
}
|
| 40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
@pytest.mark.unit
|
| 43 |
class TestOpenAlexTool:
|
|
@@ -144,6 +168,26 @@ class TestOpenAlexTool:
|
|
| 144 |
assert "sildenafil" in params["search"]
|
| 145 |
assert params["per_page"] == 3
|
| 146 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
|
| 148 |
@pytest.mark.integration
|
| 149 |
class TestOpenAlexIntegration:
|
|
|
|
| 38 |
]
|
| 39 |
}
|
| 40 |
|
| 41 |
+
# Sample response WITH PMID (for deduplication testing)
|
| 42 |
+
SAMPLE_OPENALEX_WITH_PMID = {
|
| 43 |
+
"results": [
|
| 44 |
+
{
|
| 45 |
+
"id": "https://openalex.org/W98765",
|
| 46 |
+
"doi": "https://doi.org/10.1038/nature12345",
|
| 47 |
+
"display_name": "Paper with PMID for deduplication",
|
| 48 |
+
"publication_year": 2023,
|
| 49 |
+
"cited_by_count": 50,
|
| 50 |
+
"abstract_inverted_index": {"Test": [0], "abstract": [1]},
|
| 51 |
+
"concepts": [],
|
| 52 |
+
"authorships": [],
|
| 53 |
+
"open_access": {"is_oa": False},
|
| 54 |
+
"best_oa_location": None,
|
| 55 |
+
# CRITICAL: ids object with PMID for cross-source deduplication
|
| 56 |
+
"ids": {
|
| 57 |
+
"openalex": "https://openalex.org/W98765",
|
| 58 |
+
"doi": "https://doi.org/10.1038/nature12345",
|
| 59 |
+
"pmid": "https://pubmed.ncbi.nlm.nih.gov/29456894",
|
| 60 |
+
},
|
| 61 |
+
}
|
| 62 |
+
]
|
| 63 |
+
}
|
| 64 |
+
|
| 65 |
|
| 66 |
@pytest.mark.unit
|
| 67 |
class TestOpenAlexTool:
|
|
|
|
| 168 |
assert "sildenafil" in params["search"]
|
| 169 |
assert params["per_page"] == 3
|
| 170 |
|
| 171 |
+
@pytest.mark.asyncio
|
| 172 |
+
async def test_extracts_pmid_from_ids_object(self, tool: OpenAlexTool, mock_client) -> None:
|
| 173 |
+
"""PMID should be extracted from ids.pmid for cross-source deduplication."""
|
| 174 |
+
mock_client.get.return_value.json.return_value = SAMPLE_OPENALEX_WITH_PMID
|
| 175 |
+
|
| 176 |
+
results = await tool.search("test", max_results=1)
|
| 177 |
+
|
| 178 |
+
assert len(results) == 1
|
| 179 |
+
# PMID should be extracted from URL and stored as numeric string
|
| 180 |
+
assert results[0].metadata["pmid"] == "29456894"
|
| 181 |
+
|
| 182 |
+
@pytest.mark.asyncio
|
| 183 |
+
async def test_pmid_is_none_when_not_present(self, tool: OpenAlexTool, mock_client) -> None:
|
| 184 |
+
"""PMID should be None when ids.pmid is not in response."""
|
| 185 |
+
# SAMPLE_OPENALEX_RESPONSE has no ids.pmid field
|
| 186 |
+
results = await tool.search("sildenafil ED", max_results=1)
|
| 187 |
+
|
| 188 |
+
assert len(results) == 1
|
| 189 |
+
assert results[0].metadata["pmid"] is None
|
| 190 |
+
|
| 191 |
|
| 192 |
@pytest.mark.integration
|
| 193 |
class TestOpenAlexIntegration:
|
|
@@ -5,43 +5,203 @@ from unittest.mock import AsyncMock, create_autospec
|
|
| 5 |
import pytest
|
| 6 |
|
| 7 |
from src.tools.base import SearchTool
|
| 8 |
-
from src.tools.search_handler import SearchHandler
|
| 9 |
from src.utils.exceptions import SearchError
|
| 10 |
from src.utils.models import Citation, Evidence
|
| 11 |
|
| 12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
class TestSearchHandler:
|
| 14 |
"""Tests for SearchHandler."""
|
| 15 |
|
| 16 |
@pytest.mark.asyncio
|
| 17 |
-
async def
|
| 18 |
-
"""SearchHandler should aggregate results
|
| 19 |
# Setup
|
| 20 |
mock_tool1 = AsyncMock(spec=SearchTool)
|
| 21 |
mock_tool1.name = "pubmed"
|
| 22 |
mock_tool1.search.return_value = [
|
| 23 |
-
|
| 24 |
-
content="C1",
|
| 25 |
-
citation=Citation(source="pubmed", title="T1", url="u1", date="2024"),
|
| 26 |
-
)
|
| 27 |
]
|
| 28 |
|
| 29 |
mock_tool2 = AsyncMock(spec=SearchTool)
|
| 30 |
-
mock_tool2.name = "
|
|
|
|
| 31 |
mock_tool2.search.return_value = [
|
| 32 |
-
|
| 33 |
-
content="C2",
|
| 34 |
-
citation=Citation(source="clinicaltrials", title="T2", url="u2", date="2024"),
|
| 35 |
-
)
|
| 36 |
]
|
| 37 |
|
| 38 |
handler = SearchHandler(tools=[mock_tool1, mock_tool2])
|
| 39 |
|
| 40 |
# Execute
|
| 41 |
-
result = await handler.execute("
|
| 42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
assert "pubmed" in result.sources_searched
|
| 44 |
-
assert "
|
| 45 |
|
| 46 |
@pytest.mark.asyncio
|
| 47 |
async def test_execute_handles_tool_failure(self):
|
|
@@ -49,16 +209,11 @@ class TestSearchHandler:
|
|
| 49 |
mock_tool_ok = create_autospec(SearchTool, instance=True)
|
| 50 |
mock_tool_ok.name = "pubmed"
|
| 51 |
mock_tool_ok.search = AsyncMock(
|
| 52 |
-
return_value=[
|
| 53 |
-
Evidence(
|
| 54 |
-
content="Good result",
|
| 55 |
-
citation=Citation(source="pubmed", title="T", url="u", date="2024"),
|
| 56 |
-
)
|
| 57 |
-
]
|
| 58 |
)
|
| 59 |
|
| 60 |
mock_tool_fail = create_autospec(SearchTool, instance=True)
|
| 61 |
-
mock_tool_fail.name = "
|
| 62 |
mock_tool_fail.search = AsyncMock(side_effect=SearchError("API down"))
|
| 63 |
|
| 64 |
handler = SearchHandler(tools=[mock_tool_ok, mock_tool_fail])
|
|
@@ -67,22 +222,4 @@ class TestSearchHandler:
|
|
| 67 |
assert result.total_found == 1
|
| 68 |
assert "pubmed" in result.sources_searched
|
| 69 |
assert len(result.errors) == 1
|
| 70 |
-
|
| 71 |
-
assert "pubmed: API down" in result.errors[0]
|
| 72 |
-
|
| 73 |
-
@pytest.mark.asyncio
|
| 74 |
-
async def test_search_handler_pubmed_only(self):
|
| 75 |
-
"""SearchHandler should work with only PubMed tool."""
|
| 76 |
-
# This is the specific test requested in Phase 9 spec
|
| 77 |
-
from src.tools.pubmed import PubMedTool
|
| 78 |
-
|
| 79 |
-
mock_pubmed = AsyncMock(spec=PubMedTool)
|
| 80 |
-
mock_pubmed.name = "pubmed"
|
| 81 |
-
mock_pubmed.search.return_value = []
|
| 82 |
-
|
| 83 |
-
handler = SearchHandler(tools=[mock_pubmed], timeout=30.0)
|
| 84 |
-
result = await handler.execute("testosterone libido", max_results_per_tool=3)
|
| 85 |
-
|
| 86 |
-
assert result.sources_searched == ["pubmed"]
|
| 87 |
-
assert "web" not in result.sources_searched
|
| 88 |
-
assert len(result.errors) == 0
|
|
|
|
| 5 |
import pytest
|
| 6 |
|
| 7 |
from src.tools.base import SearchTool
|
| 8 |
+
from src.tools.search_handler import SearchHandler, deduplicate_evidence, extract_paper_id
|
| 9 |
from src.utils.exceptions import SearchError
|
| 10 |
from src.utils.models import Citation, Evidence
|
| 11 |
|
| 12 |
|
| 13 |
+
def _make_evidence(source: str, url: str, metadata: dict | None = None) -> Evidence:
|
| 14 |
+
"""Helper to create Evidence objects for testing."""
|
| 15 |
+
return Evidence(
|
| 16 |
+
content="Test content",
|
| 17 |
+
citation=Citation(
|
| 18 |
+
source=source,
|
| 19 |
+
title="Test",
|
| 20 |
+
url=url,
|
| 21 |
+
date="2024",
|
| 22 |
+
authors=[],
|
| 23 |
+
),
|
| 24 |
+
metadata=metadata or {},
|
| 25 |
+
)
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
class TestExtractPaperId:
|
| 29 |
+
"""Tests for paper ID extraction from Evidence objects."""
|
| 30 |
+
|
| 31 |
+
def test_extracts_pubmed_id(self) -> None:
|
| 32 |
+
evidence = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
|
| 33 |
+
assert extract_paper_id(evidence) == "PMID:12345678"
|
| 34 |
+
|
| 35 |
+
def test_extracts_europepmc_med_id(self) -> None:
|
| 36 |
+
evidence = _make_evidence("europepmc", "https://europepmc.org/article/MED/12345678")
|
| 37 |
+
assert extract_paper_id(evidence) == "PMID:12345678"
|
| 38 |
+
|
| 39 |
+
def test_extracts_europepmc_pmc_id(self) -> None:
|
| 40 |
+
"""Europe PMC PMC articles have different ID format."""
|
| 41 |
+
evidence = _make_evidence("europepmc", "https://europepmc.org/article/PMC/PMC7654321")
|
| 42 |
+
assert extract_paper_id(evidence) == "PMCID:PMC7654321"
|
| 43 |
+
|
| 44 |
+
def test_extracts_europepmc_ppr_id(self) -> None:
|
| 45 |
+
"""Europe PMC preprints have PPR IDs."""
|
| 46 |
+
evidence = _make_evidence("europepmc", "https://europepmc.org/article/PPR/PPR123456")
|
| 47 |
+
assert extract_paper_id(evidence) == "PPRID:PPR123456"
|
| 48 |
+
|
| 49 |
+
def test_extracts_europepmc_pat_id(self) -> None:
|
| 50 |
+
"""Europe PMC patents have PAT IDs (WIPO format)."""
|
| 51 |
+
evidence = _make_evidence("europepmc", "https://europepmc.org/article/PAT/WO8601415")
|
| 52 |
+
assert extract_paper_id(evidence) == "PATID:WO8601415"
|
| 53 |
+
|
| 54 |
+
def test_extracts_europepmc_pat_id_eu_format(self) -> None:
|
| 55 |
+
"""European patent format should also work."""
|
| 56 |
+
evidence = _make_evidence("europepmc", "https://europepmc.org/article/PAT/EP1234567")
|
| 57 |
+
assert extract_paper_id(evidence) == "PATID:EP1234567"
|
| 58 |
+
|
| 59 |
+
def test_extracts_doi(self) -> None:
|
| 60 |
+
evidence = _make_evidence("pubmed", "https://doi.org/10.1038/nature12345")
|
| 61 |
+
assert extract_paper_id(evidence) == "DOI:10.1038/nature12345"
|
| 62 |
+
|
| 63 |
+
def test_extracts_doi_with_trailing_slash(self) -> None:
|
| 64 |
+
"""DOIs should be normalized (trailing slash removed)."""
|
| 65 |
+
evidence = _make_evidence("pubmed", "https://doi.org/10.1038/nature12345/")
|
| 66 |
+
assert extract_paper_id(evidence) == "DOI:10.1038/nature12345"
|
| 67 |
+
|
| 68 |
+
def test_extracts_openalex_id_from_url(self) -> None:
|
| 69 |
+
"""OpenAlex ID from URL (fallback when no PMID in metadata)."""
|
| 70 |
+
evidence = _make_evidence("openalex", "https://openalex.org/W1234567890")
|
| 71 |
+
assert extract_paper_id(evidence) == "OAID:W1234567890"
|
| 72 |
+
|
| 73 |
+
def test_extracts_openalex_pmid_from_metadata(self) -> None:
|
| 74 |
+
"""OpenAlex PMID from metadata takes priority over URL."""
|
| 75 |
+
evidence = _make_evidence(
|
| 76 |
+
"openalex",
|
| 77 |
+
"https://openalex.org/W1234567890",
|
| 78 |
+
metadata={"pmid": "98765432"},
|
| 79 |
+
)
|
| 80 |
+
assert extract_paper_id(evidence) == "PMID:98765432"
|
| 81 |
+
|
| 82 |
+
def test_extracts_nct_id_modern(self) -> None:
|
| 83 |
+
evidence = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/study/NCT12345678")
|
| 84 |
+
assert extract_paper_id(evidence) == "NCT:NCT12345678"
|
| 85 |
+
|
| 86 |
+
def test_extracts_nct_id_legacy(self) -> None:
|
| 87 |
+
"""Legacy ClinicalTrials.gov URL format should also work."""
|
| 88 |
+
evidence = _make_evidence(
|
| 89 |
+
"clinicaltrials", "https://clinicaltrials.gov/ct2/show/NCT12345678"
|
| 90 |
+
)
|
| 91 |
+
assert extract_paper_id(evidence) == "NCT:NCT12345678"
|
| 92 |
+
|
| 93 |
+
def test_returns_none_for_unknown_url(self) -> None:
|
| 94 |
+
evidence = _make_evidence("web", "https://example.com/unknown")
|
| 95 |
+
assert extract_paper_id(evidence) is None
|
| 96 |
+
|
| 97 |
+
|
| 98 |
+
class TestDeduplicateEvidence:
|
| 99 |
+
"""Tests for evidence deduplication."""
|
| 100 |
+
|
| 101 |
+
def test_removes_pubmed_europepmc_duplicate(self) -> None:
|
| 102 |
+
"""Same paper from PubMed and Europe PMC should dedupe to PubMed."""
|
| 103 |
+
pubmed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
|
| 104 |
+
europepmc = _make_evidence("europepmc", "https://europepmc.org/article/MED/12345678")
|
| 105 |
+
|
| 106 |
+
result = deduplicate_evidence([pubmed, europepmc])
|
| 107 |
+
|
| 108 |
+
assert len(result) == 1
|
| 109 |
+
assert result[0].citation.source == "pubmed"
|
| 110 |
+
|
| 111 |
+
def test_removes_pubmed_openalex_duplicate_via_metadata(self) -> None:
|
| 112 |
+
"""OpenAlex with PMID in metadata should dedupe against PubMed."""
|
| 113 |
+
pubmed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
|
| 114 |
+
openalex = _make_evidence(
|
| 115 |
+
"openalex",
|
| 116 |
+
"https://openalex.org/W9999999",
|
| 117 |
+
metadata={"pmid": "12345678", "cited_by_count": 100},
|
| 118 |
+
)
|
| 119 |
+
|
| 120 |
+
result = deduplicate_evidence([pubmed, openalex])
|
| 121 |
+
|
| 122 |
+
assert len(result) == 1
|
| 123 |
+
assert result[0].citation.source == "pubmed"
|
| 124 |
+
|
| 125 |
+
def test_preserves_unique_evidence(self) -> None:
|
| 126 |
+
"""Different papers should not be deduplicated."""
|
| 127 |
+
e1 = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/11111111/")
|
| 128 |
+
e2 = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/22222222/")
|
| 129 |
+
|
| 130 |
+
result = deduplicate_evidence([e1, e2])
|
| 131 |
+
|
| 132 |
+
assert len(result) == 2
|
| 133 |
+
|
| 134 |
+
def test_preserves_openalex_without_pmid(self) -> None:
|
| 135 |
+
"""OpenAlex papers without PMID should NOT be deduplicated against PubMed."""
|
| 136 |
+
pubmed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
|
| 137 |
+
openalex_no_pmid = _make_evidence(
|
| 138 |
+
"openalex",
|
| 139 |
+
"https://openalex.org/W9999999",
|
| 140 |
+
metadata={"cited_by_count": 100}, # No pmid key
|
| 141 |
+
)
|
| 142 |
+
|
| 143 |
+
result = deduplicate_evidence([pubmed, openalex_no_pmid])
|
| 144 |
+
|
| 145 |
+
assert len(result) == 2 # Both preserved (different IDs)
|
| 146 |
+
|
| 147 |
+
def test_keeps_unidentifiable_evidence(self) -> None:
|
| 148 |
+
"""Evidence with unrecognized URLs should be preserved."""
|
| 149 |
+
unknown = _make_evidence("web", "https://example.com/paper/123")
|
| 150 |
+
|
| 151 |
+
result = deduplicate_evidence([unknown])
|
| 152 |
+
|
| 153 |
+
assert len(result) == 1
|
| 154 |
+
|
| 155 |
+
def test_clinicaltrials_unique_per_nct(self) -> None:
|
| 156 |
+
"""ClinicalTrials entries have unique NCT IDs."""
|
| 157 |
+
trial1 = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/study/NCT11111111")
|
| 158 |
+
trial2 = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/study/NCT22222222")
|
| 159 |
+
|
| 160 |
+
result = deduplicate_evidence([trial1, trial2])
|
| 161 |
+
|
| 162 |
+
assert len(result) == 2
|
| 163 |
+
|
| 164 |
+
def test_preprints_preserved_separately(self) -> None:
|
| 165 |
+
"""Preprints (PPR IDs) should not dedupe against peer-reviewed papers."""
|
| 166 |
+
peer_reviewed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
|
| 167 |
+
preprint = _make_evidence("europepmc", "https://europepmc.org/article/PPR/PPR999999")
|
| 168 |
+
|
| 169 |
+
result = deduplicate_evidence([peer_reviewed, preprint])
|
| 170 |
+
|
| 171 |
+
assert len(result) == 2 # Both preserved (different ID types)
|
| 172 |
+
|
| 173 |
+
|
| 174 |
class TestSearchHandler:
|
| 175 |
"""Tests for SearchHandler."""
|
| 176 |
|
| 177 |
@pytest.mark.asyncio
|
| 178 |
+
async def test_execute_aggregates_and_deduplicates(self):
|
| 179 |
+
"""SearchHandler should aggregate results and deduplicate them."""
|
| 180 |
# Setup
|
| 181 |
mock_tool1 = AsyncMock(spec=SearchTool)
|
| 182 |
mock_tool1.name = "pubmed"
|
| 183 |
mock_tool1.search.return_value = [
|
| 184 |
+
_make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
|
|
|
|
|
|
|
|
|
|
| 185 |
]
|
| 186 |
|
| 187 |
mock_tool2 = AsyncMock(spec=SearchTool)
|
| 188 |
+
mock_tool2.name = "europepmc"
|
| 189 |
+
# Duplicate of the pubmed result
|
| 190 |
mock_tool2.search.return_value = [
|
| 191 |
+
_make_evidence("europepmc", "https://europepmc.org/article/MED/12345678")
|
|
|
|
|
|
|
|
|
|
| 192 |
]
|
| 193 |
|
| 194 |
handler = SearchHandler(tools=[mock_tool1, mock_tool2])
|
| 195 |
|
| 196 |
# Execute
|
| 197 |
+
result = await handler.execute("test")
|
| 198 |
+
|
| 199 |
+
# Should only have 1 result after deduplication
|
| 200 |
+
assert result.total_found == 1
|
| 201 |
+
assert len(result.evidence) == 1
|
| 202 |
+
assert result.evidence[0].citation.source == "pubmed" # Priority source kept
|
| 203 |
assert "pubmed" in result.sources_searched
|
| 204 |
+
assert "europepmc" in result.sources_searched
|
| 205 |
|
| 206 |
@pytest.mark.asyncio
|
| 207 |
async def test_execute_handles_tool_failure(self):
|
|
|
|
| 209 |
mock_tool_ok = create_autospec(SearchTool, instance=True)
|
| 210 |
mock_tool_ok.name = "pubmed"
|
| 211 |
mock_tool_ok.search = AsyncMock(
|
| 212 |
+
return_value=[_make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 213 |
)
|
| 214 |
|
| 215 |
mock_tool_fail = create_autospec(SearchTool, instance=True)
|
| 216 |
+
mock_tool_fail.name = "clinicaltrials"
|
| 217 |
mock_tool_fail.search = AsyncMock(side_effect=SearchError("API down"))
|
| 218 |
|
| 219 |
handler = SearchHandler(tools=[mock_tool_ok, mock_tool_fail])
|
|
|
|
| 222 |
assert result.total_found == 1
|
| 223 |
assert "pubmed" in result.sources_searched
|
| 224 |
assert len(result.errors) == 1
|
| 225 |
+
assert "clinicaltrials: API down" in result.errors[0]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|