VibecoderMcSwaggins commited on
Commit
2c5db87
Β·
unverified Β·
1 Parent(s): 91a017e

feat(search): SPEC_13 Evidence Deduplication (#98)

Browse files

* feat(search): implement evidence deduplication (SPEC_13)

- Add PMID extraction to OpenAlex tool (from ids.pmid)
- Implement deduplicate_evidence and extract_paper_id in SearchHandler
- Support deduplication across PubMed, Europe PMC (MED, PMC, PPR, PAT), OpenAlex, and ClinicalTrials
- Add comprehensive unit tests for deduplication logic

* test: complete SPEC_13 test coverage (senior review findings)

Missing tests added per spec:
- test_extracts_europepmc_pat_id_eu_format (EP patent format)
- test_preserves_openalex_without_pmid (critical edge case)
- test_keeps_unidentifiable_evidence (conservative behavior)
- test_clinicaltrials_unique_per_nct (NCT uniqueness)
- test_preprints_preserved_separately (PPR vs PMID)
- test_extracts_pmid_from_ids_object (OpenAlex PMID extraction)
- test_pmid_is_none_when_not_present (null case)

Code improvements:
- Move `import re` to module level in openalex.py (efficiency)
- Add SAMPLE_OPENALEX_WITH_PMID fixture for cross-dedup testing

All 35 tests pass. 100% spec compliance.

* fix: add defensive isinstance check for pmid_url

Per CodeRabbit review - guards against unexpected API response types
that could cause TypeError when checking string containment.

Files changed (27) hide show
  1. docs/bugs/ACTIVE_BUGS.md +4 -2
  2. docs/bugs/{P0_ORCHESTRATOR_DEDUP_AND_JUDGE_BUGS.md β†’ archive/P0_ORCHESTRATOR_DEDUP_AND_JUDGE_BUGS.md} +0 -0
  3. docs/bugs/{P0_SIMPLE_MODE_NEVER_SYNTHESIZES.md β†’ archive/P0_SIMPLE_MODE_NEVER_SYNTHESIZES.md} +0 -0
  4. docs/bugs/{P1_NARRATIVE_SYNTHESIS_FALLBACK.md β†’ archive/P1_NARRATIVE_SYNTHESIS_FALLBACK.md} +0 -0
  5. docs/bugs/{P2_GRADIO_EXAMPLE_NOT_FILLING.md β†’ archive/P2_GRADIO_EXAMPLE_NOT_FILLING.md} +0 -0
  6. docs/bugs/{P3_ARCHITECTURAL_GAP_EPHEMERAL_MEMORY.md β†’ archive/P3_ARCHITECTURAL_GAP_EPHEMERAL_MEMORY.md} +0 -0
  7. docs/bugs/{P3_ARCHITECTURAL_GAP_STRUCTURED_MEMORY.md β†’ archive/P3_ARCHITECTURAL_GAP_STRUCTURED_MEMORY.md} +0 -0
  8. docs/bugs/{P3_MAGENTIC_NO_TERMINATION_EVENT.md β†’ archive/P3_MAGENTIC_NO_TERMINATION_EVENT.md} +0 -0
  9. docs/specs/SPEC_13_EVIDENCE_DEDUPLICATION.md +566 -0
  10. docs/specs/SPEC_14_CLINICALTRIALS_OUTCOMES.md +466 -0
  11. docs/specs/SPEC_15_ADVANCED_MODE_PERFORMANCE.md +478 -0
  12. docs/specs/{SPEC_01_DEMO_TERMINATION.md β†’ archive/SPEC_01_DEMO_TERMINATION.md} +0 -0
  13. docs/specs/{SPEC_02_E2E_TESTING.md β†’ archive/SPEC_02_E2E_TESTING.md} +0 -0
  14. docs/specs/{SPEC_03_OPENALEX_INTEGRATION.md β†’ archive/SPEC_03_OPENALEX_INTEGRATION.md} +0 -0
  15. docs/specs/{SPEC_04_MAGENTIC_UX.md β†’ archive/SPEC_04_MAGENTIC_UX.md} +0 -0
  16. docs/specs/{SPEC_05_ORCHESTRATOR_CLEANUP.md β†’ archive/SPEC_05_ORCHESTRATOR_CLEANUP.md} +0 -0
  17. docs/specs/{SPEC_06_SIMPLE_MODE_SYNTHESIS.md β†’ archive/SPEC_06_SIMPLE_MODE_SYNTHESIS.md} +0 -0
  18. docs/specs/{SPEC_07_LANGGRAPH_MEMORY_ARCH.md β†’ archive/SPEC_07_LANGGRAPH_MEMORY_ARCH.md} +0 -0
  19. docs/specs/{SPEC_08_INTEGRATE_MEMORY_LAYER.md β†’ archive/SPEC_08_INTEGRATE_MEMORY_LAYER.md} +0 -0
  20. docs/specs/{SPEC_09_LLAMAINDEX_INTEGRATION.md β†’ archive/SPEC_09_LLAMAINDEX_INTEGRATION.md} +0 -0
  21. docs/specs/{SPEC_10_DOMAIN_AGNOSTIC_REFACTOR.md β†’ archive/SPEC_10_DOMAIN_AGNOSTIC_REFACTOR.md} +0 -0
  22. docs/specs/{SPEC_11_SEXUAL_HEALTH_FOCUS.md β†’ archive/SPEC_11_SEXUAL_HEALTH_FOCUS.md} +0 -0
  23. docs/specs/{SPEC_12_NARRATIVE_SYNTHESIS.md β†’ archive/SPEC_12_NARRATIVE_SYNTHESIS.md} +0 -0
  24. src/tools/openalex.py +12 -0
  25. src/tools/search_handler.py +130 -1
  26. tests/unit/tools/test_openalex.py +44 -0
  27. tests/unit/tools/test_search_handler.py +178 -41
docs/bugs/ACTIVE_BUGS.md CHANGED
@@ -1,6 +1,8 @@
1
  # Active Bugs
2
 
3
  > Last updated: 2025-11-30
 
 
4
 
5
  ## P0 - Blocker
6
 
@@ -10,8 +12,8 @@
10
 
11
  ## P1 - Important
12
 
13
- ### P1 - Narrative Synthesis Falls Back to Template (NEW)
14
- **File:** `P1_NARRATIVE_SYNTHESIS_FALLBACK.md`
15
  **Related:** SPEC_12 (implemented but falling back)
16
 
17
  **Problem:** Users see bullet-point template output instead of LLM-generated narrative prose.
 
1
  # Active Bugs
2
 
3
  > Last updated: 2025-11-30
4
+ >
5
+ > **Note:** Completed bug docs archived to `docs/bugs/archive/`
6
 
7
  ## P0 - Blocker
8
 
 
12
 
13
  ## P1 - Important
14
 
15
+ ### P1 - Narrative Synthesis Falls Back to Template
16
+ **File:** `archive/P1_NARRATIVE_SYNTHESIS_FALLBACK.md`
17
  **Related:** SPEC_12 (implemented but falling back)
18
 
19
  **Problem:** Users see bullet-point template output instead of LLM-generated narrative prose.
docs/bugs/{P0_ORCHESTRATOR_DEDUP_AND_JUDGE_BUGS.md β†’ archive/P0_ORCHESTRATOR_DEDUP_AND_JUDGE_BUGS.md} RENAMED
File without changes
docs/bugs/{P0_SIMPLE_MODE_NEVER_SYNTHESIZES.md β†’ archive/P0_SIMPLE_MODE_NEVER_SYNTHESIZES.md} RENAMED
File without changes
docs/bugs/{P1_NARRATIVE_SYNTHESIS_FALLBACK.md β†’ archive/P1_NARRATIVE_SYNTHESIS_FALLBACK.md} RENAMED
File without changes
docs/bugs/{P2_GRADIO_EXAMPLE_NOT_FILLING.md β†’ archive/P2_GRADIO_EXAMPLE_NOT_FILLING.md} RENAMED
File without changes
docs/bugs/{P3_ARCHITECTURAL_GAP_EPHEMERAL_MEMORY.md β†’ archive/P3_ARCHITECTURAL_GAP_EPHEMERAL_MEMORY.md} RENAMED
File without changes
docs/bugs/{P3_ARCHITECTURAL_GAP_STRUCTURED_MEMORY.md β†’ archive/P3_ARCHITECTURAL_GAP_STRUCTURED_MEMORY.md} RENAMED
File without changes
docs/bugs/{P3_MAGENTIC_NO_TERMINATION_EVENT.md β†’ archive/P3_MAGENTIC_NO_TERMINATION_EVENT.md} RENAMED
File without changes
docs/specs/SPEC_13_EVIDENCE_DEDUPLICATION.md ADDED
@@ -0,0 +1,566 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SPEC_13: Evidence Deduplication in SearchHandler
2
+
3
+ **Status**: Draft (Validated via API Documentation Review)
4
+ **Priority**: P1
5
+ **GitHub Issue**: #94
6
+ **Estimated Effort**: Medium (~100 lines of code, includes OpenAlex metadata extraction)
7
+ **Last Updated**: 2025-11-30
8
+
9
+ ---
10
+
11
+ ## Problem Statement
12
+
13
+ DeepBoner searches 4 sources in parallel:
14
+ 1. **PubMed** - NCBI's biomedical literature database
15
+ 2. **ClinicalTrials.gov** - Clinical trial registry
16
+ 3. **Europe PMC** - Indexes **ALL of PubMed/MEDLINE** plus preprints
17
+ 4. **OpenAlex** - Indexes **250M+ works** including most of PubMed
18
+
19
+ **Result**: The same paper appears 2-3 times in evidence because:
20
+ - Europe PMC includes 100% of PubMed content
21
+ - OpenAlex includes ~90% of PubMed content
22
+
23
+ ### Impact
24
+
25
+ | Metric | Current | After Fix |
26
+ |--------|---------|-----------|
27
+ | Duplicate evidence | 30-50% | <5% |
28
+ | Token waste | High | Low |
29
+ | Judge confusion | Yes | No |
30
+ | Report redundancy | Yes | No |
31
+
32
+ ---
33
+
34
+ ## API Documentation Review (2025-11-30)
35
+
36
+ ### OpenAlex Work Object - `ids` Field
37
+
38
+ **Source**: [OpenAlex Work Object Docs](https://docs.openalex.org/api-entities/works/work-object)
39
+
40
+ OpenAlex returns PMIDs in the `ids` field as **full URLs** (validated via live API testing 2025-11-30):
41
+ ```json
42
+ {
43
+ "ids": {
44
+ "openalex": "https://openalex.org/W2741809807",
45
+ "doi": "https://doi.org/10.7717/peerj.4375",
46
+ "pmid": "https://pubmed.ncbi.nlm.nih.gov/29456894"
47
+ }
48
+ }
49
+ ```
50
+
51
+ **Live API Test (2025-11-30):**
52
+ ```bash
53
+ curl "https://api.openalex.org/works?filter=ids.pmid:29456894&select=id,ids"
54
+ # Returns: "pmid": "https://pubmed.ncbi.nlm.nih.gov/29456894" (always URL format, never just number)
55
+ ```
56
+
57
+ **Current Issue**: Our `openalex.py` does NOT extract `ids.pmid` - it only uses DOI.
58
+ **Fix Required**: Extract PMID from OpenAlex response URL and store numeric PMID in `Evidence.metadata`.
59
+
60
+ ### Europe PMC URL Patterns
61
+
62
+ **Source**: [Europe PMC Help](https://europepmc.org/help)
63
+
64
+ Four URL patterns exist:
65
+ - `/article/MED/{PMID}` - PubMed/MEDLINE records
66
+ - `/article/PMC/{PMCID}` - PubMed Central full text
67
+ - `/article/PPR/{PPRID}` - Preprints (medRxiv, bioRxiv, etc.)
68
+ - `/article/PAT/{PatentID}` - Patents (e.g., `PAT/WO8601415`, `PAT/EP1234567`)
69
+
70
+ **Live API Test (2025-11-30):**
71
+ ```bash
72
+ curl "https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=SRC:PAT&pageSize=1&format=json"
73
+ # Returns: source: "PAT", id: "WO8601415", pmid: None
74
+ ```
75
+
76
+ **Current Behavior**: Our `europepmc.py:92-95` already uses PubMed URL when PMID exists:
77
+ ```python
78
+ if doi:
79
+ url = f"https://doi.org/{doi}"
80
+ elif result.get("pmid"):
81
+ url = f"https://pubmed.ncbi.nlm.nih.gov/{result['pmid']}/" # ← Good!
82
+ else:
83
+ url = f"https://europepmc.org/article/{source_db}/{result.get('id', '')}"
84
+ ```
85
+
86
+ **Implications**:
87
+ 1. For MEDLINE records with PMIDs β†’ PubMed URL used β†’ deduplication works
88
+ 2. For PMC/PPR/PAT records without PMIDs β†’ Europe PMC URL used β†’ treated as unique (correct behavior)
89
+
90
+ ### ClinicalTrials.gov URL Patterns
91
+
92
+ Two URL formats exist:
93
+ - Modern: `/study/NCT12345678` (current API v2)
94
+ - Legacy: `/ct2/show/NCT12345678` (deprecated but may exist in old data)
95
+
96
+ ---
97
+
98
+ ## Current Code
99
+
100
+ ```python
101
+ # src/tools/search_handler.py:51-62
102
+ for tool, result in zip(self.tools, results, strict=True):
103
+ if isinstance(result, Exception):
104
+ errors.append(f"{tool.name}: {result!s}")
105
+ else:
106
+ success_result = cast(list[Evidence], result)
107
+ all_evidence.extend(success_result) # ← NO DEDUPLICATION
108
+ ```
109
+
110
+ ---
111
+
112
+ ## Proposed Solution (Two Parts)
113
+
114
+ ### Part 1: Extract PMID from OpenAlex (REQUIRED for cross-source dedup)
115
+
116
+ ```python
117
+ # src/tools/openalex.py - Update _to_evidence() method
118
+
119
+ def _to_evidence(self, work: dict[str, Any]) -> Evidence:
120
+ """Convert OpenAlex work to Evidence with rich metadata."""
121
+ # ... existing code ...
122
+
123
+ # NEW: Extract PMID from ids object for deduplication
124
+ ids_obj = work.get("ids", {})
125
+ pmid_url = ids_obj.get("pmid") # "https://pubmed.ncbi.nlm.nih.gov/29456894"
126
+ pmid = None
127
+ if pmid_url and "pubmed.ncbi.nlm.nih.gov" in pmid_url:
128
+ # Extract numeric PMID from URL
129
+ import re
130
+ pmid_match = re.search(r'/(\d+)/?$', pmid_url)
131
+ if pmid_match:
132
+ pmid = pmid_match.group(1)
133
+
134
+ # ... rest of existing code ...
135
+
136
+ return Evidence(
137
+ content=content[:2000],
138
+ citation=Citation(
139
+ source="openalex",
140
+ title=title[:500],
141
+ url=url,
142
+ date=str(year),
143
+ authors=authors,
144
+ ),
145
+ relevance=relevance,
146
+ metadata={
147
+ "cited_by_count": cited_by_count,
148
+ "concepts": concepts,
149
+ "is_open_access": is_oa,
150
+ "pdf_url": pdf_url,
151
+ "pmid": pmid, # NEW: Store PMID for deduplication
152
+ },
153
+ )
154
+ ```
155
+
156
+ ### Part 2: Enhanced Deduplication Function
157
+
158
+ ```python
159
+ # src/tools/search_handler.py
160
+
161
+ import re
162
+ from typing import TYPE_CHECKING
163
+
164
+ if TYPE_CHECKING:
165
+ from src.utils.models import Evidence
166
+
167
+
168
+ def extract_paper_id(evidence: "Evidence") -> str | None:
169
+ """Extract unique paper identifier from Evidence.
170
+
171
+ Strategy:
172
+ 1. Check metadata.pmid first (OpenAlex provides this)
173
+ 2. Fall back to URL pattern matching
174
+
175
+ Supports:
176
+ - PubMed: https://pubmed.ncbi.nlm.nih.gov/12345678/
177
+ - Europe PMC MED: https://europepmc.org/article/MED/12345678
178
+ - Europe PMC PMC: https://europepmc.org/article/PMC/PMC1234567
179
+ - Europe PMC PPR: https://europepmc.org/article/PPR/PPR123456
180
+ - Europe PMC PAT: https://europepmc.org/article/PAT/WO8601415
181
+ - DOI: https://doi.org/10.1234/...
182
+ - OpenAlex: https://openalex.org/W1234567890
183
+ - ClinicalTrials: https://clinicaltrials.gov/study/NCT12345678
184
+ - ClinicalTrials (legacy): https://clinicaltrials.gov/ct2/show/NCT12345678
185
+ """
186
+ url = evidence.citation.url
187
+ metadata = evidence.metadata or {}
188
+
189
+ # Strategy 1: Check metadata.pmid (from OpenAlex)
190
+ if pmid := metadata.get("pmid"):
191
+ return f"PMID:{pmid}"
192
+
193
+ # Strategy 2: URL pattern matching
194
+
195
+ # PubMed URL pattern
196
+ pmid_match = re.search(r'pubmed\.ncbi\.nlm\.nih\.gov/(\d+)', url)
197
+ if pmid_match:
198
+ return f"PMID:{pmid_match.group(1)}"
199
+
200
+ # Europe PMC MED pattern (same as PMID)
201
+ epmc_med_match = re.search(r'europepmc\.org/article/MED/(\d+)', url)
202
+ if epmc_med_match:
203
+ return f"PMID:{epmc_med_match.group(1)}"
204
+
205
+ # Europe PMC PMC pattern (PubMed Central ID - different from PMID!)
206
+ epmc_pmc_match = re.search(r'europepmc\.org/article/PMC/(PMC\d+)', url)
207
+ if epmc_pmc_match:
208
+ return f"PMCID:{epmc_pmc_match.group(1)}"
209
+
210
+ # Europe PMC PPR pattern (Preprint ID - unique per preprint)
211
+ epmc_ppr_match = re.search(r'europepmc\.org/article/PPR/(PPR\d+)', url)
212
+ if epmc_ppr_match:
213
+ return f"PPRID:{epmc_ppr_match.group(1)}"
214
+
215
+ # Europe PMC PAT pattern (Patent ID - e.g., WO8601415, EP1234567)
216
+ epmc_pat_match = re.search(r'europepmc\.org/article/PAT/([A-Z]{2}\d+)', url)
217
+ if epmc_pat_match:
218
+ return f"PATID:{epmc_pat_match.group(1)}"
219
+
220
+ # DOI pattern (normalize trailing slash/characters)
221
+ doi_match = re.search(r'doi\.org/(10\.\d+/[^\s\]>]+)', url)
222
+ if doi_match:
223
+ doi = doi_match.group(1).rstrip('/')
224
+ return f"DOI:{doi}"
225
+
226
+ # OpenAlex ID pattern (fallback if no PMID in metadata)
227
+ openalex_match = re.search(r'openalex\.org/(W\d+)', url)
228
+ if openalex_match:
229
+ return f"OAID:{openalex_match.group(1)}"
230
+
231
+ # ClinicalTrials NCT ID (modern format)
232
+ nct_match = re.search(r'clinicaltrials\.gov/study/(NCT\d+)', url)
233
+ if nct_match:
234
+ return f"NCT:{nct_match.group(1)}"
235
+
236
+ # ClinicalTrials NCT ID (legacy format)
237
+ nct_legacy_match = re.search(r'clinicaltrials\.gov/ct2/show/(NCT\d+)', url)
238
+ if nct_legacy_match:
239
+ return f"NCT:{nct_legacy_match.group(1)}"
240
+
241
+ return None
242
+
243
+
244
+ def deduplicate_evidence(evidence_list: list[Evidence]) -> list[Evidence]:
245
+ """Remove duplicate evidence based on paper ID.
246
+
247
+ Deduplication priority:
248
+ 1. PubMed (authoritative source)
249
+ 2. Europe PMC (full text links)
250
+ 3. OpenAlex (citation data)
251
+ 4. ClinicalTrials (unique, never duplicated)
252
+
253
+ Returns:
254
+ Deduplicated list preserving source priority order.
255
+ """
256
+ seen_ids: set[str] = set()
257
+ unique: list[Evidence] = []
258
+
259
+ # Sort by source priority (PubMed first)
260
+ source_priority = {"pubmed": 0, "europepmc": 1, "openalex": 2, "clinicaltrials": 3}
261
+ sorted_evidence = sorted(
262
+ evidence_list,
263
+ key=lambda e: source_priority.get(e.citation.source, 99)
264
+ )
265
+
266
+ for evidence in sorted_evidence:
267
+ paper_id = extract_paper_id(evidence.citation.url)
268
+
269
+ if paper_id is None:
270
+ # Can't identify - keep it (conservative)
271
+ unique.append(evidence)
272
+ continue
273
+
274
+ if paper_id not in seen_ids:
275
+ seen_ids.add(paper_id)
276
+ unique.append(evidence)
277
+
278
+ return unique
279
+ ```
280
+
281
+ ### 2. Integrate into SearchHandler.execute()
282
+
283
+ ```python
284
+ # src/tools/search_handler.py:execute() - AFTER gathering results
285
+
286
+ # ... existing code to collect all_evidence ...
287
+
288
+ # DEDUPLICATION STEP
289
+ original_count = len(all_evidence)
290
+ all_evidence = deduplicate_evidence(all_evidence)
291
+ dedup_count = original_count - len(all_evidence)
292
+
293
+ if dedup_count > 0:
294
+ logger.info(
295
+ "Deduplicated evidence",
296
+ original=original_count,
297
+ unique=len(all_evidence),
298
+ removed=dedup_count,
299
+ )
300
+
301
+ return SearchResult(
302
+ query=query,
303
+ evidence=all_evidence, # Now deduplicated
304
+ sources_searched=sources_searched,
305
+ total_found=len(all_evidence),
306
+ errors=errors,
307
+ )
308
+ ```
309
+
310
+ ---
311
+
312
+ ## Test Plan
313
+
314
+ ### Unit Tests (`tests/unit/tools/test_search_handler.py`)
315
+
316
+ ```python
317
+ import pytest
318
+ from src.tools.search_handler import extract_paper_id, deduplicate_evidence
319
+ from src.utils.models import Citation, Evidence
320
+
321
+
322
+ def _make_evidence(source: str, url: str, metadata: dict | None = None) -> Evidence:
323
+ """Helper to create Evidence objects for testing."""
324
+ return Evidence(
325
+ content="Test content",
326
+ citation=Citation(
327
+ source=source,
328
+ title="Test",
329
+ url=url,
330
+ date="2024",
331
+ authors=[],
332
+ ),
333
+ metadata=metadata,
334
+ )
335
+
336
+
337
+ class TestExtractPaperId:
338
+ """Tests for paper ID extraction from Evidence objects."""
339
+
340
+ def test_extracts_pubmed_id(self) -> None:
341
+ evidence = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
342
+ assert extract_paper_id(evidence) == "PMID:12345678"
343
+
344
+ def test_extracts_europepmc_med_id(self) -> None:
345
+ evidence = _make_evidence("europepmc", "https://europepmc.org/article/MED/12345678")
346
+ assert extract_paper_id(evidence) == "PMID:12345678"
347
+
348
+ def test_extracts_europepmc_pmc_id(self) -> None:
349
+ """Europe PMC PMC articles have different ID format."""
350
+ evidence = _make_evidence("europepmc", "https://europepmc.org/article/PMC/PMC7654321")
351
+ assert extract_paper_id(evidence) == "PMCID:PMC7654321"
352
+
353
+ def test_extracts_europepmc_ppr_id(self) -> None:
354
+ """Europe PMC preprints have PPR IDs."""
355
+ evidence = _make_evidence("europepmc", "https://europepmc.org/article/PPR/PPR123456")
356
+ assert extract_paper_id(evidence) == "PPRID:PPR123456"
357
+
358
+ def test_extracts_europepmc_pat_id(self) -> None:
359
+ """Europe PMC patents have PAT IDs (e.g., WO8601415, EP1234567)."""
360
+ evidence = _make_evidence("europepmc", "https://europepmc.org/article/PAT/WO8601415")
361
+ assert extract_paper_id(evidence) == "PATID:WO8601415"
362
+
363
+ def test_extracts_europepmc_pat_id_eu_format(self) -> None:
364
+ """European patent format should also work."""
365
+ evidence = _make_evidence("europepmc", "https://europepmc.org/article/PAT/EP1234567")
366
+ assert extract_paper_id(evidence) == "PATID:EP1234567"
367
+
368
+ def test_extracts_doi(self) -> None:
369
+ evidence = _make_evidence("pubmed", "https://doi.org/10.1038/nature12345")
370
+ assert extract_paper_id(evidence) == "DOI:10.1038/nature12345"
371
+
372
+ def test_extracts_doi_with_trailing_slash(self) -> None:
373
+ """DOIs should be normalized (trailing slash removed)."""
374
+ evidence = _make_evidence("pubmed", "https://doi.org/10.1038/nature12345/")
375
+ assert extract_paper_id(evidence) == "DOI:10.1038/nature12345"
376
+
377
+ def test_extracts_openalex_id_from_url(self) -> None:
378
+ """OpenAlex ID from URL (fallback when no PMID in metadata)."""
379
+ evidence = _make_evidence("openalex", "https://openalex.org/W1234567890")
380
+ assert extract_paper_id(evidence) == "OAID:W1234567890"
381
+
382
+ def test_extracts_openalex_pmid_from_metadata(self) -> None:
383
+ """OpenAlex PMID from metadata takes priority over URL."""
384
+ evidence = _make_evidence(
385
+ "openalex",
386
+ "https://openalex.org/W1234567890",
387
+ metadata={"pmid": "98765432"},
388
+ )
389
+ assert extract_paper_id(evidence) == "PMID:98765432"
390
+
391
+ def test_extracts_nct_id_modern(self) -> None:
392
+ evidence = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/study/NCT12345678")
393
+ assert extract_paper_id(evidence) == "NCT:NCT12345678"
394
+
395
+ def test_extracts_nct_id_legacy(self) -> None:
396
+ """Legacy ClinicalTrials.gov URL format should also work."""
397
+ evidence = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/ct2/show/NCT12345678")
398
+ assert extract_paper_id(evidence) == "NCT:NCT12345678"
399
+
400
+ def test_returns_none_for_unknown_url(self) -> None:
401
+ evidence = _make_evidence("unknown", "https://example.com/unknown")
402
+ assert extract_paper_id(evidence) is None
403
+
404
+
405
+ class TestDeduplicateEvidence:
406
+ """Tests for evidence deduplication."""
407
+
408
+ def test_removes_pubmed_europepmc_duplicate(self) -> None:
409
+ """Same paper from PubMed and Europe PMC should dedupe to PubMed."""
410
+ pubmed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
411
+ europepmc = _make_evidence("europepmc", "https://europepmc.org/article/MED/12345678")
412
+
413
+ result = deduplicate_evidence([pubmed, europepmc])
414
+
415
+ assert len(result) == 1
416
+ assert result[0].citation.source == "pubmed"
417
+
418
+ def test_removes_pubmed_openalex_duplicate_via_metadata(self) -> None:
419
+ """OpenAlex with PMID in metadata should dedupe against PubMed."""
420
+ pubmed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
421
+ openalex = _make_evidence(
422
+ "openalex",
423
+ "https://openalex.org/W9999999",
424
+ metadata={"pmid": "12345678", "cited_by_count": 100},
425
+ )
426
+
427
+ result = deduplicate_evidence([pubmed, openalex])
428
+
429
+ assert len(result) == 1
430
+ assert result[0].citation.source == "pubmed"
431
+
432
+ def test_preserves_openalex_without_pmid(self) -> None:
433
+ """OpenAlex papers without PMID should NOT be deduplicated."""
434
+ pubmed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
435
+ openalex_no_pmid = _make_evidence(
436
+ "openalex",
437
+ "https://openalex.org/W9999999",
438
+ metadata={"cited_by_count": 100}, # No pmid key
439
+ )
440
+
441
+ result = deduplicate_evidence([pubmed, openalex_no_pmid])
442
+
443
+ assert len(result) == 2 # Both preserved (different IDs)
444
+
445
+ def test_preserves_unique_evidence(self) -> None:
446
+ """Different papers should not be deduplicated."""
447
+ e1 = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/11111111/")
448
+ e2 = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/22222222/")
449
+
450
+ result = deduplicate_evidence([e1, e2])
451
+
452
+ assert len(result) == 2
453
+
454
+ def test_keeps_unidentifiable_evidence(self) -> None:
455
+ """Evidence with unrecognized URLs should be preserved."""
456
+ unknown = _make_evidence("unknown", "https://example.com/paper/123")
457
+
458
+ result = deduplicate_evidence([unknown])
459
+
460
+ assert len(result) == 1
461
+
462
+ def test_clinicaltrials_unique_per_nct(self) -> None:
463
+ """ClinicalTrials entries have unique NCT IDs."""
464
+ trial1 = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/study/NCT11111111")
465
+ trial2 = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/study/NCT22222222")
466
+
467
+ result = deduplicate_evidence([trial1, trial2])
468
+
469
+ assert len(result) == 2
470
+
471
+ def test_preprints_preserved_separately(self) -> None:
472
+ """Preprints (PPR IDs) should not dedupe against peer-reviewed papers."""
473
+ peer_reviewed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
474
+ preprint = _make_evidence("europepmc", "https://europepmc.org/article/PPR/PPR999999")
475
+
476
+ result = deduplicate_evidence([peer_reviewed, preprint])
477
+
478
+ assert len(result) == 2 # Both preserved (different ID types)
479
+ ```
480
+
481
+ ### Integration Test
482
+
483
+ ```python
484
+ @pytest.mark.integration
485
+ async def test_real_search_deduplicates() -> None:
486
+ """Integration test: Real search should deduplicate PubMed/Europe PMC."""
487
+ from src.tools.pubmed import PubMedTool
488
+ from src.tools.europepmc import EuropePMCTool
489
+ from src.tools.openalex import OpenAlexTool
490
+ from src.tools.search_handler import SearchHandler, extract_paper_id
491
+
492
+ handler = SearchHandler(
493
+ tools=[PubMedTool(), EuropePMCTool(), OpenAlexTool()],
494
+ timeout=30.0,
495
+ )
496
+
497
+ result = await handler.execute("sildenafil erectile dysfunction", max_results_per_tool=5)
498
+
499
+ # Should have fewer results than 3x max_results due to deduplication
500
+ assert result.total_found < 15
501
+
502
+ # Check no duplicate paper IDs in final result
503
+ paper_ids = [
504
+ extract_paper_id(e)
505
+ for e in result.evidence
506
+ if extract_paper_id(e)
507
+ ]
508
+ assert len(paper_ids) == len(set(paper_ids)), f"Duplicate IDs found: {paper_ids}"
509
+ ```
510
+
511
+ ---
512
+
513
+ ## Files to Modify
514
+
515
+ | File | Change |
516
+ |------|--------|
517
+ | `src/tools/openalex.py` | Extract PMID from `ids.pmid` and store in `Evidence.metadata` |
518
+ | `src/tools/search_handler.py` | Add `extract_paper_id()`, `deduplicate_evidence()`, integrate into `execute()` |
519
+ | `tests/unit/tools/test_search_handler.py` | Add comprehensive deduplication tests |
520
+ | `tests/unit/tools/test_openalex.py` | Add test for PMID extraction |
521
+
522
+ ---
523
+
524
+ ## Acceptance Criteria
525
+
526
+ - [ ] `openalex.py` extracts PMID from `work.ids.pmid` and stores in `Evidence.metadata`
527
+ - [ ] `extract_paper_id()` checks `Evidence.metadata.pmid` first (for OpenAlex cross-dedup)
528
+ - [ ] `extract_paper_id()` correctly parses all URL patterns:
529
+ - PubMed URLs
530
+ - Europe PMC MED, PMC, PPR, and PAT URLs
531
+ - DOIs (normalized without trailing slash)
532
+ - OpenAlex W-IDs (fallback)
533
+ - ClinicalTrials NCT IDs (modern and legacy formats)
534
+ - [ ] `deduplicate_evidence()` removes duplicates while preserving source priority
535
+ - [ ] Unit tests cover all edge cases including OpenAlex with/without PMID
536
+ - [ ] Integration test confirms real searches are deduplicated
537
+ - [ ] Logging shows deduplication metrics
538
+
539
+ ---
540
+
541
+ ## Known Limitations
542
+
543
+ 1. **OpenAlex without PMID**: Some OpenAlex papers (especially older or non-biomedical) may not have PMIDs. These will NOT be deduplicated against PubMed. This is acceptable - we prefer false negatives (keeping duplicates) over false positives (removing unique papers).
544
+
545
+ 2. **PMC vs PMID**: PubMed Central IDs (PMC1234567) are different from PubMed IDs (12345678). A paper may have both, but we treat them as different identifiers. This may result in some duplicates not being caught when Europe PMC returns a PMC URL and PubMed returns a PMID URL for the same paper. Future enhancement: Use NCBI ID Converter API.
546
+
547
+ 3. **Preprint β†’ Peer-Reviewed**: When a preprint (PPR ID) is later published as a peer-reviewed paper (PMID), they will have different IDs and won't be deduplicated. This is intentional - users may want to see both versions.
548
+
549
+ ---
550
+
551
+ ## Rollback Plan
552
+
553
+ If deduplication causes issues:
554
+ 1. Remove the `deduplicate_evidence()` call from `execute()`
555
+ 2. All tests should still pass (deduplication is additive)
556
+ 3. No need to revert OpenAlex PMID extraction (metadata only)
557
+
558
+ ---
559
+
560
+ ## References
561
+
562
+ - GitHub Issue #94
563
+ - [OpenAlex Work Object Documentation](https://docs.openalex.org/api-entities/works/work-object)
564
+ - [Europe PMC Help - ID Types](https://europepmc.org/help)
565
+ - [ClinicalTrials.gov API v2](https://clinicaltrials.gov/data-api/api)
566
+ - `TOOL_ANALYSIS_CRITICAL.md` - "Cross-Tool Issues > Issue 1: MASSIVE DUPLICATION"
docs/specs/SPEC_14_CLINICALTRIALS_OUTCOMES.md ADDED
@@ -0,0 +1,466 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SPEC_14: Add Outcome Measures to ClinicalTrials.gov Fields
2
+
3
+ **Status**: Draft (Validated via API Documentation Review)
4
+ **Priority**: P1
5
+ **GitHub Issue**: #95
6
+ **Estimated Effort**: Small (~40 lines of code)
7
+ **Last Updated**: 2025-11-30
8
+
9
+ ---
10
+
11
+ ## Problem Statement
12
+
13
+ The `ClinicalTrialsTool` retrieves trial metadata but **misses critical efficacy data**:
14
+
15
+ ### Current Fields Retrieved
16
+
17
+ ```python
18
+ # src/tools/clinicaltrials.py:24-33
19
+ FIELDS: ClassVar[list[str]] = [
20
+ "NCTId",
21
+ "BriefTitle",
22
+ "Phase",
23
+ "OverallStatus",
24
+ "Condition",
25
+ "InterventionName",
26
+ "StartDate",
27
+ "BriefSummary",
28
+ ]
29
+ ```
30
+
31
+ ### Missing Data (Critical for Research)
32
+
33
+ | Data | Location in Response | Purpose |
34
+ |------|---------------------|---------|
35
+ | Primary Outcomes | `protocolSection.outcomesModule.primaryOutcomes[].measure` | Main efficacy endpoint |
36
+ | Secondary Outcomes | `protocolSection.outcomesModule.secondaryOutcomes[].measure` | Additional endpoints |
37
+ | Has Results | `study.hasResults` (top-level) | Whether results are posted |
38
+ | Results Date | `protocolSection.statusModule.resultsFirstPostDateStruct.date` | When results posted |
39
+
40
+ ### Impact
41
+
42
+ **Current Output**:
43
+ ```
44
+ Trial Phase: PHASE3. Status: COMPLETED. Conditions: Erectile Dysfunction.
45
+ Interventions: Sildenafil.
46
+ ```
47
+
48
+ **Desired Output**:
49
+ ```
50
+ Trial Phase: PHASE3. Status: COMPLETED. Conditions: Erectile Dysfunction.
51
+ Interventions: Sildenafil.
52
+ Primary Outcome: Change from baseline in IIEF-EF domain score at Week 12.
53
+ Results Available: Yes (posted 2024-01-15).
54
+ ```
55
+
56
+ ---
57
+
58
+ ## API Documentation Review (2025-11-30)
59
+
60
+ ### ClinicalTrials.gov API v2 Response Structure
61
+
62
+ **Source**: [Stack Overflow - ClinicalTrials.gov API v2](https://stackoverflow.com/questions/78415818)
63
+
64
+ The API returns nested JSON. Key findings:
65
+
66
+ 1. **`hasResults`** is a **top-level** field on each study object (NOT inside `protocolSection`)
67
+ 2. **Outcomes** are in `protocolSection.outcomesModule`:
68
+ ```python
69
+ study['protocolSection']['outcomesModule']['primaryOutcomes'] # List
70
+ study['protocolSection']['outcomesModule']['secondaryOutcomes'] # List
71
+ ```
72
+ 3. **Results date** is in `protocolSection.statusModule.resultsFirstPostDateStruct.date`
73
+
74
+ ### `fields` Parameter Behavior (VERIFIED VIA LIVE API TESTING)
75
+
76
+ The `fields` query parameter filters what the API returns. **If you don't request a field, you don't get it.**
77
+
78
+ **Live API Test Results (2025-11-30):**
79
+
80
+ ```bash
81
+ # Test 1: With limited fields - NO outcomesModule returned
82
+ curl "...&fields=NCTId,BriefTitle"
83
+ # β†’ Returns ONLY: protocolSection.identificationModule.{nctId, briefTitle}
84
+
85
+ # Test 2: Without fields param - outcomesModule IS present
86
+ curl "...&pageSize=1"
87
+ # β†’ Returns: hasResults: false, outcomesModule: {primaryOutcomes, secondaryOutcomes, otherOutcomes}
88
+
89
+ # Test 3: Valid field names for outcomes
90
+ curl "...&fields=NCTId,OutcomesModule" # βœ… Works - returns full outcomesModule
91
+ curl "...&fields=NCTId,PrimaryOutcome" # βœ… Works - returns only primaryOutcomes
92
+ curl "...&fields=NCTId,HasResults" # βœ… Works - returns hasResults at top level
93
+ ```
94
+
95
+ **Valid Field Names (Tested):**
96
+ - `OutcomesModule` β†’ Returns full `protocolSection.outcomesModule` with all outcomes
97
+ - `PrimaryOutcome` β†’ Returns only `primaryOutcomes` array
98
+ - `SecondaryOutcome` β†’ Returns only `secondaryOutcomes` array
99
+ - `HasResults` β†’ Returns `hasResults` at study top level
100
+
101
+ ---
102
+
103
+ ## Proposed Solution
104
+
105
+ ### βœ… UPDATE FIELDS Constant (REQUIRED)
106
+
107
+ The current implementation explicitly passes `fields=",".join(self.FIELDS)` at line 67.
108
+ **The API ONLY returns requested fields.** We MUST add the new field names.
109
+
110
+ ```python
111
+ # src/tools/clinicaltrials.py - UPDATE FIELDS
112
+ FIELDS: ClassVar[list[str]] = [
113
+ "NCTId",
114
+ "BriefTitle",
115
+ "Phase",
116
+ "OverallStatus",
117
+ "Condition",
118
+ "InterventionName",
119
+ "StartDate",
120
+ "BriefSummary",
121
+ # NEW: Outcome measures (verified via live API testing 2025-11-30)
122
+ "OutcomesModule", # Returns protocolSection.outcomesModule.{primaryOutcomes, secondaryOutcomes}
123
+ "HasResults", # Returns study.hasResults (top-level boolean)
124
+ ]
125
+ ```
126
+
127
+ ### βœ… Update `_study_to_evidence()` Method
128
+
129
+ ```python
130
+ def _study_to_evidence(self, study: dict[str, Any]) -> Evidence:
131
+ """Convert a clinical trial study to Evidence."""
132
+ # Navigate nested structure
133
+ protocol = study.get("protocolSection", {})
134
+ id_module = protocol.get("identificationModule", {})
135
+ status_module = protocol.get("statusModule", {})
136
+ desc_module = protocol.get("descriptionModule", {})
137
+ design_module = protocol.get("designModule", {})
138
+ conditions_module = protocol.get("conditionsModule", {})
139
+ arms_module = protocol.get("armsInterventionsModule", {})
140
+ outcomes_module = protocol.get("outcomesModule", {}) # NEW
141
+
142
+ # ... existing field extraction (nct_id, title, status, phase, etc.) ...
143
+
144
+ # NEW: Extract outcome measures
145
+ primary_outcomes = outcomes_module.get("primaryOutcomes", [])
146
+ primary_outcome_str = ""
147
+ if primary_outcomes:
148
+ # Get first primary outcome measure and timeframe
149
+ first = primary_outcomes[0]
150
+ measure = first.get("measure", "")
151
+ timeframe = first.get("timeFrame", "")
152
+ # Truncate long outcome descriptions
153
+ primary_outcome_str = measure[:200]
154
+ if timeframe:
155
+ primary_outcome_str += f" (measured at {timeframe})"
156
+
157
+ secondary_outcomes = outcomes_module.get("secondaryOutcomes", [])
158
+ secondary_count = len(secondary_outcomes)
159
+
160
+ # NEW: Check if results are available (hasResults is TOP-LEVEL, not in protocol!)
161
+ has_results = study.get("hasResults", False)
162
+
163
+ # Results date is in statusModule (nested inside date struct)
164
+ results_date_struct = status_module.get("resultsFirstPostDateStruct", {})
165
+ results_date = results_date_struct.get("date", "")
166
+
167
+ # Build content with key trial info (UPDATED)
168
+ content_parts = [
169
+ f"{summary[:400]}...",
170
+ f"Trial Phase: {phase}.",
171
+ f"Status: {status}.",
172
+ f"Conditions: {conditions_str}.",
173
+ f"Interventions: {interventions_str}.",
174
+ ]
175
+
176
+ if primary_outcome_str:
177
+ content_parts.append(f"Primary Outcome: {primary_outcome_str}.")
178
+
179
+ if secondary_count > 0:
180
+ content_parts.append(f"Secondary Outcomes: {secondary_count} additional endpoints.")
181
+
182
+ if has_results:
183
+ results_info = "Results Available: Yes"
184
+ if results_date:
185
+ results_info += f" (posted {results_date})"
186
+ content_parts.append(results_info + ".")
187
+ else:
188
+ content_parts.append("Results Available: Not yet posted.")
189
+
190
+ content = " ".join(content_parts)
191
+
192
+ return Evidence(
193
+ content=content[:2000],
194
+ citation=Citation(
195
+ source="clinicaltrials",
196
+ title=title[:500],
197
+ url=f"https://clinicaltrials.gov/study/{nct_id}",
198
+ date=start_date,
199
+ authors=[],
200
+ ),
201
+ relevance=0.90 if has_results else 0.85, # Boost relevance for trials with results
202
+ )
203
+ ```
204
+
205
+ ---
206
+
207
+ ## API Reference
208
+
209
+ The ClinicalTrials.gov API v2 returns nested JSON:
210
+
211
+ ```json
212
+ {
213
+ "protocolSection": {
214
+ "outcomesModule": {
215
+ "primaryOutcomes": [
216
+ {
217
+ "measure": "Change from Baseline in IIEF-EF Domain Score",
218
+ "description": "...",
219
+ "timeFrame": "Baseline to Week 12"
220
+ }
221
+ ],
222
+ "secondaryOutcomes": [
223
+ {
224
+ "measure": "Subject Global Assessment Question",
225
+ "timeFrame": "Week 12"
226
+ }
227
+ ]
228
+ }
229
+ },
230
+ "hasResults": true
231
+ }
232
+ ```
233
+
234
+ See: https://clinicaltrials.gov/data-api/api
235
+
236
+ ---
237
+
238
+ ## Test Plan
239
+
240
+ ### Unit Tests (`tests/unit/tools/test_clinicaltrials.py`)
241
+
242
+ ```python
243
+ @pytest.mark.unit
244
+ class TestClinicalTrialsOutcomes:
245
+ """Tests for outcome measure extraction."""
246
+
247
+ @pytest.mark.asyncio
248
+ async def test_extracts_primary_outcome(self, tool: ClinicalTrialsTool) -> None:
249
+ """Test that primary outcome is extracted from response."""
250
+ mock_study = {
251
+ "protocolSection": {
252
+ "identificationModule": {"nctId": "NCT12345678", "briefTitle": "Test"},
253
+ "statusModule": {"overallStatus": "COMPLETED", "startDateStruct": {"date": "2023"}},
254
+ "descriptionModule": {"briefSummary": "Summary"},
255
+ "designModule": {"phases": ["PHASE3"]},
256
+ "conditionsModule": {"conditions": ["ED"]},
257
+ "armsInterventionsModule": {"interventions": []},
258
+ "outcomesModule": {
259
+ "primaryOutcomes": [
260
+ {
261
+ "measure": "Change in IIEF-EF score",
262
+ "timeFrame": "Week 12"
263
+ }
264
+ ]
265
+ },
266
+ },
267
+ "hasResults": True,
268
+ }
269
+
270
+ mock_response = MagicMock()
271
+ mock_response.json.return_value = {"studies": [mock_study]}
272
+ mock_response.raise_for_status = MagicMock()
273
+
274
+ with patch("requests.get", return_value=mock_response):
275
+ results = await tool.search("test", max_results=1)
276
+
277
+ assert len(results) == 1
278
+ assert "Primary Outcome" in results[0].content
279
+ assert "IIEF-EF" in results[0].content
280
+ assert "Week 12" in results[0].content
281
+
282
+ @pytest.mark.asyncio
283
+ async def test_includes_results_status(self, tool: ClinicalTrialsTool) -> None:
284
+ """Test that results availability is shown."""
285
+ mock_study = {
286
+ "protocolSection": {
287
+ "identificationModule": {"nctId": "NCT12345678", "briefTitle": "Test"},
288
+ "statusModule": {
289
+ "overallStatus": "COMPLETED",
290
+ "startDateStruct": {"date": "2023"},
291
+ # Note: resultsFirstPostDateStruct, not resultsFirstSubmitDate
292
+ "resultsFirstPostDateStruct": {"date": "2024-06-15"},
293
+ },
294
+ "descriptionModule": {"briefSummary": "Summary"},
295
+ "designModule": {"phases": ["PHASE3"]},
296
+ "conditionsModule": {"conditions": ["ED"]},
297
+ "armsInterventionsModule": {"interventions": []},
298
+ "outcomesModule": {},
299
+ },
300
+ "hasResults": True, # Note: hasResults is TOP-LEVEL
301
+ }
302
+
303
+ mock_response = MagicMock()
304
+ mock_response.json.return_value = {"studies": [mock_study]}
305
+ mock_response.raise_for_status = MagicMock()
306
+
307
+ with patch("requests.get", return_value=mock_response):
308
+ results = await tool.search("test", max_results=1)
309
+
310
+ assert "Results Available: Yes" in results[0].content
311
+ assert "2024-06-15" in results[0].content
312
+
313
+ @pytest.mark.asyncio
314
+ async def test_shows_no_results_when_missing(self, tool: ClinicalTrialsTool) -> None:
315
+ """Test that missing results are indicated."""
316
+ mock_study = {
317
+ "protocolSection": {
318
+ "identificationModule": {"nctId": "NCT12345678", "briefTitle": "Test"},
319
+ "statusModule": {"overallStatus": "RECRUITING", "startDateStruct": {"date": "2024"}},
320
+ "descriptionModule": {"briefSummary": "Summary"},
321
+ "designModule": {"phases": ["PHASE2"]},
322
+ "conditionsModule": {"conditions": ["ED"]},
323
+ "armsInterventionsModule": {"interventions": []},
324
+ "outcomesModule": {},
325
+ },
326
+ "hasResults": False,
327
+ }
328
+
329
+ mock_response = MagicMock()
330
+ mock_response.json.return_value = {"studies": [mock_study]}
331
+ mock_response.raise_for_status = MagicMock()
332
+
333
+ with patch("requests.get", return_value=mock_response):
334
+ results = await tool.search("test", max_results=1)
335
+
336
+ assert "Results Available: Not yet posted" in results[0].content
337
+
338
+ @pytest.mark.asyncio
339
+ async def test_boosts_relevance_for_results(self, tool: ClinicalTrialsTool) -> None:
340
+ """Trials with results should have higher relevance score."""
341
+ with_results = {
342
+ "protocolSection": {
343
+ "identificationModule": {"nctId": "NCT11111111", "briefTitle": "With Results"},
344
+ "statusModule": {"overallStatus": "COMPLETED", "startDateStruct": {"date": "2023"}},
345
+ "descriptionModule": {"briefSummary": "Summary"},
346
+ "designModule": {"phases": []},
347
+ "conditionsModule": {"conditions": []},
348
+ "armsInterventionsModule": {"interventions": []},
349
+ "outcomesModule": {},
350
+ },
351
+ "hasResults": True,
352
+ }
353
+ without_results = {
354
+ "protocolSection": {
355
+ "identificationModule": {"nctId": "NCT22222222", "briefTitle": "No Results"},
356
+ "statusModule": {"overallStatus": "RECRUITING", "startDateStruct": {"date": "2024"}},
357
+ "descriptionModule": {"briefSummary": "Summary"},
358
+ "designModule": {"phases": []},
359
+ "conditionsModule": {"conditions": []},
360
+ "armsInterventionsModule": {"interventions": []},
361
+ "outcomesModule": {},
362
+ },
363
+ "hasResults": False,
364
+ }
365
+
366
+ mock_response = MagicMock()
367
+ mock_response.json.return_value = {"studies": [with_results, without_results]}
368
+ mock_response.raise_for_status = MagicMock()
369
+
370
+ with patch("requests.get", return_value=mock_response):
371
+ results = await tool.search("test", max_results=2)
372
+
373
+ assert results[0].relevance == 0.90 # With results
374
+ assert results[1].relevance == 0.85 # Without results
375
+ ```
376
+
377
+ ### Integration Test
378
+
379
+ ```python
380
+ @pytest.mark.integration
381
+ class TestClinicalTrialsOutcomesIntegration:
382
+ """Integration tests with real API."""
383
+
384
+ @pytest.mark.asyncio
385
+ async def test_real_completed_trial_has_outcome(self) -> None:
386
+ """Real completed Phase 3 trials should have outcome measures."""
387
+ tool = ClinicalTrialsTool()
388
+
389
+ # Search for completed Phase 3 ED trials (likely to have outcomes)
390
+ results = await tool.search(
391
+ "sildenafil erectile dysfunction Phase 3 COMPLETED",
392
+ max_results=3
393
+ )
394
+
395
+ # At least one should have primary outcome
396
+ has_outcome = any("Primary Outcome" in r.content for r in results)
397
+ assert has_outcome, "No completed trials with outcome measures found"
398
+ ```
399
+
400
+ ---
401
+
402
+ ## Files to Modify
403
+
404
+ | File | Change |
405
+ |------|--------|
406
+ | `src/tools/clinicaltrials.py` | **ADD** `OutcomesModule` and `HasResults` to `FIELDS`, update `_study_to_evidence()` |
407
+ | `tests/unit/tools/test_clinicaltrials.py` | Add outcome parsing tests |
408
+
409
+ ---
410
+
411
+ ## Acceptance Criteria
412
+
413
+ ### FIELDS Constant (REQUIRED CHANGE)
414
+ - [ ] `FIELDS` includes `"OutcomesModule"` (returns full outcomesModule)
415
+ - [ ] `FIELDS` includes `"HasResults"` (returns top-level boolean)
416
+
417
+ ### `_study_to_evidence()` Method
418
+ - [ ] Extracts `protocolSection.outcomesModule.primaryOutcomes`
419
+ - [ ] Accesses `study.hasResults` at TOP LEVEL (not inside protocolSection)
420
+ - [ ] Results date extracted from `statusModule.resultsFirstPostDateStruct.date`
421
+ - [ ] Evidence content includes primary outcome measure when available
422
+ - [ ] Evidence content shows results availability status
423
+ - [ ] Outcome measure text truncated to 200 chars
424
+ - [ ] Trials with results have boosted relevance (0.90 vs 0.85)
425
+
426
+ ### Testing
427
+ - [ ] All unit tests pass
428
+ - [ ] Integration test confirms real trials return outcome data
429
+ - [ ] Live API test confirms `OutcomesModule` and `HasResults` fields work
430
+
431
+ ---
432
+
433
+ ## Edge Cases
434
+
435
+ 1. **No outcomes defined**: Some early-phase trials don't have outcomes yet
436
+ - Solution: Gracefully skip outcome section if `outcomesModule` is empty or missing
437
+
438
+ 2. **Multiple primary outcomes**: Some trials have 2-3 primary outcomes
439
+ - Solution: Show first outcome only, mention count of others
440
+
441
+ 3. **Long outcome descriptions**: Some measures are very verbose (500+ chars)
442
+ - Solution: Truncate measure to 200 chars with `[:200]`
443
+
444
+ 4. **hasResults without resultsFirstPostDateStruct**: Some completed trials may have results without a posted date
445
+ - Solution: Show "Results Available: Yes" without date
446
+
447
+ 5. **outcomesModule missing entirely**: Not all API responses include this module
448
+ - Solution: Use `.get("outcomesModule", {})` for safe access
449
+
450
+ ---
451
+
452
+ ## Rollback Plan
453
+
454
+ If outcome extraction causes issues:
455
+ 1. DO NOT modify `FIELDS` - nothing to revert there
456
+ 2. Remove outcome extraction code from `_study_to_evidence()`
457
+ 3. Existing tests should still pass
458
+
459
+ ---
460
+
461
+ ## References
462
+
463
+ - GitHub Issue #95
464
+ - [ClinicalTrials.gov API v2 Studies Endpoint](https://clinicaltrials.gov/data-api/api)
465
+ - [Stack Overflow - ClinicalTrials.gov API v2 Response Structure](https://stackoverflow.com/questions/78415818)
466
+ - `TOOL_ANALYSIS_CRITICAL.md` - "Tool 2: ClinicalTrials.gov > Current Implementation Gaps"
docs/specs/SPEC_15_ADVANCED_MODE_PERFORMANCE.md ADDED
@@ -0,0 +1,478 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SPEC_15: Advanced Mode Performance Optimization
2
+
3
+ **Status**: Draft (Validated - Implement All Solutions)
4
+ **Priority**: P1
5
+ **GitHub Issue**: #65
6
+ **Estimated Effort**: Medium (config changes + early termination logic)
7
+ **Last Updated**: 2025-11-30
8
+
9
+ > **Senior Review Verdict**: βœ… APPROVED
10
+ > **Recommendation**: Implement Solution A + B + C together. Solution B (Early Termination) is NOT "post-hackathon" - it's the core fix that solves the root cause. The patterns used are consistent with Microsoft Agent Framework best practices.
11
+
12
+ ---
13
+
14
+ ## Problem Statement
15
+
16
+ Advanced (Multi-Agent) mode runs **10 rounds of multi-agent coordination** which takes **10-15+ minutes**.
17
+
18
+ **For hackathon demos**: No judge will wait this long. They'll close the tab before seeing results.
19
+
20
+ ### Observed Behavior
21
+
22
+ - System works correctly (no crashes)
23
+ - Produces detailed, high-quality research output
24
+ - Takes too long for practical demo use
25
+ - User had to manually terminate after ~10 minutes
26
+
27
+ ### Current Configuration
28
+
29
+ ```python
30
+ # src/orchestrators/advanced.py:133
31
+ .with_standard_manager(
32
+ chat_client=manager_client,
33
+ max_round_count=self._max_rounds, # Default: 10
34
+ max_stall_count=3,
35
+ max_reset_count=2,
36
+ )
37
+ ```
38
+
39
+ ### Time Breakdown (Estimated)
40
+
41
+ | Component | Time per Round | Notes |
42
+ |-----------|---------------|-------|
43
+ | Manager LLM call | 2-5s | Decides next agent |
44
+ | Search Agent | 10-20s | 4 API calls (PubMed, CT, EPMC, OA) |
45
+ | Hypothesis Agent | 5-10s | LLM reasoning |
46
+ | Judge Agent | 5-10s | LLM evaluation |
47
+ | Report Agent | 10-20s | LLM synthesis (when called) |
48
+
49
+ **Total per round**: ~30-60 seconds
50
+ **10 rounds**: 5-10 minutes minimum
51
+
52
+ ---
53
+
54
+ ## Root Cause Analysis
55
+
56
+ ### Issue 1: Default `max_rounds=10` is Too High
57
+
58
+ The Microsoft Agent Framework keeps iterating until:
59
+ 1. `max_rounds` reached, OR
60
+ 2. Manager decides workflow is complete
61
+
62
+ For research tasks, the manager often wants "more evidence" and keeps searching.
63
+
64
+ ### Issue 2: No Early Termination Heuristic
65
+
66
+ Even when the Judge says `sufficient=True` with high confidence, the workflow continues because the manager wants to be thorough.
67
+
68
+ ### Issue 3: No User Expectation Setting
69
+
70
+ Users don't know how long to expect. Progress indication is minimal.
71
+
72
+ ---
73
+
74
+ ## Proposed Solutions
75
+
76
+ ### Solution A: Reduce Default `max_rounds` (QUICK FIX)
77
+
78
+ **Change**: Reduce `max_rounds` from 10 to 5 (or make configurable via env).
79
+
80
+ ```python
81
+ # src/orchestrators/advanced.py
82
+
83
+ def __init__(
84
+ self,
85
+ max_rounds: int | None = None, # Changed from 10
86
+ ...
87
+ ) -> None:
88
+ # Read from environment, default to 5 for faster demos
89
+ default_rounds = int(os.getenv("ADVANCED_MAX_ROUNDS", "5"))
90
+ self._max_rounds = max_rounds if max_rounds is not None else default_rounds
91
+ ```
92
+
93
+ **Pros**:
94
+ - Simple, 2-line change
95
+ - Immediately halves demo time
96
+
97
+ **Cons**:
98
+ - Less thorough research
99
+ - Trade-off: speed vs. quality
100
+
101
+ ### Solution B: Early Termination on High-Confidence Judge (RECOMMENDED)
102
+
103
+ **Change**: Add workflow termination signal when Judge returns `sufficient=True` with confidence > 70%.
104
+
105
+ This requires modifying the JudgeAgent to signal completion:
106
+
107
+ ```python
108
+ # src/agents/magentic_agents.py - create_judge_agent()
109
+
110
+ @chat_agent.on_message
111
+ async def handle_judge_message(message: str, context: Context) -> ChatMessage:
112
+ """Process judge request and potentially signal completion."""
113
+ # ... existing judge logic ...
114
+
115
+ assessment = await judge_handler.evaluate(evidence, query)
116
+
117
+ if assessment.sufficient and assessment.confidence >= 0.70:
118
+ # Signal to manager that we have enough evidence
119
+ # The manager prompt should respect this signal
120
+ return ChatMessage(
121
+ content=f"SUFFICIENT EVIDENCE (confidence: {assessment.confidence:.0%}). "
122
+ f"Recommend immediate synthesis. {assessment.reasoning}",
123
+ metadata={"sufficient": True, "confidence": assessment.confidence},
124
+ )
125
+
126
+ return ChatMessage(content=f"INSUFFICIENT: {assessment.reasoning}")
127
+ ```
128
+
129
+ And update the manager's system prompt to respect this:
130
+
131
+ ```python
132
+ # src/orchestrators/advanced.py - _build_workflow()
133
+
134
+ manager_system_prompt = """You are a research workflow manager.
135
+
136
+ IMPORTANT: When JudgeAgent returns "SUFFICIENT EVIDENCE", immediately
137
+ delegate to ReportAgent for final synthesis. Do NOT continue searching.
138
+
139
+ Workflow:
140
+ 1. SearchAgent finds evidence
141
+ 2. HypothesisAgent generates hypotheses
142
+ 3. JudgeAgent evaluates sufficiency
143
+ 4. IF sufficient β†’ ReportAgent synthesizes (END)
144
+ 5. IF insufficient β†’ SearchAgent refines search (CONTINUE)
145
+ """
146
+ ```
147
+
148
+ **Pros**:
149
+ - Respects actual evidence quality
150
+ - Can terminate early (round 3-4) when evidence is strong
151
+ - Maintains quality for complex queries
152
+
153
+ **Cons**:
154
+ - Requires testing to ensure manager respects signal
155
+ - More complex change
156
+
157
+ ### Solution C: Better Progress Indication
158
+
159
+ Add estimated time remaining to UI:
160
+
161
+ ```python
162
+ # src/orchestrators/advanced.py - run()
163
+
164
+ yield AgentEvent(
165
+ type="progress",
166
+ message=f"Round {iteration}/{self._max_rounds} "
167
+ f"(~{(self._max_rounds - iteration) * 45}s remaining)",
168
+ iteration=iteration,
169
+ )
170
+ ```
171
+
172
+ **Pros**:
173
+ - Sets user expectations
174
+ - Doesn't change workflow behavior
175
+
176
+ **Cons**:
177
+ - Doesn't actually speed up the workflow
178
+
179
+ ---
180
+
181
+ ## Recommended Implementation
182
+
183
+ **IMPLEMENT ALL THREE SOLUTIONS NOW**:
184
+
185
+ 1. **Solution A**: Reduce `max_rounds` to 5 via environment variable
186
+ 2. **Solution B**: Early termination when Judge returns `sufficient=True` with confidence β‰₯70%
187
+ 3. **Solution C**: Better progress indication with time estimates
188
+
189
+ > **Why Solution B NOW?** The Manager acting as a "termination condition" based on Judge feedback is a standard multi-agent pattern (Critique/Refine loop with exit). This aligns with Microsoft Agent Framework best practices and solves the ROOT CAUSE, not just a symptom.
190
+
191
+ ---
192
+
193
+ ## Implementation Details
194
+
195
+ ### Phase 1: All Solutions Together (A + B + C)
196
+
197
+ #### 1. Update Advanced Orchestrator Constructor
198
+
199
+ ```python
200
+ # src/orchestrators/advanced.py
201
+
202
+ import os
203
+
204
+ class AdvancedOrchestrator(OrchestratorProtocol):
205
+ def __init__(
206
+ self,
207
+ max_rounds: int | None = None,
208
+ chat_client: OpenAIChatClient | None = None,
209
+ api_key: str | None = None,
210
+ timeout_seconds: float = 300.0, # Reduced from 600 to 5 min
211
+ domain: ResearchDomain | str | None = None,
212
+ ) -> None:
213
+ # Environment-configurable rounds (default 5 for demos)
214
+ default_rounds = int(os.getenv("ADVANCED_MAX_ROUNDS", "5"))
215
+ self._max_rounds = max_rounds if max_rounds is not None else default_rounds
216
+ self._timeout_seconds = timeout_seconds
217
+ # ... rest unchanged ...
218
+ ```
219
+
220
+ #### 2. Add Progress Estimation
221
+
222
+ ```python
223
+ # src/orchestrators/advanced.py - run()
224
+
225
+ # After processing MagenticAgentMessageEvent:
226
+ if isinstance(event, MagenticAgentMessageEvent):
227
+ iteration += 1
228
+ rounds_remaining = self._max_rounds - iteration
229
+ # Estimate ~45s per round based on observed timing
230
+ est_seconds = rounds_remaining * 45
231
+ est_display = f"{est_seconds // 60}m {est_seconds % 60}s" if est_seconds >= 60 else f"{est_seconds}s"
232
+
233
+ yield AgentEvent(
234
+ type="progress",
235
+ message=f"Round {iteration}/{self._max_rounds} (~{est_display} remaining)",
236
+ iteration=iteration,
237
+ )
238
+ ```
239
+
240
+ #### 3. Update UI Message (Solution C)
241
+
242
+ ```python
243
+ # src/orchestrators/advanced.py - run()
244
+
245
+ # UX FIX: More accurate timing message
246
+ yield AgentEvent(
247
+ type="thinking",
248
+ message=(
249
+ f"Multi-agent reasoning in progress ({self._max_rounds} rounds max)... "
250
+ f"Estimated time: {self._max_rounds * 45 // 60}-{self._max_rounds * 60 // 60} minutes."
251
+ ),
252
+ iteration=0,
253
+ )
254
+ ```
255
+
256
+ #### 4. Add Early Termination Signal (Solution B)
257
+
258
+ ```python
259
+ # src/agents/magentic_agents.py - Update create_judge_agent()
260
+
261
+ @chat_agent.on_message
262
+ async def handle_judge_message(message: str, context: Context) -> ChatMessage:
263
+ """Process judge request and signal completion when evidence is sufficient."""
264
+ # ... existing parsing logic to extract evidence and query ...
265
+
266
+ assessment = await judge_handler.evaluate(evidence, query)
267
+
268
+ # NEW: Strong termination signal for high-confidence assessments
269
+ if assessment.sufficient and assessment.confidence >= 0.70:
270
+ # Clear, unambiguous signal that Manager should respect
271
+ return ChatMessage(
272
+ content=(
273
+ f"βœ… SUFFICIENT EVIDENCE (confidence: {assessment.confidence:.0%}). "
274
+ f"STOP SEARCHING. Delegate to ReportAgent NOW for final synthesis. "
275
+ f"Reasoning: {assessment.reasoning}"
276
+ ),
277
+ metadata={"sufficient": True, "confidence": assessment.confidence},
278
+ )
279
+
280
+ # Insufficient - continue the loop
281
+ return ChatMessage(
282
+ content=(
283
+ f"❌ INSUFFICIENT: {assessment.reasoning}. "
284
+ f"Confidence: {assessment.confidence:.0%}. "
285
+ f"Suggested refinements: {', '.join(assessment.next_search_queries[:2])}"
286
+ )
287
+ )
288
+ ```
289
+
290
+ #### 5. Update Manager System Prompt (Solution B)
291
+
292
+ ```python
293
+ # src/orchestrators/advanced.py - _build_workflow()
294
+
295
+ MANAGER_SYSTEM_PROMPT = """You are a medical research workflow manager.
296
+
297
+ ## CRITICAL RULE
298
+ When JudgeAgent says "SUFFICIENT EVIDENCE" or "STOP SEARCHING":
299
+ β†’ IMMEDIATELY delegate to ReportAgent for synthesis
300
+ β†’ Do NOT continue searching or gathering more evidence
301
+ β†’ The Judge has determined evidence quality is adequate
302
+
303
+ ## Standard Workflow
304
+ 1. SearchAgent β†’ finds evidence from PubMed, ClinicalTrials, etc.
305
+ 2. HypothesisAgent β†’ generates testable hypotheses
306
+ 3. JudgeAgent β†’ evaluates evidence sufficiency
307
+ 4. IF sufficient β†’ ReportAgent (DONE)
308
+ 5. IF insufficient β†’ SearchAgent with refined queries (CONTINUE)
309
+
310
+ ## Your Role
311
+ - Coordinate agents efficiently
312
+ - Respect the Judge's termination signal
313
+ - Prioritize completing the task over perfection
314
+ """
315
+ ```
316
+
317
+ ---
318
+
319
+ ## Test Plan
320
+
321
+ ### Unit Tests
322
+
323
+ ```python
324
+ # tests/unit/orchestrators/test_advanced_orchestrator.py
325
+
326
+ import os
327
+ from unittest.mock import patch
328
+
329
+ import pytest
330
+
331
+ from src.orchestrators.advanced import AdvancedOrchestrator
332
+
333
+
334
+ class TestAdvancedOrchestratorConfig:
335
+ """Tests for configuration options."""
336
+
337
+ def test_default_max_rounds_is_five(self) -> None:
338
+ """Default max_rounds should be 5 for faster demos."""
339
+ with patch.dict(os.environ, {}, clear=True):
340
+ # Clear any existing env var
341
+ os.environ.pop("ADVANCED_MAX_ROUNDS", None)
342
+ orch = AdvancedOrchestrator.__new__(AdvancedOrchestrator)
343
+ orch.__init__()
344
+ assert orch._max_rounds == 5
345
+
346
+ def test_max_rounds_from_env(self) -> None:
347
+ """max_rounds should be configurable via environment."""
348
+ with patch.dict(os.environ, {"ADVANCED_MAX_ROUNDS": "3"}):
349
+ orch = AdvancedOrchestrator.__new__(AdvancedOrchestrator)
350
+ orch.__init__()
351
+ assert orch._max_rounds == 3
352
+
353
+ def test_explicit_max_rounds_overrides_env(self) -> None:
354
+ """Explicit parameter should override environment."""
355
+ with patch.dict(os.environ, {"ADVANCED_MAX_ROUNDS": "3"}):
356
+ orch = AdvancedOrchestrator.__new__(AdvancedOrchestrator)
357
+ orch.__init__(max_rounds=7)
358
+ assert orch._max_rounds == 7
359
+
360
+ def test_timeout_default_is_five_minutes(self) -> None:
361
+ """Default timeout should be 300s (5 min) for faster failure."""
362
+ orch = AdvancedOrchestrator.__new__(AdvancedOrchestrator)
363
+ orch.__init__()
364
+ assert orch._timeout_seconds == 300.0
365
+ ```
366
+
367
+ ### Integration Test (Manual)
368
+
369
+ ```bash
370
+ # Run advanced mode with reduced rounds
371
+ ADVANCED_MAX_ROUNDS=3 uv run python -c "
372
+ import asyncio
373
+ from src.orchestrators.advanced import AdvancedOrchestrator
374
+
375
+ async def test():
376
+ orch = AdvancedOrchestrator()
377
+ print(f'Max rounds: {orch._max_rounds}') # Should be 3
378
+
379
+ async for event in orch.run('sildenafil mechanism'):
380
+ print(f'{event.type}: {event.message[:100]}...')
381
+
382
+ asyncio.run(test())
383
+ "
384
+ ```
385
+
386
+ ### Timing Benchmark
387
+
388
+ Create a benchmark script to measure actual performance:
389
+
390
+ ```python
391
+ # examples/benchmark_advanced.py
392
+ """Benchmark Advanced mode with different max_rounds settings."""
393
+
394
+ import asyncio
395
+ import os
396
+ import time
397
+
398
+
399
+ async def benchmark(max_rounds: int) -> float:
400
+ """Run benchmark with specified rounds, return elapsed time."""
401
+ os.environ["ADVANCED_MAX_ROUNDS"] = str(max_rounds)
402
+
403
+ # Import after setting env
404
+ from src.orchestrators.advanced import AdvancedOrchestrator
405
+
406
+ orch = AdvancedOrchestrator()
407
+ start = time.time()
408
+
409
+ async for event in orch.run("sildenafil erectile dysfunction"):
410
+ if event.type == "complete":
411
+ break
412
+
413
+ return time.time() - start
414
+
415
+
416
+ async def main() -> None:
417
+ """Run benchmarks for different configurations."""
418
+ for rounds in [3, 5, 7, 10]:
419
+ elapsed = await benchmark(rounds)
420
+ print(f"max_rounds={rounds}: {elapsed:.1f}s ({elapsed/60:.1f}min)")
421
+
422
+
423
+ if __name__ == "__main__":
424
+ asyncio.run(main())
425
+ ```
426
+
427
+ ---
428
+
429
+ ## Files to Modify
430
+
431
+ | File | Change |
432
+ |------|--------|
433
+ | `src/orchestrators/advanced.py` | Add env-configurable `max_rounds`, reduce default to 5, add progress estimation, update Manager prompt |
434
+ | `src/agents/magentic_agents.py` | Add early termination signal in JudgeAgent |
435
+ | `tests/unit/orchestrators/test_advanced_orchestrator.py` | Add config tests |
436
+ | `tests/unit/agents/test_magentic_judge_termination.py` | Add termination signal tests |
437
+ | `examples/benchmark_advanced.py` | Add timing benchmark (optional) |
438
+
439
+ ---
440
+
441
+ ## Acceptance Criteria
442
+
443
+ ### Solution A: Configuration
444
+ - [ ] Default `max_rounds` is 5 (not 10)
445
+ - [ ] `max_rounds` configurable via `ADVANCED_MAX_ROUNDS` env var
446
+ - [ ] Explicit `max_rounds` parameter overrides env var
447
+ - [ ] Default timeout is 5 minutes (300s, not 600s)
448
+
449
+ ### Solution B: Early Termination
450
+ - [ ] JudgeAgent returns "SUFFICIENT EVIDENCE" message when confidence β‰₯70%
451
+ - [ ] JudgeAgent returns "STOP SEARCHING" in termination signal
452
+ - [ ] Manager system prompt includes explicit termination instructions
453
+ - [ ] Workflow terminates early when Judge signals sufficiency (observed in logs)
454
+
455
+ ### Solution C: Progress Indication
456
+ - [ ] Progress events show current round / max rounds
457
+ - [ ] Progress events show estimated time remaining
458
+ - [ ] Initial "thinking" message shows estimated total time
459
+
460
+ ### Overall
461
+ - [ ] Demo completes in <5 minutes with useful output
462
+ - [ ] Quality of output is maintained (no degradation from early termination)
463
+
464
+ ---
465
+
466
+ ## Rollback Plan
467
+
468
+ If reduced rounds cause quality issues:
469
+ 1. Increase `ADVANCED_MAX_ROUNDS` environment variable
470
+ 2. No code changes needed
471
+
472
+ ---
473
+
474
+ ## References
475
+
476
+ - GitHub Issue #65
477
+ - Microsoft Agent Framework: https://github.com/microsoft/agent-framework
478
+ - MagenticBuilder docs: Round configuration
docs/specs/{SPEC_01_DEMO_TERMINATION.md β†’ archive/SPEC_01_DEMO_TERMINATION.md} RENAMED
File without changes
docs/specs/{SPEC_02_E2E_TESTING.md β†’ archive/SPEC_02_E2E_TESTING.md} RENAMED
File without changes
docs/specs/{SPEC_03_OPENALEX_INTEGRATION.md β†’ archive/SPEC_03_OPENALEX_INTEGRATION.md} RENAMED
File without changes
docs/specs/{SPEC_04_MAGENTIC_UX.md β†’ archive/SPEC_04_MAGENTIC_UX.md} RENAMED
File without changes
docs/specs/{SPEC_05_ORCHESTRATOR_CLEANUP.md β†’ archive/SPEC_05_ORCHESTRATOR_CLEANUP.md} RENAMED
File without changes
docs/specs/{SPEC_06_SIMPLE_MODE_SYNTHESIS.md β†’ archive/SPEC_06_SIMPLE_MODE_SYNTHESIS.md} RENAMED
File without changes
docs/specs/{SPEC_07_LANGGRAPH_MEMORY_ARCH.md β†’ archive/SPEC_07_LANGGRAPH_MEMORY_ARCH.md} RENAMED
File without changes
docs/specs/{SPEC_08_INTEGRATE_MEMORY_LAYER.md β†’ archive/SPEC_08_INTEGRATE_MEMORY_LAYER.md} RENAMED
File without changes
docs/specs/{SPEC_09_LLAMAINDEX_INTEGRATION.md β†’ archive/SPEC_09_LLAMAINDEX_INTEGRATION.md} RENAMED
File without changes
docs/specs/{SPEC_10_DOMAIN_AGNOSTIC_REFACTOR.md β†’ archive/SPEC_10_DOMAIN_AGNOSTIC_REFACTOR.md} RENAMED
File without changes
docs/specs/{SPEC_11_SEXUAL_HEALTH_FOCUS.md β†’ archive/SPEC_11_SEXUAL_HEALTH_FOCUS.md} RENAMED
File without changes
docs/specs/{SPEC_12_NARRATIVE_SYNTHESIS.md β†’ archive/SPEC_12_NARRATIVE_SYNTHESIS.md} RENAMED
File without changes
src/tools/openalex.py CHANGED
@@ -1,5 +1,6 @@
1
  """OpenAlex search tool - citation-aware scholarly search."""
2
 
 
3
  from typing import Any
4
 
5
  import httpx
@@ -104,6 +105,16 @@ class OpenAlexTool:
104
  openalex_id = work.get("id", "")
105
  url = openalex_id if openalex_id else "https://openalex.org"
106
 
 
 
 
 
 
 
 
 
 
 
107
  # Prepend citation badge to content
108
  citation_badge = f"[Cited by {cited_by_count}] " if cited_by_count > 0 else ""
109
  content = f"{citation_badge}{abstract[:1900]}"
@@ -127,6 +138,7 @@ class OpenAlexTool:
127
  "concepts": concepts,
128
  "is_open_access": is_oa,
129
  "pdf_url": pdf_url,
 
130
  },
131
  )
132
 
 
1
  """OpenAlex search tool - citation-aware scholarly search."""
2
 
3
+ import re
4
  from typing import Any
5
 
6
  import httpx
 
105
  openalex_id = work.get("id", "")
106
  url = openalex_id if openalex_id else "https://openalex.org"
107
 
108
+ # NEW: Extract PMID from ids object for deduplication
109
+ ids_obj = work.get("ids", {})
110
+ pmid_url = ids_obj.get("pmid") # "https://pubmed.ncbi.nlm.nih.gov/29456894"
111
+ pmid = None
112
+ if pmid_url and isinstance(pmid_url, str) and "pubmed.ncbi.nlm.nih.gov" in pmid_url:
113
+ # Extract numeric PMID from URL
114
+ pmid_match = re.search(r"/(\d+)/?$", pmid_url)
115
+ if pmid_match:
116
+ pmid = pmid_match.group(1)
117
+
118
  # Prepend citation badge to content
119
  citation_badge = f"[Cited by {cited_by_count}] " if cited_by_count > 0 else ""
120
  content = f"{citation_badge}{abstract[:1900]}"
 
138
  "concepts": concepts,
139
  "is_open_access": is_oa,
140
  "pdf_url": pdf_url,
141
+ "pmid": pmid, # NEW: Store PMID for deduplication
142
  },
143
  )
144
 
src/tools/search_handler.py CHANGED
@@ -1,7 +1,8 @@
1
  """Search handler - orchestrates multiple search tools."""
2
 
3
  import asyncio
4
- from typing import cast
 
5
 
6
  import structlog
7
 
@@ -9,9 +10,124 @@ from src.tools.base import SearchTool
9
  from src.utils.exceptions import SearchError
10
  from src.utils.models import Evidence, SearchResult, SourceName
11
 
 
 
 
12
  logger = structlog.get_logger()
13
 
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  class SearchHandler:
16
  """Orchestrates parallel searches across multiple tools."""
17
 
@@ -66,6 +182,19 @@ class SearchHandler:
66
  sources_searched.append(tool_name)
67
  logger.info("Search tool succeeded", tool=tool.name, count=len(success_result))
68
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69
  return SearchResult(
70
  query=query,
71
  evidence=all_evidence,
 
1
  """Search handler - orchestrates multiple search tools."""
2
 
3
  import asyncio
4
+ import re
5
+ from typing import TYPE_CHECKING, cast
6
 
7
  import structlog
8
 
 
10
  from src.utils.exceptions import SearchError
11
  from src.utils.models import Evidence, SearchResult, SourceName
12
 
13
+ if TYPE_CHECKING:
14
+ from src.utils.models import Evidence
15
+
16
  logger = structlog.get_logger()
17
 
18
 
19
+ def extract_paper_id(evidence: "Evidence") -> str | None:
20
+ """Extract unique paper identifier from Evidence.
21
+
22
+ Strategy:
23
+ 1. Check metadata.pmid first (OpenAlex provides this)
24
+ 2. Fall back to URL pattern matching
25
+
26
+ Supports:
27
+ - PubMed: https://pubmed.ncbi.nlm.nih.gov/12345678/
28
+ - Europe PMC MED: https://europepmc.org/article/MED/12345678
29
+ - Europe PMC PMC: https://europepmc.org/article/PMC/PMC1234567
30
+ - Europe PMC PPR: https://europepmc.org/article/PPR/PPR123456
31
+ - Europe PMC PAT: https://europepmc.org/article/PAT/WO8601415
32
+ - DOI: https://doi.org/10.1234/...
33
+ - OpenAlex: https://openalex.org/W1234567890
34
+ - ClinicalTrials: https://clinicaltrials.gov/study/NCT12345678
35
+ - ClinicalTrials (legacy): https://clinicaltrials.gov/ct2/show/NCT12345678
36
+ """
37
+ url = evidence.citation.url
38
+ metadata = evidence.metadata or {}
39
+
40
+ # Strategy 1: Check metadata.pmid (from OpenAlex)
41
+ if pmid := metadata.get("pmid"):
42
+ return f"PMID:{pmid}"
43
+
44
+ # Strategy 2: URL pattern matching
45
+
46
+ # PubMed URL pattern
47
+ pmid_match = re.search(r"pubmed\.ncbi\.nlm\.nih\.gov/(\d+)", url)
48
+ if pmid_match:
49
+ return f"PMID:{pmid_match.group(1)}"
50
+
51
+ # Europe PMC MED pattern (same as PMID)
52
+ epmc_med_match = re.search(r"europepmc\.org/article/MED/(\d+)", url)
53
+ if epmc_med_match:
54
+ return f"PMID:{epmc_med_match.group(1)}"
55
+
56
+ # Europe PMC PMC pattern (PubMed Central ID - different from PMID!)
57
+ epmc_pmc_match = re.search(r"europepmc\.org/article/PMC/(PMC\d+)", url)
58
+ if epmc_pmc_match:
59
+ return f"PMCID:{epmc_pmc_match.group(1)}"
60
+
61
+ # Europe PMC PPR pattern (Preprint ID - unique per preprint)
62
+ epmc_ppr_match = re.search(r"europepmc\.org/article/PPR/(PPR\d+)", url)
63
+ if epmc_ppr_match:
64
+ return f"PPRID:{epmc_ppr_match.group(1)}"
65
+
66
+ # Europe PMC PAT pattern (Patent ID - e.g., WO8601415, EP1234567)
67
+ epmc_pat_match = re.search(r"europepmc\.org/article/PAT/([A-Z]{2}\d+)", url)
68
+ if epmc_pat_match:
69
+ return f"PATID:{epmc_pat_match.group(1)}"
70
+
71
+ # DOI pattern (normalize trailing slash/characters)
72
+ doi_match = re.search(r"doi\.org/(10\.\d+/[^\s\]>]+)", url)
73
+ if doi_match:
74
+ doi = doi_match.group(1).rstrip("/")
75
+ return f"DOI:{doi}"
76
+
77
+ # OpenAlex ID pattern (fallback if no PMID in metadata)
78
+ openalex_match = re.search(r"openalex\.org/(W\d+)", url)
79
+ if openalex_match:
80
+ return f"OAID:{openalex_match.group(1)}"
81
+
82
+ # ClinicalTrials NCT ID (modern format)
83
+ nct_match = re.search(r"clinicaltrials\.gov/study/(NCT\d+)", url)
84
+ if nct_match:
85
+ return f"NCT:{nct_match.group(1)}"
86
+
87
+ # ClinicalTrials NCT ID (legacy format)
88
+ nct_legacy_match = re.search(r"clinicaltrials\.gov/ct2/show/(NCT\d+)", url)
89
+ if nct_legacy_match:
90
+ return f"NCT:{nct_legacy_match.group(1)}"
91
+
92
+ return None
93
+
94
+
95
+ def deduplicate_evidence(evidence_list: list["Evidence"]) -> list["Evidence"]:
96
+ """Remove duplicate evidence based on paper ID.
97
+
98
+ Deduplication priority:
99
+ 1. PubMed (authoritative source)
100
+ 2. Europe PMC (full text links)
101
+ 3. OpenAlex (citation data)
102
+ 4. ClinicalTrials (unique, never duplicated)
103
+
104
+ Returns:
105
+ Deduplicated list preserving source priority order.
106
+ """
107
+ seen_ids: set[str] = set()
108
+ unique: list[Evidence] = []
109
+
110
+ # Sort by source priority (PubMed first)
111
+ source_priority = {"pubmed": 0, "europepmc": 1, "openalex": 2, "clinicaltrials": 3}
112
+ sorted_evidence = sorted(
113
+ evidence_list, key=lambda e: source_priority.get(e.citation.source, 99)
114
+ )
115
+
116
+ for evidence in sorted_evidence:
117
+ paper_id = extract_paper_id(evidence)
118
+
119
+ if paper_id is None:
120
+ # Can't identify - keep it (conservative)
121
+ unique.append(evidence)
122
+ continue
123
+
124
+ if paper_id not in seen_ids:
125
+ seen_ids.add(paper_id)
126
+ unique.append(evidence)
127
+
128
+ return unique
129
+
130
+
131
  class SearchHandler:
132
  """Orchestrates parallel searches across multiple tools."""
133
 
 
182
  sources_searched.append(tool_name)
183
  logger.info("Search tool succeeded", tool=tool.name, count=len(success_result))
184
 
185
+ # DEDUPLICATION STEP
186
+ original_count = len(all_evidence)
187
+ all_evidence = deduplicate_evidence(all_evidence)
188
+ dedup_count = original_count - len(all_evidence)
189
+
190
+ if dedup_count > 0:
191
+ logger.info(
192
+ "Deduplicated evidence",
193
+ original=original_count,
194
+ unique=len(all_evidence),
195
+ removed=dedup_count,
196
+ )
197
+
198
  return SearchResult(
199
  query=query,
200
  evidence=all_evidence,
tests/unit/tools/test_openalex.py CHANGED
@@ -38,6 +38,30 @@ SAMPLE_OPENALEX_RESPONSE = {
38
  ]
39
  }
40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  @pytest.mark.unit
43
  class TestOpenAlexTool:
@@ -144,6 +168,26 @@ class TestOpenAlexTool:
144
  assert "sildenafil" in params["search"]
145
  assert params["per_page"] == 3
146
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
147
 
148
  @pytest.mark.integration
149
  class TestOpenAlexIntegration:
 
38
  ]
39
  }
40
 
41
+ # Sample response WITH PMID (for deduplication testing)
42
+ SAMPLE_OPENALEX_WITH_PMID = {
43
+ "results": [
44
+ {
45
+ "id": "https://openalex.org/W98765",
46
+ "doi": "https://doi.org/10.1038/nature12345",
47
+ "display_name": "Paper with PMID for deduplication",
48
+ "publication_year": 2023,
49
+ "cited_by_count": 50,
50
+ "abstract_inverted_index": {"Test": [0], "abstract": [1]},
51
+ "concepts": [],
52
+ "authorships": [],
53
+ "open_access": {"is_oa": False},
54
+ "best_oa_location": None,
55
+ # CRITICAL: ids object with PMID for cross-source deduplication
56
+ "ids": {
57
+ "openalex": "https://openalex.org/W98765",
58
+ "doi": "https://doi.org/10.1038/nature12345",
59
+ "pmid": "https://pubmed.ncbi.nlm.nih.gov/29456894",
60
+ },
61
+ }
62
+ ]
63
+ }
64
+
65
 
66
  @pytest.mark.unit
67
  class TestOpenAlexTool:
 
168
  assert "sildenafil" in params["search"]
169
  assert params["per_page"] == 3
170
 
171
+ @pytest.mark.asyncio
172
+ async def test_extracts_pmid_from_ids_object(self, tool: OpenAlexTool, mock_client) -> None:
173
+ """PMID should be extracted from ids.pmid for cross-source deduplication."""
174
+ mock_client.get.return_value.json.return_value = SAMPLE_OPENALEX_WITH_PMID
175
+
176
+ results = await tool.search("test", max_results=1)
177
+
178
+ assert len(results) == 1
179
+ # PMID should be extracted from URL and stored as numeric string
180
+ assert results[0].metadata["pmid"] == "29456894"
181
+
182
+ @pytest.mark.asyncio
183
+ async def test_pmid_is_none_when_not_present(self, tool: OpenAlexTool, mock_client) -> None:
184
+ """PMID should be None when ids.pmid is not in response."""
185
+ # SAMPLE_OPENALEX_RESPONSE has no ids.pmid field
186
+ results = await tool.search("sildenafil ED", max_results=1)
187
+
188
+ assert len(results) == 1
189
+ assert results[0].metadata["pmid"] is None
190
+
191
 
192
  @pytest.mark.integration
193
  class TestOpenAlexIntegration:
tests/unit/tools/test_search_handler.py CHANGED
@@ -5,43 +5,203 @@ from unittest.mock import AsyncMock, create_autospec
5
  import pytest
6
 
7
  from src.tools.base import SearchTool
8
- from src.tools.search_handler import SearchHandler
9
  from src.utils.exceptions import SearchError
10
  from src.utils.models import Citation, Evidence
11
 
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  class TestSearchHandler:
14
  """Tests for SearchHandler."""
15
 
16
  @pytest.mark.asyncio
17
- async def test_execute_aggregates_results(self):
18
- """SearchHandler should aggregate results from all tools."""
19
  # Setup
20
  mock_tool1 = AsyncMock(spec=SearchTool)
21
  mock_tool1.name = "pubmed"
22
  mock_tool1.search.return_value = [
23
- Evidence(
24
- content="C1",
25
- citation=Citation(source="pubmed", title="T1", url="u1", date="2024"),
26
- )
27
  ]
28
 
29
  mock_tool2 = AsyncMock(spec=SearchTool)
30
- mock_tool2.name = "clinicaltrials"
 
31
  mock_tool2.search.return_value = [
32
- Evidence(
33
- content="C2",
34
- citation=Citation(source="clinicaltrials", title="T2", url="u2", date="2024"),
35
- )
36
  ]
37
 
38
  handler = SearchHandler(tools=[mock_tool1, mock_tool2])
39
 
40
  # Execute
41
- result = await handler.execute("testosterone libido", max_results_per_tool=3)
42
- assert result.total_found == 2
 
 
 
 
43
  assert "pubmed" in result.sources_searched
44
- assert "clinicaltrials" in result.sources_searched
45
 
46
  @pytest.mark.asyncio
47
  async def test_execute_handles_tool_failure(self):
@@ -49,16 +209,11 @@ class TestSearchHandler:
49
  mock_tool_ok = create_autospec(SearchTool, instance=True)
50
  mock_tool_ok.name = "pubmed"
51
  mock_tool_ok.search = AsyncMock(
52
- return_value=[
53
- Evidence(
54
- content="Good result",
55
- citation=Citation(source="pubmed", title="T", url="u", date="2024"),
56
- )
57
- ]
58
  )
59
 
60
  mock_tool_fail = create_autospec(SearchTool, instance=True)
61
- mock_tool_fail.name = "pubmed" # Mocking a second pubmed instance failing
62
  mock_tool_fail.search = AsyncMock(side_effect=SearchError("API down"))
63
 
64
  handler = SearchHandler(tools=[mock_tool_ok, mock_tool_fail])
@@ -67,22 +222,4 @@ class TestSearchHandler:
67
  assert result.total_found == 1
68
  assert "pubmed" in result.sources_searched
69
  assert len(result.errors) == 1
70
- # The error message format is "{tool.name}: {error!s}"
71
- assert "pubmed: API down" in result.errors[0]
72
-
73
- @pytest.mark.asyncio
74
- async def test_search_handler_pubmed_only(self):
75
- """SearchHandler should work with only PubMed tool."""
76
- # This is the specific test requested in Phase 9 spec
77
- from src.tools.pubmed import PubMedTool
78
-
79
- mock_pubmed = AsyncMock(spec=PubMedTool)
80
- mock_pubmed.name = "pubmed"
81
- mock_pubmed.search.return_value = []
82
-
83
- handler = SearchHandler(tools=[mock_pubmed], timeout=30.0)
84
- result = await handler.execute("testosterone libido", max_results_per_tool=3)
85
-
86
- assert result.sources_searched == ["pubmed"]
87
- assert "web" not in result.sources_searched
88
- assert len(result.errors) == 0
 
5
  import pytest
6
 
7
  from src.tools.base import SearchTool
8
+ from src.tools.search_handler import SearchHandler, deduplicate_evidence, extract_paper_id
9
  from src.utils.exceptions import SearchError
10
  from src.utils.models import Citation, Evidence
11
 
12
 
13
+ def _make_evidence(source: str, url: str, metadata: dict | None = None) -> Evidence:
14
+ """Helper to create Evidence objects for testing."""
15
+ return Evidence(
16
+ content="Test content",
17
+ citation=Citation(
18
+ source=source,
19
+ title="Test",
20
+ url=url,
21
+ date="2024",
22
+ authors=[],
23
+ ),
24
+ metadata=metadata or {},
25
+ )
26
+
27
+
28
+ class TestExtractPaperId:
29
+ """Tests for paper ID extraction from Evidence objects."""
30
+
31
+ def test_extracts_pubmed_id(self) -> None:
32
+ evidence = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
33
+ assert extract_paper_id(evidence) == "PMID:12345678"
34
+
35
+ def test_extracts_europepmc_med_id(self) -> None:
36
+ evidence = _make_evidence("europepmc", "https://europepmc.org/article/MED/12345678")
37
+ assert extract_paper_id(evidence) == "PMID:12345678"
38
+
39
+ def test_extracts_europepmc_pmc_id(self) -> None:
40
+ """Europe PMC PMC articles have different ID format."""
41
+ evidence = _make_evidence("europepmc", "https://europepmc.org/article/PMC/PMC7654321")
42
+ assert extract_paper_id(evidence) == "PMCID:PMC7654321"
43
+
44
+ def test_extracts_europepmc_ppr_id(self) -> None:
45
+ """Europe PMC preprints have PPR IDs."""
46
+ evidence = _make_evidence("europepmc", "https://europepmc.org/article/PPR/PPR123456")
47
+ assert extract_paper_id(evidence) == "PPRID:PPR123456"
48
+
49
+ def test_extracts_europepmc_pat_id(self) -> None:
50
+ """Europe PMC patents have PAT IDs (WIPO format)."""
51
+ evidence = _make_evidence("europepmc", "https://europepmc.org/article/PAT/WO8601415")
52
+ assert extract_paper_id(evidence) == "PATID:WO8601415"
53
+
54
+ def test_extracts_europepmc_pat_id_eu_format(self) -> None:
55
+ """European patent format should also work."""
56
+ evidence = _make_evidence("europepmc", "https://europepmc.org/article/PAT/EP1234567")
57
+ assert extract_paper_id(evidence) == "PATID:EP1234567"
58
+
59
+ def test_extracts_doi(self) -> None:
60
+ evidence = _make_evidence("pubmed", "https://doi.org/10.1038/nature12345")
61
+ assert extract_paper_id(evidence) == "DOI:10.1038/nature12345"
62
+
63
+ def test_extracts_doi_with_trailing_slash(self) -> None:
64
+ """DOIs should be normalized (trailing slash removed)."""
65
+ evidence = _make_evidence("pubmed", "https://doi.org/10.1038/nature12345/")
66
+ assert extract_paper_id(evidence) == "DOI:10.1038/nature12345"
67
+
68
+ def test_extracts_openalex_id_from_url(self) -> None:
69
+ """OpenAlex ID from URL (fallback when no PMID in metadata)."""
70
+ evidence = _make_evidence("openalex", "https://openalex.org/W1234567890")
71
+ assert extract_paper_id(evidence) == "OAID:W1234567890"
72
+
73
+ def test_extracts_openalex_pmid_from_metadata(self) -> None:
74
+ """OpenAlex PMID from metadata takes priority over URL."""
75
+ evidence = _make_evidence(
76
+ "openalex",
77
+ "https://openalex.org/W1234567890",
78
+ metadata={"pmid": "98765432"},
79
+ )
80
+ assert extract_paper_id(evidence) == "PMID:98765432"
81
+
82
+ def test_extracts_nct_id_modern(self) -> None:
83
+ evidence = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/study/NCT12345678")
84
+ assert extract_paper_id(evidence) == "NCT:NCT12345678"
85
+
86
+ def test_extracts_nct_id_legacy(self) -> None:
87
+ """Legacy ClinicalTrials.gov URL format should also work."""
88
+ evidence = _make_evidence(
89
+ "clinicaltrials", "https://clinicaltrials.gov/ct2/show/NCT12345678"
90
+ )
91
+ assert extract_paper_id(evidence) == "NCT:NCT12345678"
92
+
93
+ def test_returns_none_for_unknown_url(self) -> None:
94
+ evidence = _make_evidence("web", "https://example.com/unknown")
95
+ assert extract_paper_id(evidence) is None
96
+
97
+
98
+ class TestDeduplicateEvidence:
99
+ """Tests for evidence deduplication."""
100
+
101
+ def test_removes_pubmed_europepmc_duplicate(self) -> None:
102
+ """Same paper from PubMed and Europe PMC should dedupe to PubMed."""
103
+ pubmed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
104
+ europepmc = _make_evidence("europepmc", "https://europepmc.org/article/MED/12345678")
105
+
106
+ result = deduplicate_evidence([pubmed, europepmc])
107
+
108
+ assert len(result) == 1
109
+ assert result[0].citation.source == "pubmed"
110
+
111
+ def test_removes_pubmed_openalex_duplicate_via_metadata(self) -> None:
112
+ """OpenAlex with PMID in metadata should dedupe against PubMed."""
113
+ pubmed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
114
+ openalex = _make_evidence(
115
+ "openalex",
116
+ "https://openalex.org/W9999999",
117
+ metadata={"pmid": "12345678", "cited_by_count": 100},
118
+ )
119
+
120
+ result = deduplicate_evidence([pubmed, openalex])
121
+
122
+ assert len(result) == 1
123
+ assert result[0].citation.source == "pubmed"
124
+
125
+ def test_preserves_unique_evidence(self) -> None:
126
+ """Different papers should not be deduplicated."""
127
+ e1 = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/11111111/")
128
+ e2 = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/22222222/")
129
+
130
+ result = deduplicate_evidence([e1, e2])
131
+
132
+ assert len(result) == 2
133
+
134
+ def test_preserves_openalex_without_pmid(self) -> None:
135
+ """OpenAlex papers without PMID should NOT be deduplicated against PubMed."""
136
+ pubmed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
137
+ openalex_no_pmid = _make_evidence(
138
+ "openalex",
139
+ "https://openalex.org/W9999999",
140
+ metadata={"cited_by_count": 100}, # No pmid key
141
+ )
142
+
143
+ result = deduplicate_evidence([pubmed, openalex_no_pmid])
144
+
145
+ assert len(result) == 2 # Both preserved (different IDs)
146
+
147
+ def test_keeps_unidentifiable_evidence(self) -> None:
148
+ """Evidence with unrecognized URLs should be preserved."""
149
+ unknown = _make_evidence("web", "https://example.com/paper/123")
150
+
151
+ result = deduplicate_evidence([unknown])
152
+
153
+ assert len(result) == 1
154
+
155
+ def test_clinicaltrials_unique_per_nct(self) -> None:
156
+ """ClinicalTrials entries have unique NCT IDs."""
157
+ trial1 = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/study/NCT11111111")
158
+ trial2 = _make_evidence("clinicaltrials", "https://clinicaltrials.gov/study/NCT22222222")
159
+
160
+ result = deduplicate_evidence([trial1, trial2])
161
+
162
+ assert len(result) == 2
163
+
164
+ def test_preprints_preserved_separately(self) -> None:
165
+ """Preprints (PPR IDs) should not dedupe against peer-reviewed papers."""
166
+ peer_reviewed = _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
167
+ preprint = _make_evidence("europepmc", "https://europepmc.org/article/PPR/PPR999999")
168
+
169
+ result = deduplicate_evidence([peer_reviewed, preprint])
170
+
171
+ assert len(result) == 2 # Both preserved (different ID types)
172
+
173
+
174
  class TestSearchHandler:
175
  """Tests for SearchHandler."""
176
 
177
  @pytest.mark.asyncio
178
+ async def test_execute_aggregates_and_deduplicates(self):
179
+ """SearchHandler should aggregate results and deduplicate them."""
180
  # Setup
181
  mock_tool1 = AsyncMock(spec=SearchTool)
182
  mock_tool1.name = "pubmed"
183
  mock_tool1.search.return_value = [
184
+ _make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")
 
 
 
185
  ]
186
 
187
  mock_tool2 = AsyncMock(spec=SearchTool)
188
+ mock_tool2.name = "europepmc"
189
+ # Duplicate of the pubmed result
190
  mock_tool2.search.return_value = [
191
+ _make_evidence("europepmc", "https://europepmc.org/article/MED/12345678")
 
 
 
192
  ]
193
 
194
  handler = SearchHandler(tools=[mock_tool1, mock_tool2])
195
 
196
  # Execute
197
+ result = await handler.execute("test")
198
+
199
+ # Should only have 1 result after deduplication
200
+ assert result.total_found == 1
201
+ assert len(result.evidence) == 1
202
+ assert result.evidence[0].citation.source == "pubmed" # Priority source kept
203
  assert "pubmed" in result.sources_searched
204
+ assert "europepmc" in result.sources_searched
205
 
206
  @pytest.mark.asyncio
207
  async def test_execute_handles_tool_failure(self):
 
209
  mock_tool_ok = create_autospec(SearchTool, instance=True)
210
  mock_tool_ok.name = "pubmed"
211
  mock_tool_ok.search = AsyncMock(
212
+ return_value=[_make_evidence("pubmed", "https://pubmed.ncbi.nlm.nih.gov/12345678/")]
 
 
 
 
 
213
  )
214
 
215
  mock_tool_fail = create_autospec(SearchTool, instance=True)
216
+ mock_tool_fail.name = "clinicaltrials"
217
  mock_tool_fail.search = AsyncMock(side_effect=SearchError("API down"))
218
 
219
  handler = SearchHandler(tools=[mock_tool_ok, mock_tool_fail])
 
222
  assert result.total_found == 1
223
  assert "pubmed" in result.sources_searched
224
  assert len(result.errors) == 1
225
+ assert "clinicaltrials: API down" in result.errors[0]