Spaces:
Running
A newer version of the Gradio SDK is available:
6.1.0
DeepBoner Context
Project Overview
DeepBoner is an AI-native Sexual Health Research Agent. Goal: To accelerate research into sexual health, wellness, and reproductive medicine by intelligently searching biomedical literature (PubMed, ClinicalTrials.gov, Europe PMC), evaluating evidence, and synthesizing findings.
Architecture: The project follows a Vertical Slice Architecture (Search -> Judge -> Orchestrator) and adheres to Strict TDD (Test-Driven Development).
Current Status: Phases 1-14 COMPLETE (Foundation through Demo Submission).
Tech Stack & Tooling
- Language: Python 3.11 (Pinned)
- Package Manager:
uv(Rust-based, extremely fast) - Frameworks:
pydantic,pydantic-ai,httpx,gradio[mcp] - Vector DB:
chromadbwithsentence-transformersfor semantic search - Code Execution:
modalfor secure sandboxed Python execution - Testing:
pytest,pytest-asyncio,respx(for mocking) - Quality:
ruff(linting/formatting),mypy(strict type checking),pre-commit
Building & Running
| Command | Description |
|---|---|
make install |
Install dependencies and pre-commit hooks. |
make test |
Run unit tests. |
make lint |
Run Ruff linter. |
make format |
Run Ruff formatter. |
make typecheck |
Run Mypy static type checker. |
make check |
The Golden Gate: Runs lint, typecheck, and test. Must pass before committing. |
make clean |
Clean up cache and artifacts. |
Directory Structure
src/: Source codeutils/: Shared utilities (config.py,exceptions.py,models.py)tools/: Search tools (pubmed.py,clinicaltrials.py,europepmc.py,code_execution.py)services/: Services (embeddings.py,statistical_analyzer.py)agents/: Magentic multi-agent mode agentsagent_factory/: Agent definitions (judges, prompts)mcp_tools.py: MCP tool wrappers for Claude Desktop integrationapp.py: Gradio UI with MCP server
tests/: Test suiteunit/: Isolated unit tests (Mocked)integration/: Real API tests (Marked as slow/integration)
docs/: Documentation and Implementation Specsexamples/: Working demos for each phase
Key Components
src/orchestrators/- Orchestrator package (simple, advanced, langgraph modes)simple.py- Main search-and-judge loopadvanced.py- Multi-agent Magentic modelanggraph_orchestrator.py- LangGraph-based workflow
src/tools/pubmed.py- PubMed E-utilities searchsrc/tools/clinicaltrials.py- ClinicalTrials.gov APIsrc/tools/europepmc.py- Europe PMC searchsrc/tools/code_execution.py- Modal sandbox executionsrc/tools/search_handler.py- Scatter-gather orchestrationsrc/services/embeddings.py- Local embeddings (sentence-transformers, in-memory)src/services/llamaindex_rag.py- Premium embeddings (OpenAI, persistent ChromaDB)src/services/embedding_protocol.py- Protocol interface for embedding servicessrc/services/research_memory.py- Shared memory layer for research statesrc/services/statistical_analyzer.py- Statistical analysis via Modalsrc/utils/service_loader.py- Tiered service selection (free vs premium)src/mcp_tools.py- MCP tool wrapperssrc/app.py- Gradio UI (HuggingFace Spaces) with MCP server
Configuration
Settings via pydantic-settings from .env:
LLM_PROVIDER: "openai" or "anthropic"OPENAI_API_KEY/ANTHROPIC_API_KEY: LLM keysNCBI_API_KEY: Optional, for higher PubMed rate limitsMODAL_TOKEN_ID/MODAL_TOKEN_SECRET: For Modal sandbox (optional)MAX_ITERATIONS: 1-50, default 10LOG_LEVEL: DEBUG, INFO, WARNING, ERROR
LLM Model Defaults (November 2025)
Given the rapid advancements, as of November 29, 2025, the DeepBoner project uses the following default LLM models in its configuration (src/utils/config.py):
- OpenAI:
gpt-5- Current flagship model (November 2025). Requires Tier 5 access.
- Anthropic:
claude-sonnet-4-5-20250929- This is the mid-range Claude 4.5 model, released on September 29, 2025.
- The flagship
Claude Opus 4.5(released November 24, 2025) is also available and can be configured by advanced users for enhanced capabilities.
- HuggingFace (Free Tier):
meta-llama/Llama-3.1-70B-Instruct- This remains the default for the free tier, subject to quota limits.
It is crucial to keep these defaults updated as the LLM landscape evolves.
Development Conventions
- Strict TDD: Write failing tests in
tests/unit/before implementing logic insrc/. - Type Safety: All code must pass
mypy --strict. Use Pydantic models for data exchange. - Linting: Zero tolerance for Ruff errors.
- Mocking: Use
respxorunittest.mockfor all external API calls in unit tests. - Vertical Slices: Implement features end-to-end rather than layer-by-layer.
Git Workflow
main: Production-ready (GitHub)dev: Development integration (GitHub)- Remote
origin: GitHub (source of truth for PRs/code review) - Remote
huggingface-upstream: HuggingFace Spaces (deployment target)
HuggingFace Spaces Collaboration:
- Each contributor should use their own dev branch:
yourname-dev(e.g.,vcms-dev,mario-dev) - DO NOT push directly to
mainordevon HuggingFace - these can be overwritten easily - GitHub is the source of truth; HuggingFace is for deployment/demo
- Consider using git hooks to prevent accidental pushes to protected branches