DeepBoner / GEMINI.md
VibecoderMcSwaggins's picture
feat: Wire LlamaIndex RAG into Simple Mode (Tiered Embedding) (#83)
7baf8ba unverified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

DeepBoner Context

Project Overview

DeepBoner is an AI-native Sexual Health Research Agent. Goal: To accelerate research into sexual health, wellness, and reproductive medicine by intelligently searching biomedical literature (PubMed, ClinicalTrials.gov, Europe PMC), evaluating evidence, and synthesizing findings.

Architecture: The project follows a Vertical Slice Architecture (Search -> Judge -> Orchestrator) and adheres to Strict TDD (Test-Driven Development).

Current Status: Phases 1-14 COMPLETE (Foundation through Demo Submission).

Tech Stack & Tooling

  • Language: Python 3.11 (Pinned)
  • Package Manager: uv (Rust-based, extremely fast)
  • Frameworks: pydantic, pydantic-ai, httpx, gradio[mcp]
  • Vector DB: chromadb with sentence-transformers for semantic search
  • Code Execution: modal for secure sandboxed Python execution
  • Testing: pytest, pytest-asyncio, respx (for mocking)
  • Quality: ruff (linting/formatting), mypy (strict type checking), pre-commit

Building & Running

Command Description
make install Install dependencies and pre-commit hooks.
make test Run unit tests.
make lint Run Ruff linter.
make format Run Ruff formatter.
make typecheck Run Mypy static type checker.
make check The Golden Gate: Runs lint, typecheck, and test. Must pass before committing.
make clean Clean up cache and artifacts.

Directory Structure

  • src/: Source code
    • utils/: Shared utilities (config.py, exceptions.py, models.py)
    • tools/: Search tools (pubmed.py, clinicaltrials.py, europepmc.py, code_execution.py)
    • services/: Services (embeddings.py, statistical_analyzer.py)
    • agents/: Magentic multi-agent mode agents
    • agent_factory/: Agent definitions (judges, prompts)
    • mcp_tools.py: MCP tool wrappers for Claude Desktop integration
    • app.py: Gradio UI with MCP server
  • tests/: Test suite
    • unit/: Isolated unit tests (Mocked)
    • integration/: Real API tests (Marked as slow/integration)
  • docs/: Documentation and Implementation Specs
  • examples/: Working demos for each phase

Key Components

  • src/orchestrators/ - Orchestrator package (simple, advanced, langgraph modes)
    • simple.py - Main search-and-judge loop
    • advanced.py - Multi-agent Magentic mode
    • langgraph_orchestrator.py - LangGraph-based workflow
  • src/tools/pubmed.py - PubMed E-utilities search
  • src/tools/clinicaltrials.py - ClinicalTrials.gov API
  • src/tools/europepmc.py - Europe PMC search
  • src/tools/code_execution.py - Modal sandbox execution
  • src/tools/search_handler.py - Scatter-gather orchestration
  • src/services/embeddings.py - Local embeddings (sentence-transformers, in-memory)
  • src/services/llamaindex_rag.py - Premium embeddings (OpenAI, persistent ChromaDB)
  • src/services/embedding_protocol.py - Protocol interface for embedding services
  • src/services/research_memory.py - Shared memory layer for research state
  • src/services/statistical_analyzer.py - Statistical analysis via Modal
  • src/utils/service_loader.py - Tiered service selection (free vs premium)
  • src/mcp_tools.py - MCP tool wrappers
  • src/app.py - Gradio UI (HuggingFace Spaces) with MCP server

Configuration

Settings via pydantic-settings from .env:

  • LLM_PROVIDER: "openai" or "anthropic"
  • OPENAI_API_KEY / ANTHROPIC_API_KEY: LLM keys
  • NCBI_API_KEY: Optional, for higher PubMed rate limits
  • MODAL_TOKEN_ID / MODAL_TOKEN_SECRET: For Modal sandbox (optional)
  • MAX_ITERATIONS: 1-50, default 10
  • LOG_LEVEL: DEBUG, INFO, WARNING, ERROR

LLM Model Defaults (November 2025)

Given the rapid advancements, as of November 29, 2025, the DeepBoner project uses the following default LLM models in its configuration (src/utils/config.py):

  • OpenAI: gpt-5
    • Current flagship model (November 2025). Requires Tier 5 access.
  • Anthropic: claude-sonnet-4-5-20250929
    • This is the mid-range Claude 4.5 model, released on September 29, 2025.
    • The flagship Claude Opus 4.5 (released November 24, 2025) is also available and can be configured by advanced users for enhanced capabilities.
  • HuggingFace (Free Tier): meta-llama/Llama-3.1-70B-Instruct
    • This remains the default for the free tier, subject to quota limits.

It is crucial to keep these defaults updated as the LLM landscape evolves.

Development Conventions

  1. Strict TDD: Write failing tests in tests/unit/ before implementing logic in src/.
  2. Type Safety: All code must pass mypy --strict. Use Pydantic models for data exchange.
  3. Linting: Zero tolerance for Ruff errors.
  4. Mocking: Use respx or unittest.mock for all external API calls in unit tests.
  5. Vertical Slices: Implement features end-to-end rather than layer-by-layer.

Git Workflow

  • main: Production-ready (GitHub)
  • dev: Development integration (GitHub)
  • Remote origin: GitHub (source of truth for PRs/code review)
  • Remote huggingface-upstream: HuggingFace Spaces (deployment target)

HuggingFace Spaces Collaboration:

  • Each contributor should use their own dev branch: yourname-dev (e.g., vcms-dev, mario-dev)
  • DO NOT push directly to main or dev on HuggingFace - these can be overwritten easily
  • GitHub is the source of truth; HuggingFace is for deployment/demo
  • Consider using git hooks to prevent accidental pushes to protected branches