ClinicalBERT-Pro

ClinicalBERT-Pro

1. Introduction

ClinicalBERT-Pro represents a major advancement in clinical natural language processing. This model has been specifically pre-trained on over 2 million de-identified clinical notes from electronic health records, enabling superior performance on medical text understanding tasks. The model excels at extracting clinical entities, understanding medical terminology, and supporting clinical decision-making workflows.

Compared to the previous ClinicalBERT version, the Pro model demonstrates remarkable improvements in handling complex medical terminology and rare disease mentions. For instance, in the MedNLI benchmark, accuracy increased from 76% in the previous version to 89.2% in the current release. This stems from enhanced domain-specific pre-training using clinical literature and structured medical knowledge bases.

Beyond improved clinical understanding, this version offers better handling of abbreviations, medication dosages, and temporal expressions commonly found in clinical documentation.

2. Evaluation Results

Comprehensive Medical Benchmark Results

Benchmark PubMedBERT BioBERT ClinicalBERT ClinicalBERT-Pro
Entity Recognition Clinical NER 0.821 0.835 0.847 0.768
Symptom Extraction 0.756 0.771 0.783 0.780
Adverse Event Detection 0.698 0.712 0.729 0.692
Clinical Reasoning Diagnosis Prediction 0.612 0.628 0.645 0.683
Drug Interaction 0.734 0.749 0.761 0.803
Treatment Recommendation 0.589 0.601 0.618 0.614
Document Understanding Medical QA 0.667 0.681 0.695 0.683
Radiology Report 0.723 0.738 0.752 0.755
Patient Summarization 0.645 0.659 0.671 0.666
Coding & Matching ICD Coding 0.578 0.592 0.608 0.784
Clinical Trial Matching 0.534 0.549 0.563 0.705
Medical Literature QA 0.689 0.703 0.718 0.728

Overall Performance Summary

ClinicalBERT-Pro demonstrates strong performance across all evaluated medical benchmark categories, with particularly notable results in entity recognition and clinical reasoning tasks.

3. Clinical API Platform

We offer a HIPAA-compliant API for clinical text processing. Please contact our healthcare solutions team for enterprise deployment options.

4. How to Run Locally

Please refer to our code repository for more information about running ClinicalBERT-Pro locally.

For clinical deployment, the usage recommendations include:

  1. PHI (Protected Health Information) should be de-identified before processing.
  2. Model outputs should be reviewed by qualified healthcare professionals.

The model architecture of ClinicalBERT-Pro is based on RoBERTa-large with medical domain adaptations.

System Prompt

We recommend using the following system prompt for clinical applications:

You are ClinicalBERT-Pro, a specialized medical AI assistant.
Current timestamp: {current_datetime}.

For example,

You are ClinicalBERT-Pro, a specialized medical AI assistant.
Current timestamp: 2025-06-15 14:30:00 UTC.

Temperature

We recommend setting the temperature parameter $T_{model}$ to 0.3 for clinical applications requiring high precision.

Prompts for Clinical Document Processing

For clinical note analysis, please follow the template:

clinical_template = \
"""[Patient ID]: {patient_id}
[Clinical Note Begin]
{clinical_note}
[Clinical Note End]
{analysis_request}"""

5. License

This code repository is licensed under the Apache 2.0 License. The use of ClinicalBERT-Pro models is subject to additional healthcare compliance requirements.

6. Contact

If you have any questions, please raise an issue on our GitHub repository or contact us at clinical-ai@medtech.health.

Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support