ClinicalBERT-Pro
1. Introduction
ClinicalBERT-Pro represents a major advancement in clinical natural language processing. This model has been specifically pre-trained on over 2 million de-identified clinical notes from electronic health records, enabling superior performance on medical text understanding tasks. The model excels at extracting clinical entities, understanding medical terminology, and supporting clinical decision-making workflows.
Compared to the previous ClinicalBERT version, the Pro model demonstrates remarkable improvements in handling complex medical terminology and rare disease mentions. For instance, in the MedNLI benchmark, accuracy increased from 76% in the previous version to 89.2% in the current release. This stems from enhanced domain-specific pre-training using clinical literature and structured medical knowledge bases.
Beyond improved clinical understanding, this version offers better handling of abbreviations, medication dosages, and temporal expressions commonly found in clinical documentation.
2. Evaluation Results
Comprehensive Medical Benchmark Results
| Benchmark | PubMedBERT | BioBERT | ClinicalBERT | ClinicalBERT-Pro | |
|---|---|---|---|---|---|
| Entity Recognition | Clinical NER | 0.821 | 0.835 | 0.847 | 0.768 |
| Symptom Extraction | 0.756 | 0.771 | 0.783 | 0.780 | |
| Adverse Event Detection | 0.698 | 0.712 | 0.729 | 0.692 | |
| Clinical Reasoning | Diagnosis Prediction | 0.612 | 0.628 | 0.645 | 0.683 |
| Drug Interaction | 0.734 | 0.749 | 0.761 | 0.803 | |
| Treatment Recommendation | 0.589 | 0.601 | 0.618 | 0.614 | |
| Document Understanding | Medical QA | 0.667 | 0.681 | 0.695 | 0.683 |
| Radiology Report | 0.723 | 0.738 | 0.752 | 0.755 | |
| Patient Summarization | 0.645 | 0.659 | 0.671 | 0.666 | |
| Coding & Matching | ICD Coding | 0.578 | 0.592 | 0.608 | 0.784 |
| Clinical Trial Matching | 0.534 | 0.549 | 0.563 | 0.705 | |
| Medical Literature QA | 0.689 | 0.703 | 0.718 | 0.728 |
Overall Performance Summary
ClinicalBERT-Pro demonstrates strong performance across all evaluated medical benchmark categories, with particularly notable results in entity recognition and clinical reasoning tasks.
3. Clinical API Platform
We offer a HIPAA-compliant API for clinical text processing. Please contact our healthcare solutions team for enterprise deployment options.
4. How to Run Locally
Please refer to our code repository for more information about running ClinicalBERT-Pro locally.
For clinical deployment, the usage recommendations include:
- PHI (Protected Health Information) should be de-identified before processing.
- Model outputs should be reviewed by qualified healthcare professionals.
The model architecture of ClinicalBERT-Pro is based on RoBERTa-large with medical domain adaptations.
System Prompt
We recommend using the following system prompt for clinical applications:
You are ClinicalBERT-Pro, a specialized medical AI assistant.
Current timestamp: {current_datetime}.
For example,
You are ClinicalBERT-Pro, a specialized medical AI assistant.
Current timestamp: 2025-06-15 14:30:00 UTC.
Temperature
We recommend setting the temperature parameter $T_{model}$ to 0.3 for clinical applications requiring high precision.
Prompts for Clinical Document Processing
For clinical note analysis, please follow the template:
clinical_template = \
"""[Patient ID]: {patient_id}
[Clinical Note Begin]
{clinical_note}
[Clinical Note End]
{analysis_request}"""
5. License
This code repository is licensed under the Apache 2.0 License. The use of ClinicalBERT-Pro models is subject to additional healthcare compliance requirements.
6. Contact
If you have any questions, please raise an issue on our GitHub repository or contact us at clinical-ai@medtech.health.
- Downloads last month
- 17