ClinicalBERT-Pro

1. Introduction

ClinicalBERT-Pro represents a major advancement in clinical natural language processing. This model has been specifically pre-trained on over 2 million de-identified clinical notes from electronic health records, enabling superior performance on medical text understanding tasks. The model excels at extracting clinical entities, understanding medical terminology, and supporting clinical decision-making workflows.

Compared to the previous ClinicalBERT version, the Pro model demonstrates remarkable improvements in handling complex medical terminology and rare disease mentions. For instance, in the MedNLI benchmark, accuracy increased from 76% in the previous version to 89.2% in the current release. This stems from enhanced domain-specific pre-training using clinical literature and structured medical knowledge bases.

Beyond improved clinical understanding, this version offers better handling of abbreviations, medication dosages, and temporal expressions commonly found in clinical documentation.

2. Evaluation Results

Comprehensive Medical Benchmark Results

	Benchmark	PubMedBERT	BioBERT	ClinicalBERT	ClinicalBERT-Pro
Entity Recognition	Clinical NER	0.821	0.835	0.847	0.768
	Symptom Extraction	0.756	0.771	0.783	0.780
	Adverse Event Detection	0.698	0.712	0.729	0.692
Clinical Reasoning	Diagnosis Prediction	0.612	0.628	0.645	0.683
	Drug Interaction	0.734	0.749	0.761	0.803
	Treatment Recommendation	0.589	0.601	0.618	0.614
Document Understanding	Medical QA	0.667	0.681	0.695	0.683
	Radiology Report	0.723	0.738	0.752	0.755
	Patient Summarization	0.645	0.659	0.671	0.666
Coding & Matching	ICD Coding	0.578	0.592	0.608	0.784
	Clinical Trial Matching	0.534	0.549	0.563	0.705
	Medical Literature QA	0.689	0.703	0.718	0.728

Overall Performance Summary

ClinicalBERT-Pro demonstrates strong performance across all evaluated medical benchmark categories, with particularly notable results in entity recognition and clinical reasoning tasks.

3. Clinical API Platform

We offer a HIPAA-compliant API for clinical text processing. Please contact our healthcare solutions team for enterprise deployment options.

4. How to Run Locally

Please refer to our code repository for more information about running ClinicalBERT-Pro locally.

For clinical deployment, the usage recommendations include:

PHI (Protected Health Information) should be de-identified before processing.
Model outputs should be reviewed by qualified healthcare professionals.

The model architecture of ClinicalBERT-Pro is based on RoBERTa-large with medical domain adaptations.

System Prompt

We recommend using the following system prompt for clinical applications:

You are ClinicalBERT-Pro, a specialized medical AI assistant.
Current timestamp: {current_datetime}.

For example,

You are ClinicalBERT-Pro, a specialized medical AI assistant.
Current timestamp: 2025-06-15 14:30:00 UTC.

Temperature

We recommend setting the temperature parameter $T_{model}$ to 0.3 for clinical applications requiring high precision.

Prompts for Clinical Document Processing

For clinical note analysis, please follow the template:

clinical_template = \
"""[Patient ID]: {patient_id}
[Clinical Note Begin]
{clinical_note}
[Clinical Note End]
{analysis_request}"""

5. License

This code repository is licensed under the Apache 2.0 License. The use of ClinicalBERT-Pro models is subject to additional healthcare compliance requirements.

6. Contact

If you have any questions, please raise an issue on our GitHub repository or contact us at clinical-ai@medtech.health.

Downloads last month: 17