BioBBC: BERT-BiLSTM-CRF for Biomarker Recognition

Model Description

BioBBC is an enhanced Named Entity Recognition (NER) model specifically trained for identifying biomarkers in biomedical text. It combines multiple state-of-the-art techniques:

  • BERT: BioBERT for contextual embeddings
  • Character CNN: Character-level features
  • POS Embeddings: Part-of-speech information
  • Domain Embeddings: Biomedical word embeddings
  • BiLSTM: Bidirectional sequence modeling
  • CRF: Conditional Random Fields for sequence labeling

Model Architecture

Input Text
    ↓
BioBERT Embeddings (768d)
    ↓
+ Character CNN (150d)
+ POS Embeddings (25d)
+ Domain Embeddings (200d)
    ↓
BiLSTM (512d Γ— 2)
    ↓
CRF Layer
    ↓
Predicted Biomarkers

Performance

  • F1 Score: 0.9734
  • Training Epochs: 5
  • Best Epoch: 5

Labels

  • O: Outside (not a biomarker)
  • B-BIOMARKER: Beginning of biomarker entity
  • I-BIOMARKER: Inside biomarker entity

Usage

Installation

pip install transformers torch huggingface_hub

Load Model

from huggingface_hub import hf_hub_download
import torch

# Download model
model_path = hf_hub_download(repo_id="postlyt/biobbc-biomarker-ner", filename="pytorch_model.bin")
config_path = hf_hub_download(repo_id="postlyt/biobbc-biomarker-ner", filename="config.json")
vocab_path = hf_hub_download(repo_id="postlyt/biobbc-biomarker-ner", filename="vocabularies.bin")

# Load model (see full loading script in repository)
# model = load_biobbc_model(model_path, config_path, vocab_path)

Predict Biomarkers

# See predict_from_huggingface.py in repository for complete example
text = "Elevated IL-6 and TNF-alpha levels were observed."
biomarkers = model.predict(text)
print(biomarkers)  # ['IL-6', 'TNF-alpha']

Training Details

  • Base Model: dmis-lab/biobert-base-cased-v1.2
  • Training Data: Custom biomarker dataset
  • Batch Size: 256 (optimized for A100 GPU)
  • Mixed Precision: FP16 enabled
  • Optimization: AdamW with linear warmup

Citation

If you use this model, please cite:

@misc{biobbc_model,
  author = {postlyt},
  title = {BioBBC: BERT-BiLSTM-CRF for Biomarker Recognition},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/postlyt/biobbc-biomarker-ner}
}

Model Card Authors

postlyt

Model Card Contact

For questions or issues, please open an issue in the repository.

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support