|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- honicky/hdfs-logs-encoded-blocks |
|
|
- Kingslayer5437/BGL |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
- roc_auc |
|
|
base_model: |
|
|
- distilbert/distilbert-base-uncased |
|
|
pipeline_tag: text-classification |
|
|
library_name: transformers |
|
|
tags: |
|
|
- log-analysis |
|
|
- anomaly-detection |
|
|
- bert |
|
|
- huggingface |
|
|
model-index: |
|
|
- name: CloudOpsBERT (distributed-storage) |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Anomaly Detection |
|
|
dataset: |
|
|
name: HDFS |
|
|
type: honicky/hdfs-logs-encoded-blocks |
|
|
split: test |
|
|
metrics: |
|
|
- type: f1 |
|
|
value: 0.571 |
|
|
- type: precision |
|
|
value: 0.992 |
|
|
- type: recall |
|
|
value: 0.401 |
|
|
- type: auroc |
|
|
value: 0.73 |
|
|
- type: threshold |
|
|
value: 0.5 |
|
|
- name: CloudOpsBERT (HPC) |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Anomaly Detection |
|
|
dataset: |
|
|
name: BGL |
|
|
type: Kingslayer5437/BGL |
|
|
split: test |
|
|
metrics: |
|
|
- type: f1 |
|
|
value: 1.00 |
|
|
- type: precision |
|
|
value: 1.00 |
|
|
- type: recall |
|
|
value: 1.00 |
|
|
- type: auroc |
|
|
value: 1.00 |
|
|
- type: threshold |
|
|
value: 0.05 |
|
|
--- |
|
|
--- |
|
|
# CloudOpsBERT: Domain-Specific Language Models for Cloud Operations |
|
|
|
|
|
CloudOpsBERT is an open-source project exploring **domain-adapted transformer models** for **cloud operations log analysis** β specifically anomaly detection, reliability monitoring, and cost optimization. |
|
|
|
|
|
This project fine-tunes lightweight BERT variants (e.g., DistilBERT) on large-scale system log datasets (HDFS, BGL) and provides ready-to-use models for the research and practitioner community. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Motivation |
|
|
|
|
|
Modern cloud platforms generate massive amounts of logs. Detecting anomalies in these logs is crucial for: |
|
|
- Ensuring **reliability** (catching failures early), |
|
|
- Improving **cost efficiency** (identifying waste or misconfigurations), |
|
|
- Supporting **autonomous operations** (AIOps). |
|
|
|
|
|
Generic LLMs and BERT models are not optimized for this domain. CloudOpsBERT bridges that gap by: |
|
|
- Training on **real log datasets** (HDFS, BGL), |
|
|
- Addressing **imbalanced anomaly detection** with class weighting, |
|
|
- Publishing **open-source checkpoints** for reproducibility. |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## π Inference (Pretrained) |
|
|
Predict anomaly probability for a single log line: |
|
|
``` |
|
|
python src/predict.py \ |
|
|
--model_dir vaibhav2507/cloudops-bert \ |
|
|
--subfolder distributed-storage \ |
|
|
--text "ERROR dfs.DataNode: Lost connection to namenode" |
|
|
``` |
|
|
Batch inference (file with one log line per row): |
|
|
|
|
|
``` |
|
|
python src/predict.py \ |
|
|
--model_dir vaibhav2507/cloudops-bert \ |
|
|
--subfolder distributed-storage \ |
|
|
--file samples/sample_logs.txt \ |
|
|
--threshold 0.5 \ |
|
|
--jsonl_out predictions.jsonl |
|
|
``` |
|
|
|
|
|
## π Results |
|
|
* HDFS (in-domain, test set) |
|
|
* F1: 0.571 |
|
|
* Precision: 0.992 |
|
|
* Recall: 0.401 |
|
|
* AUROC: 0.730 |
|
|
* Threshold: 0.50 (tuneable) |
|
|
- Cross-domain (HDFS β BGL) |
|
|
- Performance degrades significantly due to dataset/domain shift (see paper). |
|
|
- BGL (training in progress) |
|
|
- Will be released as cloudops-bert (subfolder bgl) once full training is complete. |
|
|
|
|
|
## π¦ Models |
|
|
|
|
|
* vaibhav2507/cloudops-bert (Hugging Face Hub) |
|
|
* subfolder="distributed-storage" β HDFS-trained CloudOpsBERT |
|
|
* subfolder="hpc" β BGL-trained CloudOpsBERT |
|
|
* Each export includes: |
|
|
* Model weights (pytorch_model.bin) |
|
|
* Config with label mappings (normal, anomaly) |
|
|
* Tokenizer files |
|
|
|
|
|
## π Quickstart (Scripts) |
|
|
1) Setup folders |
|
|
``` |
|
|
bash scripts/setup_dirs.sh |
|
|
``` |
|
|
|
|
|
2) (optional) Download a local copy of a submodel from Hugging Face |
|
|
``` |
|
|
bash scripts/fetch_pretrained.sh # downloads 'hdfs' by default |
|
|
SUBFOLDER=bgl bash scripts/fetch_pretrained.sh # downloads 'bgl' |
|
|
``` |
|
|
|
|
|
3) Single-line prediction (directly from HF) |
|
|
``` |
|
|
bash scripts/predict_line.sh "ERROR dfs.DataNode: Lost connection to namenode" hdfs |
|
|
``` |
|
|
|
|
|
4) Batch prediction (using local model folder) |
|
|
``` |
|
|
bash scripts/make_sample_logs.sh |
|
|
bash scripts/predict_file.sh samples/sample_logs.txt hdfs models/cloudops-bert-hdfs preds/preds_hdfs.jsonl |
|
|
``` |
|
|
|
|
|
## π Related Work |
|
|
|
|
|
Several prior works have explored using BERT for log anomaly detection: |
|
|
|
|
|
- Leveraging BERT and Hugging Face Transformers for Log Anomaly Detection |
|
|
- Tutorial-style blog post demonstrating how to fine-tune BERT on log data with Hugging Face. Useful as an introduction, but not intended as a reproducible research artifact. |
|
|
|
|
|
LogBERT (HelenGuohx/logbert) |
|
|
- Academic prototype from ~2019β2020 focusing on modeling log sequences with BERT. Demonstrates feasibility but limited to in-domain experiments and lacks integration with modern Hugging Face tooling. |
|
|
|
|
|
AnomalyBERT (Jhryu30/AnomalyBERT) |
|
|
- Another exploratory repository showing BERT-based anomaly detection on logs, with dataset-specific preprocessing. Similar limitations in generalization and reproducibility. |
|
|
|
|
|
## π How CloudOpsBERT is different |
|
|
- Domain-specific adaptation: explicitly trained for cloud operations logs (HDFS, BGL) with class-weighted loss. |
|
|
- Cross-domain evaluation: includes in-domain and cross-domain benchmarks, highlighting generalization challenges. |
|
|
- Reproducibility & usability: clean repo, scripts, and ready-to-use Hugging Face exports. |
|
|
- Future directions: introduces MicroLM β compressed micro-language models for efficient edge/cloud hybrid inference. |
|
|
- In short: previous work showed that βBERT can work for logs.β |
|
|
- CloudOpsBERT operationalizes this idea into reproducible benchmarks, public models, and deployable tools for both researchers and practitioners. |
|
|
|
|
|
## π Citation |
|
|
If you use CloudOpsBERT in your research or tools, please cite: |
|
|
``` |
|
|
@misc{pandey2025cloudopsbert, |
|
|
title={CloudOpsBERT: Domain-Specific Transformer Models for Cloud Operations Anomaly Detection}, |
|
|
author={Pandey, Vaibhav}, |
|
|
year={2025}, |
|
|
howpublished={GitHub, Hugging Face}, |
|
|
url={https://github.com/vaibhav-research/cloudops-bert} |
|
|
} |
|
|
``` |
|
|
|