AventIQ-AI
/

Bert-Disaster-SOS-Message-Classifier

Model card Files Files and versions

Bert-Disaster-SOS-Message-Classifier / README.md

developerPushkal's picture

developerPushkal

Create README.md

f5381b5 verified 10 months ago

|

history blame contribute delete

3.34 kB

	### BERT-Base-Uncased Quantized Model for Disaster SOS Message Classification

	This repository hosts a quantized version of the BERT model, fine-tuned for Disaster SOS Message Classification. The model efficiently classifies emergency messages related to disasters, helping prioritize urgent cases. It has been optimized for deployment in resource-constrained environments while maintaining high accuracy.

	## Model Details

	- Model Architecture: BERT Base Uncased
	- Task: Disaster SOS Message Classification
	- Dataset: Disaster Response Messages Dataset
	- Quantization: Float16
	- Fine-tuning Framework: Hugging Face Transformers

	## Usage

	### Installation

	```sh
	pip install transformers torch
	```

	### Loading the Model

	```python
	from transformers import BertForSequenceClassification, BertTokenizer
	import torch

	# Load quantized model
	quantized_model_path = "/kaggle/working/bert_finetuned_fp16"
	quantized_model = BertForSequenceClassification.from_pretrained(quantized_model_path)
	quantized_model.eval() # Set to evaluation mode
	quantized_model.half() # Convert model to FP16

	# Load tokenizer
	tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

	# Define a test SOS message
	test_message = "There is a massive earthquake, and people need help immediately!"

	# Tokenize input
	inputs = tokenizer(test_message, return_tensors="pt", padding=True, truncation=True, max_length=128)

	# Ensure input tensors are in correct dtype
	inputs["input_ids"] = inputs["input_ids"].long()
	inputs["attention_mask"] = inputs["attention_mask"].long()

	# Make prediction
	with torch.no_grad():
	outputs = quantized_model(**inputs)

	# Get predicted categories
	probs = torch.sigmoid(outputs.logits).cpu().numpy().flatten()
	predictions = (probs > 0.5).astype(int)

	# Category mapping (Example)
	category_names = ["Earthquake", "Flood", "Medical Emergency", "Infrastructure Damage", "General Help"]
	predicted_labels = [category_names[i] for i in range(len(predictions)) if predictions[i] == 1]

	print(f"Message: {test_message}")
	print(f"Predicted Categories: {predicted_labels}")
	print(f"Confidence Scores: {probs}")
	```

	## Performance Metrics

	- Accuracy: 0.85
	- F1 Score: 0.83

	## Fine-Tuning Details

	### Dataset

	The dataset is the Disaster Response Messages Dataset, which contains real-life messages from various disaster scenarios.

	### Training

	- Number of epochs: 3
	- Batch size: 8
	- Evaluation strategy: epoch
	- Learning rate: 2e-5

	### Quantization

	Post-training quantization was applied using PyTorch’s built-in quantization framework, reducing model size and improving inference speed.

	## Repository Structure

	```
	.
	├── model/ # Contains the quantized model files
	├── tokenizer_config/ # Tokenizer configuration and vocabulary files
	├── model.safensors/ # Fine-tuned Model
	├── README.md # Model documentation
	```

	## Limitations

	- The model may not generalize well to unseen disaster types outside the training data.
	- Minor accuracy degradation due to quantization.

	## Contributing

	Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.

	---