| ### **BERT-Base-Uncased Quantized Model for Disaster SOS Message Classification** | |
| This repository hosts a quantized version of the BERT model, fine-tuned for **Disaster SOS Message Classification**. The model efficiently classifies emergency messages related to disasters, helping prioritize urgent cases. It has been optimized for deployment in resource-constrained environments while maintaining high accuracy. | |
| ## **Model Details** | |
| - **Model Architecture:** BERT Base Uncased | |
| - **Task:** Disaster SOS Message Classification | |
| - **Dataset:** Disaster Response Messages Dataset | |
| - **Quantization:** Float16 | |
| - **Fine-tuning Framework:** Hugging Face Transformers | |
| ## **Usage** | |
| ### **Installation** | |
| ```sh | |
| pip install transformers torch | |
| ``` | |
| ### **Loading the Model** | |
| ```python | |
| from transformers import BertForSequenceClassification, BertTokenizer | |
| import torch | |
| # Load quantized model | |
| quantized_model_path = "/kaggle/working/bert_finetuned_fp16" | |
| quantized_model = BertForSequenceClassification.from_pretrained(quantized_model_path) | |
| quantized_model.eval() # Set to evaluation mode | |
| quantized_model.half() # Convert model to FP16 | |
| # Load tokenizer | |
| tokenizer = BertTokenizer.from_pretrained("bert-base-uncased") | |
| # Define a test SOS message | |
| test_message = "There is a massive earthquake, and people need help immediately!" | |
| # Tokenize input | |
| inputs = tokenizer(test_message, return_tensors="pt", padding=True, truncation=True, max_length=128) | |
| # Ensure input tensors are in correct dtype | |
| inputs["input_ids"] = inputs["input_ids"].long() | |
| inputs["attention_mask"] = inputs["attention_mask"].long() | |
| # Make prediction | |
| with torch.no_grad(): | |
| outputs = quantized_model(**inputs) | |
| # Get predicted categories | |
| probs = torch.sigmoid(outputs.logits).cpu().numpy().flatten() | |
| predictions = (probs > 0.5).astype(int) | |
| # Category mapping (Example) | |
| category_names = ["Earthquake", "Flood", "Medical Emergency", "Infrastructure Damage", "General Help"] | |
| predicted_labels = [category_names[i] for i in range(len(predictions)) if predictions[i] == 1] | |
| print(f"Message: {test_message}") | |
| print(f"Predicted Categories: {predicted_labels}") | |
| print(f"Confidence Scores: {probs}") | |
| ``` | |
| ## **Performance Metrics** | |
| - **Accuracy:** 0.85 | |
| - **F1 Score:** 0.83 | |
| ## **Fine-Tuning Details** | |
| ### **Dataset** | |
| The dataset is the **Disaster Response Messages Dataset**, which contains real-life messages from various disaster scenarios. | |
| ### **Training** | |
| - Number of epochs: 3 | |
| - Batch size: 8 | |
| - Evaluation strategy: epoch | |
| - Learning rate: 2e-5 | |
| ### **Quantization** | |
| Post-training quantization was applied using PyTorchβs built-in quantization framework, reducing model size and improving inference speed. | |
| ## **Repository Structure** | |
| ``` | |
| . | |
| βββ model/ # Contains the quantized model files | |
| βββ tokenizer_config/ # Tokenizer configuration and vocabulary files | |
| βββ model.safensors/ # Fine-tuned Model | |
| βββ README.md # Model documentation | |
| ``` | |
| ## **Limitations** | |
| - The model may not generalize well to unseen disaster types outside the training data. | |
| - Minor accuracy degradation due to quantization. | |
| ## **Contributing** | |
| Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements. | |
| --- |