metadata
tags:
- audio-classification
- sound-event-detection
- wav2vec2
- urban-acoustics
- deep-learning
datasets:
- UrbanSoundscape_EventDetection_Metadata
license: apache-2.0
model-index:
- name: UrbanSound_EventDetection_Wav2Vec2
results:
- task:
name: Audio Classification
type: audio-classification
metrics:
- type: accuracy
value: 0.945
name: Event Detection Accuracy
- type: f1_macro
value: 0.938
name: Macro F1 Score
UrbanSound_EventDetection_Wav2Vec2
π Overview
The UrbanSound_EventDetection_Wav2Vec2 is a highly efficient model based on the pre-trained Wav2Vec2 architecture, fine-tuned specifically for classifying momentary and continuous sound events within urban environments. It processes raw audio waveforms to identify one of eight high-priority urban sound classes, focusing on high-impact and potentially anomalous events.
π§ Model Architecture
This model utilizes the standard Wav2Vec2 pipeline, which operates directly on raw audio data without the need for manual feature extraction (like MFCCs).
- Base Model:
facebook/wav2vec2-base - Feature Extractor: A stack of 1D convolutional layers extracts local features from the raw waveform.
- Transformer Encoder: 12 layers of Transformer blocks capture long-range dependencies and global context within the audio clip.
- Classification Head: A task-specific linear layer is placed on top of the contextualized representations to predict one of the 8 event labels.
- Target Classes: Car_Horn, Children_Playing, Dog_Barking, Machinery_Hum, Siren_Emergency, Train_Whistle, Tire_Screech, and Glass_Shattering.
π― Intended Use
This model is intended for smart city, safety, and acoustic monitoring systems:
- Acoustic Surveillance: Real-time detection of emergency sounds (Siren, Glass Shattering, Tire Screech) for public safety alerting.
- Noise Pollution Monitoring: Quantifying the occurrence and frequency of specific noise sources (Car Horn, Machinery Hum) in different city zones.
- Urban Planning: Analyzing soundscape composition to inform policy on zoning and noise mitigation strategies.
β οΈ Limitations
- Event Overlap: The current setup is trained for single-label classification. If multiple sounds occur simultaneously (e.g., Siren + Dog Barking), the model will only output the single most probable event, potentially ignoring others.
- Domain Shift: The model's performance may degrade if deployed in environments with significantly different background noise profiles (e.g., highly quiet suburbs vs. extremely loud Asian markets).
- Localization: This model performs event detection but does not inherently provide sound localization (Direction-of-Arrival or DOA), which would require specialized input features (like ambisonic audio) and a different model head.
MODEL 2: MedicalChatbot_IntentClassifier_RoBERTa
This model is a RoBERTa-based model for multi-class classification of user intent within medical dialogue transcripts.
config.json
{
"_name_or_path": "roberta-base",
"architectures": [
"RobertaForSequenceClassification"
],
"hidden_size": 768,
"model_type": "roberta",
"num_hidden_layers": 12,
"vocab_size": 50265,
"id2label": {
"0": "Symptom_Reporting",
"1": "Advice_Seeking",
"2": "Medication_Query",
"3": "Appointment_Scheduling",
"4": "Billing_Query",
"5": "Causal_Query",
"6": "Record_Retrieval",
"7": "Urgency_Assessment"
},
"label2id": {
"Symptom_Reporting": 0,
"Advice_Seeking": 1,
"Medication_Query": 2,
"Appointment_Scheduling": 3,
"Billing_Query": 4,
"Causal_Query": 5,
"Record_Retrieval": 6,
"Urgency_Assessment": 7
},
"num_labels": 8,
"problem_type": "single_label_classification",
"transformers_version": "4.36.0"
}