WavesFM v1.0
WavesFM is a multimodal wireless foundation model that processes raw IQ streams and image-like wireless modalities (spectrograms and CSI). It uses a single Vision Transformer (ViT) backbone with modality-specific input embeddings and a masked wireless modeling pretraining objective. The model is evaluated across RF fingerprinting, interference detection/classification, human activity sensing, RF signal classification, and 5G NR positioning.
This card summarizes the model as described in:
"Multimodal Wireless Foundation Models" (Aboulfotouh, Abou-Zeid), arXiv:2511.15162.
Model details
- Architecture: ViT encoder with modality-specific input embeddings and lightweight task heads.
- Modalities: raw IQ streams and image-like inputs (spectrograms, CSI).
- Encoder config (paper): patch size 16x16, IQ segment length 16, 8 blocks, embed dim 256.
- Params (paper): ~6.32M encoder, ~0.79M decoder (decoder used only in pretraining).
Pretraining
- Objective: masked wireless modeling (MAE-style reconstruction).
- Datasets:
- Spectrogram dataset: 3,200 samples from over-the-air SDR captures across multiple signal types (WiFi, LTE, Bluetooth, 5G-NR, ISM).
- IQ dataset: 3,200 samples from a 4-antenna MIMO indoor testbed with varied modulations/technologies and TX/RX configurations.
- Setup (paper): 800 epochs, 40-epoch warmup, 70% masking ratio for both modalities, Adam with lr 1e-3 and cosine annealing.
Fine-tuning regimes
- LP (Linear probing): encoder frozen; train task head + input projections.
- FT2 (Partial fine-tuning): last 2 encoder blocks unfrozen.
- LoRA: rank 32, alpha 32, adapters in attention projections; encoder frozen.
Downstream tasks
- RF Fingerprinting (RFP) - IQ, device identification (mean per-class accuracy).
- Interference Detection (INTD) / Classification (INTC) - IQ (mean per-class accuracy).
- Human Activity Sensing (HAS) - CSI (mean per-class accuracy).
- RF Signal Classification (RFS) - spectrograms (mean per-class accuracy).
- 5G NR Positioning (POS) - CSI (mean localization error in meters).
- DeepMIMO LOS/NLOS Classifcation - CSI (mean per-class accuracy)
- DeepMIMO Beam Prediction - CSI (mean pea-class accuracy)
- RADCOM Signal & Modulation Classification - IQ (mean per-class accuracy)
- UWB Indoor Positioning and Tracking - CIR (mean position error)
- UWB Industrial Localization - CIR (mean position error)
For up-to-date reproduction commands and dataset protocols, see:
- Benchmarks: to be added
- Reproduce: to be added