MobileFetalCLIP
Selective Repulsive Knowledge Distillation for Mobile Fetal Ultrasound Analysis
Project Website | Paper | GitHub Repository
MobileFetalCLIP is a highly efficient foundation model designed specifically for fetal ultrasound analysis on point-of-care, low-resource devices (like smartphones). It achieves this by distilling knowledge from a massive 427M parameter teacher model into a tiny 11.4M parameter student model using a novel technique called Selective Repulsive Knowledge Distillation.
Despite being 26× smaller and 24× faster, MobileFetalCLIP surpasses its massive teacher on standard validity benchmarks (HC18) and retains 97-98% of linear probing performance across tasks.
Model Details
- Architecture: FastViT (Student) distilled from ViT-L/14 (Teacher)
- Parameters: 11.4M Visual Parameters (75M Total)
- Modality: Ultrasound Image / Text
- License: CC BY-NC 4.0 (Non-Commercial Research Use Only)
Key Contributions
- Selective Repulsive KD: A novel methodology that explicitly pushes apart non-matching image-text embeddings during distillation, improving representation geometry.
- Mobile Deployment: Native efficiency, capable of running inference at 1.6ms on an iPhone 16 Pro (compared to the teacher which entirely OOMs).
- SOTA Performance: Establishes a new efficiency-accuracy Pareto frontier for prenatal ultrasound AI.
Usage
Please refer to the official GitHub repository for installation instructions, dataset preparation, and inference scripts: 🔗 GitHub: numanai/MobileFetalCLIP
Citation
If you find this model or codebase useful for your research, please cite the paper:
@article{saeed2026mobilefetalclip,
title = {MobileFetalCLIP: Selective Repulsive Knowledge Distillation
for Mobile Fetal Ultrasound Analysis},
author = {Saeed, Numan and Maani, Fadillah Adamsyah and Yaqub, Mohammad},
journal = {arXiv preprint arXiv:2603.05421},
year = {2026}
}