view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 5 days ago • 223
E-MM1 Collection Multimodal embedding model, supporting datasets, and a paper describing the process going into building both the datasets and the models 🤗 • 6 items • Updated 15 days ago • 10
Gaperon: A Peppered English-French Generative Language Model Suite Paper • 2510.25771 • Published Oct 29 • 15
view article Article ViDoRe V3: a comprehensive evaluation of retrieval for enterprise use-cases about 1 month ago • 52
ViDoRe Benchmark V3 Collection ViDoRe V3 is our latest benchmark, engineered to set a new industry gold standard for multi-modal, enterprise document retrieval evaluation. • 8 items • Updated about 1 month ago • 11
view article Article huggingface_hub v1.0: Five Years of Building the Foundation of Open Machine Learning +2 Oct 27 • 69
ImpossibleBench Datasets Collection Datasets constructed in ImpossibleBench https://arxiv.org/abs/2510.20270 • 2 items • Updated Oct 24 • 1
Nigeria Energy Sector Collection A collection of datasets across Nigeria's energy sector. • 35 items • Updated Oct 11 • 8
Amon Dîn Collection A collection of datasets from the Nigerian Telecommunications Sector • 34 items • Updated Oct 6 • 1
view article Article `LeRobotDataset:v3.0`: Bringing large-scale datasets to `lerobot` +9 Sep 16 • 47
Tiny Language Model Datasets Collection Collection of Synthetic Datasets that can be used in pretraining of any the Tiny Language Model • 14 items • Updated Sep 21 • 29