🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19 • 175
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community +1 Apr 15, 2024 • 190
view article Article Docmatix - a huge dataset for Document Visual Question Answering Jul 18, 2024 • 78
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models +1 Mar 20, 2024 • 105
view article Article Ethics and Society Newsletter #6: Building Better AI: The Importance of Data Quality +8 Jun 24, 2024 • 34
view article Article Experimenting with Automatic PII Detection on the Hub using Presidio +2 Jul 10, 2024 • 26
view article Article How to directly access 150k+ Hugging Face Datasets with DuckDB and query using GPT-4o May 31, 2024 • 11
view article Article Synthetic dataset generation techniques: generating custom sentence similarity data May 23, 2024 • 16