Running Hackathon Humans in the Loop - Desafío ALIA ✍ Create and manage labeled text datasets with Argilla
Running Hackathon Humans in the Loop - Desafío ALIA ✍ Create and manage labeled text datasets with Argilla
BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data Paper • 2510.10159 • Published Oct 11, 2025 • 3
Measuring what Matters: Construct Validity in Large Language Model Benchmarks Paper • 2511.04703 • Published Nov 3, 2025 • 8
Running on CPU Upgrade 75 La Leaderboard 🌸 75 Evaluate open LLMs in the languages of LATAM and Spain.
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments Paper • 2509.14233 • Published Sep 17, 2025 • 16
Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings Paper • 2509.14405 • Published Sep 17, 2025 • 2
Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings Paper • 2509.14405 • Published Sep 17, 2025 • 2
Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans Paper • 2506.22439 • Published May 29, 2025 • 3
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments Paper • 2509.14233 • Published Sep 17, 2025 • 16
La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America Paper • 2507.00999 • Published Jul 1, 2025 • 1