view article Article Structural Problems in AI Benchmarking and the Case for a Unified Evaluation Framework 3 days ago • 11
view article Article 🏟️ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do about 13 hours ago • 14
view article Article MARL: Runtime Middleware That Reduces LLM Hallucination Without Fine-Tuning 2 days ago • 13
view article Article Do Bubbles Form When Tens of Thousands of AIs Simulate Capitalism? 15 days ago • 17