view article Article IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST 21 days ago • 18
view article Article From Golden Gate Bridge to Broken JSON: Why Anthropic's SAE Steering Fails for Structured Output Feb 7 • 22
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 Feb 4 • 88
view article Article Nemotron ColEmbed V2: Raising the Bar for Multimodal Retrieval with ViDoRe V3’s Top Model Feb 4 • 28
view article Article Llasa Goes RL: Training LLaSA with GRPO for Improved Prosody and Expressiveness Nov 5, 2025 • 12
Nemotron ColEmbed V2 Collection State-of-the-Art Late Interaction Vision-Language Embedding Models • 3 items • Updated 5 days ago • 10
view article Article AssetOpsBench: Bridging the Gap Between AI Agent Benchmarks and Industrial Reality Jan 21 • 31
view article Article LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family Jan 19 • 87
LightOnOCR-2 🦉 Collection LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family • 12 items • Updated 8 days ago • 22