ESPIRE: A Diagnostic Benchmark for Embodied Spatial Reasoning of Vision-Language Models Paper • 2603.13033 • Published 11 days ago • 13
Cosmos-Tokenizer Collection A suite of image and video tokenizers • 12 items • Updated about 6 hours ago • 43