Qwen/Qwen3-VL-235B-A22B-Instruct Image-Text-to-Text • 236B • Updated 15 days ago • 110k • • 329
PAI-Bench: A Comprehensive Benchmark For Physical AI Paper • 2512.01989 • Published 10 days ago • 5
WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning Paper • 2512.02425 • Published 9 days ago • 22
RedHatAI/Qwen2.5-VL-72B-Instruct-FP8-dynamic Image-to-Text • 73B • Updated Apr 25 • 8.65k • 14
Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising Paper • 2511.08633 • Published Nov 9 • 53
Adaptive Multi-Agent Response Refinement in Conversational Systems Paper • 2511.08319 • Published about 1 month ago • 40
Latent Diffusion Model without Variational Autoencoder Paper • 2510.15301 • Published Oct 17 • 48
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs Paper • 2510.09201 • Published Oct 10 • 49
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs Paper • 2510.07499 • Published Oct 8 • 48
ACON: Optimizing Context Compression for Long-horizon LLM Agents Paper • 2510.00615 • Published Oct 1 • 32
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models Paper • 2508.09968 • Published Aug 13 • 15