Spaces:
No application file
No application file
Apply for community grant: Academic project (gpu and storage)
#1
by
yyyou
- opened
In this work, we introduce LLaDA-V, a purely diffusion-based Multimodal Large Language Model (MLLM) that integrates visual instruction tuning with masked diffusion models, representing a departure from the autoregressive paradigms dominant in current multimodal approaches.