francesco visconti
biohiphop
·
AI & ML interests
None yet
Organizations
None yet
MultimodalGen
-
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Paper • 2505.10046 • Published • 9 -
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Paper • 2506.16035 • Published • 89 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 250 -
Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers
Paper • 2507.10787 • Published • 13
MLOps
PromptOpt
MultimodalGen
-
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Paper • 2505.10046 • Published • 9 -
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Paper • 2506.16035 • Published • 89 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 250 -
Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers
Paper • 2507.10787 • Published • 13
Llm benchmarking