Jiakui
/

MV-AR

+---
+license: cc-by-nc-4.0
+---
+# Auto-Regressively Generating Multi-View Consistent Images
+[JiaKui Hu](https://jkhu29.github.io/)\*, [Yuxiao Yang](https://yuxiaoyang23.github.io/)\*, [Jialun Liu](https://scholar.google.com/citations?user=OkMMP2AAAAAJ), [Jinbo Wu](https://scholar.google.com/citations?user=9OecN2sAAAAJ), [Chen Zhao](), [Yanye Lu](https://scholar.google.com/citations?user=WSFToOMAAAAJ)
+<br>PKU, BaiduVis, THU<br>
+## Introduction
+![overview](https://raw.githubusercontent.com/MILab-PKU/MVAR/refs/heads/main/assets/MVAR_overview.png)
+Diffusion-based multi-view image generation methods use a specific reference view for predicting subsequent views, which becomes problematic when overlap between the reference view and the predicted view is minimal, affecting image quality and multi-view consistency. Our MV-AR addresses this by using the preceding view with significant overlap for conditioning.
+## Results
+### Text to Multiview images
+![t2mv](https://raw.githubusercontent.com/MILab-PKU/MVAR/refs/heads/main/assets/t2mv_compare.png)
+### Image to Multiview images
+![i2mv](https://raw.githubusercontent.com/MILab-PKU/MVAR/refs/heads/main/assets/i2mv_compare.png)
+### Text + Geometric to Multiview images
+![ts2mv](https://raw.githubusercontent.com/MILab-PKU/MVAR/refs/heads/main/assets/ts2mv_cases.png)
+## Quick Start
+### Requirements
+> Please follow the instructions in [code](https://github.com/MILab-PKU/MVAR).
+### Reproduce
+1. Please download [flan-t5-xl](https://huggingface.co/google/flan-t5-xl) in `./pretrained_models`;
+2. Please download [Cap3D_automated_Objaverse_full.csv](https://huggingface.co/datasets/tiange/Cap3D/blob/main/Cap3D_automated_Objaverse_full.csv) in `dataset/captions`;
+3. Please download models here, put them in `./pretrained_models`;
+4. Run:
+```shell
+# For t2mv on objaverse
+sh sample_tcam2i.sh
+# For t2mv on GSO
+sh sample_icam2i_gso.sh
+# For i2mv on GSO
+sh sample_icam2i_gso.sh
+```
+The generated images will be saved to `samples_objaverse_nv_ray/`.
+## Acknowledgement
+This repository is heavily based on [LlamaGen](https://github.com/FoundationVision/LlamaGen). We would like to thank the authors of these work for publicly releasing their code.
+For help or issues using this git, please feel free to submit a GitHub issue.
+For other communications related to this git, please contact `jkhu29@stu.pku.edu.cn`.
+### Citation
+```bibtex
+@article{hu2025mvar,
+  title={Auto-Regressively Generating Multi-View Consistent Images},
+  author={Hu, JiaKui and Yang, Yuxiao and Liu, Jialun and Wu, Jinbo and Zhao, Chen and Lu, Yanye},
+  journal={arXiv preprint arXiv:2506.18527},
+  year={2025}
+}
+```