Dreamer-MC: A Real-Time Autoregressive World Model for Infinite Video Generation
๐ Introduction
This repository contains the Inference Code for the Minecraft Autoregressive World Model.
This project serves as an open-source reproduction of the DreamerV4 architecture, tailored specifically for high-fidelity simulation in the Minecraft environment. Our model utilizes a MAE (Masked Autoencoder) for efficient video compression and a DiT (Diffusion Transformer) architecture to autoregressively predict future game frames based on history and action inputs in the latent space.
This codebase is streamlined for deployment and generation, supporting long-context inference and real-time interaction.
Key Features
- Inference Only: Lightweight codebase focused on generation, stripped of complex training logic.
- Long Context Support: Capable of loading Long-Context models to recall events from 12 seconds prior.
- Fast Inference Backend: Built-in optimized inference pipeline designed for high-performance, real-time next-frame prediction.
- Infinite Generation: Supports infinite generation without image quality degradation during long-term rollouts.
- Complex Interaction: Supports a variety of interactions within the Minecraft world, such as eating food, collecting water, using weapons, etc.
๐ฐ Model Zoo
Please download the pre-trained weights and place them in the checkpoints/ directory before running the code.
| Model Name |
Params |
VRAM Req |
Description |
| MAE-Tokenizer |
430M |
>2GB |
Handles video encoding and decoding. |
| Dynamic Model |
1.7B |
9GB |
Generates the next frame based on history and action. |
๐ Download: HuggingFace Collection
๐ ๏ธ Installation
We recommend using Python 3.10+ and CUDA 12.1+.
git clone https://github.com/IamCreateAI/Dreamerv4-MC.git
cd Dreamerv4-MC
conda create -n dreamer python=3.12 -y
conda activate dreamer
pip install torch torchvision --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)
pip install -r requirements.txt
MAX_JOBS=4 pip install flash-attn --no-build-isolation
pip install -e .
๐ป Quick-Start
python ui/inference_ui.py --dynamic_path=/path/to/dynamic_model \
--tokenizer_path=/path/to/tokenizer/ \
--record_video_output_path=output/
๐ฎ Controls
| Key |
Action |
| W / A / S / D |
Move |
| Space |
Jump |
| Left Click |
Attack / Destroy |
| Right Click |
Place / Use Item |
| E |
Open/Close Inventory (Simulation) |
| 1 - 9 |
Select Hotbar Slot |
| R |
start/stop record the video |
| V |
refresh into new scene |
| left Shift |
Sneak |
| left ctrl |
Sprint |
๐ Citation
If you use this codebase in your research, please consider citing us as:
@article{hafner2025dreamerv4,
title = {Dreamer-MC: A Real-Time Autoregressive World Model for Infinite Video Generation},
author = {Ming Gao, Yan Yan, ShengQu Xi, Yu Duan, ShengQian Li, Feng Wang},
year = {2026},
url = {https://findlamp.github.io/dreamer-mc.github.io/}
}
as well as the original Dreamer 4 paper:
@misc{Hafner2025TrainingAgents,
title={Training Agents Inside of Scalable World Models},
author={Danijar Hafner and Wilson Yan and Timothy Lillicrap},
year={2025},
eprint={2509.24527},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2509.24527},
}
๐ References
This project is built upon the following foundational works: