File size: 8,757 Bytes
3fe5e39
 
 
 
 
 
 
 
5a5a57c
 
 
3fe5e39
 
 
 
5a5a57c
 
b2bb88c
d5967ea
3fe5e39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5a5a57c
 
df68f4a
5a5a57c
df68f4a
5a5a57c
df68f4a
5a5a57c
df68f4a
 
 
 
 
 
5a5a57c
3fe5e39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1add947
3fe5e39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71f0a80
3fe5e39
 
 
 
 
 
 
 
 
2330874
 
 
 
 
 
3fe5e39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6d0dd7e
3fe5e39
 
 
 
 
 
 
 
 
 
 
 
e7affac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
---
language:
- en
base_model:
- Qwen/Qwen-Image
pipeline_tag: text-to-image
tags:
- '360'
- 360°
- 360-degree
- 360-image
- equirectangular
- equirectangular-projection
- image-generation
- text-to-image
datasets:
- CaptionEmporium/pexels-568k-internvl2
license: mit
library_name: diffusers
---

# Qwen 360 Diffusion

![](https://huggingface.co/ProGamerGov/qwen-360-diffusion/resolve/main/example_grid.jpg)

## General

Qwen 360 Diffusion is a rank 128 LoRA built on top of a 20B parameter MMDiT (Multimodal Diffusion Transformer) model, designed to generate 360 degree equirectangular projection images from text descriptions.

The model was trained from the [Qwen Image model](https://huggingface.co/Qwen/Qwen-Image) on an extremely diverse dataset composed of tens of thousands of equirectangular images, depicting landscapes, interiors, humans, animals, and objects. All images were resized to 2048x1024 before training.

The model was also trained with a diverse dataset of normal photos for regularization, making the model a realism finetune when prompted correctly.

Based on extensive testing, the model's capabilities vastly exceed all other currently available T2I 360 image generation models. Thus when given the right prompt, the model should be capable of producing almost anything you want.

The model is designed to be capable of producing equirectangular images that can be used for non-VR purposes such as general imagery, photography, artwork, architecture, portraiture, and many other concepts. 

### Training Details

The training dataset consists of 32k unique 360 degree equirectangular images. Each image was randomly rotated horizontally 3 times for data augmentation (original + 3 rotations), providing a total of 128k training images. All 32k original 360 images were manually checked by humans for seams, polar artifacts, incorrect distortions, and other problems before their inclusion in the dataset.

For regularization, 64k images were randomly selected from the [pexels-568k-internvl2](https://huggingface.co/datasets/CaptionEmporium/pexels-568k-internvl2) dataset and added to the training set.

**Training timeline:** 3 months and 23 days

Training was first performed using nf4 quantization for 32 epochs (8 epochs counting the original + augmentations as a single epoch):
- `qwen-360-diffusion-int4-bf16-v1.safetensors` was trained for 28 epochs (1,344,000 steps)
- `qwen-360-diffusion-int4-bf16-v1-b.safetensors` was trained for 32 epochs (1,536,000 steps)

Training then continued at int8 quantization for another 16 epochs (4 epochs counting the original + augmentations as a single epoch):
- `qwen-360-diffusion-int8-bf16-v1.safetensors` was trained for a total of 48 epochs (2,304,000 steps)

---


## Usage

To activate panoramic generation, include one of the following **trigger phrases** or some variation of one or more of the following trigger words in your prompt:

> `"equirectangular"`, `"360 image"`, `"360 panorama"`, or `"360 degree panorama with equirectangular projection"`


Note that even using a 360 viewer on your 2D device screen can create a feeling like you are actually inside the scene, known as a sense of 'presence' in psychology.

### Recommended Settings

- **Aspect ratio:** For best results use the `2:1` resolution of `2048×1024`. Using `1024×512`, `1536×768`, and other 2:1 ratios for text-to-image generation may cause the model to struggle with generating proper horizons.  
- **Prompt tips:** Include desired **medium or style**, such as _photograph_, _oil painting_, _illustration_, or _digital art_.
- **360-specific considerations:** Remember that 360 images wrap around with no borders—the left edge connects to the right edge, while the top and bottom edges merge into a single point at the poles of the sphere.
- **Human subject considerations:** For full body shots, specify the head/face and footwear (e.g., "wearing boots") or lack thereof to avoid incomplete or incorrectly distorted outputs.
- **Equirectangular distortion:** Outputs show increasing horizontal stretching as you move vertically away from the center. These distortions are not visible when viewed in a 360 viewer.

Once generated, you can upscale your panoramas for use as **photographs**, **artwork**, **skyboxes, virtual environments, VR experiences, VR therapy, or 3D scene backgrounds**—or as part of a **text-to-image-to-video-to-3D-world pipeline**. Note that the model is also designed to produce equirectangular images for non-VR usage as well.

---

### Notes

#### FP8 inference

When using FP8 quantization, for maximum visual fidelity it's **strongly recommended** to use the GGUF Q8 or int8 quantized versions of Qwen Image transformer models.

If you are using transformer models with `fp8_e4m3fn` or `fp8_e5m2` precision, or low precision models trained with "accuracy-fixing" methods (e.g., `ostris/ai-toolkit`), they may cause **patch or grid artifacts** when used with the int8-trained LoRA model. Some have found this issue to be caused by directly downcasting to fp8 from fp16, without proper scaling and calibration.
  → To avoid this, use the **lower-accuracy full-precision versions** of the model:  
  `qwen-360-diffusion-int4-bf16-v1.safetensors` or `qwen-360-diffusion-int4-bf16-v1-b.safetensors`.

  - **Low-Precision Artifact Mitigation**  
    If artifacts still appear when using the int4-trained LoRA on a `fp8_e4m3fn` or `fp8_e5m2` transformer quant, they can often be reduced by:  
     - Adjusting the **LoRA weight**, and/or refining both **positive and negative prompts**.

---

## Additional Tools

### HTML 360 Viewer

To make the viewing and sharing of 360 images & video easier, I built a web browser based HTML 360 viewer that runs locally on your device. It works on desktop and mobile browsers, and has optional VR headset support.

* You can try it out here on Github Pages: https://progamergov.github.io/html-360-viewer/
    * Github code: https://github.com/ProGamerGov/html-360-viewer
* You can append '`?url=`' followed by a link to your image in order to automatically load it into the 360 viewer, making sharing your 360 creations extremely easy.
* Example: https://progamergov.github.io/html-360-viewer/?url=https://upload.wikimedia.org/wikipedia/commons/7/76/Dauderi.jpg


### Recommended ComfyUI Nodes

If you are a user of [ComfyUI](https://github.com/comfyanonymous/ComfyUI), then these sets of nodes can be useful for working with 360 images & videos.

* ComfyUI_preview360panorama
    * For viewing 360s inside of ComfyUI (may be slower than my web browser viewer).
    * Link: https://github.com/ProGamerGov/ComfyUI_preview360panorama

* ComfyUI_pytorch360convert
    * For editing 360s, seam fixing, view rotation, and masking potential artifacts.
    * Link: https://github.com/ProGamerGov/ComfyUI_pytorch360convert

* ComfyUI_pytorch360convert_video
    * For generating sweep videos that rotate around the scene.
    * Link: https://github.com/ProGamerGov/ComfyUI_pytorch360convert_video

For those using diffusers and other libraries, you can make use of the [pytorch360convert](https://github.com/ProGamerGov/pytorch360convert) library when working with 360 media.

---

### Diffusers Example

Example scripts using [Diffusers](https://github.com/huggingface/diffusers) can be found for running Qwen-Image with nf4 [here](https://huggingface.co/ProGamerGov/qwen-360-diffusion/blob/main/run_qwen_image_nf4.py) and int8 [here](https://huggingface.co/ProGamerGov/qwen-360-diffusion/blob/main/run_qwen_image_int8.py).

---

## Limitations

A large portion of training data has the viewer at 90 degrees to the direction of gravity, and thus rotating outputs may be required to achieve different vertical angles.

---

## Example Gallery

We have uploaded over 320 images with full metadata and prompts to the CivitAI gallery for inspiration, including all the images in the grid above. You can find the [gallery here](https://civitai.com/models/2209835/qwen-360-diffusion).

---

## Contributors

- [Ben Egan](https://github.com/ProGamerGov)
- [XWAVE](https://twitter.com/XWAVEart)
- [Jimmy Carter](https://huggingface.co/jimmycarter)


## Citation Information

BibTeX

```
@software{Egan_Qwen_360_Diffusion_2025,
  author = {Egan, Ben and {XWAVE} and {Jimmy Carter}},
  license = {MIT},
  month = dec,
  title = {{Qwen 360 Diffusion}},
  url = {https://huggingface.co/ProGamerGov/qwen-360-diffusion},
  year = {2025}
}
```

APA

```
Egan, B., XWAVE, & Jimmy Carter. (2025). Qwen 360 Diffusion [Computer software]. https://huggingface.co/ProGamerGov/qwen-360-diffusion
```

Please refer to the [CITATION.cff](https://huggingface.co/ProGamerGov/qwen-360-diffusion/blob/main/CITATION.cff) for more information on how to cite this model.