|
|
--- |
|
|
pipeline_tag: text-to-image |
|
|
inference: false |
|
|
library_name: tensorrt |
|
|
license: other |
|
|
license_name: stabilityai-ai-community |
|
|
license_link: LICENSE.md |
|
|
tags: |
|
|
- tensorrt |
|
|
- sd3.5-medium |
|
|
- text-to-image |
|
|
- onnx |
|
|
extra_gated_prompt: >- |
|
|
By clicking "Agree", you agree to the [License |
|
|
Agreement](https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md) |
|
|
and acknowledge Stability AI's [Privacy |
|
|
Policy](https://stability.ai/privacy-policy). |
|
|
extra_gated_fields: |
|
|
Name: text |
|
|
Email: text |
|
|
Country: country |
|
|
Organization or Affiliation: text |
|
|
Receive email updates and promotions on Stability AI products, services, and research?: |
|
|
type: select |
|
|
options: |
|
|
- 'Yes' |
|
|
- 'No' |
|
|
What do you intend to use the model for?: |
|
|
type: select |
|
|
options: |
|
|
- Research |
|
|
- Personal use |
|
|
- Creative Professional |
|
|
- Startup |
|
|
- Enterprise |
|
|
I agree to the License Agreement and acknowledge Stability AI's Privacy Policy: checkbox |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# Stable Diffusion 3.5 Medium TensorRT |
|
|
## Introduction |
|
|
|
|
|
This repository hosts the **TensorRT-optimized version** of **Stable Diffusion 3.5 Medium**, developed in collaboration between [Stability AI](https://stability.ai) and [NVIDIA](https://huggingface.co/nvidia). This implementation leverages NVIDIA's TensorRT deep learning inference library to deliver significant performance improvements while maintaining the exceptional image quality of the original model. |
|
|
|
|
|
Stable Diffusion 3.5 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. The TensorRT optimization makes these capabilities accessible for production deployment and real-time applications. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
This repository holds the ONNX exports of the T5, MMDiT and VAE models in BF16 precision. |
|
|
|
|
|
|
|
|
## Performance using TensorRT 10.13 |
|
|
#### Timings for 30 steps at 1024x1024 |
|
|
|
|
|
| Accelerator | Precision | CLIP-G | CLIP-L | T5 | MMDiT x 30 | VAE Decoder | Total | |
|
|
|-------------|-----------|------------|--------------|--------------|-----------------------|---------------------|------------------------| |
|
|
| H100 | BF16 | 16.52 ms | 6.83 ms | 8.46 ms | 2358.34 ms | 72.58 ms | 2496.63 ms | |
|
|
|
|
|
|
|
|
## Usage Example |
|
|
1. Follow the [setup instructions](https://github.com/NVIDIA/TensorRT/blob/release/sd35/demo/Diffusion/README.md) on launching a TensorRT NGC container. |
|
|
```shell |
|
|
git clone https://github.com/NVIDIA/TensorRT.git |
|
|
cd TensorRT |
|
|
git checkout release/sd35 |
|
|
docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:25.01-py3 /bin/bash |
|
|
``` |
|
|
|
|
|
|
|
|
2. Install libraries and requirements |
|
|
```shell |
|
|
cd demo/Diffusion |
|
|
python3 -m pip install --upgrade pip |
|
|
pip3 install -r requirements.txt |
|
|
python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt-cu12 |
|
|
``` |
|
|
|
|
|
3. Generate HuggingFace user access token |
|
|
To download model checkpoints for the Stable Diffusion 3.5 checkpoints, please request access on the [Stable Diffusion 3.5 Medium](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) page. |
|
|
You will then need to obtain a `read` access token to HuggingFace Hub and export as shown below. See [instructions](https://huggingface.co/docs/hub/security-tokens). |
|
|
|
|
|
```bash |
|
|
export HF_TOKEN=<your access token> |
|
|
``` |
|
|
|
|
|
4. Perform TensorRT optimized inference: |
|
|
|
|
|
- **Stable Diffusion 3.5 Medium in BF16 precision** |
|
|
|
|
|
``` |
|
|
python3 demo_txt2img_sd35.py \ |
|
|
"a beautiful photograph of Mt. Fuji during cherry blossom" \ |
|
|
--version=3.5-medium \ |
|
|
--bf16 \ |
|
|
--download-onnx-models \ |
|
|
--denoising-steps=30 \ |
|
|
--guidance-scale 3.5 \ |
|
|
--build-static-batch \ |
|
|
--use-cuda-graph \ |
|
|
--hf-token=$HF_TOKEN |
|
|
``` |
|
|
|
|
|
|