pharmapsychotic's picture
Mark as TensorRT library for download tracking
e21bc07 verified
---
pipeline_tag: text-to-image
inference: false
library_name: tensorrt
license: other
license_name: stabilityai-ai-community
license_link: LICENSE.md
tags:
- tensorrt
- sd3.5-medium
- text-to-image
- onnx
extra_gated_prompt: >-
By clicking "Agree", you agree to the [License
Agreement](https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md)
and acknowledge Stability AI's [Privacy
Policy](https://stability.ai/privacy-policy).
extra_gated_fields:
Name: text
Email: text
Country: country
Organization or Affiliation: text
Receive email updates and promotions on Stability AI products, services, and research?:
type: select
options:
- 'Yes'
- 'No'
What do you intend to use the model for?:
type: select
options:
- Research
- Personal use
- Creative Professional
- Startup
- Enterprise
I agree to the License Agreement and acknowledge Stability AI's Privacy Policy: checkbox
language:
- en
---
# Stable Diffusion 3.5 Medium TensorRT
## Introduction
This repository hosts the **TensorRT-optimized version** of **Stable Diffusion 3.5 Medium**, developed in collaboration between [Stability AI](https://stability.ai) and [NVIDIA](https://huggingface.co/nvidia). This implementation leverages NVIDIA's TensorRT deep learning inference library to deliver significant performance improvements while maintaining the exceptional image quality of the original model.
Stable Diffusion 3.5 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. The TensorRT optimization makes these capabilities accessible for production deployment and real-time applications.
## Model Details
### Model Description
This repository holds the ONNX exports of the T5, MMDiT and VAE models in BF16 precision.
## Performance using TensorRT 10.13
#### Timings for 30 steps at 1024x1024
| Accelerator | Precision | CLIP-G | CLIP-L | T5 | MMDiT x 30 | VAE Decoder | Total |
|-------------|-----------|------------|--------------|--------------|-----------------------|---------------------|------------------------|
| H100 | BF16 | 16.52 ms | 6.83 ms | 8.46 ms | 2358.34 ms | 72.58 ms | 2496.63 ms |
## Usage Example
1. Follow the [setup instructions](https://github.com/NVIDIA/TensorRT/blob/release/sd35/demo/Diffusion/README.md) on launching a TensorRT NGC container.
```shell
git clone https://github.com/NVIDIA/TensorRT.git
cd TensorRT
git checkout release/sd35
docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:25.01-py3 /bin/bash
```
2. Install libraries and requirements
```shell
cd demo/Diffusion
python3 -m pip install --upgrade pip
pip3 install -r requirements.txt
python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt-cu12
```
3. Generate HuggingFace user access token
To download model checkpoints for the Stable Diffusion 3.5 checkpoints, please request access on the [Stable Diffusion 3.5 Medium](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) page.
You will then need to obtain a `read` access token to HuggingFace Hub and export as shown below. See [instructions](https://huggingface.co/docs/hub/security-tokens).
```bash
export HF_TOKEN=<your access token>
```
4. Perform TensorRT optimized inference:
- **Stable Diffusion 3.5 Medium in BF16 precision**
```
python3 demo_txt2img_sd35.py \
"a beautiful photograph of Mt. Fuji during cherry blossom" \
--version=3.5-medium \
--bf16 \
--download-onnx-models \
--denoising-steps=30 \
--guidance-scale 3.5 \
--build-static-batch \
--use-cuda-graph \
--hf-token=$HF_TOKEN
```