cfchase commited on
Commit
098ac30
·
verified ·
1 Parent(s): 06d699c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +127 -209
README.md CHANGED
@@ -1,235 +1,153 @@
1
- # OpenShift AI Demo: Text-to-Image Generation
2
-
3
- This demonstration showcases the complete machine learning workflow in Red Hat OpenShift AI, taking you from initial experimentation to production deployment. Using Stable Diffusion for text-to-image generation, you'll learn how to experiment with models, fine-tune them with custom data, create automated pipelines, and deploy models as scalable services.
4
-
5
- ## What You'll Learn
6
-
7
- - **Data Science Projects**: Creating and managing ML workspaces in OpenShift AI
8
- - **GPU-Accelerated Workbenches**: Leveraging NVIDIA GPUs for model training and inference
9
- - **Model Experimentation**: Working with pre-trained models from Hugging Face
10
- - **Fine-Tuning**: Customizing models with your own data using Dreambooth
11
- - **Pipeline Automation**: Building repeatable ML workflows with Data Science Pipelines
12
- - **Model Serving**: Deploying models as REST APIs using KServe
13
- - **Production Integration**: Connecting served models to applications
14
-
15
- ## Prerequisites
16
-
17
- ### Platform Requirements
18
- - Red Hat OpenShift cluster (4.12+)
19
- - Red Hat OpenShift AI installed (2.9+)
20
- - For managed service: Available as add-on for OpenShift Dedicated or ROSA
21
- - For self-managed: Install from OperatorHub
22
- - GPU node with at least 45GB memory (NVIDIA L40S recommended, A10G minimum for smaller models)
23
-
24
- ### Storage Requirements
25
- - S3-compatible object storage (MinIO, AWS S3, or Ceph)
26
- - Two buckets configured:
27
- - `pipeline-artifacts`: For pipeline execution artifacts
28
- - `models`: For storing trained models
29
-
30
- ### Access Requirements
31
- - OpenShift AI Dashboard access
32
- - Ability to create Data Science Projects
33
- - (Optional) Hugging Face account with API token for model downloads
34
-
35
- ## Quick Start
36
-
37
- 1. **Access OpenShift AI Dashboard**
38
- - Navigate to your OpenShift console
39
- - Click the application launcher (9-dot grid)
40
- - Select "Red Hat OpenShift AI"
41
-
42
- 2. **Create a Data Science Project**
43
- - Click "Data Science Projects"
44
- - Create a new project named `image-generation`
45
-
46
- 3. **Set Up Storage**
47
- - Import `setup/setup-s3.yaml` to create local S3 storage (for demos)
48
- - Or configure your own S3-compatible storage connections
49
-
50
- 4. **Create a Workbench**
51
- - Select PyTorch notebook image
52
- - Allocate GPU resources
53
- - Add environment variables (including `HF_TOKEN` if available)
54
- - Attach data connections
55
-
56
- 5. **Clone This Repository**
57
- ```bash
58
- git clone https://github.com/cfchase/text-to-image-demo.git
59
- cd text-to-image-demo
60
- ```
61
-
62
- 6. **Follow the Notebooks**
63
- - `1_experimentation.ipynb`: Initial model testing
64
- - `2_fine_tuning.ipynb`: Training with custom data
65
- - `3_remote_inference.ipynb`: Testing deployed models
66
-
67
- ## Key Components
68
-
69
- - **Workbenches**: Jupyter notebook environments for development
70
- - **Pipelines**: Automated ML workflows
71
- - **Model Serving**: Deploy models as REST APIs
72
- - **Storage**: S3-compatible object storage for data and models
73
-
74
- ## Detailed Setup Instructions
75
-
76
- ### 1. Storage Configuration
77
-
78
- #### Option A: Demo Setup (Local S3)
79
- ```bash
80
- oc apply -f setup/setup-s3.yaml
81
  ```
82
 
83
- This creates:
84
- - MinIO deployment for S3-compatible storage
85
- - Two PVCs for buckets
86
- - Data connections for workbench and pipeline access
87
 
88
- #### Option B: Production Setup (External S3)
89
- Create data connections with your S3 credentials:
90
- - Connection 1: "My Storage" - for workbench access
91
- - Connection 2: "Pipeline Artifacts" - for pipeline server
92
 
93
- ### 2. Workbench Configuration
 
 
 
94
 
95
- When creating your workbench:
96
 
97
- **Notebook Image**: Choose based on your needs
98
- - Standard Data Science: Basic Python environment
99
- - PyTorch: Includes PyTorch, CUDA support (recommended for this demo)
100
- - TensorFlow: For TensorFlow-based workflows
101
- - Custom: Use your own image with specific dependencies
102
 
103
- **Resources**:
104
- - Small: 2 CPUs, 8Gi memory
105
- - Medium: 7 CPUs, 24Gi memory
106
- - Large: 14 CPUs, 56Gi memory
107
- - GPU: Add 1-2 NVIDIA GPUs (required for this demo)
 
 
 
108
 
109
- **Environment Variables**:
110
- ```
111
- HF_TOKEN=<your-huggingface-token> # For model downloads
112
- AWS_S3_ENDPOINT=<s3-endpoint-url> # Auto-configured if using data connections
113
- AWS_ACCESS_KEY_ID=<access-key> # Auto-configured if using data connections
114
- AWS_SECRET_ACCESS_KEY=<secret-key> # Auto-configured if using data connections
115
- AWS_S3_BUCKET=<bucket-name> # Auto-configured if using data connections
116
- ```
117
-
118
- ### 3. Pipeline Server Setup
119
-
120
- 1. In your Data Science Project, go to "Pipelines" → "Create pipeline server"
121
- 2. Select the "Pipeline Artifacts" data connection
122
- 3. Wait for the server to be ready (2-3 minutes)
123
-
124
- ### 4. Model Serving Configuration
125
-
126
- After training your model:
127
-
128
- 1. Deploy the custom Diffusers runtime:
129
- ```bash
130
- cd diffusers-runtime
131
- make build
132
- make push
133
- oc apply -f templates/serving-runtime.yaml
134
- ```
135
 
136
- 2. Create a model server in the OpenShift AI dashboard:
137
- - Model framework: "Custom"
138
- - Model location: S3 path to your trained model
139
- - Select the Diffusers serving runtime
140
 
141
- ## Project Structure
142
 
143
- ```
144
- text-to-image-demo/
145
- ├── README.md # This file
146
- ├── ARCHITECTURE.md # Technical architecture details
147
- ├── PIPELINES.md # Pipeline automation guide
148
- ├── SERVING.md # Model serving guide
149
- ├── DEMO_SCRIPT.md # Step-by-step demo script
150
-
151
- ├── 1_experimentation.ipynb # Initial model testing
152
- ├── 2_fine_tuning.ipynb # Custom training workflow
153
- ├── 3_remote_inference.ipynb # Testing served models
154
-
155
- ├── requirements-base.txt # Base Python dependencies
156
- ├── requirements-gpu.txt # GPU-specific packages
157
-
158
- ├── finetuning_pipeline/ # Kubeflow pipeline components
159
- │ ├── Dreambooth.pipeline # Pipeline definition
160
- │ ├── get_data.ipynb # Data preparation step
161
- │ ├── train.ipynb # Training execution step
162
- │ └── upload.ipynb # Model upload step
163
-
164
- ├── diffusers-runtime/ # Custom KServe runtime
165
- │ ├── Dockerfile # Runtime container definition
166
- │ ├── model.py # KServe predictor implementation
167
- │ └── templates/ # Kubernetes manifests
168
-
169
- └── setup/ # Deployment configurations
170
- └── setup-s3.yaml # Demo S3 storage setup
171
- ```
172
-
173
- ## Workflow Overview
174
-
175
- ### 1. Experimentation Phase
176
- - Load pre-trained Stable Diffusion model
177
- - Test basic text-to-image generation
178
- - Identify limitations with generic models
179
 
180
- ### 2. Training Phase
181
- - Prepare custom training data (images of "Teddy")
182
- - Fine-tune model using Dreambooth technique
183
- - Save trained weights to S3 storage
184
 
185
- ### 3. Pipeline Automation
186
- - Convert notebooks to pipeline steps
187
- - Create repeatable training workflow
188
- - Enable parameter tuning and experimentation
189
 
190
- ### 4. Model Serving
191
- - Deploy custom KServe runtime
192
- - Create inference service
193
- - Expose REST API endpoint
194
 
195
- ### 5. Application Integration
196
- - Test model via REST API
197
- - Integrate with applications
198
- - Monitor performance
 
199
 
200
- ## Troubleshooting
201
 
202
- ### GPU Issues
203
- - **No GPU detected**: Ensure your node has GPU support and correct drivers
204
- - **Out of memory**: Reduce batch size or use gradient checkpointing
205
- - **CUDA errors**: Verify PyTorch and CUDA versions match
 
206
 
207
- ### Storage Issues
208
- - **S3 connection failed**: Check credentials and endpoint URL
209
- - **Permission denied**: Verify bucket policies and access keys
210
- - **Upload timeouts**: Check network connectivity and proxy settings
211
 
212
- ### Pipeline Issues
213
- - **Pipeline server not starting**: Check data connection configuration
214
- - **Pipeline runs failing**: Review logs in pipeline run details
215
- - **Missing artifacts**: Verify S3 bucket permissions
 
 
 
216
 
217
- ### Serving Issues
218
- - **Model not loading**: Check S3 path and model format
219
- - **Inference errors**: Review KServe pod logs
220
- - **Timeout errors**: Increase resource limits or timeout values
221
 
222
- ## Additional Resources
223
 
224
- - [Red Hat OpenShift AI Documentation](https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed)
225
- - [OpenShift AI Learning Resources](https://developers.redhat.com/products/red-hat-openshift-ai/overview)
226
- - [KServe Documentation](https://kserve.github.io/website/)
227
- - [Hugging Face Diffusers](https://huggingface.co/docs/diffusers)
 
 
 
 
 
228
 
229
- ## Contributing
230
 
231
- Contributions are welcome! Please feel free to submit issues or pull requests to improve this demo.
232
 
233
- ## License
234
 
235
- This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
 
1
+ ---
2
+ license: other
3
+ base_model: stabilityai/stable-diffusion-3.5-medium
4
+ tags:
5
+ - stable-diffusion
6
+ - stable-diffusion-diffusers
7
+ - text-to-image
8
+ - diffusers
9
+ - dreambooth
10
+ - redhat
11
+ - corporate-branding
12
+ - fine-tuned
13
+ library_name: diffusers
14
+ pipeline_tag: text-to-image
15
+ ---
16
+
17
+ # RedHat Dog SD3 - Fine-tuned Stable Diffusion 3.5 Model
18
+
19
+ ## Model Description
20
+
21
+ This is a fine-tuned version of [Stable Diffusion 3.5 Medium](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) trained using the Dreambooth technique to generate images of a specific Red Hat branded dog character ("rhteddy").
22
+
23
+ ## Model Details
24
+
25
+ - **Base Model**: stabilityai/stable-diffusion-3.5-medium
26
+ - **Fine-tuning Method**: Dreambooth
27
+ - **Training Data**: 5-10 images of Red Hat dog character
28
+ - **Training Steps**: 800 steps
29
+ - **Resolution**: 512x512 pixels
30
+ - **Hardware**: NVIDIA A10G GPU (23GB memory)
31
+
32
+ ## Intended Use
33
+
34
+ This model is designed for:
35
+ - Generating images of the Red Hat dog character in various contexts
36
+ - Educational demonstrations of Dreambooth fine-tuning
37
+ - Corporate branding and marketing content creation
38
+ - Research into personalized diffusion models
39
+
40
+ ## Usage
41
+
42
+ ### Basic Usage
43
+
44
+ ```python
45
+ from diffusers import StableDiffusion3Pipeline
46
+ import torch
47
+
48
+ # Load the model
49
+ pipe = StableDiffusion3Pipeline.from_pretrained(
50
+ "cfchase/redhat-dog-sd3",
51
+ torch_dtype=torch.float16
52
+ )
53
+ pipe = pipe.to("cuda")
54
+
55
+ # Generate an image
56
+ prompt = "photo of a rhteddy dog in a park"
57
+ image = pipe(prompt).images[0]
58
+ image.save("redhat_dog_park.png")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
  ```
60
 
61
+ ### Recommended Prompts
 
 
 
62
 
63
+ The model works best with prompts that include the trigger phrase `rhteddy dog`:
 
 
 
64
 
65
+ - `"photo of a rhteddy dog"`
66
+ - `"rhteddy dog sitting in an office"`
67
+ - `"rhteddy dog wearing a Red Hat"`
68
+ - `"rhteddy dog in a technology conference"`
69
 
70
+ ## Training Details
71
 
72
+ ### Training Configuration
 
 
 
 
73
 
74
+ - **Instance Prompt**: "photo of a rhteddy dog"
75
+ - **Class Prompt**: "a photo of dog"
76
+ - **Learning Rate**: 5e-6
77
+ - **Batch Size**: 1
78
+ - **Gradient Accumulation Steps**: 2
79
+ - **Optimizer**: 8-bit Adam
80
+ - **Scheduler**: Constant
81
+ - **Prior Preservation**: Enabled with 200 class images
82
 
83
+ ### Training Environment
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
 
85
+ - **Platform**: Red Hat OpenShift AI (RHODS)
86
+ - **Framework**: Hugging Face Diffusers
87
+ - **Acceleration**: xFormers, gradient checkpointing
88
+ - **Storage**: S3-compatible object storage
89
 
90
+ ## Model Architecture
91
 
92
+ This model inherits the architecture of Stable Diffusion 3.5 Medium:
93
+ - **Transformer**: SD3Transformer2DModel
94
+ - **VAE**: AutoencoderKL
95
+ - **Text Encoders**:
96
+ - 2x CLIPTextModelWithProjection
97
+ - 1x T5EncoderModel
98
+ - **Scheduler**: FlowMatchEulerDiscreteScheduler
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
 
100
+ ## Limitations and Bias
 
 
 
101
 
102
+ - The model is specifically trained on Red Hat branded imagery and may not generalize well to other contexts
103
+ - Training data was limited to a small dataset, which may result in overfitting
104
+ - The model inherits any biases present in the base Stable Diffusion 3.5 model
105
+ - Performance is optimized for the specific "rhteddy dog" concept and may struggle with significant variations
106
 
107
+ ## Training Data
 
 
 
108
 
109
+ The training data consists of approximately 5-10 high-quality images of the Red Hat dog character, featuring:
110
+ - Various poses and angles
111
+ - Consistent visual style and branding
112
+ - Professional photography quality
113
+ - Clear subject focus
114
 
115
+ ## Ethical Considerations
116
 
117
+ This model is intended for educational and corporate branding purposes. Users should:
118
+ - Respect Red Hat's trademark and branding guidelines
119
+ - Avoid generating misleading or inappropriate content
120
+ - Consider the environmental impact of inference computations
121
+ - Use responsibly in accordance with AI ethics best practices
122
 
123
+ ## Technical Specifications
 
 
 
124
 
125
+ - **Model Size**: ~47GB (full precision weights)
126
+ - **Inference Requirements**:
127
+ - GPU with 8GB+ VRAM recommended
128
+ - CUDA-compatible device
129
+ - Python 3.8+
130
+ - PyTorch 2.0+
131
+ - Diffusers library
132
 
133
+ ## Citation
 
 
 
134
 
135
+ If you use this model in your research or applications, please cite:
136
 
137
+ ```bibtex
138
+ @misc{redhat-dog-sd3,
139
+ title={RedHat Dog SD3: Fine-tuned Stable Diffusion 3.5 for Corporate Branding},
140
+ author={Red Hat AI},
141
+ year={2025},
142
+ howpublished={Hugging Face Model Hub},
143
+ url={https://huggingface.co/cfchase/redhat-dog-sd3}
144
+ }
145
+ ```
146
 
147
+ ## License
148
 
149
+ This model is based on Stable Diffusion 3.5 Medium and is subject to the same licensing terms. Please refer to the [original model license](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) for details.
150
 
151
+ ## Contact
152
 
153
+ For questions about this model or the training process, please refer to the [Red Hat OpenShift AI documentation](https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed) or the associated training notebooks.