stable-diffusion-image-generation
State-of-the-art text-to-image generation with Stable Diffusion models via HuggingFace Diffusers. Use when generating images from text prompts, performing image-to-image translation, inpainting, or bu
State-of-the-art text-to-image generation with Stable Diffusion models via HuggingFace Diffusers. Use when generating images from text prompts, performing image-to-image translation, inpainting, or bu
Real data. Real impact.
Emerging
Developers
Per week
Excellent
Skills give you superpowers. Install in 30 seconds.
Comprehensive guide to generating images with Stable Diffusion using the HuggingFace Diffusers library.
Use Stable Diffusion when:
Key features:
Use alternatives instead:
pip install diffusers transformers accelerate torch pip install xformers # Optional: memory-efficient attention
from diffusers import DiffusionPipeline import torch # Load pipeline (auto-detects model type) pipe = DiffusionPipeline.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16 ) pipe.to("cuda") # Generate image image = pipe( "A serene mountain landscape at sunset, highly detailed", num_inference_steps=50, guidance_scale=7.5 ).images[0] image.save("output.png")
from diffusers import AutoPipelineForText2Image import torch pipe = AutoPipelineForText2Image.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16" ) pipe.to("cuda") # Enable memory optimization pipe.enable_model_cpu_offload() image = pipe( prompt="A futuristic city with flying cars, cinematic lighting", height=1024, width=1024, num_inference_steps=30 ).images[0]
Diffusers is built around three core components:
Pipeline (orchestration) ├── Model (neural networks) │ ├── UNet / Transformer (noise prediction) │ ├── VAE (latent encoding/decoding) │ └── Text Encoder (CLIP/T5) └── Scheduler (denoising algorithm)
Text Prompt → Text Encoder → Text Embeddings ↓ Random Noise → [Denoising Loop] ← Scheduler ↓ Predicted Noise ↓ VAE Decoder → Final Image
Pipelines orchestrate complete workflows:
| Pipeline | Purpose |
|---|---|
| Text-to-image (SD 1.x/2.x) |
| Text-to-image (SDXL) |
| Text-to-image (SD 3.0) |
| Text-to-image (Flux models) |
| Image-to-image |
| Inpainting |
Schedulers control the denoising process:
| Scheduler | Steps | Quality | Use Case |
|---|---|---|---|
| 20-50 | Good | Default choice |
| 20-50 | Good | More variation |
| 15-25 | Excellent | Fast, high quality |
| 50-100 | Good | Deterministic |
| 4-8 | Good | Very fast |
| 15-25 | Excellent | Fast convergence |
from diffusers import DPMSolverMultistepScheduler # Swap for faster generation pipe.scheduler = DPMSolverMultistepScheduler.from_config( pipe.scheduler.config ) # Now generate with fewer steps image = pipe(prompt, num_inference_steps=20).images[0]
| Parameter | Default | Description |
|---|---|---|
| Required | Text description of desired image |
| None | What to avoid in the image |
| 50 | Denoising steps (more = better quality) |
| 7.5 | Prompt adherence (7-12 typical) |
, | 512/1024 | Output dimensions (multiples of 8) |
| None | Torch generator for reproducibility |
| 1 | Batch size |
import torch generator = torch.Generator(device="cuda").manual_seed(42) image = pipe( prompt="A cat wearing a top hat", generator=generator, num_inference_steps=50 ).images[0]
image = pipe( prompt="Professional photo of a dog in a garden", negative_prompt="blurry, low quality, distorted, ugly, bad anatomy", guidance_scale=7.5 ).images[0]
Transform existing images with text guidance:
from diffusers import AutoPipelineForImage2Image from PIL import Image pipe = AutoPipelineForImage2Image.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16 ).to("cuda") init_image = Image.open("input.jpg").resize((512, 512)) image = pipe( prompt="A watercolor painting of the scene", image=init_image, strength=0.75, # How much to transform (0-1) num_inference_steps=50 ).images[0]
Fill masked regions:
from diffusers import AutoPipelineForInpainting from PIL import Image pipe = AutoPipelineForInpainting.from_pretrained( "runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16 ).to("cuda") image = Image.open("photo.jpg") mask = Image.open("mask.png") # White = inpaint region result = pipe( prompt="A red car parked on the street", image=image, mask_image=mask, num_inference_steps=50 ).images[0]
Add spatial conditioning for precise control:
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel import torch # Load ControlNet for edge conditioning controlnet = ControlNetModel.from_pretrained( "lllyasviel/control_v11p_sd15_canny", torch_dtype=torch.float16 ) pipe = StableDiffusionControlNetPipeline.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16 ).to("cuda") # Use Canny edge image as control control_image = get_canny_image(input_image) image = pipe( prompt="A beautiful house in the style of Van Gogh", image=control_image, num_inference_steps=30 ).images[0]
| ControlNet | Input Type | Use Case |
|---|---|---|
| Edge maps | Preserve structure |
| Pose skeletons | Human poses |
| Depth maps | 3D-aware generation |
| Normal maps | Surface details |
| Line segments | Architectural lines |
| Rough sketches | Sketch-to-image |
Load fine-tuned style adapters:
from diffusers import DiffusionPipeline pipe = DiffusionPipeline.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16 ).to("cuda") # Load LoRA weights pipe.load_lora_weights("path/to/lora", weight_name="style.safetensors") # Generate with LoRA style image = pipe("A portrait in the trained style").images[0] # Adjust LoRA strength pipe.fuse_lora(lora_scale=0.8) # Unload LoRA pipe.unload_lora_weights()
# Load multiple LoRAs pipe.load_lora_weights("lora1", adapter_name="style") pipe.load_lora_weights("lora2", adapter_name="character") # Set weights for each pipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.5]) image = pipe("A portrait").images[0]
# Model CPU offload - moves models to CPU when not in use pipe.enable_model_cpu_offload() # Sequential CPU offload - more aggressive, slower pipe.enable_sequential_cpu_offload()
# Reduce memory by computing attention in chunks pipe.enable_attention_slicing() # Or specific chunk size pipe.enable_attention_slicing("max")
# Requires xformers package pipe.enable_xformers_memory_efficient_attention()
# Decode latents in tiles for large images pipe.enable_vae_slicing() pipe.enable_vae_tiling()
# FP16 (recommended for GPU) pipe = DiffusionPipeline.from_pretrained( "model-id", torch_dtype=torch.float16, variant="fp16" ) # BF16 (better precision, requires Ampere+ GPU) pipe = DiffusionPipeline.from_pretrained( "model-id", torch_dtype=torch.bfloat16 )
from diffusers import UNet2DConditionModel, AutoencoderKL # Load custom VAE vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse") # Use with pipeline pipe = DiffusionPipeline.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", vae=vae, torch_dtype=torch.float16 )
Generate multiple images efficiently:
# Multiple prompts prompts = [ "A cat playing piano", "A dog reading a book", "A bird painting a picture" ] images = pipe(prompts, num_inference_steps=30).images # Multiple images per prompt images = pipe( "A beautiful sunset", num_images_per_prompt=4, num_inference_steps=30 ).images
from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler import torch # 1. Load SDXL with optimizations pipe = StableDiffusionXLPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16" ) pipe.to("cuda") pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) pipe.enable_model_cpu_offload() # 2. Generate with quality settings image = pipe( prompt="A majestic lion in the savanna, golden hour lighting, 8k, detailed fur", negative_prompt="blurry, low quality, cartoon, anime, sketch", num_inference_steps=30, guidance_scale=7.5, height=1024, width=1024 ).images[0]
from diffusers import AutoPipelineForText2Image, LCMScheduler import torch # Use LCM for 4-8 step generation pipe = AutoPipelineForText2Image.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 ).to("cuda") # Load LCM LoRA for fast generation pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl") pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) pipe.fuse_lora() # Generate in ~1 second image = pipe( "A beautiful landscape", num_inference_steps=4, guidance_scale=1.0 ).images[0]
CUDA out of memory:
# Enable memory optimizations pipe.enable_model_cpu_offload() pipe.enable_attention_slicing() pipe.enable_vae_slicing() # Or use lower precision pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
Black/noise images:
# Check VAE configuration # Use safety checker bypass if needed pipe.safety_checker = None # Ensure proper dtype consistency pipe = pipe.to(dtype=torch.float16)
Slow generation:
# Use faster scheduler from diffusers import DPMSolverMultistepScheduler pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) # Reduce steps image = pipe(prompt, num_inference_steps=20).images[0]
MIT
mkdir -p ~/.hermes/skills/mlops/stable-diffusion && curl -o ~/.hermes/skills/mlops/stable-diffusion/SKILL.md https://raw.githubusercontent.com/NousResearch/hermes-agent/main/optional-skills/mlops/stable-diffusion/SKILL.md1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.