The model known as LCM extends previous diffusion models by learning to solve ordinary differential equations, successfully reducing the number of diffusion steps. This leads to significantly faster image generation capabilities.
https://github.com/luosiallen/latent-consistency-model
It's also possible through Automatic1111, but I encountered an error, so I'll try directly from the console.
I'll prepare both CPU and GPU, and compute using float32.
*Due to insufficient verification of whether I'm actually using the CPU or GPU, I will unify this trial under a CPU+GPU H100 environment for now.
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7")
To save GPU memory, torch.float16 can be used, but it may compromise image quality.
pipe.to(torch_device="cuda", torch_dtype=torch.float32)
I used a standard prompt. First, I'll try with 4 inference steps.
prompt = "portrait photo of a girl, photograph, highly detailed face, depth of field, moody light, golden hour, style by Dan Winters, Russell James, Steve McCurry, centered, extremely detailed, Nikon D850, award winning photography"
Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps.
num_inference_steps = 4
images = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=8.0, lcm_origin_steps=50, output_type="pil").images
images[0].save("output.png")
[00:06<00:00, 1.52s/it]
fast with 768x768
Let's change denoising steps
1step
[00:01<00:00, 1.26s/it]
2steps
[00:02<00:00, 1.37s/it]
4 steps
[00:05<00:00, 1.27s/it]
8 steps
[00:10<00:00, 1.28s/it]
Next we try 1024x1024 images.
4steps / 1024x1024
[00:12<00:00, 3.17s/it]
And finally we try increasing the number of images output at the same time.
4steps / 768x768 / num_images 4
[00:21<00:00, 5.45s/it]
LCM is enough fast. 4steps of denoising step looks good.