common.title

Docs
Quantum Circuit
TYTAN CLOUD

QUANTUM GAMING


autoQAOA
Desktop RAG

Overview
Terms of service

Privacy policy

Contact
Research

Sign in
Sign up
common.title

Benchmarking Latent Consistency Models with Stable Diffusion on CPUs and H100.

Yuichiro Minato

2024/03/20 03:00

#Featured

The model known as LCM extends previous diffusion models by learning to solve ordinary differential equations, successfully reducing the number of diffusion steps. This leads to significantly faster image generation capabilities.

https://github.com/luosiallen/latent-consistency-model

It's also possible through Automatic1111, but I encountered an error, so I'll try directly from the console.

I'll prepare both CPU and GPU, and compute using float32.

*Due to insufficient verification of whether I'm actually using the CPU or GPU, I will unify this trial under a CPU+GPU H100 environment for now.

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7")

To save GPU memory, torch.float16 can be used, but it may compromise image quality.

pipe.to(torch_device="cuda", torch_dtype=torch.float32)

I used a standard prompt. First, I'll try with 4 inference steps.

prompt = "portrait photo of a girl, photograph, highly detailed face, depth of field, moody light, golden hour, style by Dan Winters, Russell James, Steve McCurry, centered, extremely detailed, Nikon D850, award winning photography"

Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps.

num_inference_steps = 4

images = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=8.0, lcm_origin_steps=50, output_type="pil").images

images[0].save("output.png")

[00:06<00:00, 1.52s/it]

fast with 768x768

Let's change denoising steps

1step

[00:01<00:00, 1.26s/it]

2steps

[00:02<00:00, 1.37s/it]

4 steps

[00:05<00:00, 1.27s/it]

8 steps

[00:10<00:00, 1.28s/it]

Next we try 1024x1024 images.

4steps / 1024x1024

[00:12<00:00, 3.17s/it]

And finally we try increasing the number of images output at the same time.

4steps / 768x768 / num_images 4

[00:21<00:00, 5.45s/it]

LCM is enough fast. 4steps of denoising step looks good.

© 2025, blueqat Inc. All rights reserved