Using LCM allows for a reduction in inference steps. Furthermore, employing LCM-LoRA can reduce the inference steps even more.

This time, we will combine LCM-LoRA with the conventional Style-LoRA used in Diffusion Models (DM).

LCM-LoRA-SDv1.5（Speedup LCM-LoRA）

https://huggingface.co/latent-consistency/lcm-lora-sdv1-5

PixelArtRedmond 1.5V- Pixel Art Loras for SD 1.5!（pixelize Style-LoRA）

https://huggingface.co/artificialguybr/pixelartredmond-1-5v-pixel-art-loras-for-sd-1-5

!pip install -q diffusers accelerate peft

The base model utilizes the distilled model of Dreamshaper v7 with LCM. FP32 is adopted.

import torch
from diffusers import LCMScheduler, StableDiffusionPipeline

model_id = "SimianLuo/LCM_Dreamshaper_v7"

pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.to("cuda")

Next, we apply LoRA. This time, both were available on Hugging Face, so we proceed directly.

# load and fuse lcm lora
pipe.load_lora_weights("artificialguybr/pixelartredmond-1-5v-pixel-art-loras-for-sd-1-5")
pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5")
pipe.fuse_lora()

prompt = "portrait photo of a girl, photograph, highly detailed face, depth of field, moody light, golden hour, style by Dan Winters, Russell James, Steve McCurry, centered, extremely detailed, Nikon D850, award winning photography"

We'll try running it with 2 inference steps.

%%time

disable guidance_scale by passing 0

image = pipe(prompt=prompt, num_inference_steps=2, guidance_scale=0).images[0]
image

It's incredibly fast on the RTX6000ada. It finishes in 0.5 seconds.

CPU times: user 752 ms, sys: 102 ms, total: 853 ms
Wall time: 497 ms

For comparison, I tried inference with LCM-LoRA turned off.

The style of the image changed significantly. This confirmed that multiple LoRAs were effectively applied. That's all.

Combining LCM-LoRA + Style-LoRA for speed improvement & style modification. RTX6000ada benchmark.

Yuichiro Minato

disable guidance_scale by passing 0