LCMは常微分方程式を解くようにNNを学習させるようにしてデノイズすることで、拡散モデルのデノイズステップを大幅に減らそうという試みです。

そんなLCMですが、これまでのLoRAという低ランクアダプターの使い方を拡張するように、さらにデノイズステップを減らすような高速化が可能なモデルが出ています。

LCMは十分に高速なので高価なGPUが不要です。

ここでは、Google Colab T4 GPUでのLCMとLoRAを実行してみました。

https://github.com/luosiallen/latent-consistency-model

!pip install -q diffusers accelerate peft

今回はfp16でも32でもいいですが、早いので16にしてみます。

weightはLCMのものをとってきます。

from diffusers import StableDiffusionPipeline, LCMScheduler
import torch

pipe = StableDiffusionPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", torch_dtype=torch.float32).to("cuda")

プロンプトは例題にあったものを利用します。

prompt = "portrait photo of a girl, photograph, highly detailed face, depth of field, moody light, golden hour, style by Dan Winters, Russell James, Steve McCurry, centered, extremely detailed, Nikon D850, award winning photography"

まずは、LoRAを利用しないもの。step2,4,8と見てみます。

%%time

image = pipe(prompt=prompt, num_inference_steps=2, guidance_scale=8.0, lcm_origin_steps=50).images[0]

画像の確認

image.save("image.png")
image

LCM without LoRA / steps 2 / 768x768 / 1.48s

LCM without LoRA / steps 4 / 768x768 / 1.94s

LCM without LoRA / steps 8 / 768x768 / 2.94s

次にこちらのLCM-LoRAをつけます。

通常LoRAはスタイル変更をしますが、こちらは推論ステップを減らしてくれます。

https://huggingface.co/latent-consistency/lcm-lora-sdv1-5

ガイダンススケールは0にするか、1から2を採用してくれと書いてありますので、0にします。

pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

LCM with LCM-LoRA / steps 2 / 768x768 / 1.41s

少しスタイルが変更になりました。step数2でもある程度ぼやけずに絵が出ました。

LCM with LCM-LoRA / steps 4 / 768x768 / 2.02s

LCM with LCM-LoRA / steps 8 / 768x768 / 3.13s

step2でもきちんと絵が出るのがいいですね。

LCM / Latent Consistency ModelsとLCM-LoRA ベンチマーク

Yuichiro Minato

画像の確認