Stable Diffusion + LCMで潜在一貫性モデルベンチマーク。CPUやH100など。

y

Yuichiro Minato

2024/03/18 02:15

LCMと呼ばれるモデルはこれまでの拡散モデルを拡張し、常微分方程式解法を学習させることで拡散ステップ数を減らすことに成功しています。

かなり高速に画像生成ができるということなので、やってみます。

https://github.com/luosiallen/latent-consistency-model

Automatic1111経由でもできますが、エラーが出たのでダイレクトにコンソールからやってみます。

CPUとGPUを準備し、float32で計算してみます。

*結局CPUを使っているのかGPUを使っているのか、検証が不十分なので、一旦CPU+GPU H100での環境での試行として統一します。

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7")

To save GPU memory, torch.float16 can be used, but it may compromise image quality.

pipe.to(torch_device="cuda", torch_dtype=torch.float32)

プロンプトは標準的なものを使いました。まずは推論ステップ4でやってみます。

prompt = "portrait photo of a girl, photograph, highly detailed face, depth of field, moody light, golden hour, style by Dan Winters, Russell James, Steve McCurry, centered, extremely detailed, Nikon D850, award winning photography"

num_inference_steps = 4

images = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=8.0, lcm_origin_steps=50, output_type="pil").images

images[0].save("output.png")

[00:06<00:00, 1.52s/it]

768x768は十分高速です。

推論ステップを変更してみます。

1step

[00:01<00:00, 1.26s/it]

ボケますね。

2steps

[00:02<00:00, 1.37s/it]

2秒でもいい絵が出ます。

4 steps

[00:05<00:00, 1.27s/it]

8 steps

[00:10<00:00, 1.28s/it]

いい感じです。

次に画像サイズを768x768の標準から、1024x1024にしてみます。

4steps / 1024x1024

[00:12<00:00, 3.17s/it]

最後に同時作成枚数を調整してみます。これまで1枚で作成してましたが、

4枚同時に作成してみます。step数は4に固定、画像サイズは768x768固定にします。

4steps / 768x768 / num_images 4

[00:21<00:00, 5.45s/it]

結果は多少縮小していますが、768x768の画像が4枚出ました。

十分に速いので、ステップ4くらいで色々試すのが良さそうです。

以上です。

Stable Diffusion + LCMで潜在一貫性モデルベンチマーク。CPUやH100など。

Yuichiro Minato

To save GPU memory, torch.float16 can be used, but it may compromise image quality.

Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps.