Compared to large language models (LLMs), AI for image generation typically requires relatively less VRAM, which is the GPU memory. In this case, we are using consumer-grade machines with RTX 3060 12GB VRAM and RTX 3090 24GB VRAM to look at the typical usage of image generation AI. The model we will use this time is based on Stable Diffusion 1.5, specifically the Realistic Vision V6.0. First, we&#39;ll download the Safetensor files locally. The installation involves necessary libraries such as diffusers. <pre class="ql-syntax" spellcheck="false">pip install --quiet diffusers transformers accelerate gradio
</pre> I set up the code like this. I will use the loaded safetensor for the computations. <pre class="ql-syntax" spellcheck="false">from diffusers import StableDiffusionPipeline
import torch

pipe = StableDiffusionPipeline.from_single_file(
  &#34;real.safetensors&#34;,
  load_safety_checker=True,
  extract_ema=True,
  torch_dtype=torch.float16 
  ).to(&#34;cuda&#34;)

def txt2img(prompt):
  return pipe(prompt, num_inference_steps=20).images[0]
</pre> First, starting with the RTX 3060.I&#39;ll actually input a prompt. <pre class="ql-syntax" spellcheck="false">import time

start = time.time()
txt2img(&#34;closeup face photo of man in black clothes, night city street, bokeh&#34;)
print(time.time()-start)
</pre> <pre class="ql-syntax" spellcheck="false">2.92753529548645
</pre> When running it on Gradio, I will use the function created above. <pre class="ql-syntax" spellcheck="false">import gradio as gr

app = gr.Interface(
  txt2img,
  &#34;text&#34;,
  gr.Image(),
)

app.launch(share=True)
</pre> It went very smoothly. <img src="https://assets.blueqat.com/public/uploads/us-east-2:4805ff4b-c3cc-4344-b165-86544c34d0bf/2024/04/19/Gradio1.png"/> Next, I will run the same file on the 3090. <pre class="ql-syntax" spellcheck="false">import time

start = time.time()
txt2img(&#34;closeup face photo of man in black clothes, night city street, bokeh&#34;)
print(time.time()-start)
</pre> <pre class="ql-syntax" spellcheck="false">1.0783894062042236
</pre> It became about three times faster. However, in the case of images, I didn&#39;t feel as much of a difference in actual experience as the numbers suggested.

Run benchmarks for Stable Diffusion on Gradio interface with RTX 3060 and RTX 3090

Yuichiro Minato