Compared to large language models (LLMs), AI for image generation typically requires relatively less VRAM, which is the GPU memory. In this case, we are using consumer-grade machines with RTX 3060 12GB VRAM and RTX 3090 24GB VRAM to look at the typical usage of image generation AI.
The model we will use this time is based on Stable Diffusion 1.5, specifically the Realistic Vision V6.0.
First, we'll download the Safetensor files locally.
The installation involves necessary libraries such as diffusers.
pip install --quiet diffusers transformers accelerate gradio
I set up the code like this. I will use the loaded safetensor for the computations.
from diffusers import StableDiffusionPipeline
import torch
pipe = StableDiffusionPipeline.from_single_file(
"real.safetensors",
load_safety_checker=True,
extract_ema=True,
torch_dtype=torch.float16
).to("cuda")
def txt2img(prompt):
return pipe(prompt, num_inference_steps=20).images[0]
First, starting with the RTX 3060.
I'll actually input a prompt.
import time
start = time.time()
txt2img("closeup face photo of man in black clothes, night city street, bokeh")
print(time.time()-start)
2.92753529548645
When running it on Gradio, I will use the function created above.
import gradio as gr
app = gr.Interface(
txt2img,
"text",
gr.Image(),
)
app.launch(share=True)
It went very smoothly.
Next, I will run the same file on the 3090.
import time
start = time.time()
txt2img("closeup face photo of man in black clothes, night city street, bokeh")
print(time.time()-start)
1.0783894062042236
It became about three times faster. However, in the case of images, I didn't feel as much of a difference in actual experience as the numbers suggested.