NVIDIA CUDA-Q Tutorial & H100 Benchmark Part 2: Quantum Circuit Generation with Diffusion Models in Generative AI
Compiling Unitary Matrices into Quantum Circuits Using Diffusion Models
A new approach to quantum circuit synthesis with CUDA-Q and genQC
Recently, diffusion models, a popular machine learning technique, have begun to be used for one of the most important tasks in quantum computing: unitary matrix compilation into quantum circuits.
In this tutorial, we’ll walk through a method to convert an arbitrary 3-qubit unitary matrix into a quantum circuit, based on the 2024 paper:
"Quantum circuit synthesis with diffusion models" by Fürrutter et al.
We’ll be using NVIDIA’s CUDA-Q quantum SDK and the circuit generation tool genQC, which uses a diffusion model.
Why Diffusion Models?
Diffusion models have shown excellent performance in generating complex data structures such as images and protein folding structures.
This research explores how they can also be applied to quantum circuit generation.
How Does the Circuit Generation Work? (High-Level Overview)
The overall pipeline works as follows:
1. Circuit Encoding
Quantum circuits are sequences of gates, but for use in machine learning,
each gate is converted into a continuous-valued vector, forming a 3D tensor.
This makes it possible to model discrete gate sequences using neural networks.
The generated tensors can then be decoded back into gate sequences (quantum circuits).
2. Conditioning
The model is conditioned using two pieces of information:
- Gate set to use (e.g., “build a circuit using ['x', 'h']”) → This is encoded using a language model.
- Target unitary matrix → Encoded using another neural network.
3. Circuit Generation (Unitary Compilation)
Now the diffusion model takes over.
It gradually removes noise from the tensor and generates a quantum circuit that matches the target unitary.
Using the provided gate constraints and unitary matrix as guidance,
the model outputs a runnable quantum circuit.
Image source: https://arxiv.org/pdf/2311.02041
This entire process is already packaged and can be tried using CUDA-Q and genQC.
Let’s Dive Into the Code
We haven’t even read the full paper yet — let’s just follow the CUDA-Q tutorial and give it a try.
Our goal: Use quantum gates to build a target unitary matrix.
Install the necessary tools:
pip install genQC==0.1.0 cudaq matplotlib
Set up libraries and fix the random seed:
import genQC
from genQC.imports import *
from genQC.pipeline.diffusion_pipeline import DiffusionPipeline
from genQC.inference.export_cudaq import genqc_to_cudaq
import genQC.inference.infer_compilation as infer_comp
import genQC.util as util
import numpy as np
import torch
torch.manual_seed(0)
np.random.seed(0)
device = util.infer_torch_device()
util.MemoryCleaner.purge_mem()
print(device)
The model weights are available on Hugging Face:
pipeline = DiffusionPipeline.from_pretrained(
"Floki00/qc_unitary_3qubit", device)
pipeline.scheduler.set_timesteps(40)
Set the number of qubits and gates:
vocab = {
i + 1: gate for i, gate in enumerate(pipeline.gate_pool)
}
num_of_qubits = 3
max_gates = 12
Define the target unitary matrix and check it is valid:
U = np.matrix([...], dtype=np.complex128)
assert np.allclose(U.H @ U, np.identity(2**num_of_qubits)) and np.allclose(
U @ U.H, np.identity(2**num_of_qubits))
Define the prompt — for example, excluding the X gate:
prompt = "Compile using: ['h', 'cx', 'z', 'ccx', 'swap']"
Process the unitary matrix for inference:
samples = 128
U_r, U_i = torch.Tensor(np.real(U)), torch.Tensor(np.imag(U))
U_tensor = torch.stack([U_r, U_i], dim=0)
out_tensors = infer_comp.generate_comp_tensors(
pipeline=pipeline,
prompt=prompt,
U=U_tensor,
samples=samples,
system_size=num_of_qubits,
num_of_qubits=num_of_qubits,
max_gates=max_gates,
g=10
)
You’ll get tensors like:
tensor([[ 5, 3, 0, 0, -5, 3, 3, 0, -5, 0, 0, 0],
[-5, 0, 0, 6, 5, 0, 0, 3, 5, 0, 0, 0],
[-5, 0, 1, 6, -5, 0, 0, 0, -5, 0, 0, 0]])
Convert the output to CUDA-Q kernels:
import cudaq
cudaq.set_target('qpp-cpu') # Use CPU for speed (or use 'nvidia' for GPU)
kernel_list = []
valid_tensors = []
invalid_tensors = 0
for out_tensors_i in tqdm(out_tensors):
try:
kernel = genqc_to_cudaq(out_tensors_i, vocab)
except:
kernel = None
if kernel:
kernel_list.append(kernel)
valid_tensors.append(out_tensors_i)
else:
invalid_tensors += 1
print(f"The model generated {invalid_tensors} invalid tensors.")
Check and draw the quantum circuit:
input_state = [0] * (2**num_of_qubits)
print(cudaq.draw(kernel_list[0], input_state))
Check the Fidelity
Let’s verify how close the generated circuit gets to the target unitary:
N = 2**num_of_qubits
got_unitaries = np.zeros((len(kernel_list), N, N), dtype=np.complex128)
for i, kernel in tqdm(enumerate(kernel_list), total=got_unitaries.shape[0]):
for j in range(N):
basis_state_j = np.zeros((N), dtype=np.complex128)
basis_state_j[j] = 1
got_unitaries[i, :, j] = np.array(cudaq.get_state(kernel, basis_state_j), copy=False)
Check the result:
np.set_printoptions(linewidth=1000)
print(np.round(got_unitaries[0], 4))
Output:
[[ 0.7071+0.j 0. +0.j ... 0.7071+0.j]
...
[ 0. +0.j 0. +0.j ... 0.7071+0.j]]
For more benchmarks and fidelity tests:
In our run, nearly 30 out of 128 generated circuits achieved high fidelity — matching NVIDIA’s reported results.
Multiple circuits can be proposed with this method, and gate sets can be customized to match specific quantum hardware.
The Future is Here
The combination of machine learning and quantum computing is rapidly growing.
Let’s keep building and experimenting!