NVIDIA CUDA-Q Tutorial & H100 Benchmark Part 2: Quantum Circuit Generation with Diffusion Models in Generative AI

Compiling Unitary Matrices into Quantum Circuits Using Diffusion Models

A new approach to quantum circuit synthesis with CUDA-Q and genQC

Recently, diffusion models, a popular machine learning technique, have begun to be used for one of the most important tasks in quantum computing: unitary matrix compilation into quantum circuits.

In this tutorial, we’ll walk through a method to convert an arbitrary 3-qubit unitary matrix into a quantum circuit, based on the 2024 paper:
"Quantum circuit synthesis with diffusion models" by Fürrutter et al.

We’ll be using NVIDIA’s CUDA-Q quantum SDK and the circuit generation tool genQC, which uses a diffusion model.

Why Diffusion Models?

Diffusion models have shown excellent performance in generating complex data structures such as images and protein folding structures.
This research explores how they can also be applied to quantum circuit generation.

How Does the Circuit Generation Work? (High-Level Overview)

The overall pipeline works as follows:

1. Circuit Encoding

Quantum circuits are sequences of gates, but for use in machine learning,
each gate is converted into a continuous-valued vector, forming a 3D tensor.
This makes it possible to model discrete gate sequences using neural networks.

The generated tensors can then be decoded back into gate sequences (quantum circuits).

2. Conditioning

The model is conditioned using two pieces of information:

Gate set to use (e.g., “build a circuit using ['x', 'h']”) → This is encoded using a language model.
Target unitary matrix → Encoded using another neural network.

3. Circuit Generation (Unitary Compilation)

Now the diffusion model takes over.
It gradually removes noise from the tensor and generates a quantum circuit that matches the target unitary.

Using the provided gate constraints and unitary matrix as guidance,
the model outputs a runnable quantum circuit.

diffusion model circuit generation

Image source: https://arxiv.org/pdf/2311.02041

This entire process is already packaged and can be tried using CUDA-Q and genQC.

Let’s Dive Into the Code

We haven’t even read the full paper yet — let’s just follow the CUDA-Q tutorial and give it a try.
Our goal: Use quantum gates to build a target unitary matrix.

Install the necessary tools:

pip install genQC==0.1.0 cudaq matplotlib

Set up libraries and fix the random seed:

import genQC
from genQC.imports import *
from genQC.pipeline.diffusion_pipeline import DiffusionPipeline
from genQC.inference.export_cudaq import genqc_to_cudaq
import genQC.inference.infer_compilation as infer_comp
import genQC.util as util

import numpy as np
import torch

torch.manual_seed(0)
np.random.seed(0)

device = util.infer_torch_device()
util.MemoryCleaner.purge_mem()
print(device)

The model weights are available on Hugging Face:

pipeline = DiffusionPipeline.from_pretrained(
    "Floki00/qc_unitary_3qubit", device)
pipeline.scheduler.set_timesteps(40)

Set the number of qubits and gates:

vocab = {
    i + 1: gate for i, gate in enumerate(pipeline.gate_pool)
}
num_of_qubits = 3
max_gates = 12

Define the target unitary matrix and check it is valid:

U = np.matrix([...], dtype=np.complex128)

assert np.allclose(U.H @ U, np.identity(2**num_of_qubits)) and np.allclose(
    U @ U.H, np.identity(2**num_of_qubits))

Define the prompt — for example, excluding the X gate:

prompt = "Compile using: ['h', 'cx', 'z', 'ccx', 'swap']"

Process the unitary matrix for inference:

samples = 128
U_r, U_i = torch.Tensor(np.real(U)), torch.Tensor(np.imag(U))
U_tensor = torch.stack([U_r, U_i], dim=0)

out_tensors = infer_comp.generate_comp_tensors(
    pipeline=pipeline,
    prompt=prompt,
    U=U_tensor,
    samples=samples,
    system_size=num_of_qubits,
    num_of_qubits=num_of_qubits,
    max_gates=max_gates,
    g=10
)

You’ll get tensors like:

tensor([[ 5,  3,  0,  0, -5,  3,  3,  0, -5,  0,  0,  0],
        [-5,  0,  0,  6,  5,  0,  0,  3,  5,  0,  0,  0],
        [-5,  0,  1,  6, -5,  0,  0,  0, -5,  0,  0,  0]])

Convert the output to CUDA-Q kernels:

import cudaq
cudaq.set_target('qpp-cpu')  # Use CPU for speed (or use 'nvidia' for GPU)

kernel_list = []
valid_tensors = []
invalid_tensors = 0

for out_tensors_i in tqdm(out_tensors):
    try:
        kernel = genqc_to_cudaq(out_tensors_i, vocab)
    except:
        kernel = None

    if kernel:
        kernel_list.append(kernel)
        valid_tensors.append(out_tensors_i)
    else:
        invalid_tensors += 1

print(f"The model generated {invalid_tensors} invalid tensors.")

Check and draw the quantum circuit:

input_state = [0] * (2**num_of_qubits)
print(cudaq.draw(kernel_list[0], input_state))

generated circuit

Check the Fidelity

Let’s verify how close the generated circuit gets to the target unitary:

N = 2**num_of_qubits
got_unitaries = np.zeros((len(kernel_list), N, N), dtype=np.complex128)

for i, kernel in tqdm(enumerate(kernel_list), total=got_unitaries.shape[0]):
    for j in range(N):
        basis_state_j = np.zeros((N), dtype=np.complex128)
        basis_state_j[j] = 1
        got_unitaries[i, :, j] = np.array(cudaq.get_state(kernel, basis_state_j), copy=False)

Check the result:

np.set_printoptions(linewidth=1000)
print(np.round(got_unitaries[0], 4))

Output:

[[ 0.7071+0.j  0.    +0.j  ...  0.7071+0.j]
 ...
 [ 0.    +0.j  0.    +0.j  ...  0.7071+0.j]]

For more benchmarks and fidelity tests:

In our run, nearly 30 out of 128 generated circuits achieved high fidelity — matching NVIDIA’s reported results.

fidelity results

Multiple circuits can be proposed with this method, and gate sets can be customized to match specific quantum hardware.

circuit comparison

The Future is Here

The combination of machine learning and quantum computing is rapidly growing.
Let’s keep building and experimenting!

NVIDIA CUDA-Q Tutorial & H100 Benchmark Part 2: Quantum Circuit Generation with Diffusion Models in Generative AI

Yuichiro Minato

NVIDIA CUDA-Q Tutorial & H100 Benchmark Part 2: Quantum Circuit Generation with Diffusion Models in Generative AI

Compiling Unitary Matrices into Quantum Circuits Using Diffusion Models

Why Diffusion Models?

How Does the Circuit Generation Work? (High-Level Overview)

1. Circuit Encoding

2. Conditioning

3. Circuit Generation (Unitary Compilation)

Let’s Dive Into the Code

Check the Fidelity

The Future is Here