拡散モデルでユニタリ行列を量子回路にコンパイルしてみよう

〜CUDA-QとgenQCを使った量子回路合成の新しいアプローチ〜

最近、量子コンピュータで重要な「ユニタリ行列の回路化（ユニタリコンパイル）」に**拡散モデル（Diffusion Models）**という機械学習技術が使われるようになってきました。

今回は、2024年に発表された論文
**「Quantum circuit synthesis with diffusion models（Fürrutterら）」**に基づいて、

任意の3量子ビットのユニタリ行列を量子回路に変換する方法をご紹介します。

使うのはNVIDIAの量子SDK CUDA-Q と、拡散モデルを用いた回路生成ツール genQC です。

なぜ拡散モデル？

拡散モデルは、画像生成やタンパク質構造予測など、複雑な構造を持つデータの生成で優れた性能を見せてきました。
それを量子コンピュータの量子回路生成にも応用しようというのがこの研究のポイントです。

回路生成の仕組み（ざっくり）

全体の流れは以下のようになっています：

1. 回路エンコード（Circuit Encoding）

量子回路は「ゲートの並び」ですが、機械学習モデルで扱いやすいように、
各ゲートを「連続値のベクトル」に変換して3次元テンソルにします。
これにより、離散的なゲート列をモデルで扱えるようになります。
生成されたテンソルは、逆変換によって元のゲート列（量子回路）に戻せます。

2. 条件付け（Conditioning）

モデルに与える入力には2つあります：

使いたいゲートのセット（例："['x', 'h']で回路を作って"）→ これは言語モデルを使って連続ベクトルに変換。
ターゲットのユニタリ行列 → これは別のニューラルネットでエンコード。

3. 回路生成（Unitary Compilation）

あとは拡散モデルの出番です。
ノイズがかかったテンソルから少しずつノイズを除去しながら、目的のユニタリ行列に対応する量子回路を生成します。

このとき、与えられたゲート制約やユニタリ情報を元に、
最終的に「実行可能な量子回路」を出力します。

引用：https://arxiv.org/pdf/2311.02041

このプロセスはすでにツール化されていて、CUDA-QとgenQCを使って試すことができます。

実際にコードを見ていきます。

実はまだ論文を読んでないのですが、CUDA-Qのチュートリアルのままやってみます。
今回の目的は、

「ユニタリ行列を量子ゲートを使って作る」です。

インストールは、

pip install genQC==0.1.0 cudaq matplotlib

まずはライブラリと乱数シードの固定の準備。

import genQC
from genQC.imports import *
from genQC.pipeline.diffusion_pipeline import DiffusionPipeline
from genQC.inference.export_cudaq import genqc_to_cudaq
import genQC.inference.infer_compilation as infer_comp
import genQC.util as util

import numpy as np
import torch

# Fixed seed for reproducibility
torch.manual_seed(0)
np.random.seed(0)

device = util.infer_torch_device()  # Use CUDA if we can
util.MemoryCleaner.purge_mem()  # Clean existing memory allocation
print(device)

今回の拡散モデルはHugging Faceに重みがあるらしく、そこから重みをダウンロード。

こちらのページですね。

pipeline = DiffusionPipeline.from_pretrained(
    "Floki00/qc_unitary_3qubit", device)  # Download from Hugging Face
pipeline.scheduler.set_timesteps(40)

使えるゲートと、今回利用する量子ビット数、利用する量子ゲートの数を指定します。
この場合、最大12ゲート使って所望のユニタリ行列を近似します。

vocab = {
    i + 1: gate for i, gate in enumerate(pipeline.gate_pool)
}  # Gateset used during training, used for decoding
num_of_qubits = 3  # Number of qubits
max_gates = 12  # Maximum number of gates

次に、作りたいユニタリ行列は自分で指定します。そしてユニタリなっているかどうかを確認しています。

U = np.matrix([[0.70710678, 0., 0., 0., 0.70710678, 0., 0., 0.],
               [0., -0.70710678, 0., 0., 0., -0.70710678, 0., 0.],
               [-0.70710678, 0., 0., 0., 0.70710678, 0., 0., 0.],
               [0., 0.70710678, 0., 0., 0., -0.70710678, 0., 0.],
               [0., 0., 0.70710678, 0., 0., 0., 0., 0.70710678],
               [0., 0., 0., 0.70710678, 0., 0., 0.70710678, 0.],
               [0., 0., -0.70710678, 0., 0., 0., 0., 0.70710678],
               [0., 0., 0., -0.70710678, 0., 0., 0.70710678, 0.]],
              dtype=np.complex128)

assert np.allclose(U.H @ U, np.identity(2**num_of_qubits)) and np.allclose(
    U @ U.H, np.identity(2**num_of_qubits))  #check if unitary

次に、利用するゲートの種類を決めます。どうやら今回はXゲートなしでユニタリ行列を作るということを目指すため、わざとXを外してます。

# Notice how the x gate missing from the prompt since this is a restriction we set
prompt = "Compile using: ['h', 'cx', 'z', 'ccx', 'swap']"

今回のニューラルネットワークが実数のみの処理なので、実数と虚数を分けて計算してます。

# Number of circuits to sample from the trained DM.
samples = 128

# As the neural network works only with real numbers, we first separate
# the two components and create a 2 dimensional tensor for the magnitude
# of each component:
U_r, U_i = torch.Tensor(np.real(U)), torch.Tensor(np.imag(U))
U_tensor = torch.stack([U_r, U_i], dim=0)

# Now we generate a tensor representation of the desired quantum circuit using the DM based on the prompt and U. This is also known as inference.
out_tensors = infer_comp.generate_comp_tensors(
    pipeline=pipeline,
    prompt=prompt,
    U=U_tensor,
    samples=samples,
    system_size=
    num_of_qubits,  # Max qubit number allowed by the model (this model is only trained with 3 qubits)
    num_of_qubits=num_of_qubits,
    max_gates=max_gates,
    g=10  # classifier-free-guidance (CFG) scale
)

out_tensors[0]

これを計算するとテンソルが得られました。

tensor([[ 5,  3,  0,  0, -5,  3,  3,  0, -5,  0,  0,  0],
        [-5,  0,  0,  6,  5,  0,  0,  3,  5,  0,  0,  0],
        [-5,  0,  1,  6, -5,  0,  0,  0, -5,  0,  0,  0]])

シード固定されているのでチュートリアルと全く同じテンソルが出ました。こちらは量子ゲートが量子回路の形になっているものなので、これをCUDA-Qで計算できる形に直します。

import cudaq
cudaq.set_target('qpp-cpu')  # Note that cpu is faster for 3 qubit kernels
# cudaq.set_target('nvidia') # Set to GPU for larger circuits

kernel_list = []
valid_tensors = []

invalid_tensors = 0
for out_tensors_i in tqdm(out_tensors):

    # Use a try-except to catch invalid tensors (if any)
    try:
        kernel = genqc_to_cudaq(out_tensors_i,
                                vocab)  # Convert out_tensors to CUDA-Q kernels
    except:
        kernel = None

    if kernel:
        kernel_list.append(kernel)
        valid_tensors.append(out_tensors_i)
    else:
        invalid_tensors += 1

print(
    f"The model generated {invalid_tensors} invalid tensors that does not correspond to circuits."
)

回路を確認してみます。

# Arbitrary input state to the circuit for plotting

input_state = [0] * (2**num_of_qubits)

print(cudaq.draw(kernel_list[0], input_state))

作成されているかどうか確認する

実際に拡散モデルで出てきた出力が本当にもともと作ってもらいたかった行列に近いかどうかを確認します。

N = 2**num_of_qubits

got_unitaries = np.zeros((len(kernel_list), N, N), dtype=np.complex128)

for i, kernel in tqdm(enumerate(kernel_list), total=got_unitaries.shape[0]):
    for j in range(N):
        basis_state_j = np.zeros((N), dtype=np.complex128)
        basis_state_j[j] = 1

        got_unitaries[i, :,
                      j] = np.array(cudaq.get_state(kernel, basis_state_j),
                                    copy=False)

先ほどの回路は、

np.set_printoptions(linewidth=1000)
print(np.round(got_unitaries[0], 4))

いい感じにできました。

[[ 0.7071+0.j  0.    +0.j  0.    +0.j  0.    +0.j  0.7071+0.j  0.    +0.j  0.    +0.j  0.    +0.j]
 [ 0.    +0.j -0.7071+0.j  0.    +0.j  0.    +0.j  0.    +0.j -0.7071+0.j  0.    +0.j  0.    +0.j]
 [-0.7071+0.j  0.    +0.j  0.    +0.j  0.    +0.j  0.7071+0.j  0.    +0.j  0.    +0.j  0.    +0.j]
 [ 0.    +0.j  0.7071+0.j  0.    +0.j  0.    +0.j  0.    +0.j -0.7071+0.j  0.    +0.j  0.    +0.j]
 [ 0.    +0.j  0.    +0.j  0.7071+0.j  0.    +0.j  0.    +0.j  0.    +0.j  0.    +0.j  0.7071+0.j]
 [ 0.    +0.j  0.    +0.j  0.    +0.j  0.7071+0.j  0.    +0.j  0.    +0.j  0.7071+0.j  0.    +0.j]
 [ 0.    +0.j  0.    +0.j -0.7071+0.j  0.    +0.j  0.    +0.j  0.    +0.j  0.    +0.j  0.7071+0.j]
 [ 0.    +0.j  0.    +0.j  0.    +0.j -0.7071+0.j  0.    +0.j  0.    +0.j  0.7071+0.j  0.    +0.j]]

あとは原文にはいろいろ検証があります。

まずは全体の作成された拡散モデルからのテンソルでどれくらい忠実度が高いかですが、128のうち、30近くがいいという結果に。下記は同じ回路を手元の拡散モデルで実行しましたが同じような結果となりました。

複数の回路が提案できるので、特定の量子ゲートを入れたモデルなどハードに合わせても実装できるということでした。

最近では機械学習と量子計算の組み合わせが増えてきました。どんどんやりましょう！

NVIDIA CUDA-Q チュートリアル＆H100ベンチマーク２：拡散モデルによる生成AIでの量子回路生成

Yuichiro Minato

拡散モデルでユニタリ行列を量子回路にコンパイルしてみよう

なぜ拡散モデル？

回路生成の仕組み（ざっくり）

1. 回路エンコード（Circuit Encoding）

2. 条件付け（Conditioning）

3. 回路生成（Unitary Compilation）

実際にコードを見ていきます。

作成されているかどうか確認する

NVIDIA CUDA-Q チュートリアル＆H100ベンチマーク２： 拡散モデルによる生成AIでの量子回路生成

Yuichiro Minato

拡散モデルでユニタリ行列を量子回路にコンパイルしてみよう

なぜ拡散モデル？

回路生成の仕組み（ざっくり）

1. 回路エンコード（Circuit Encoding）

2. 条件付け（Conditioning）

3. 回路生成（Unitary Compilation）

実際にコードを見ていきます。

作成されているかどうか確認する

NVIDIA CUDA-Q チュートリアル＆H100ベンチマーク２：拡散モデルによる生成AIでの量子回路生成