NVIDIA CUDA-Q チュートリアル＆H100ベンチマーク９：ショアのアルゴリズムによる素因数分解

量子コンピュータで最も有名な応用のひとつが、Shorのアルゴリズムによる整数の素因数分解です。
このアルゴリズムが特に重要視されている理由は、効率的な素因数分解アルゴリズムが現代のRSAなどの公開鍵暗号方式を破る可能性があるからです。

小さな整数であれば、この量子アルゴリズムを古典コンピュータ上でシミュレーションすることが可能です。
ただし、古典的な実装における主な課題は、**order-finding（位数探索）**の部分にあります。

まずは前段として、このorder-findingアルゴリズムの古典的な実装を紹介し、その後で量子アルゴリズムによる位数探索の説明に進みます。

!pip install contfrac==1.0.0 -q

from math import gcd, log2, ceil
import numpy as np
import random
import cudaq
from cudaq import *
import fractions
import matplotlib.pyplot as plt
import contfrac

Shorのアルゴリズムは、整数の素因数分解を効率的に行う量子アルゴリズムです。RSA暗号を破る可能性があることで有名です。

このアルゴリズムのポイントは2つ：

素因数分解の問題を「位数を見つける問題」に変える
位数（ある数を何乗すると1になるか）を量子計算で求める

流れは以下の通り：

$N$ が偶数なら終了（2が因数）
ランダムな数 $ a $ を選ぶ
$a$ と $N$ に共通の因数があれば、それが答え
$a$ の位数 $r$ を見つける
$a^{r/2} \pm 1$ と $N$ に共通因数があれば、それが答え
ダメなら繰り返す

また、「位数を見つける」ことは「周期を見つける」ことと同じ意味で使われます。

def shors_algorithm(N, initial, quantum):
    """ Factor N using Shor's algorithm with initial starting value and choice of
    using either classical or quantum approach for the period finding step
    Parameters
    ----------
    N: int
        Integer to be factored (assumed to be non-prime and >1)
    initial: int
        Initial choice of the random integer between 2 and N-1
    quantum: boolean
        True if we will call the quantum order-finding algorithm, and false if we call the classical one for step 3.

    Returns
    -------
    a1, a2: int
        Non-trivial factors of N
    """

    # 0. Check to see if N is even.
    if N % 2 == 0:
        divisor1 = 2
        divisor2 = N // 2
        print("Found factors:", divisor1, divisor2)
        return (divisor1, divisor2)

    attempts = [initial]
    while (True):
        # 1. Select a random integer between 2 and N-1
        if len(attempts) == 1:
            a = initial
        else:
            a = random.choice(
                [n for n in range(N - 1) if n not in attempts and n != 1])

        # 2. See if the integer selected in step 1 happens to factor N
        print("Trying a =", a)
        divisor1 = gcd(a, N)
        if divisor1 != 1:
            divisor2 = N // divisor1
            print("Found factors of N={} by chance: {} and {}".format(
                N, divisor1, divisor2))
            return (divisor1, divisor2)

        # 3. Find the order of a mod N (i.e., r, where a^r = 1 (mod N))
        if quantum == True:
            r = find_order_quantum(a, N)
        else:
            r = find_order_classical(a, N)
        print("The order of a = {} is {}".format(a, r))

        # 4. If the order of a is found and it is
        # * even and
        # * not a^(r/2) = -1 (mod N),
        # then test a^(r/2)-1 and a^(r/2)+1 to see if they share factors with N.
        # We also want to rule out the case of finding the trivial factors: 1 and N.
        divisor1, divisor2 = test_order(a, r, N)
        if (divisor1 != 0):  # test_order will return a 0 if no factor is found
            print("Found factors of N = {}: {} and {}".format(
                N, divisor1, divisor2))
            return divisor1, divisor2

        # 5. Repeat
        print("retrying...")
        attempts.append(a)

まず、素因数分解の問題を「位数探索問題」に変換する考え方を見てみましょう。これは先ほどのステップ4に該当します。

ここでは、選んだ整数 $a$ と $N$ が互いに素（共通因数が1）であり、 $a$ の位数 $r$ が見つかっているとします。このとき、以下の式が成り立ちます：

a^r \equiv 1 \mod N

もし $r$ が偶数なら、 $r = 2k$ と書けるので：

a^r - 1 = a^{2k} - 1 = (a^k - 1)(a^k + 1)

この式を因数分解の形で書き直せます。さらに、次の条件が成り立つ場合：

a^k \not\equiv \pm1 \mod N

このとき、 $a^k - 1$ または $a^k + 1$ のいずれかが $N$ と非自明な共通因数を持っており、それが素因数となる可能性があります。

Peter Shorは、ランダムに選んだ $a$ がこの条件を満たす確率は 50%以上であることを示しました。

以下のコードでは、 $r$ が偶数か、上記の条件を満たしているかを確認し、非自明な因数を探す関数が定義されています。

def test_order(a, r, N):
    """Checks whether or not a^(r/2)+1 or a^(r/2)-1 share a non-trivial common factor with N
    Parameters
    ----------
    a: int
    r: int
    N: int

    Returns
    -------
    int, int factors of N, if found; 0,0 otherwise
    """

    if r != None and (r % 2 == 0) and a**r % N == 1:
        if (a**(int(r / 2))) % N != -1:
            possible_factors = [gcd(r - 1, N), gcd(r + 1, N)]
            for test_factor in possible_factors:
                if test_factor != 1 and test_factor != N:
                    return test_factor, N // test_factor
    # period did not produce a factor
    else:
        print('No non-trivial factor found')
        return 0, 0

Shorのアルゴリズムの重要な部分は、ある数 $a$ の「位数」 $r$ を効率的に見つける量子アルゴリズムです。

古典的なアルゴリズムでも位数を求めることは可能ですが、計算量が多く非常に非効率です。

def find_order_classical(a, N):
    """A naive classical method to find the order r of a (mod N).
    Parameters
    ----------
    a: int
        an integer in the interval [2,N-1]
    N: int

    Returns
    -------
    r: int
        Period of a^x (mod N)
    """
    assert 1 < a and a < N
    r = 1
    y = a
    while y != 1:
        y = y * a % N
        r += 1
    return r

いくつかの例を使って、Shorのアルゴリズムが古典的な位数探索を用いてどのように動作するかを見てみましょう。

ここで注目すべき点は、初期に選んだ数値がどれだけ頻繁に $N$ の因数を見つける結果につながるか、ということです。

今回はQuantumがFalseになっているので古典的に探します。

my_integer = 123  #edit this value to try out a few examples
# Edit the value in the line below to try out different initial guesses for a.
# What happens when you choose a = 42 for the integer 123?
# What happens when you choose a = 100 for the integer 123?
initial_value_to_start = 42  # edit this value; it should be less than my_integer

shors_algorithm(my_integer, initial_value_to_start, False)

答えは、3と42が見つかりました。

Trying a = 42
Found factors of N=123 by chance: 3 and 41
(3, 41)

量子アルゴリズムによる位数探索問題の解法では、「量子フーリエ変換（QFT）」が中心的な役割を果たします。フーリエ変換は古典的にも周期を見つけるための効率的な手法ですが、量子版ではさらに高速に周期（= 位数）を特定することができます。

Shorのアルゴリズムでは、量子フーリエ変換を使って、複数の量子ビットの測定から位数を導き出します。

図には、Shorのアルゴリズムで使われる回路が示されています。この回路では、最後のステップで**逆量子フーリエ変換（Inverse QFT）**が適用されます。

引用：https://nvidia.github.io/cuda-quantum/latest/applications/python/shors.html

この回路では以下のような構成が必要です：

逆量子フーリエ変換を実装するカーネル（QFT → 逆QFT）
**モジュラー乗算（modular multiplication）**を行うカーネル：これは $a^x \mod N$ を実現するため、何度も繰り返し適用されます

このセクションの目標は、上記の量子回路を phase_kernel という名前のカーネルとしてコーディングすることです。その準備として、まず量子フーリエ変換とその逆変換のカーネルを実装します。

# Define kernels for the quantum Fourier transform and the inverse quantum Fourier transform
@cudaq.kernel
def quantum_fourier_transform(qubits: cudaq.qview):
    qubit_count = len(qubits)
    # Apply Hadamard gates and controlled rotation gates.
    for i in range(qubit_count):
        h(qubits[i])
        for j in range(i + 1, qubit_count):
            angle = (2 * np.pi) / (2**(j - i + 1))
            cr1(angle, [qubits[j]], qubits[i])


@cudaq.kernel
def inverse_qft(qubits: cudaq.qview):
    cudaq.adjoint(quantum_fourier_transform, qubits)

Shorのアルゴリズムは理論的に高速な素因数分解を可能にしますが、実装には多くの量子ビットやゲートが必要で、現在の量子ハードウェアでは現実的ではありません。

特に「モジュラーべき乗（ $a^x \mod N$ ）」の量子カーネルの最適化が重要で、リソース削減の研究が進められています。

このデモでは、簡単な例（例： $N = 21, a = 5$ ）に絞って、位数探索の部分のみを量子回路で確認します。

@cudaq.kernel
def modular_mult_5_21(work: cudaq.qview):
    """"Kernel for multiplying by 5 mod 21
    based off of the circuit diagram in
    https://physlab.org/wp-content/uploads/2023/05/Shor_s_Algorithm_23100113_Fin.pdf
    Modifications were made to change the ordering of the qubits"""
    x(work[0])
    x(work[2])
    x(work[4])

    swap(work[0], work[4])
    swap(work[0], work[2])


@cudaq.kernel
def modular_exp_5_21(exponent: cudaq.qview, work: cudaq.qview,
                     control_size: int):
    """ Controlled modular exponentiation kernel used in Shor's algorithm
    |x> U^x |y> = |x> |(5^x)y mod 21>
    """
    x(work[0])
    for exp in range(control_size):
        ctrl_qubit = exponent[exp]
        for _ in range(2**(exp)):
            cudaq.control(modular_mult_5_21, ctrl_qubit, work)

以下のコードブロックでは、上で定義した量子カーネルが、指定した $a$ と $N$ に対して正しく乗算および**べき乗（累乗）**を実行しているかを確認します。

確認するには、下の iterations 変数の値を変更して、さまざまなべき指数で正しく動作するかを試してみてください。

# Demonstrate iterated application of 5y mod 21 where y = 1
@cudaq.kernel
def demonstrate_mod_exponentiation(iterations: int):
    qubits = cudaq.qvector(5)
    x(qubits[0]
     )  # initalizes the qubits in the state for y = 1 which is |10000>
    for _ in range(iterations):
        modular_mult_5_21(qubits)


shots = 200

# The iterations variable determines the exponent in 5^x mod 21.
# Change this value to verify that the demonstrate_mod_exponentiation
# kernel carries out the desired calculation.
iterations = 1

print(cudaq.draw(demonstrate_mod_exponentiation, iterations))

results = cudaq.sample(demonstrate_mod_exponentiation,
                       iterations,
                       shots_count=shots)

print("Measurement results from sampling:", results)

# Reverse the order of the most probable measured bit string
# and convert the binary string to an integer
integer_result = int(results.most_probable()[::-1], 2)

print("For x = {}, 5^x mod 21 = {}".format(iterations, (5**iterations) % 21))
print("For x = {}, the computed result of the circuit is {}".format(
    iterations, integer_result))

ケース： $N = 21$ , $a = 4$

以下の例では、モジュラー乗算を単純に実装した場合と比べて、必要なゲート数と量子ビット数を削減するようにワークレジスタ（作業用レジスタ）が最適化されています。これは前のケースと比較して、より効率的な回路設計になっています。

@cudaq.kernel
def modular_exp_4_21(exponent: cudaq.qview, work: cudaq.qview):
    """ Controlled modular exponentiation kernel used in Shor's algorithm
     |x> U^x |y> = |x> |(4^x)y mod 21>
     based off of the circuit diagram in https://arxiv.org/abs/2103.13855
     Modifications were made to account for qubit ordering differences"""
    swap(exponent[0], exponent[2])
    # x = 1
    x.ctrl(exponent[2], work[1])

    # x = 2
    x.ctrl(exponent[1], work[1])
    x.ctrl(work[1], work[0])
    x.ctrl([exponent[1], work[0]], work[1])
    x.ctrl(work[1], work[0])

    # x = 4
    x(work[1])
    x.ctrl([exponent[0], work[1]], work[0])
    x(work[1])
    x.ctrl(work[1], work[0])
    x.ctrl([exponent[0], work[0]], work[1])
    x.ctrl(work[1], work[0])
    swap(exponent[0], exponent[2])

それでは、このセクションの最初に示した回路図の指示に従って処理を実行するための phase_kernel を定義する準備が整いました。

@cudaq.kernel
def phase_kernel(control_register_size: int, work_register_size: int, a: int,
                 N: int):
    """
    Kernel to estimate the phase of the modular multiplication gate |x> U |y> = |x> |a*y mod 21> for a = 4 or 5
    """

    qubits = cudaq.qvector(control_register_size + work_register_size)
    control_register = qubits[0:control_register_size]
    work_register = qubits[control_register_size:control_register_size +
                           work_register_size]

    h(control_register)

    if a == 4 and N == 21:
        modular_exp_4_21(control_register, work_register)
    if a == 5 and N == 21:
        modular_exp_5_21(control_register, work_register, control_register_size)

    inverse_qft(control_register)

    # Measure only the control_register and not the work_register
    mz(control_register)

control_register_size = 3
work_register_size = 5
values_for_a = [4, 5]
idx = 1  # change to 1 to select 5
N = 21
shots = 15000

print(
    cudaq.draw(phase_kernel, control_register_size, work_register_size,
               values_for_a[idx], N))

results = cudaq.sample(phase_kernel,
                       control_register_size,
                       work_register_size,
                       values_for_a[idx],
                       N,
                       shots_count=shots)
print(
    "Measurement results for a={} and N={} with {} qubits in the control register "
    .format(values_for_a[idx], N, control_register_size))
print(results)

実行時間：
0.006780862808227539秒

Measurement results for a=5 and N=21 with 3 qubits in the control register
{ 000:2819 001:1877 010:963 011:1933 100:2793 101:1883 110:948 111:1784 }

実行も早いですね。

phase_kernel の測定結果から、演算子の「位相」を推定することができます。ここでは、最も出現頻度の高いゼロ以外の測定結果に注目します。

そのため、よく現れる結果だけを抽出するための関数 top_results を作成します。

def top_results(sample_results, zeros, threshold):
    """Function to output the non-zero results whose counts are above the given threshold
    Returns
    -------
        dict[str, int]: keys are bit-strings and values are the respective counts
    """
    results_dictionary = {k: v for k, v in sample_results.items()}
    if zeros in results_dictionary.keys():
        results_dictionary.pop(zeros)
    sorted_results = {
        k: v for k, v in sorted(
            results_dictionary.items(), key=lambda item: item[1], reverse=True)
    }
    top_key = next(iter(sorted_results))
    max_value = sorted_results[top_key]
    top_results_dictionary = {top_key: max_value}

    for key in sorted_results:
        if results_dictionary[key] > min(threshold, max_value):
            top_results_dictionary[key] = results_dictionary[key]
    return top_results_dictionary

先ほど計算した、制御レジスタに3量子ビットを使った $a = 4$ , $N = 21$ のケースについて、top_results 関数が測定結果から出現頻度の高い結果をどのように抽出するかを確認します。

これらの上位結果のうちの1つが、演算子の位相の推定値である可能性が高いです。

# Example
top_results(results, '000', 750)

{'100': 2793,
 '011': 1933,
 '101': 1883,
 '001': 1877,
 '111': 1784,
 '010': 963,
 '110': 948}

演算子の位相推定値から ( a ) の位数（周期）を求めるには、「連分数展開（continued fractions）」を使うアルゴリズムが必要です。

この計算は find_order_quantum 関数で実行されます。

def get_order_from_phase(phase, phase_nbits, a, N):
    """Uses continued fractions to find the order of a mod N
    Parameters
    ----------
    phase: int
        Integer result from the phase estimate of U|x> = ax mod N
    phase_nbits: int
        Number of qubits used to estimate the phase
    a: int
        For this demonstration a is either 4 or 5
    N: int
        For this demonstration N = 21
    Returns
    -------
    int: period of a mod N if found, otherwise returns None
    """

    assert phase_nbits > 0
    assert a > 0
    assert N > 0

    eigenphase = float(phase) / 2**(phase_nbits)

    f = fractions.Fraction.from_float(eigenphase).limit_denominator(N)

    if f.numerator == 1:
        return None
    eigenphase = float(f.numerator / f.denominator)
    print('eigenphase is ', eigenphase)
    coefficients_continued_fraction = list(
        contfrac.continued_fraction(eigenphase))

    convergents_continued_fraction = list(contfrac.convergents(eigenphase))
    print('convergent sequence of fractions for this eigenphase is',
          convergents_continued_fraction)
    for r in convergents_continued_fraction:
        print(
            'using the denominators of the fractions in the convergent sequence, testing order =',
            r[1])
        if a**r[1] % N == 1:
            print('Found order:', r[1])
            return (r[1])
    return None

これまでに説明したすべての要素を組み合わせて、( a ) の位数（order）を求める関数を作成し、それを素因数分解アルゴリズムの中でテストする準備が整いました。

def find_order_quantum(a, N):
    """The quantum algorithm to find the order of a mod N, when x = 4 or x =5 and N = 21
    Parameters
    ----------
    a: int
        For this demonstration a will be either 4 or 5
    N: int
        For this demonstration N will be 21
    Returns
    r: int the period if it is found, or None if no period is found
    -------

    """

    if (a == 4 and N == 21) or (a == 5 and N == 21):
        shots = 15000
        if a == 4 and N == 21:
            control_register_size = 3
            work_register_size = 2
        if a == 5 and N == 21:
            control_register_size = 5
            work_register_size = 5

        #cudaq.set_random_seed(123)
        results = cudaq.sample(phase_kernel,
                               control_register_size,
                               work_register_size,
                               a,
                               N,
                               shots_count=shots)
        print("Measurement results:")
        print(results)

        # We will want to ignore the all zero result
        zero_result = ''.join(
            [str(elem) for elem in [0] * control_register_size])
        # We'll only consider the top results from the sampling
        threshold = shots * (.1)
        most_probable_bitpatterns = top_results(results, zero_result, threshold)

        for key in most_probable_bitpatterns:
            # Convert the key bit string into an integer
            # This integer divided by 8 is an estimate for the phase
            reverse_result = key[::-1]
            phase = int(reverse_result, 2)

            print("Trying nonzero bitpattern from the phase estimation:", key,
                  "=", phase)
            r = get_order_from_phase(phase, control_register_size, a, N)
            if r == None:
                print('No period found.')

                continue

            return r
            break
    else:
        print(
            "A different quantum kernel is required for this choice of a and N."
        )

my_integer = 21
initial_value_to_start = 5  # Try replacing 5 with 4
quantum = True
shors_algorithm(my_integer, initial_value_to_start, quantum)

Trying a = 5
Measurement results:
{ 00000:2495 00001:16 00010:28 00011:50 00100:95 00101:1723 00110:444 00111:85 01000:60 01001:86 01010:435 01011:1742 01100:113 01101:48 01110:23 01111:23 10000:2557 10001:21 10010:18 10011:46 10100:128 10101:1713 10110:409 10111:109 11000:64 11001:85 11010:471 11011:1715 11100:122 11101:40 11110:23 11111:13 }

Trying nonzero bitpattern from the phase estimation: 10000 = 1
No period found.
Trying nonzero bitpattern from the phase estimation: 01011 = 26
eigenphase is 0.8125
convergent sequence of fractions for this eigenphase is [(0, 1), (1, 1), (4, 5), (13, 16)]
using the denominators of the fractions in the convergent sequence, testing order = 1
using the denominators of the fractions in the convergent sequence, testing order = 1
using the denominators of the fractions in the convergent sequence, testing order = 5
using the denominators of the fractions in the convergent sequence, testing order = 16
No period found.
Trying nonzero bitpattern from the phase estimation: 00101 = 20
eigenphase is 0.625
convergent sequence of fractions for this eigenphase is [(0, 1), (1, 1), (1, 2), (2, 3), (5, 8)]
using the denominators of the fractions in the convergent sequence, testing order = 1
using the denominators of the fractions in the convergent sequence, testing order = 1
using the denominators of the fractions in the convergent sequence, testing order = 2
using the denominators of the fractions in the convergent sequence, testing order = 3
using the denominators of the fractions in the convergent sequence, testing order = 8
No period found.
Trying nonzero bitpattern from the phase estimation: 11011 = 27
eigenphase is 0.8421052631578947
convergent sequence of fractions for this eigenphase is [(0, 1), (1, 1), (5, 6), (16, 19)]
using the denominators of the fractions in the convergent sequence, testing order = 1
using the denominators of the fractions in the convergent sequence, testing order = 1
using the denominators of the fractions in the convergent sequence, testing order = 6
Found order: 6
The order of a = 5 is 6
Found factors of N = 21: 7 and 3
(7, 3)

NVIDIA CUDA-Q チュートリアル＆H100ベンチマーク９： ショアのアルゴリズムによる素因数分解

Yuichiro Minato

NVIDIA CUDA-Q チュートリアル＆H100ベンチマーク９：ショアのアルゴリズムによる素因数分解