Today, I conducted benchmarks for Torch Tytan, focusing on moderate computations with 5,000 and 10,000 qubits, and compared them across different GPUs to avoid too much variety.

今日はTorch Tytanのベンチマークをとってみました。あんまり種類があるとアレなので、5,000と10,000量子ビットの適度な計算をGPU別にとりました。

from tytan import *
import random
import time

N = 5000

qubits

q = symbols_list(N, 'q{}')

hamiltonian

H = 0

biases

for i in range(N):
H += random.randint(-10, 10) * q[i]

Jij but set N as it takes a long time to finish if we set all of connections

for i in range(N):
H += random.choice([-1, 1]) * q[random.randint(0,N-1)] * q[random.randint(0,N-1)]

compile

qubo, offset = Compile(H).get_qubo()

sampler

solver = sampler.ArminSampler(seed=None, mode='GPU', device='cuda:0', verbose=1)

start = time.time()

#sampling
result = solver.run(qubo, shots=1)

print(time.time() - start)

In this way, for 5,000 qubits, we set a moderate 5,000 points of interaction, and for 10,000 qubits, we set a moderate 10,000 points of interaction. All measurements were taken on a single GPU.

こんな感じで、5,000量子ビットでは適度に5,000箇所の相互作用を、10,000量子ビットでは、適度に1万箇所の相互作用を決めます。全てシングルGPUで計測しています。

5,000qubits

H100 : 3.7187023162841797s
RTX 6000ada : 3.443608045578003s
T4 : 14.257462501525879s

10,000qubits

H100 : 12.083187103271484s
RTX 6000ada : 11.568817615509033s
T4 : 62.8189423084259s

It went quite well. The H100 might achieve even higher speeds, but that's for another time.

なかなかいい感じでした。H100はもっと速度出るかもしれませんが一旦。

Torch Tytan on QUBO Annealing Benchmark / H100, RTX6000ada, T4

Yuichiro Minato