common.title

Docs
Quantum Circuit
TYTAN CLOUD

QUANTUM GAMING


Overview
Contact
Event
Project
Research

Terms of service (Web service)

Terms of service (Quantum and ML Cloud service)

Privacy policy


Sign in
Sign up
common.title

Benchmarking Quantum Fourier Transform with CUDA-Q: Single-GPU Performance Comparison Including H200

Yuichiro Minato

2025/03/29 22:52

Benchmarking Quantum Fourier Transform with CUDA-Q:

Single-GPU Performance Comparison Including H200
~ Evaluating Real Performance Beyond 22 to 30+ Qubits ~

CUDA-Q, provided by NVIDIA, is the latest toolkit for hybrid quantum computing leveraging GPUs. At blueqat, we conducted a benchmark using Quantum Fourier Transform (QFT) circuits with CUDA-Q.

The goal of this benchmark is to measure how efficiently high-qubit quantum circuits can be processed in cutting-edge single-GPU environments, including the H200. CUDA-Q’s true capabilities are put to the test especially in the 22+ qubit range.

🧪 Benchmark Environment

Tool used: CUDA-Q
Circuit: Quantum Fourier Transform (QFT)
GPUs tested (all single-GPU setups):

  • NVIDIA RTX 4090
  • NVIDIA RTX 5090
  • NVIDIA L40s
  • NVIDIA H100 SXM
  • NVIDIA H100 NVL
  • NVIDIA H200

📈 Execution Time Beyond 22 Qubits (Log Scale)

The graph below visualizes GPU execution times beyond 22 qubits on a logarithmic scale, enabling clear comparisons of subtle performance differences.

image

Notably, both the H200 and H100 NVL continue to scale smoothly beyond 30 qubits, demonstrating strong compatibility between CUDA-Q and the Ampere and later GPU architectures.

📈 Execution Time Beyond 30 Qubits (Linear Scale)

The following graph compares execution times beyond 30 qubits on a linear scale. Absolute values provide a clearer picture for real-world responsiveness and resource estimation.

image

  • With H100 and H200, 30-qubit circuits run in the range of several to a dozen seconds.
  • RTX 4090 and 5090, with 24GB/32GB VRAM, hit a qubit limit sooner.
  • L40s, with 48GB VRAM, supports slightly more qubits but has speed limitations.
  • H100 and H200 are similar in speed, but VRAM capacity determines the maximum qubits that can be simulated.

✅ Insights and Conclusion

  • CUDA-Q is highly effective for executing quantum circuits using existing GPU resources.
  • Especially on H200 and H100, CUDA-Q enables practical performance for mid- to large-scale circuits exceeding 30 qubits.
  • Even in single-GPU configurations, CUDA-Q environments can accelerate quantum algorithm research in both cloud and on-premise settings.

At blueqat, we will continue exploring practical quantum computing solutions using GPUs, especially with CUDA-Q.

📌 Note

This benchmark involves GPU-based quantum circuit simulation, not actual quantum computers. However, it provides a practical and realistic foundation for developing, evaluating, and verifying quantum algorithms, and is expected to gain increasing adoption across industries and research institutions.

© 2025, blueqat Inc. All rights reserved