common.title

Docs
Quantum Circuit
TYTAN CLOUD

QUANTUM GAMING


Desktop RAG

Overview
Terms of service

Privacy policy

Contact
Research

Sign in
Sign up
common.title

Benchmarking Quantum Fourier Transform with CUDA-Q: Single-GPU Performance Comparison Including H200

Yuichiro Minato

2025/03/29 22:52

Benchmarking Quantum Fourier Transform with CUDA-Q:

Single-GPU Performance Comparison Including H200
~ Evaluating Real Performance Beyond 22 to 30+ Qubits ~

CUDA-Q, provided by NVIDIA, is the latest toolkit for hybrid quantum computing leveraging GPUs. At blueqat, we conducted a benchmark using Quantum Fourier Transform (QFT) circuits with CUDA-Q.

The goal of this benchmark is to measure how efficiently high-qubit quantum circuits can be processed in cutting-edge single-GPU environments, including the H200. CUDA-Q’s true capabilities are put to the test especially in the 22+ qubit range.

🧪 Benchmark Environment

Tool used: CUDA-Q
Circuit: Quantum Fourier Transform (QFT)
GPUs tested (all single-GPU setups):

  • NVIDIA RTX 4090
  • NVIDIA RTX 5090
  • NVIDIA L40s
  • NVIDIA H100 SXM
  • NVIDIA H100 NVL
  • NVIDIA H200

📈 Execution Time Beyond 22 Qubits (Log Scale)

The graph below visualizes GPU execution times beyond 22 qubits on a logarithmic scale, enabling clear comparisons of subtle performance differences.

image

Notably, both the H200 and H100 NVL continue to scale smoothly beyond 30 qubits, demonstrating strong compatibility between CUDA-Q and the Ampere and later GPU architectures.

📈 Execution Time Beyond 30 Qubits (Linear Scale)

The following graph compares execution times beyond 30 qubits on a linear scale. Absolute values provide a clearer picture for real-world responsiveness and resource estimation.

image

  • With H100 and H200, 30-qubit circuits run in the range of several to a dozen seconds.
  • RTX 4090 and 5090, with 24GB/32GB VRAM, hit a qubit limit sooner.
  • L40s, with 48GB VRAM, supports slightly more qubits but has speed limitations.
  • H100 and H200 are similar in speed, but VRAM capacity determines the maximum qubits that can be simulated.

✅ Insights and Conclusion

  • CUDA-Q is highly effective for executing quantum circuits using existing GPU resources.
  • Especially on H200 and H100, CUDA-Q enables practical performance for mid- to large-scale circuits exceeding 30 qubits.
  • Even in single-GPU configurations, CUDA-Q environments can accelerate quantum algorithm research in both cloud and on-premise settings.

At blueqat, we will continue exploring practical quantum computing solutions using GPUs, especially with CUDA-Q.

📌 Note

This benchmark involves GPU-based quantum circuit simulation, not actual quantum computers. However, it provides a practical and realistic foundation for developing, evaluating, and verifying quantum algorithms, and is expected to gain increasing adoption across industries and research institutions.

© 2025, blueqat Inc. All rights reserved