Benchmarking Quantum Fourier Transform with CUDA-Q:
Single-GPU Performance Comparison Including H200
~ Evaluating Real Performance Beyond 22 to 30+ Qubits ~
CUDA-Q, provided by NVIDIA, is the latest toolkit for hybrid quantum computing leveraging GPUs. At blueqat, we conducted a benchmark using Quantum Fourier Transform (QFT) circuits with CUDA-Q.
The goal of this benchmark is to measure how efficiently high-qubit quantum circuits can be processed in cutting-edge single-GPU environments, including the H200. CUDA-Q’s true capabilities are put to the test especially in the 22+ qubit range.
🧪 Benchmark Environment
Tool used: CUDA-Q
Circuit: Quantum Fourier Transform (QFT)
GPUs tested (all single-GPU setups):
- NVIDIA RTX 4090
- NVIDIA RTX 5090
- NVIDIA L40s
- NVIDIA H100 SXM
- NVIDIA H100 NVL
- NVIDIA H200
📈 Execution Time Beyond 22 Qubits (Log Scale)
The graph below visualizes GPU execution times beyond 22 qubits on a logarithmic scale, enabling clear comparisons of subtle performance differences.
Notably, both the H200 and H100 NVL continue to scale smoothly beyond 30 qubits, demonstrating strong compatibility between CUDA-Q and the Ampere and later GPU architectures.
📈 Execution Time Beyond 30 Qubits (Linear Scale)
The following graph compares execution times beyond 30 qubits on a linear scale. Absolute values provide a clearer picture for real-world responsiveness and resource estimation.
- With H100 and H200, 30-qubit circuits run in the range of several to a dozen seconds.
- RTX 4090 and 5090, with 24GB/32GB VRAM, hit a qubit limit sooner.
- L40s, with 48GB VRAM, supports slightly more qubits but has speed limitations.
- H100 and H200 are similar in speed, but VRAM capacity determines the maximum qubits that can be simulated.
✅ Insights and Conclusion
- CUDA-Q is highly effective for executing quantum circuits using existing GPU resources.
- Especially on H200 and H100, CUDA-Q enables practical performance for mid- to large-scale circuits exceeding 30 qubits.
- Even in single-GPU configurations, CUDA-Q environments can accelerate quantum algorithm research in both cloud and on-premise settings.
At blueqat, we will continue exploring practical quantum computing solutions using GPUs, especially with CUDA-Q.
📌 Note
This benchmark involves GPU-based quantum circuit simulation, not actual quantum computers. However, it provides a practical and realistic foundation for developing, evaluating, and verifying quantum algorithms, and is expected to gain increasing adoption across industries and research institutions.