Hello, previously, I conducted benchmarking on various GPU machines.
This time, I performed measurements with a single GPU. In addition to the previous multi-GPU configurations, I've added the following setups:
four V100 GPUs, eight V100 GPUs, and eight A100 GPUs.
This time, I conducted benchmarks with NVIDIA cuQuantum/cuStateVec + Qiskit-Aer-GPU using the T4 with 16GB VRAM and the consumer-grade RTX 4090 with 24GB VRAM. The benchmark was conducted on QV circuits with a depth of 10. Here are the results:
The RTX 4090 is indeed impressively fast. While the T4 is a solid entry-level option in terms of speed, the 4090 outperformed it. It's evident that it can't compete with multiple GPUs. Additionally, with only 24GB of VRAM, the RTX 4090 had a limit of around 30 qubits.