Hello, I previously conducted benchmarks on various GPU machines. This time, the measurement is done on a single GPU.
In addition to the previous multi-GPU configurations, we had implemented configurations of 4 units of V100 VRAM16G, 8 units of VRAM32G, and 8 units of A100 VRAM40G.
This time, I conducted benchmarks for NVIDIA's cuQuantum/cuStateVec + Qiskit-Aer-GPU using T4 VRAM16GB and the consumer-grade RTX4090 VRAM24G.
The circuits for the Quantum Volume (QV) had a depth of 10.
The results are as follows.
The 4090 is impressively fast. I thought the T4 was also fast, but the 4090 was quicker. As expected, it can't really compete with multiple GPUs. With 24G of VRAM, 30 qubits was the limit.