Hello, until now, I have been benchmarking the cuQuantum's cuStateVec with a 4-card and 8-card configuration of the V100 for quantum volume calculations at depth 10. This time, I benchmarked an 8-card configuration of the A100 40G (not 80G), so I would like to introduce it. The problem I tested remains the same. I used Qiskit + cuStateVec to perform quantum circuit calculations on the GPU. The calculation results were almost as expected. Since the VRAM has increased from 32GB in the V100 to 40GB this time, I was able to calculate one more qubit. Overall, the 8 cards of the A100 are faster.
However, to be honest, as of 2023, obtaining eight A100 cards for quantum computing is extremely difficult. It's practically impossible to get your hands on them, so it might be wiser to give up on that idea. You'll probably have no choice but to use the V100 or H100. Next, if I can get my hands on an H100 environment, I'll try to benchmark it. That's all.