NVIDIA's quantum computing SDK, cuQuantum version 23.10, has been updated. In this update, we want to review the enhancements made to the software development kit.
The SDK is primarily divided into three parts. The first is called cuStateVec, which is a state vector simulator that serves as a traditional simulator and constitutes the core of cuQuantum. The second is a new simulator known as cuTensorNet, which is different from the traditional approach and may require some skill to use effectively. The last part is a Python wrapper for operating these two simulators, which has been updated under the name cuQuantum Python. These three components have been updated, and we would like to review the changes.
Release Note
https://docs.nvidia.com/cuda/cuquantum/latest/cuquantum_sdk_release_notes.html#cuquantum-sdk-v23-10
cuStateVec is now 1.5.0
https://docs.nvidia.com/cuda/cuquantum/latest/custatevec/release_notes.html#custatevec-v1-5-0
¶
cuStateVec v1.5.0- Added new API:
- Migration of sub state vectors (see Host state vector migration and custatevecSubSVMigratorMigrate() )
- Improved performance/functionality:
- Improved the performance of custatevecApplyPauliRotation() .
- Resolved issues:
- Fixed an issue that custatevecMultiDeviceSwapIndexBits() accepted wrong index bit positions specified to the indexBitSwaps argument. If the indexBitSwaps argument is properly given, the function works properly.
Firstly, let's talk about cuStateVec. The first point is about a new API, which is important. The second is about performance improvements. The third is the resolution of issues, so we will skip the third and focus on the first two points.
Regarding the API, it appears to be a significant update. NVIDIA has recently entered the CPU market as well, releasing the Grace CPU. This release seems to be compatible with the new Grace CPU.
For a detailed explanation of the overview, please refer to the following website.
https://docs.nvidia.com/cuda/cuquantum/latest/custatevec/host_state_vector_migration.html
Regarding Host State Vector Migration
The cuStateVec library now provides the custatevecSubSVMigrator API, which allows users to combine host CPU memory and device GPU memory to scale up simulations. It seems likely that simulations for quantum computers will become faster, as both the Grace CPU and GPU can be used efficiently.
custatevecSubSVMigrator API
The custatevecSubSVMigrator API is a utility for migrating state vectors allocated on the CPU (host) and additionally those allocated on the GPU (device). By using this API, CPU memory can be leveraged to accommodate the state vector. Furthermore, by utilizing both CPU and GPU memory for a single state vector, it is possible to maximize the number of qubits that can be simulated.
API is here
Next
cuTensorNet is v2.3.0
https://docs.nvidia.com/cuda/cuquantum/latest/cutensornet/release_notes.html#cutensornet-v2-3-0
¶
cuTensorNet v2.3.0- New functionalities:
- New high-level APIs for defining tensor network operators and computing their expectation values over user-defined tensor network states.
- See the introduction at High-level tensor network specification and processing .
- New high-level APIs for computing arbitrary slices of tensor network states.
- New high-level APIs for computing the Matrix-Product-State (MPS) factorization of a given tensor network state.
- New truncation option CUTENSORNET_TENSOR_SVD_CONFIG_DISCARDED_WEIGHT_CUTOFF for tensor SVD computation.
- Bugs fixed:
- Fix a bug when the automatic distributed contraction path optimization is invoked with the TIME as a cost function to ensure the optimal path is chosen.
- Fix a bug for potentially inconsistent library handles management when cutensornetWorkspaceComputeSVDSizes() , cutensornetWorkspaceComputeQRSizes() and cutensornetWorkspaceComputeGateSplitSizes() are called.
- Fix a bug for cutensornetTensorQR() when the combined matrix row/column extent of the input tensor equals 1.
- Fix a performance bug when the tensor network contraction path finder is run by multiple processes on the same node.
- Other changes:
- Complex-valued gradients computed with the experimental API cutensornetComputeGradientsBackward() are complex conjugated now, as compared to what would be returned in the previous release.
As part of the second update, which includes bug fixes, I am particularly interested in the first and last changes.
To simplify the specification and processing of tensor networks encountered in the quantum field and other domains, cuTensorNet provides a set of high-level API functions that allow users to gradually build a given tensor network state and then compute its properties.
It seems there are mainly new API offerings for slicing and decomposition of Matrix Product States (MPS), with new decomposition options for Singular Value Decomposition (SVD).
The final "Other changes" are quite intriguing features. There is an experimental release for the calculation of gradients in complex numbers. This is particularly relevant in fields such as machine learning where gradients are required, and in the case of quantum computers, these are complex numbers, which may not be supported by general libraries.
Finally we see cuQuantum Python
https://docs.nvidia.com/cuda/cuquantum/latest/python/release_notes.html#cuquantum-python-v23-10-0
¶
cuQuantum Python v23.10.0- Add new APIs and functionalities:
- For low-level APIs, please refer to the release notes of cuStateVec v1.5.0 and cuTensorNet v2.3.0 .
- The function cuquantum.contract() now works like a native PyTorch operator as far as autograd is concerned, if the input operands are PyTorch tensors. This is an experimental feature.
- A new, experimental method cuquantum.Network.gradients() is added for computing the gradients of the network with respect to the input operands.
- If the gradients are complex-valued, the convention follows that of PyTorch’s .
- Added a new attribute cuquantum.cutensornet.tensor.SVDMethod.discarded_weight_cutoff to allow SVD truncation based on discarded weight.
- The cuquantum.Network constructor and its reset_operands() method now accept an optional stream argument.
- Bugs fixed:
- Fix potential data corruption when reset_operands() is called when the provided operands don’t outlive the contraction operation.
- For the case of using CPU arrays (from NumPy/PyTorch) as input operands for contraction, the internal streams were not be properly ordered.
- The methods autotune() , contract() and the standalone function contract() allow passing the pointer address for the stream argument, as promised in the docs.
- The attribute dtypes for cuquantum.cutensornet.MarginalAttribute.OPT_NUM_HYPER_SAMPLES and cuquantum.cutensornet.SamplerAttribute.OPT_NUM_HYPER_SAMPLES are fixed.
- Other changes:
- If Python logging is enabled, cuTensorNet’s run-time (instead of build-time) version is reported.
- For passing PyTorch tensors to contraction APIs, the tensor flags .is_conj() and .requires_grad are now taken into account, unless a user explicitly overwrites them with the qualifiers argument.
Looking beyond bug fixes, it's evident that there were discussions about APIs related to cuStateVec and cuTensorNet. It also seems that there is now compatibility with PyTorch Tensors. It looks like it's possible to work with objects like Torch Tensors, even when automatic differentiation information is set. Furthermore, the updates support the content previously mentioned for cuStateVec and cuTensorNet.
How about that? The significant changes seem to revolve around updates with a focus on the Grace Hopper CPU and GPU, as well as compatibility with existing machine learning frameworks like PyTorch, enabling the differentiation of complex numbers. This should make gradient calculations more straightforward. That's all for now.