Abstract
Hakuhodo DY Holdings Inc. and blueqat Inc. have demonstrated compression of NeRF models using tensor network technology. The research results were presented in a poster presentation at an international workshop on tensor networks by the SQAI and NCTS Physics in Taiwan.
Neural Radiance Field (NeRF) is a well-known 3D reconstruction method capable of generating novel views of a target scene. NeRF model often employs a neural network trained by captured images to represent a 3D scene as a continuous function that maps a 3D coordinate and a view direction to color and density. In our work, we examine the potential of NeRF acceleration by replacing the MLP layers of a standard NeRF architecture with Matrix Product Operators (MPO). We show that our preliminary experiments with tensorized-NeRF, our NeRF variant, can efficiently reduce model size with comparable performance, indicating the prospect of applying tensor networks to NeRF.
Tensor Networks
Tensor networks [1] have efficiently simulated quantum many-body systems on classical computers. Due to the exponentially large dimensions of the states of quantum systems, represented as vectors in Hilbert space, direct handling on classical computers would necessitate significant resources. Tensor networks efficiently address this challenge by decomposing these states into multiple lower-rank tensors, allowing for necessary approximations.
The field of machine learning is also grappling with a surge in computational demands attributed to the expanding scale of models. Although models based on Transformer [2] have seen significant success recently, the reliance on high-performance GPUs and the associated energy consumption poses significant sustainability challenges. Furthermore, for applications necessitating stringent response time and security standards, the capability to execute tasks on a local device without resources on the cloud is crucial. Various model compression techniques have been introduced to tackle such
challenges.
Against this problem, the ability of tensor networks to effectively extract features from high-dimensional spaces has garnered interest within the machine learning domain. By employing tensor decomposition techniques on the vast number of weight parameters in a neural network, features can be extracted efficiently with a reduced parameter count. We employ Tensor-Train (MPO) decomposition [3], which has been applied to Large Language Models (LLMs) recently, resulting in 30% reduction in model size while maintaining 90% of original accuracy [4]. Figure 1 illustrates the replacement of a dense layer with a Tensor-Train (MPO) layer.
Figure 1. Replacement of the MLP layer with the Tensor-Train(MPO) layer.(4 nodes in this figure.)
Neural Radiance Fields
Neural Radiance Field (NeRF) is a well-known implicit 3D representation that maps a 3D coordinate and a view direction to color and density, as depicted in Figure 2. Given a NeRF model, accumulated color of a ray is calculated using classical volume rendering.
Figure 2. Schematics of image generation and Rendering method via NeRF [5]
The continuous function is often implemented as a neural network [5], or a hybrid of lightweight MLPs and other techniques such as voxel grids, hash-based encodings, decomposed low-rank matrices [6,7,8]. In our work, we examine the possibility of compressing NeRF by replacing the dense layer of a vanilla NeRF architecture with tensor networks.
Experimental Results
Method
- We compare tensorized-NeRF with vanilla NeRF on two real-world datasets: ”Fern” [5], and “Greek” [9] (Figure 3.)
- A single Dense Layer was replaced with the tensorized Dense Layer
- Weight matrix is broken into a two node MPO representation
Results
- Using a bond dimension of 2, a slight speedup is achievable in the range 0.-5% (Table 1.)
- There is a parameter reduction of 13.7% with a single layer replacement with nearly equal view synthesis quality
Figure 3. Novel view by the original model (Left, PSNR: 25.67) and Novel view by tensorized model (Right, PSNR: 25.62).
Table 1. Number of parameters and quality of generated images
Conclusion and Future Work
We presented an application case of model compression with a neural network layer replaced by a tensor network. Specifically, we applied Tensor-Train (MPO) decomposition to MLPs within NeRF, a widely used implicit 3D representation model. We achieved parameter reduction while maintaining the rendering quality of novel views on the real-world scenes. While we see the potential for further compression, we still observe a considerable trade-off between parameter reduction and image quality, which is a problem we would like to solve. Applications to other 3D representations or architectures are also an exciting avenue for further research.
Reference
[1] Schollwöck, Ulrich. Annals of physics 326.1 (2011): 96-192.
[2] Vaswani, Ashish, et al. Advances in neural information processing systems 30 (2017).
[3] Novikov, Alexander, et al. Advances in neural information processing systems 28 (2015).
[4] Tomut, Andrei, et al. arXiv preprint arXiv:2401.14109 (2024).
[5] Mildenhall, Ben, et al. Communications of the ACM 65.1 (2021): 99-106.
[6] Fridovich-Keil, Sara, et al. Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition. 2022.
[7] Müller, Thomas, et al. ACM transactions on graphics (TOG) 41.4 (2022).
[8] Chen, Anpei, et al. European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022.
[9] Sitzmann, Vincent, et al. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.