2021/09/06 07:46
1
In this series of blog posts, we shall be looking into different areas of classical and quantum machine learning, and talk about their applications. The aim of this series is to be a quick introduction to both ML and QML for people interested in these areas. Knowledge of some basics (stochastic gradient descent, MLP, RNN etc.) are assumed, but is not required!
Through this part, we'll discuss the classical model in the paper of Amin et. al. , Phys. Rev. X, 2018. The quantum part will be discussed in Part 2 of this series.
A Boltzmann machine can be thought of as a probabilistic recurrent neural network, and has been studied in the context of classical as well as quantum machine learning. Its basic architecture is a graph , where the vertex set is partitioned into three parts corresponding to input, output and hidden nodes respectively. For notational convenience, let's write . Each node can contain the values , and the network parameters are weights .
Say we fix a node , then the probability that contains depends on the values of its neighbors:
To further simplify the training and probabilities, an additional assumption on the weights and edges is made: it is assumed that is a bipartite graph with parts and . In other words, the only parameters in the network now are the weights . All other weights are fixed to . Now, the probability distribution of the different states of the network is given as follows.
Fix a -valued binary string of length . Let denote the Ising energy of the system. We then get a probability distribution:
This distribution is known as the Boltzmann distribution in statistical physics, which is where the name of this ML model comes from! We denote the denominator with for notational convenience.
Note that here, only the vertices in are visible, and contains vertices of the hidden layer. Therefore, we can marginalize the PDF further:
Where denotes concatenation of strings.
We start with random weights , and start training by minimizing the loss function
where is the 'th point in the training set. We shall use stochastic gradient descent to decrease . To that end, computing the gradient leads us to the following final expression:
Here, denotes the partition function computed only over the dataset. This means that the first term in the partial derivative is easy to compute given the dataset. The second term can be thought of as : a mean of over the model. Therefore, it is sampled using Markov Chain Monte Carlo (MCMC).
The gradient is used to perform a gradient descent on the weights . This algorithm is known as Contrastive Divergence, proposed by Hinton , and has been used extensively to train deep neural networks successfully.
In the next post, we shall see the different quantum models for the Boltzmann machine!
© 2024, blueqat Inc. All rights reserved