テンソルの低ランク近似をする前に、類似事例で行列の次元削減を見てみます。Matrix Factorization（以下MF）は行列を二つの行列に分解します。その際に潜在変数を介して、次元を削減することができます。 Before discussing low-rank approximation of tensors, let&#39;s take a look at dimensionality reduction of matrices using a similar example. Matrix Factorization (MF) decomposes a matrix into two matrices using latent variables, which can be used to reduce its dimensionality. 参考記事です。 Matrix Factorizationとは<a href="https://qiita.com/ysekky/items/c81ff24da0390a74fc6c" rel="noopener noreferrer" target="_blank">https://qiita.com/ysekky/items/c81ff24da0390a74fc6c</a> Matrix Factorization：Pythonでのチュートリアルと実装<a href="https://enjoyworks.jp/tech-blog/633" rel="noopener noreferrer" target="_blank">https://enjoyworks.jp/tech-blog/633</a> RをPとQに分解をします。詳しい内容は参考記事を見てもらいたいのですが、0もしくはnullの値の扱いがSVDとは異なるようです。今回は勾配降下法で実装してみます。簡単にいうと、元の行列RとMFしたP.T@Qの値の誤差を小さくするように学習します。 We will decompose R into P and Q matrices using gradient descent. Unlike SVD, the handling of zero or null values is different. Simply put, we will use gradient descent to learn how to minimize the error between the original matrix R and the MF approximation P.T@Q, with the goal of reducing the error over time through training. <pre class="ql-syntax" spellcheck="false">import numpy as np

# 元々の行列
# initial matrix
R = np.array([
[1,1,0,3],
[2,5,0,5],
[3,1,2,2],
[0,1,3,0],
[1,0,3,1]])

#行数と列数を取得
#getting the number of rows and columns
rows, cols = R.shape

#潜在変数
#latent variable
r = 2

#二つの行列をランダムで初期化
#initialize two matrix with random number
P = np.random.rand(r, rows)
Q = np.random.rand(r, cols)
</pre> まずは元となる行列Rを準備します。行と列の数を取得します。それぞれ5と4ですね。そして、潜在変数を決めます。これは適当に決めます。そこま揃ったらPとQをランダムで初期化します。 First, we prepare the original matrix R and obtain its number of rows and columns, which are 5 and 4, respectively. We then choose a number of latent variables, which can be arbitrarily decided. Once we have these, we randomly initialize the matrices P and Q. P<pre class="ql-syntax" spellcheck="false">array([[0.26543184, 0.4548205 , 0.22495288, 0.98146866, 0.94311498],
 [0.67803321, 0.17750233, 0.96624364, 0.04348862, 0.39414471]])
</pre> Q<pre class="ql-syntax" spellcheck="false">array([[0.37440945, 0.49324253, 0.46005219, 0.40387909],
 [0.15421424, 0.21077136, 0.67442266, 0.25346886]])
</pre> 一応P.T@Qの値を再現してみます。We will try to reproduce the values of P.T@Q. <pre class="ql-syntax" spellcheck="false">P.T@Q
</pre> <pre class="ql-syntax" spellcheck="false">array([[0.20394256, 0.27383226, 0.57939347, 0.27906267],
 [0.19766248, 0.26174922, 0.32895276, 0.2286838 ],
 [0.23323301, 0.31461281, 0.75514667, 0.33576643],
 [0.3741777 , 0.49326824, 0.48085651, 0.40741768],
 [0.41389388, 0.54825883, 0.69970223, 0.48080782]])
</pre> まぁ、当然ですが、ランダムで初期化したのでランダムになります。元の行列との誤差は、一旦全部の要素の差の二乗和をとってみます。Since we initialized P and Q randomly, the values will naturally be random as well. To evaluate the error between the original matrix and the approximation, we can compute the sum of the squared differences between each element. <pre class="ql-syntax" spellcheck="false">def loss(R1, P1, Q1):
 return np.sum(np.square(R1-P1.T@Q1))
loss(R, P, Q)
</pre> <pre class="ql-syntax" spellcheck="false">83.03131884627395
</pre> 事前準備をします。前方差分を取って勾配計算しますので、deltaxを決めます。パラメータ更新する際の学習率も適当に決めました。最後に損失関数の変遷を見たいので、それらをステップごとに格納する配列を準備しておきます。 <pre class="ql-syntax" spellcheck="false">#デルタx
h = 0.001

#更新時の学習率
e=0.01

#PとQの最適化に際して、それぞれの損失関数を格納する配列
arr_loss = []
</pre> そして、一気に学習してしまいます。全部で200ステップ学習しました。そして、パラメータの更新はPとQを交互に行いました。同時でもいけるのかもしれませんが。各ステップの最後に更新された損失関数の値を格納してみました。 <pre class="ql-syntax" spellcheck="false">for k in range(200):

 #まずはPでのパラメータ更新
 loss_base = loss(R, P, Q)
 loss_temp_P = np.zeros((r, rows))

 #Pの微分
 for i in range(r):
 for j in range(rows):
 P_temp = P.copy()
 P_temp[i][j] += h
 loss_temp_P[i][j] = loss(R, P_temp, Q)

 #Pのパラメータ更新
 for i in range(r):
 for j in range(rows):
 P[i][j] -= (loss_temp_P[i][j] - loss_base)/h * e

 #次はQでのパラメータ更新
 loss_base = loss(R, P, Q)
 loss_temp_Q = np.zeros((r, cols))

 #Qの微分
 for i in range(r):
 for j in range(cols):
 Q_temp = Q.copy()
 Q_temp[i][j] += h
 loss_temp_Q[i][j] = loss(R, P, Q_temp)

 #Qのパラメータ更新
 for i in range(r):
 for j in range(cols):
 Q[i][j] -= (loss_temp_Q[i][j] - loss_base)/h * e

 arr_loss.append(loss(R, P, Q))
</pre> 計算結果は、 <pre class="ql-syntax" spellcheck="false">import matplotlib.pyplot as plt

plt.plot(arr_loss)
plt.show() 
</pre> <img src="https://assets.blueqat.com/public/uploads/us-east-2:4805ff4b-c3cc-4344-b165-86544c34d0bf/2023/04/22/Untitled4-ipynb-Colaboratory.png"/> この程度だと完全には誤差はおさまりませんが、 <pre class="ql-syntax" spellcheck="false">loss_base
</pre> <pre class="ql-syntax" spellcheck="false">6.381376041184652
</pre> 再現された行列を元のRと比較してみます。 <pre class="ql-syntax" spellcheck="false">P.T@Q
</pre> <pre class="ql-syntax" spellcheck="false">array([[ 1.05543388, 1.86861188, 0.04396524, 2.25051587],
 [ 2.40140225, 4.41706009, -0.16455878, 5.29578759],
 [ 1.72255751, 1.57607526, 2.42840392, 2.11216977],
 [ 0.96005554, 0.07260247, 2.64209568, 0.3237063 ],
 [ 1.20498716, 0.31413576, 2.95951818, 0.64249867]])
</pre> <pre class="ql-syntax" spellcheck="false">R
</pre> <pre class="ql-syntax" spellcheck="false">array([[1, 1, 0, 3],
 [2, 5, 0, 5],
 [3, 1, 2, 2],
 [0, 1, 3, 0],
 [1, 0, 3, 1]])
</pre> だいぶ近い気もします。今回は0の値も学習しようとしています。MFでは0の値を学習しないで予測として使うらしいので、それもちょっとやってみます。上記の式において、誤差の算出で0の要素に当たる要素を排除することで実行できます。 途中の誤差計算を下記のように変更しました。 <pre class="ql-syntax" spellcheck="false">def loss(R1, P1, Q1):
 non_zero_mask = R != 0
 filtered_matrix = (R1-P1.T@Q1) * non_zero_mask
 return np.sum(np.square(filtered_matrix))
loss(R, P, Q)
</pre> <img src="https://assets.blueqat.com/public/uploads/us-east-2:4805ff4b-c3cc-4344-b165-86544c34d0bf/2023/04/22/Untitled4-ipynb-Colaboratory_3.png"/> 当たり前ですが、0のところは評価しないので、誤差は小さくなりました。 <pre class="ql-syntax" spellcheck="false">loss_base
</pre> <pre class="ql-syntax" spellcheck="false">1.8002928412671269
</pre> 最終的に比較してみます。 <pre class="ql-syntax" spellcheck="false">P.T@Q
</pre> <pre class="ql-syntax" spellcheck="false">array([[ 1.42267875, 1.67052381, 3.97776867, 2.20223902],
 [ 1.92529184, 4.7772124 , 10.97915212, 5.15071552],
 [ 2.90018636, 0.69508485, 2.08172075, 2.15172136],
 [ 1.72443551, 1.17455089, 2.93062296, 1.93597462],
 [ 0.77376256, 1.20938216, 2.83237033, 1.45720222]])
</pre> <pre class="ql-syntax" spellcheck="false">R
</pre> <pre class="ql-syntax" spellcheck="false">array([[1, 1, 0, 3],
 [2, 5, 0, 5],
 [3, 1, 2, 2],
 [0, 1, 3, 0],
 [1, 0, 3, 1]])
</pre> 本当は誤差に正則化項を入れたりするらしいのですが、それらの影響はよくわかりませんが、とりあえずそれっぽい行列には戻っています。0のところは学習されてないので、二つの行列から再構成された予測値が入っています。評価の場合には0は評価が低いではなく、評価されなかったという意味らしいので、その値を行列因子分解から再構成することで予測できるようです。 以上です。

Matrix Factorization / 行列因子分解

Yuichiro Minato