量子コンピュータの基本 - 線形代数の公式~スカラー変数のベクトル微分、行列微分
§ この記事の目的
量子コンピュータのみに限らず、機械学習や深層学習の理論でもよく出てくる線形代数の計算や式展開のうち、ベクトルや行列を用いた微分の計算の公式とその導出方法について確認します。
§ 微分公式のまとめ
まずは公式の一覧を示します。
1. ベクトル微分の公式
ここでは、\mathbf{x}と\mathbf{a}を列ベクトル、Aを行列とします。
\frac{\partial \mathbf{x}^T\mathbf{a}}{\partial\mathbf{x}}=
\frac{\partial \mathbf{a}^T\mathbf{x}}{\partial\mathbf{x}}=\mathbf{a}\quad\quad(式1)\\
\frac{\partial \mathbf{x}^TA\mathbf{x}}{\partial \mathbf{x}}=(A + A^T)\mathbf{x}\quad\quad(式2)\\
\frac{\partial tr(\mathbf{xa}^T)}{\partial \mathbf{x}} =
\frac{\partial tr(\mathbf{ax}^T)}{\partial \mathbf{x}} = \mathbf{a}\quad\quad(式3)\\
\frac{\partial}{\partial \mathbf{x}}(\mathbf{a}-A\mathbf{x})^T(\mathbf{a}-A\mathbf{x})=-2A^T(\mathbf{a}-A\mathbf{x})\quad\quad(式4)
2. 行列微分の公式
ここでは、\mathbf{x}と\mathbf{y}を列ベクトル、XとAを行列とします。
また、|A|は行列Aの行列式を意味します。
\frac{\partial \mathbf{x}^TX\mathbf{y}}{\partial X}=\mathbf{x}\mathbf{y}^T\quad\quad(式5)\\
\frac{\partial \mathbf{x}^TX^{-1}\mathbf{y}}{\partial X}=-X^{-1}\mathbf{x}\mathbf{y}^TX^{-1}\quad\quad(式6)\\
\frac{\partial \log |X|}{\partial X}=(X^{-1})^T\quad\quad(式7)\\
\frac{\partial tr(X)}{\partial X}=I\quad\quad(式8)\\
\frac{\partial tr(XA)}{\partial X}=A^T\quad\quad(式9)\\
\frac{\partial tr(X^TA)}{\partial X}=A\quad\quad(式10)\\
\frac{\partial tr(XAA^T)}{\partial X}=X(A + A^T)\quad\quad(式10)\\
\frac{\partial}{\partial x}\log|X|=tr\bigg(X^{-1}\frac{\partial X}{\partial x}\bigg)\quad\quad(式11)
スカラーyが行列Xの関数y=f(X)で表される場合、yの関数g(y)の行列Xでの微分は、
\frac{\partial g(y)}{\partial X} = \frac{\partial g(y)}{\partial y}\frac{\partial f(X)}{\partial X}\quad\quad(式12)
§ 公式の導出
(式1)~(式4)までの導出方法を確認します。
厳密な証明でなく、あくまで一例ですのでご了承下さい。
(式5)以降は紙面の都合上、導出を割愛します。
1. (式1)の導出
例として3次元を取り使いますが、どの次元でも結果は同じとなります。
列ベクトル\mathbf{x},\mathbf{a}をそれぞれ以下とします。
\mathbf{x}=\begin{pmatrix}x_1 \\ x_2 \\ x_3\end{pmatrix},
\mathbf{a}=\begin{pmatrix}a_1 \\ a_2 \\ a_3\end{pmatrix}
また、ベクトル微分\frac{\partial }{\partial \mathbf{x}}は以下のように作用するものとします。
\frac{\partial}{\partial \mathbf{x}} = \begin{pmatrix}\frac{\partial }{\partial x_1} \\ \frac{\partial }{\partial x_2} \\ \frac{\partial }{\partial x_3}\end{pmatrix}
以下、導出例です。
\begin{align}
\frac{\partial \mathbf{x}^T\mathbf{a}}{\partial \mathbf{x}} &= \frac{\partial}{\partial \mathbf{x}} \left\{(x_1\;x_2\;x_3)\begin{pmatrix}a_1 \\ a_2 \\ a_3\end{pmatrix} \right\}\\
&= \frac{\partial}{\partial \mathbf{x}}(a_1x_1+a_2x_2+0a_3x_3)\\
&=\begin{pmatrix}\frac{\partial }{\partial x_1} \\ \frac{\partial }{\partial x_2} \\ \frac{\partial }{\partial x_3}\end{pmatrix}(a_1x_1+a_2x_2+a_3x_3)\\
&= \begin{pmatrix} \frac{\partial}{\partial x_1}(a_1x_1+a_2x_2+a_3x_3) \\ \frac{\partial}{\partial x_2}(a_1x_1+a_2x_2+a_3x_3) \\ \frac{\partial}{\partial x_3}(a_1x_1+a_2x_2+a_3x_3) \end{pmatrix}\\
&=\begin{pmatrix}a_1 \\ a_2 \\ a_3\end{pmatrix} \\
&=\mathbf{a}
\end{align}
また、
\begin{align}
\frac{\partial \mathbf{a}^T\mathbf{x}}{\partial \mathbf{x}} &= \frac{\partial}{\partial \mathbf{x}} \left\{(a_1\;a_2\;a_3)\begin{pmatrix}x_1 \\ x_2 \\ x_3\end{pmatrix} \right\}\\
&= \frac{\partial}{\partial \mathbf{x}}(a_1x_1+a_2x_2+0a_3x_3)\\
&=\begin{pmatrix}\frac{\partial }{\partial x_1} \\ \frac{\partial }{\partial x_2} \\ \frac{\partial }{\partial x_3}\end{pmatrix}(a_1x_1+a_2x_2+a_3x_3)\\
&=\begin{pmatrix}a_1 \\ a_2 \\ a_3\end{pmatrix} \\
&=\mathbf{a}
\end{align}
2. (式2)の導出
行列Aを以下とします。
A=\begin{pmatrix}A_{11} & A_{12} & A_{13}\\A_{21} & A_{22} & A_{23}\\A_{31} & A_{32} & A_{33}\end{pmatrix}
以下、導出例です。
\begin{align}
\frac{\partial \mathbf{x}^TA\mathbf{x}}{\partial \mathbf{x}}
&=\frac{\partial}{\partial \mathbf{x}} \left\{(x_1\;x_2\;x_3)\begin{pmatrix}A_{11} & A_{12} & A_{13}\\A_{21} & A_{22} & A_{23}\\A_{31} & A_{32} & A_{33}\end{pmatrix}\begin{pmatrix}x_1 \\ x_2 \\ x_3\end{pmatrix}\right\}\\
&=\frac{\partial}{\partial \mathbf{x}} \left\{(A_{11}x_1+A_{21}x_2+A_{31}x_3\quad A_{12}x_1+A_{22}x_2+A_{32}x_3\quad A_{13}x_1+A_{23}x_2+A_{33}x_3)\begin{pmatrix}x_1 \\ x_2 \\ x_3\end{pmatrix}\right\}\\
&=\frac{\partial}{\partial \mathbf{x}}(A_{11}x_1x_1+A_{21}x_1x_2+A_{31}x_1x_3+A_{12}x_1x_2+A_{22}x_2x_2+A_{32}x_2x_3+A_{13}x_1x_3+A_{23}x_2x_3+A_{33}x_3x_3)\\
&=\begin{pmatrix}2A_{11}x_1+A_{21}x_2+A_{31}x_3+A_{12}x_2+A_{13}x_3 \\
A_{21}x_1+A_{12}x_1+2A_{22}x_2+A_{32}x_3+A_{23}x_3 \\
A_{31}x_1+A_{32}x_2+A_{13}x_1+A_{23}x_2+2A_{33}x_3 \end{pmatrix}\\
&=\begin{pmatrix}A_{11}x_1+A_{21}x_2+A_{31}x_3 \\ A_{12}x_1+A_{22}x_2+A_{32}x_3 \\ A_{13}x_1+A_{23}x_2+A_{33}x_3\end{pmatrix}+
\begin{pmatrix}A_{11}x_1+A_{12}x_2+A_{13}x_3 \\ A_{21}x_1+A_{22}x_2+A_{23}x_3 \\ A_{31}x_1+A_{32}x_2+A_{33}x_3\end{pmatrix}\\
&=\begin{pmatrix}A_{11} & A_{21} & A_{31} \\ A_{12} & A_{22} & A_{32} \\ A_{13} & A_{23} & A_{33}\end{pmatrix}\begin{pmatrix}x_1 \\ x_2 \\ x_3\end{pmatrix}+
\begin{pmatrix}A_{11} & A_{12} & A_{13} \\ A_{21} & A_{22} & A_{23} \\ A_{31} & A_{32} & A_{33}\end{pmatrix}\begin{pmatrix}x_1 \\ x_2 \\ x_3\end{pmatrix}\\
&=(A^T + A)\mathbf{x}\\
&=(A + A^T)\mathbf{x}
\end{align}
3. (式3)の導出
\begin{align}
\frac{\partial tr(\mathbf{x}\mathbf{a}^T)}{\partial \mathbf{x}}
\end{align}
ここで、
\begin{align}
\mathbf{x}\mathbf{a}^T&=\begin{pmatrix}x_1 \\ x_2 \\ x_3\end{pmatrix}(a_1\;a_2\;a_3)\\
&=\begin{pmatrix}
a_1x_1 & a_2x_1 & a_3x_1 \\
a_1x_2 & a_2x_2 & a_3x_2 \\
a_1x_3 & a_2x_3 & a_3x_3
\end{pmatrix}
\end{align}
トレースを取ると、
tr(\mathbf{x}\mathbf{a}^T)=a_1x_1 + a_2x_2 + a_3x_3
よって、
\begin{align}
\frac{\partial tr(\mathbf{x}\mathbf{a}^T)}{\partial \mathbf{x}}&=
\frac{\partial}{\partial \mathbf{x}}(a_1x_1 + a_2x_2 + a_3x_3)\\
&=\begin{pmatrix}
\frac{\partial}{\partial x_1}(a_1x_1 + a_2x_2 + a_3x_3)\\
\frac{\partial}{\partial x_2}(a_1x_1 + a_2x_2 + a_3x_3)\\
\frac{\partial}{\partial x_3}(a_1x_1 + a_2x_2 + a_3x_3)
\end{pmatrix}\\
&=\begin{pmatrix}a_1 \\ a_2 \\ a_3\end{pmatrix}\\
&=\mathbf{a}
\end{align}
また、
\begin{align}
\frac{\partial tr(\mathbf{a}\mathbf{x}^T)}{\partial \mathbf{x}}
\end{align}
ここで、
\begin{align}
\mathbf{a}\mathbf{x}^T&=\begin{pmatrix}a_1 \\ a_2 \\ a_3\end{pmatrix}(x_1\;x_2\;x_3)\\
&=\begin{pmatrix}
a_1x_1 & a_1x_2 & a_1x_3 \\
a_2x_1 & a_2x_2 & a_2x_3 \\
a_3x_1 & a_3x_2 & a_3x_3
\end{pmatrix}
\end{align}
トレースを取ると、
tr(\mathbf{a}\mathbf{x}^T)=a_1x_1 + a_2x_2 + a_3x_3
これ以降は同じ導出のため省略します。
4. (式4)の導出
\begin{align}
\frac{\partial}{\partial \mathbf{x}}(\mathbf{a}-A\mathbf{x})^T(\mathbf{a}-A\mathbf{x})&=\frac{\partial}{\partial \mathbf{x}}(\mathbf{a}^T-\mathbf{x}^TA^T)(\mathbf{a}-A\mathbf{x})\\
&=\frac{\partial}{\partial \mathbf{x}}(\mathbf{a}^T\mathbf{a}-\mathbf{a}^TA\mathbf{x}-\mathbf{x}^TA^T\mathbf{a}+\mathbf{x}^TA^TA\mathbf{x})
\end{align}
ここで第一項\mathbf{a}^T\mathbf{a}はxに関わらないことから微分すると消える項のため除外します。
\begin{align}
上式&=\frac{\partial}{\partial \mathbf{x}}(-\mathbf{a}^TA\mathbf{x}-\mathbf{x}^TA^T\mathbf{a}+\mathbf{x}^TA^TA\mathbf{x})\\
&=\frac{\partial}{\partial \mathbf{x}}\left\{-(a_1\;a_2\;a_3)\begin{pmatrix}A_{11} & A_{12} & A_{13}\\A_{21} & A_{22} & A_{23}\\A_{31} & A_{32} & A_{33}\end{pmatrix}\begin{pmatrix}x_1\\x_2\\x_3\end{pmatrix}
-(x_1\;x_2\;x_3)\begin{pmatrix}A_{11} & A_{21} & A_{31}\\A_{12} & A_{22} & A_{32}\\A_{13} & A_{23} & A_{33}\end{pmatrix}\begin{pmatrix}a_1\\a_2\\a_3\end{pmatrix}
+(x_1\;x_2\;x_3)\begin{pmatrix}A_{11} & A_{21} & A_{31}\\A_{12} & A_{22} & A_{32}\\A_{13} & A_{23} & A_{33}\end{pmatrix}\begin{pmatrix}A_{11} & A_{12} & A_{13}\\A_{21} & A_{22} & A_{23}\\A_{31} & A_{32} & A_{33}\end{pmatrix}\begin{pmatrix}x
_1\\x_2\\x_3\end{pmatrix}\right\}\\
&=\frac{\partial}{\partial \mathbf{x}}\left\{
-(a_1A_{11}+a_2A_{21}+a_3A_{31}\quad a_1A_{12}+a_2A_{22}+a_3A_{32}\quad a_1A_{13}+a_2A_{23}+a_3A_{33})\begin{pmatrix}x_1 \\ x_2 \\ x_3\end{pmatrix}-(a_1A_{11}+a_2A_{12}+a_3A_{13}\quad a_1A_{21}+a_2A_{22}+a_3A_{23}\quad a_1A_{31}+a_2A_{32}+a_3A_{33})\begin{pmatrix}a_1 \\ a_2 \\ a_3\end{pmatrix}
+(x_1\;x_2\;x_3)\begin{pmatrix}B_{11} & B_{12} & B_{13} \\B_{21} & B_{22} & B_{23} \\B_{31} & B_{32} & B_{33}\end{pmatrix}\begin{pmatrix}x_1 \\ x_2 \\ x_3\end{pmatrix}\right\}\end{align}
ここで、行列Bを以下のように仮置きしました。
\begin{align}
B&=\begin{pmatrix}B_{11} & B_{12} & B_{13} \\ B_{21} & B_{22} & B_{23} \\ B_{31} & B_{32} & B_{33}\end{pmatrix}\\
&=\begin{pmatrix}A_{11}A_{11}+A_{21}A_{21}+A_{31}A_{31} & A_{11}A_{12}+A_{21}A_{22}+A_{31}A_{32} & A_{11}A_{13}+A_{21}A_{23}+A_{31}A_{33}\\
A_{12}A_{11}+A_{22}A_{21}+A_{32}A_{31} & A_{12}A_{12}+A_{22}A_{22}+A_{32}A_{32} & A_{12}A_{13}+A_{22}A_{23}+A_{32}A_{33}\\
A_{13}A_{11}+A_{23}A_{21}+A_{33}A_{31} & A_{13}A_{12}+A_{23}A_{22}+A_{33}A_{32} & A_{13}A_{13}+A_{23}A_{23}+A_{33}A_{33}\end{pmatrix}
\end{align}
これより、
\begin{align}
上式&=\frac{\partial}{\partial \mathbf{x}}\left\{
-\bigg(x_1(a_1A_{11}+a_2A_{21}+a_3A_{31})+x_2(a_1A_{12}+a_2A_{22}+a_3A_{32})+x_3(a_1A_{13}+a_2A_{23}+a_3A_{33})\bigg)
-\bigg(a_1(x_1A_{11}+x_2A_{12}+x_3A_{13})+a_2(x_1A_{21}+x_2A_{22}+x_3A_{23})+a_3(x_1A_{31}+x_2A_{32}+x_3A_{33})\bigg)
+(x_1B_{11}+x_2B_{21}+x_3B_{31}\quad x_1B_{12}+x_2B_{22}+x_3B_{32}\quad x_1B_{13}+x_2B_{23}+x_3B_{33})\begin{pmatrix}x_1 \\ x_2 \\ x_3\end{pmatrix}\right\}\\
&=\frac{\partial}{\partial \mathbf{x}}\left\{
-x_1(a_1A_{11}+a_2A_{21}+a_3A_{31})-x_2(a_1A_{12}+a_2A_{22}+a_3A_{32})-x_3(a_1A_{13}+a_2A_{23}+a_3A_{33})-x_1(a_1A_{11}+a_2A_{21}+a_3A_{31})-x_2(a_1A_{12}+a_2A_{21}+a_3A_{32})-x_3(a_1A_{13}+a_2A_{23}+a_3A_{33})
+x_1^2B_{11}+x_1x_2B_{21}+x_1x_3B_{31}+x_1x_2B_{12}+x_2^2B_{22}+x_2x_3B_{32}+x_1x_3B_{13}+x_2x_3B_{23}+x_3^2B_{33}\right\}
\end{align}
ベクトル微分\frac{\partial}{\partial \mathbf{x}}を作用させると、
\begin{align}
上式&=
\begin{pmatrix}
-(a_1A_{11} + a_2A_{21}+a_3A_{31})-(a_1A_{11} + a_2A_{21}+a_3A_{31})+2x_1B_{11}+x_2B_{21}+x_3B_{31}+x_2B_{12}+x_3B_{13}\\
-(a_1A_{12} + a_2A_{22}+a_3A_{32})-(a_1A_{12} + a_2A_{22}+a_3A_{32})+x_1B_{21}+x_1B_{12}+2x_2B_{22}+x_3B_{32}+x_3B_{33}\\
-(a_1A_{13} + a_2A_{23}+a_3A_{33})-(a_1A_{13} + a_2A_{23}+a_3A_{33})+x_1B_{31}+x_2B_{32}+x_1B_{13}+x_2B_{23}+2x_3B_{33}
\end{pmatrix}
\end{align}
BをAで戻すと、
\begin{align}
上式&=\begin{pmatrix}
-2(a_1A_{11}+a_2A_{21}+a_3A_{31})
+2x_1(A_{11}A_{11}+A_{21}A_{21}+A_{31}A_{31})
+x_2(A_{12}A_{11}+A_{22}A_{21}+A_{32}A_{31})
+x_3(A_{13}A_{11}+A_{23}A_{21}+A_{33}A_{31})
+x_2(A_{11}A_{12}+A_{21}A_{22}+A_{31}A_{32})
+x_3(A_{11}A_{13}+A_{21}A_{23}+A_{31}A_{33})\\ \\
-2(a_1A_{12}+a_2A_{22}+a_3A_{32})
+x_1(A_{12}A_{11}+A_{22}A_{21}+A_{32}A_{31})
+x_1(A_{11}A_{12}+A_{21}A_{22}+A_{31}A_{32})
+2x_2(A_{12}A_{12}+A_{22}A_{22}+A_{32}A_{32})
+x_3(A_{13}A_{12}+A_{23}A_{22}+A_{33}A_{32})
+x_3(A_{12}A_{13}+A_{22}A_{23}+A_{32}A_{33})\\ \\
-2(a_1A_{13}+a_2A_{23}+a_3A_{33})
+x_1(A_{13}A_{11}+A_{23}A_{21}+A_{33}A_{31})
+x_2(A_{13}A_{12}+A_{23}A_{22}+A_{33}A_{32})
+x_1(A_{11}A_{13}+A_{21}A_{23}+A_{31}A_{33})
+x_2(A_{12}A_{13}+A_{22}A_{23}+A_{32}A_{33})
+2x_3(A_{13}A_{13}+A_{23}A_{23}+A_{33}A_{33})
\end{pmatrix}\\
&=\begin{pmatrix}
-2(a_1A_{11}+a_2A_{21}+a_3A_{31})
+2x_1(A_{11}A_{11}+A_{21}A_{21}+A_{31}A_{31})
+2x_2(A_{11}A_{12}+A_{21}A_{22}+A_{31}A_{32})
+2x_3(A_{11}A_{13}+A_{21}A_{23}+A_{31}A_{33})\\\\
-2(a_1A_{12}+a_2A_{22}+a_3A_{32})
+2x_1(A_{11}A_{12}+A_{21}A_{22}+A_{31}A_{32})
+2x_2(A_{12}A_{12}+A_{22}A_{22}+A_{32}A_{32})
+2x_3(A_{12}A_{13}+A_{22}A_{23}+A_{32}A_{33})\\\\
-2(a_1A_{13}+a_2A_{23}+a_3A_{33})
+2x_1(A_{11}A_{13}+A_{21}A_{23}+A_{31}A_{33})
+2x_2(A_{12}A_{13}+A_{22}A_{23}+A_{32}A_{33})
+2x_3(A_{13}A_{13}+A_{23}A_{23}+A_{33}A_{33})
\end{pmatrix}\\
&=-2\begin{pmatrix}
a_1A_{11}+a_2A_{21}+a_3A_{31}\\
a_1A_{12}+a_2A_{22}+a_3A_{32}\\
a_1A_{13}+a_2A_{23}+a_3A_{33}
\end{pmatrix}
+2\begin{pmatrix}
A_{11}A_{11}+A_{21}A_{21}+A_{31}A_{31} & A_{11}A_{12}+A_{21}A_{22}+A_{31}A_{32} & A_{11}A_{13}+A_{21}A_{23}+A_{31}A_{33}\\
A_{11}A_{12}+A_{21}A_{22}+A_{31}A_{32} & A_{12}A_{12}+A_{22}A_{22}+A_{32}A_{32} & A_{12}A_{13}+A_{22}A_{23}+A_{32}A_{33}\\
A_{11}A_{13}+A_{21}A_{23}+A_{31}A_{33} & A_{12}A_{13}+A_{22}A_{23}+A_{32}A_{33} & A_{13}A_{13}+A_{23}A_{23}+A_{33}A_{33}
\end{pmatrix}
\begin{pmatrix}
x_1 \\ x_2 \\ x_3
\end{pmatrix}\\
&=-2\left\{
\begin{pmatrix}
A_{11} & A_{21} & A_{31} \\
A_{12} & A_{22} & A_{32} \\
A_{13} & A_{23} & A_{33}
\end{pmatrix}
\begin{pmatrix}
a_1 \\ a_2 \\ a_3
\end{pmatrix}
-\begin{pmatrix}
A_{11} & A_{21} & A_{31} \\
A_{12} & A_{22} & A_{32} \\
A_{13} & A_{23} & A_{33}
\end{pmatrix}
\begin{pmatrix}
A_{11} & A_{12} & A_{13} \\
A_{21} & A_{22} & A_{23} \\
A_{31} & A_{32} & A_{33}
\end{pmatrix}
\begin{pmatrix}
x_1 \\ x_2 \\ x_3
\end{pmatrix}
\right\}\\
&=-2(A^T\mathbf{a}-A^TA\mathbf{x})\\
&=-2A^T(\mathbf{a}-A\mathbf{x})
\end{align}
5. (式5)以降の導出について
(式5)以降は省略しますが、以下のように行列微分を用いれば導出ができます。
\frac{\partial}{\partial X}=\begin{pmatrix}
\frac{\partial}{\partial X_{11}} & \frac{\partial}{\partial X_{12}} & \frac{\partial}{\partial X_{13}}\\
\frac{\partial}{\partial X_{21}} & \frac{\partial}{\partial X_{22}} & \frac{\partial}{\partial X_{23}}\\
\frac{\partial}{\partial X_{31}} & \frac{\partial}{\partial X_{32}} & \frac{\partial}{\partial X_{33}}
\end{pmatrix}