量子コンピュータの基本 - 線形代数の公式~スカラー変数のベクトル微分、行列微分

量子コンピュータの基本 - 線形代数の公式~スカラー変数のベクトル微分、行列微分

§ この記事の目的

量子コンピュータのみに限らず、機械学習や深層学習の理論でもよく出てくる線形代数の計算や式展開のうち、ベクトルや行列を用いた微分の計算の公式とその導出方法について確認します。

§ 微分公式のまとめ

まずは公式の一覧を示します。

1. ベクトル微分の公式

ここでは、$\mathbf{x}$と$\mathbf{a}$を列ベクトル、$A$を行列とします。

$$ \frac{\partial \mathbf{x}^T\mathbf{a}}{\partial\mathbf{x}}= \frac{\partial \mathbf{a}^T\mathbf{x}}{\partial\mathbf{x}}=\mathbf{a}\quad\quad(式1)\ \frac{\partial \mathbf{x}^TA\mathbf{x}}{\partial \mathbf{x}}=(A + A^T)\mathbf{x}\quad\quad(式2)\ \frac{\partial tr(\mathbf{xa}^T)}{\partial \mathbf{x}} = \frac{\partial tr(\mathbf{ax}^T)}{\partial \mathbf{x}} = \mathbf{a}\quad\quad(式3)\ \frac{\partial}{\partial \mathbf{x}}(\mathbf{a}-A\mathbf{x})^T(\mathbf{a}-A\mathbf{x})=-2A^T(\mathbf{a}-A\mathbf{x})\quad\quad(式4) $$

2. 行列微分の公式

ここでは、$\mathbf{x}$と$\mathbf{y}$を列ベクトル、$X$と$A$を行列とします。
また、$|A|$は行列$A$の行列式を意味します。

$$ \frac{\partial \mathbf{x}^TX\mathbf{y}}{\partial X}=\mathbf{x}\mathbf{y}^T\quad\quad(式5)\ \frac{\partial \mathbf{x}^TX^{-1}\mathbf{y}}{\partial X}=-X^{-1}\mathbf{x}\mathbf{y}^TX^{-1}\quad\quad(式6)\ \frac{\partial \log |X|}{\partial X}=(X^{-1})^T\quad\quad(式7)\ \frac{\partial tr(X)}{\partial X}=I\quad\quad(式8)\ \frac{\partial tr(XA)}{\partial X}=A^T\quad\quad(式9)\ \frac{\partial tr(X^TA)}{\partial X}=A\quad\quad(式10)\ \frac{\partial tr(XAA^T)}{\partial X}=X(A + A^T)\quad\quad(式10)\ \frac{\partial}{\partial x}\log|X|=tr\bigg(X^{-1}\frac{\partial X}{\partial x}\bigg)\quad\quad(式11) $$

スカラー$y$が行列$X$の関数$y=f(X)$で表される場合、$y$の関数$g(y)$の行列$X$での微分は、

$$ \frac{\partial g(y)}{\partial X} = \frac{\partial g(y)}{\partial y}\frac{\partial f(X)}{\partial X}\quad\quad(式12) $$

§ 公式の導出

(式1)~(式4)までの導出方法を確認します。
厳密な証明でなく、あくまで一例ですのでご了承下さい。
(式5)以降は紙面の都合上、導出を割愛します。

1. (式1)の導出

例として3次元を取り使いますが、どの次元でも結果は同じとなります。
列ベクトル$\mathbf{x},\mathbf{a}$をそれぞれ以下とします。

$$ \mathbf{x}=\begin{pmatrix}x_1 \ x_2 \ x_3\end{pmatrix}, \mathbf{a}=\begin{pmatrix}a_1 \ a_2 \ a_3\end{pmatrix} $$

また、ベクトル微分$\frac{\partial }{\partial \mathbf{x}}$は以下のように作用するものとします。

$$ \frac{\partial}{\partial \mathbf{x}} = \begin{pmatrix}\frac{\partial }{\partial x_1} \ \frac{\partial }{\partial x_2} \ \frac{\partial }{\partial x_3}\end{pmatrix} $$

以下、導出例です。

$$ \begin{align} \frac{\partial \mathbf{x}^T\mathbf{a}}{\partial \mathbf{x}} &= \frac{\partial}{\partial \mathbf{x}} \left{(x_1;x_2;x_3)\begin{pmatrix}a_1 \ a_2 \ a_3\end{pmatrix} \right}\ &= \frac{\partial}{\partial \mathbf{x}}(a_1x_1+a_2x_2+0a_3x_3)\ &=\begin{pmatrix}\frac{\partial }{\partial x_1} \ \frac{\partial }{\partial x_2} \ \frac{\partial }{\partial x_3}\end{pmatrix}(a_1x_1+a_2x_2+a_3x_3)\ &= \begin{pmatrix} \frac{\partial}{\partial x_1}(a_1x_1+a_2x_2+a_3x_3) \ \frac{\partial}{\partial x_2}(a_1x_1+a_2x_2+a_3x_3) \ \frac{\partial}{\partial x_3}(a_1x_1+a_2x_2+a_3x_3) \end{pmatrix}\ &=\begin{pmatrix}a_1 \ a_2 \ a_3\end{pmatrix} \ &=\mathbf{a} \end{align} $$

また、

$$ \begin{align} \frac{\partial \mathbf{a}^T\mathbf{x}}{\partial \mathbf{x}} &= \frac{\partial}{\partial \mathbf{x}} \left{(a_1;a_2;a_3)\begin{pmatrix}x_1 \ x_2 \ x_3\end{pmatrix} \right}\ &= \frac{\partial}{\partial \mathbf{x}}(a_1x_1+a_2x_2+0a_3x_3)\ &=\begin{pmatrix}\frac{\partial }{\partial x_1} \ \frac{\partial }{\partial x_2} \ \frac{\partial }{\partial x_3}\end{pmatrix}(a_1x_1+a_2x_2+a_3x_3)\ &=\begin{pmatrix}a_1 \ a_2 \ a_3\end{pmatrix} \ &=\mathbf{a} \end{align} $$

2. (式2)の導出

行列$A$を以下とします。

$$ A=\begin{pmatrix}A_{11} & A_{12} & A_{13}\A_{21} & A_{22} & A_{23}\A_{31} & A_{32} & A_{33}\end{pmatrix} $$

以下、導出例です。

$$ \begin{align} \frac{\partial \mathbf{x}^TA\mathbf{x}}{\partial \mathbf{x}} &=\frac{\partial}{\partial \mathbf{x}} \left{(x_1;x_2;x_3)\begin{pmatrix}A_{11} & A_{12} & A_{13}\A_{21} & A_{22} & A_{23}\A_{31} & A_{32} & A_{33}\end{pmatrix}\begin{pmatrix}x_1 \ x_2 \ x_3\end{pmatrix}\right}\ &=\frac{\partial}{\partial \mathbf{x}} \left{(A_{11}x_1+A_{21}x_2+A_{31}x_3\quad A_{12}x_1+A_{22}x_2+A_{32}x_3\quad A_{13}x_1+A_{23}x_2+A_{33}x_3)\begin{pmatrix}x_1 \ x_2 \ x_3\end{pmatrix}\right}\ &=\frac{\partial}{\partial \mathbf{x}}(A_{11}x_1x_1+A_{21}x_1x_2+A_{31}x_1x_3+A_{12}x_1x_2+A_{22}x_2x_2+A_{32}x_2x_3+A_{13}x_1x_3+A_{23}x_2x_3+A_{33}x_3x_3)\ &=\begin{pmatrix}2A_{11}x_1+A_{21}x_2+A_{31}x_3+A_{12}x_2+A_{13}x_3 \ A_{21}x_1+A_{12}x_1+2A_{22}x_2+A_{32}x_3+A_{23}x_3 \ A_{31}x_1+A_{32}x_2+A_{13}x_1+A_{23}x_2+2A_{33}x_3 \end{pmatrix}\ &=\begin{pmatrix}A_{11}x_1+A_{21}x_2+A_{31}x_3 \ A_{12}x_1+A_{22}x_2+A_{32}x_3 \ A_{13}x_1+A_{23}x_2+A_{33}x_3\end{pmatrix}+ \begin{pmatrix}A_{11}x_1+A_{12}x_2+A_{13}x_3 \ A_{21}x_1+A_{22}x_2+A_{23}x_3 \ A_{31}x_1+A_{32}x_2+A_{33}x_3\end{pmatrix}\ &=\begin{pmatrix}A_{11} & A_{21} & A_{31} \ A_{12} & A_{22} & A_{32} \ A_{13} & A_{23} & A_{33}\end{pmatrix}\begin{pmatrix}x_1 \ x_2 \ x_3\end{pmatrix}+ \begin{pmatrix}A_{11} & A_{12} & A_{13} \ A_{21} & A_{22} & A_{23} \ A_{31} & A_{32} & A_{33}\end{pmatrix}\begin{pmatrix}x_1 \ x_2 \ x_3\end{pmatrix}\ &=(A^T + A)\mathbf{x}\ &=(A + A^T)\mathbf{x} \end{align} $$

3. (式3)の導出

$$ \begin{align} \frac{\partial tr(\mathbf{x}\mathbf{a}^T)}{\partial \mathbf{x}} \end{align} $$

ここで、

$$ \begin{align} \mathbf{x}\mathbf{a}^T&=\begin{pmatrix}x_1 \ x_2 \ x_3\end{pmatrix}(a_1;a_2;a_3)\ &=\begin{pmatrix} a_1x_1 & a_2x_1 & a_3x_1 \ a_1x_2 & a_2x_2 & a_3x_2 \ a_1x_3 & a_2x_3 & a_3x_3 \end{pmatrix} \end{align} $$

トレースを取ると、

$$ tr(\mathbf{x}\mathbf{a}^T)=a_1x_1 + a_2x_2 + a_3x_3 $$

よって、

$$ \begin{align} \frac{\partial tr(\mathbf{x}\mathbf{a}^T)}{\partial \mathbf{x}}&= \frac{\partial}{\partial \mathbf{x}}(a_1x_1 + a_2x_2 + a_3x_3)\ &=\begin{pmatrix} \frac{\partial}{\partial x_1}(a_1x_1 + a_2x_2 + a_3x_3)\ \frac{\partial}{\partial x_2}(a_1x_1 + a_2x_2 + a_3x_3)\ \frac{\partial}{\partial x_3}(a_1x_1 + a_2x_2 + a_3x_3) \end{pmatrix}\ &=\begin{pmatrix}a_1 \ a_2 \ a_3\end{pmatrix}\ &=\mathbf{a} \end{align} $$

また、

$$ \begin{align} \frac{\partial tr(\mathbf{a}\mathbf{x}^T)}{\partial \mathbf{x}} \end{align} $$

ここで、

$$ \begin{align} \mathbf{a}\mathbf{x}^T&=\begin{pmatrix}a_1 \ a_2 \ a_3\end{pmatrix}(x_1;x_2;x_3)\ &=\begin{pmatrix} a_1x_1 & a_1x_2 & a_1x_3 \ a_2x_1 & a_2x_2 & a_2x_3 \ a_3x_1 & a_3x_2 & a_3x_3 \end{pmatrix} \end{align} $$

トレースを取ると、

$$ tr(\mathbf{a}\mathbf{x}^T)=a_1x_1 + a_2x_2 + a_3x_3 $$

これ以降は同じ導出のため省略します。

4. (式4)の導出

$$ \begin{align} \frac{\partial}{\partial \mathbf{x}}(\mathbf{a}-A\mathbf{x})^T(\mathbf{a}-A\mathbf{x})&=\frac{\partial}{\partial \mathbf{x}}(\mathbf{a}^T-\mathbf{x}^TA^T)(\mathbf{a}-A\mathbf{x})\ &=\frac{\partial}{\partial \mathbf{x}}(\mathbf{a}^T\mathbf{a}-\mathbf{a}^TA\mathbf{x}-\mathbf{x}^TA^T\mathbf{a}+\mathbf{x}^TA^TA\mathbf{x}) \end{align} $$

ここで第一項$\mathbf{a}^T\mathbf{a}$は$x$に関わらないことから微分すると消える項のため除外します。

$$ \begin{align} 上式&=\frac{\partial}{\partial \mathbf{x}}(-\mathbf{a}^TA\mathbf{x}-\mathbf{x}^TA^T\mathbf{a}+\mathbf{x}^TA^TA\mathbf{x})\ &=\frac{\partial}{\partial \mathbf{x}}\left{-(a_1;a_2;a_3)\begin{pmatrix}A_{11} & A_{12} & A_{13}\A_{21} & A_{22} & A_{23}\A_{31} & A_{32} & A_{33}\end{pmatrix}\begin{pmatrix}x_1\x_2\x_3\end{pmatrix} -(x_1;x_2;x_3)\begin{pmatrix}A_{11} & A_{21} & A_{31}\A_{12} & A_{22} & A_{32}\A_{13} & A_{23} & A_{33}\end{pmatrix}\begin{pmatrix}a_1\a_2\a_3\end{pmatrix} +(x_1;x_2;x_3)\begin{pmatrix}A_{11} & A_{21} & A_{31}\A_{12} & A_{22} & A_{32}\A_{13} & A_{23} & A_{33}\end{pmatrix}\begin{pmatrix}A_{11} & A_{12} & A_{13}\A_{21} & A_{22} & A_{23}\A_{31} & A_{32} & A_{33}\end{pmatrix}\begin{pmatrix}x 1\x_2\x_3\end{pmatrix}\right}\ &=\frac{\partial}{\partial \mathbf{x}}\left{ -(a_1A{11}+a_2A_{21}+a_3A_{31}\quad a_1A_{12}+a_2A_{22}+a_3A_{32}\quad a_1A_{13}+a_2A_{23}+a_3A_{33})\begin{pmatrix}x_1 \ x_2 \ x_3\end{pmatrix}-(a_1A_{11}+a_2A_{12}+a_3A_{13}\quad a_1A_{21}+a_2A_{22}+a_3A_{23}\quad a_1A_{31}+a_2A_{32}+a_3A_{33})\begin{pmatrix}a_1 \ a_2 \ a_3\end{pmatrix} +(x_1;x_2;x_3)\begin{pmatrix}B_{11} & B_{12} & B_{13} \B_{21} & B_{22} & B_{23} \B_{31} & B_{32} & B_{33}\end{pmatrix}\begin{pmatrix}x_1 \ x_2 \ x_3\end{pmatrix}\right}\end{align} $$

ここで、行列$B$を以下のように仮置きしました。

$$ \begin{align} B&=\begin{pmatrix}B_{11} & B_{12} & B_{13} \ B_{21} & B_{22} & B_{23} \ B_{31} & B_{32} & B_{33}\end{pmatrix}\ &=\begin{pmatrix}A_{11}A_{11}+A_{21}A_{21}+A_{31}A_{31} & A_{11}A_{12}+A_{21}A_{22}+A_{31}A_{32} & A_{11}A_{13}+A_{21}A_{23}+A_{31}A_{33}\ A_{12}A_{11}+A_{22}A_{21}+A_{32}A_{31} & A_{12}A_{12}+A_{22}A_{22}+A_{32}A_{32} & A_{12}A_{13}+A_{22}A_{23}+A_{32}A_{33}\ A_{13}A_{11}+A_{23}A_{21}+A_{33}A_{31} & A_{13}A_{12}+A_{23}A_{22}+A_{33}A_{32} & A_{13}A_{13}+A_{23}A_{23}+A_{33}A_{33}\end{pmatrix} \end{align} $$

これより、

$$ \begin{align} 上式&=\frac{\partial}{\partial \mathbf{x}}\left{ -\bigg(x_1(a_1A_{11}+a_2A_{21}+a_3A_{31})+x_2(a_1A_{12}+a_2A_{22}+a_3A_{32})+x_3(a_1A_{13}+a_2A_{23}+a_3A_{33})\bigg) -\bigg(a_1(x_1A_{11}+x_2A_{12}+x_3A_{13})+a_2(x_1A_{21}+x_2A_{22}+x_3A_{23})+a_3(x_1A_{31}+x_2A_{32}+x_3A_{33})\bigg) +(x_1B_{11}+x_2B_{21}+x_3B_{31}\quad x_1B_{12}+x_2B_{22}+x_3B_{32}\quad x_1B_{13}+x_2B_{23}+x_3B_{33})\begin{pmatrix}x_1 \ x_2 \ x_3\end{pmatrix}\right}\ &=\frac{\partial}{\partial \mathbf{x}}\left{ -x_1(a_1A_{11}+a_2A_{21}+a_3A_{31})-x_2(a_1A_{12}+a_2A_{22}+a_3A_{32})-x_3(a_1A_{13}+a_2A_{23}+a_3A_{33})-x_1(a_1A_{11}+a_2A_{21}+a_3A_{31})-x_2(a_1A_{12}+a_2A_{21}+a_3A_{32})-x_3(a_1A_{13}+a_2A_{23}+a_3A_{33}) +x_1^2B_{11}+x_1x_2B_{21}+x_1x_3B_{31}+x_1x_2B_{12}+x_2^2B_{22}+x_2x_3B_{32}+x_1x_3B_{13}+x_2x_3B_{23}+x_3^2B_{33}\right} \end{align} $$

ベクトル微分$\frac{\partial}{\partial \mathbf{x}}$を作用させると、

$$ \begin{align} 上式&= \begin{pmatrix} -(a_1A_{11} + a_2A_{21}+a_3A_{31})-(a_1A_{11} + a_2A_{21}+a_3A_{31})+2x_1B_{11}+x_2B_{21}+x_3B_{31}+x_2B_{12}+x_3B_{13}\ -(a_1A_{12} + a_2A_{22}+a_3A_{32})-(a_1A_{12} + a_2A_{22}+a_3A_{32})+x_1B_{21}+x_1B_{12}+2x_2B_{22}+x_3B_{32}+x_3B_{33}\ -(a_1A_{13} + a_2A_{23}+a_3A_{33})-(a_1A_{13} + a_2A_{23}+a_3A_{33})+x_1B_{31}+x_2B_{32}+x_1B_{13}+x_2B_{23}+2x_3B_{33} \end{pmatrix} \end{align} $$

$B$を$A$で戻すと、

$$ \begin{align} 上式&=\begin{pmatrix} -2(a_1A_{11}+a_2A_{21}+a_3A_{31}) +2x_1(A_{11}A_{11}+A_{21}A_{21}+A_{31}A_{31}) +x_2(A_{12}A_{11}+A_{22}A_{21}+A_{32}A_{31}) +x_3(A_{13}A_{11}+A_{23}A_{21}+A_{33}A_{31}) +x_2(A_{11}A_{12}+A_{21}A_{22}+A_{31}A_{32}) +x_3(A_{11}A_{13}+A_{21}A_{23}+A_{31}A_{33})\ \ -2(a_1A_{12}+a_2A_{22}+a_3A_{32}) +x_1(A_{12}A_{11}+A_{22}A_{21}+A_{32}A_{31}) +x_1(A_{11}A_{12}+A_{21}A_{22}+A_{31}A_{32}) +2x_2(A_{12}A_{12}+A_{22}A_{22}+A_{32}A_{32}) +x_3(A_{13}A_{12}+A_{23}A_{22}+A_{33}A_{32}) +x_3(A_{12}A_{13}+A_{22}A_{23}+A_{32}A_{33})\ \ -2(a_1A_{13}+a_2A_{23}+a_3A_{33}) +x_1(A_{13}A_{11}+A_{23}A_{21}+A_{33}A_{31}) +x_2(A_{13}A_{12}+A_{23}A_{22}+A_{33}A_{32}) +x_1(A_{11}A_{13}+A_{21}A_{23}+A_{31}A_{33}) +x_2(A_{12}A_{13}+A_{22}A_{23}+A_{32}A_{33}) +2x_3(A_{13}A_{13}+A_{23}A_{23}+A_{33}A_{33}) \end{pmatrix}\ &=\begin{pmatrix} -2(a_1A_{11}+a_2A_{21}+a_3A_{31}) +2x_1(A_{11}A_{11}+A_{21}A_{21}+A_{31}A_{31}) +2x_2(A_{11}A_{12}+A_{21}A_{22}+A_{31}A_{32}) +2x_3(A_{11}A_{13}+A_{21}A_{23}+A_{31}A_{33})\\ -2(a_1A_{12}+a_2A_{22}+a_3A_{32}) +2x_1(A_{11}A_{12}+A_{21}A_{22}+A_{31}A_{32}) +2x_2(A_{12}A_{12}+A_{22}A_{22}+A_{32}A_{32}) +2x_3(A_{12}A_{13}+A_{22}A_{23}+A_{32}A_{33})\\ -2(a_1A_{13}+a_2A_{23}+a_3A_{33}) +2x_1(A_{11}A_{13}+A_{21}A_{23}+A_{31}A_{33}) +2x_2(A_{12}A_{13}+A_{22}A_{23}+A_{32}A_{33}) +2x_3(A_{13}A_{13}+A_{23}A_{23}+A_{33}A_{33}) \end{pmatrix}\ &=-2\begin{pmatrix} a_1A_{11}+a_2A_{21}+a_3A_{31}\ a_1A_{12}+a_2A_{22}+a_3A_{32}\ a_1A_{13}+a_2A_{23}+a_3A_{33} \end{pmatrix} +2\begin{pmatrix} A_{11}A_{11}+A_{21}A_{21}+A_{31}A_{31} & A_{11}A_{12}+A_{21}A_{22}+A_{31}A_{32} & A_{11}A_{13}+A_{21}A_{23}+A_{31}A_{33}\ A_{11}A_{12}+A_{21}A_{22}+A_{31}A_{32} & A_{12}A_{12}+A_{22}A_{22}+A_{32}A_{32} & A_{12}A_{13}+A_{22}A_{23}+A_{32}A_{33}\ A_{11}A_{13}+A_{21}A_{23}+A_{31}A_{33} & A_{12}A_{13}+A_{22}A_{23}+A_{32}A_{33} & A_{13}A_{13}+A_{23}A_{23}+A_{33}A_{33} \end{pmatrix} \begin{pmatrix} x_1 \ x_2 \ x_3 \end{pmatrix}\ &=-2\left{ \begin{pmatrix} A_{11} & A_{21} & A_{31} \ A_{12} & A_{22} & A_{32} \ A_{13} & A_{23} & A_{33} \end{pmatrix} \begin{pmatrix} a_1 \ a_2 \ a_3 \end{pmatrix} -\begin{pmatrix} A_{11} & A_{21} & A_{31} \ A_{12} & A_{22} & A_{32} \ A_{13} & A_{23} & A_{33} \end{pmatrix} \begin{pmatrix} A_{11} & A_{12} & A_{13} \ A_{21} & A_{22} & A_{23} \ A_{31} & A_{32} & A_{33} \end{pmatrix} \begin{pmatrix} x_1 \ x_2 \ x_3 \end{pmatrix} \right}\ &=-2(A^T\mathbf{a}-A^TA\mathbf{x})\ &=-2A^T(\mathbf{a}-A\mathbf{x}) \end{align} $$

5. (式5)以降の導出について

(式5)以降は省略しますが、以下のように行列微分を用いれば導出ができます。

$$ \frac{\partial}{\partial X}=\begin{pmatrix} \frac{\partial}{\partial X_{11}} & \frac{\partial}{\partial X_{12}} & \frac{\partial}{\partial X_{13}}\ \frac{\partial}{\partial X_{21}} & \frac{\partial}{\partial X_{22}} & \frac{\partial}{\partial X_{23}}\ \frac{\partial}{\partial X_{31}} & \frac{\partial}{\partial X_{32}} & \frac{\partial}{\partial X_{33}} \end{pmatrix} $$

Tetsuro Tabata
Comments
Tetsuro Tabata
Related posts

blueqat Inc.

Shibuya Scramble Square 39F 2-24-12, Shibuya, Shibuya-ku, Tokyo
Contact: info@blueqat.com