Category Archives: Linear Algebra

Square Roots of Operators II

The determinant (and also trace) of a nilpotent matrix is always zero, so a nilpotent matrix cannot be invertible. However, if $N$ is nilpotent of index $m$, then $I+N$ and $I-N$ are invertible and the inverses are given by
\begin{align*} (I+N)^{-1}&=\sum_{k=0}^{m-1}(-N)^k=I-N+N^2-N^3+\cdots+(-N)^{m-1}\\ (I-N)^{-1}&=\sum_{k=0}^{m-1}N^k=I+N+N^2+N^3+\cdots+N^{m-1} \end{align*}
We have shown here that invertible operators have square roots. By the same token, we see that the square roots of $(I+N)^{-1}$ and $(I-N)^{-1}$ exist. But how do we find them? Let us denote them by $(I+N)^{-\frac{1}{2}}$ and $(I-N)^{-\frac{1}{2}}$, respectively. Then by the same manner we proved the existence of $\sqrt{N+1}$ here, we obtain
\begin{align*} (I+N)^{-\frac{1}{2}}&=I-\frac{1}{2}N-\frac{3}{8}N^2-\frac{11}{16}N^3+\cdots\\ (I-N)^{-\frac{1}{2}}&=I+\frac{1}{2}N+\frac{3}{8}N^2+\frac{5}{16}N^3+\cdots \end{align*}

Square Roots of Operators

Mathematics is full of weird stuff. One of them is the square root of an operator. So far, I have seen only two books that discuss the square root of an operator. They are listed in the references below.

Definition. An operator $R$ is called a square root of an operator $T$ is $R^2=T$.

Example. In quantum computing, $\sqrt{\mathrm{NOT}}$ gate is given by
$$\sqrt{\mathrm{NOT}}=\begin{bmatrix}
\frac{1+i}{2} & \frac{1-i}{2}\\
\frac{1-i}{2} & \frac{1+i}{2}
\end{bmatrix}$$
and
$$\sqrt{\mathrm{NOT}}\cdot\sqrt{\mathrm{NOT}}=\mathrm{NOT}=\begin{bmatrix}
0 & 1\\
1 & 0
\end{bmatrix}$$
As required by quantum mechanics, quantum gates are unitary matrices. There is no counterpart of $\sqrt{\mathrm{NOT}}$ gate in classical computing, so $\sqrt{\mathrm{NOT}}$ gate is a truly quantum gate. As far as I know, no one has come up with a physical implementation of $\sqrt{\mathrm{NOT}}$ gate yet.

Example. An operator does not necessarily have a square root. For example, define $T:\mathbb{C}^3\longrightarrow\mathbb{C}^3$ by
$$T(z_1,z_2,z_3)=(z_2,z_3,0)$$
Then one can easily show that $T$ is linear. Suppose that $T$ has a square root $R$. Let $\begin{bmatrix}
r & s & t\\
u & v & w\\
z & y & z
\end{bmatrix}$ be the matrix associated with $R$. Since the matrix associated with $T$ is $\begin{bmatrix}
0 & 1 & 0\\
0 & 0 & 1\\
0 & 0 & 0
\end{bmatrix}$, we have the equation
$$\begin{bmatrix}
r & s & t\\
u & v & w\\
z & y & z
\end{bmatrix}\cdot\begin{bmatrix}
r & s & t\\
u & v & w\\
z & y & z
\end{bmatrix}=\begin{bmatrix}
0 & 1 & 0\\
0 & 0 & 1\\
0 & 0 & 0
\end{bmatrix}$$
This is a system of 9 scalar equations. One can attempt to solve this equation using a CAS (Computer Algebra System), for example, Maple and see that the system does not have a solution, i.e. $R$ does not exist.

So, what kind of operators have square roots?

Theorem. Identity + Nilpotent has a square root.

Proof. Let $N$ be nilpotent. Then $N^m=0$ for some positive integer $m$. Consider the Taylor series for $\sqrt{1+x}$:
$$\sqrt{1+x}=1+a_1x+a_2x^2+a_3x^3+\cdots$$
Using the nilpotency of $N$, we may guess that $\sqrt{I+N}$ takes the form
$$I+a_1N+a_2N^2+\cdots+a_{m-1}N^{m-1}$$
Now,
\begin{align*} (I+a_1N+a_2N^2+\cdots+a_{m-1}N^{m-1})^2=I&+2a_1N+(2a_2+a_1^2)N^2\\&+(2a_3+2a_1a_2)N^3+\cdots\\ &+(2a_{m-1}+\cdots)N^{m-1}=I+N \end{align*}
and by comparing the coefficients
$a_1=\frac{1}{2}$, $2a_2+a_1^2=0$, so $a_2=-\frac{1}{8}$, $2a_3+2a_1a_2=0$, so $a_3=\frac{1}{16}$, and so and forth.

Theorem. Suppose that $T: V\longrightarrow V$ is invertible. Then $T$ has a square root.

Proof. Let $\lambda_1,\lambda_2,\cdots,\lambda_m$ be the distinct eigenvalues of $T$. Then for each $j$, there exists a nilpotent operator $N_j: G(\lambda_j,T)\longrightarrow G(\lambda_j,T)$ such that $T|{G(\lambda_j,T)}=\lambda_jI+N_j$. (See [1] , p 252, Theorem 8.21 for a proof.) Here, $G(\lambda_j,T)$ is the eigenspace of $T$ corresponding to the eigenvalue $\lambda_j$. Since $T$ is invertible, no $\lambda_j$ is equal to 0, so we can write $$T|{G(\lambda_j,T)}=\lambda_j\left(I+\frac{N}{\lambda_j}\right)$$
Since $\frac{N}{\lambda_j}$ is nipotent, $I+\frac{N}{\lambda_j}$ has a square root. Let $R_j$ be $\sqrt{\lambda_j}$ times the square root of $I+\frac{N}{\lambda_j}$. Any vector $v\in V$ can be written uniquely in the form
$$v=u_1+u_2+\cdots+u_m,$$
where each $u_j$ is in $G(\lambda_j,T)$. Using this decomposition, define an operator $R:V\longrightarrow V$ by
$$Rv=R_1u_1+R_2 u_2+\cdots+R_m u_m$$
Then $R^2u_j=R_j^2 u_j=T|{G(\lambda_j,T)}u_j=\lambda_ju_j$ and so \begin{align*} R^2v&=R_1^2u_1+R_2^2u_2+\cdots+R_m^2u_m\\ &= T|{G(\lambda_1,T)}u_1+T|{G(\lambda_2,T)}u_2+\cdots+T|{G(\lambda_m,T)}u_m=Tv
\end{align*}
Therefore, $R$ is a square root of $T$.

Example. The $\mathrm{NOT}$ gate is invertible. It is an inverse of itself.

The proof of the above theorem suggests that an operator $T$ has a square root if there is a spectral decomposition of $T$ with respect to the standard inner product of $\mathbb{C}^n$. A normal operator is such an operator. Recall that an operator $T: V\longrightarrow V$ is normal if $T^\ast T=TT^\ast$. The following theorem guarantees the existence of such a spectral decomposition of a normal operator.

Theorem. A matrix $T$ is normal if and only if there exists a diagonal matrix $\Lambda$ and an unitary matrix $U$ such that $T=U\Lambda U^\ast$.

Here, the diagonal entries of $\Lambda$ are the eigenvalues of $T$ and the columns of $U$ are the eigenvectors of $T$. The matching eigenvalues in $\Lambda$ come in the same order as the eigenvectors are ordered as columns of $U$.

Let $T: V\longrightarrow V$ be a normal operator and write it as its spectral decomposition, using Dirac’s braket notation,
$$T=\sum_j\lambda_j|v_j\rangle\langle v_j|$$
Define
$$f(T)=\sum_j f(\lambda_j)|v_j\rangle\langle v_j|$$
Using this definition, one can define, for example, the exponential, square root, and logarithmic operator of $T$.

Example. Let $T=\begin{bmatrix}
0 & 1\\
1 & 0
\end{bmatrix}$. Find the square root of $T$.

Solution. $T$ is hermitian (symmetric), so it is normal. The spectral decomposition of $T$ is given by
$$T=|v_1\rangle\langle v_1|-|v_2\rangle\langle v_2|,$$
where
$$|v_1\rangle=\frac{1}{\sqrt{2}}\begin{bmatrix}
1\\
1
\end{bmatrix}\ \mathrm{and}\ |v_2\rangle=\frac{1}{\sqrt{2}}\begin{bmatrix}
1\\
-1
\end{bmatrix}$$
Now,
\begin{align*} \sqrt{T}&=|v_1\rangle\langle v_1|+i|v_2\rangle\langle v_2|\\ &=\frac{1}{2}\begin{bmatrix} 1 & 1\\ 1 & 1 \end{bmatrix}+\frac{i}{2}\begin{bmatrix} 1 & -1\\ -1 & 1 \end{bmatrix}\\ &=\begin{bmatrix} \frac{1+i}{2} & \frac{1-i}{2}\\ \frac{1-i}{2} & \frac{1+i}{2} \end{bmatrix} \end{align*}

References:

  1. Sheldon Axler, Linear Algebra Done Right, Third Edition, Springer, 2015
  2. Michael A. Nielsen and Isaac L. Chuang, Quantum Computation and Quantum Information, Cambridge University Press, 2000

The Eigenvectors of a Hermitian Operator Are Mutually Orthogonal

In quantum mechanics, operators are required to be Hermitian. A reason for this is that the eigenvalues of a quantum mechanical operator are interpreted as physically measurable quantities such as positions, momenta, energies, etc. and therefore they are required to be real. As is well-known, Hermitian operators have all real eigenvalues. Hermitian operators also have another nice property. The eigenvectors of a Hermitian operator are mutually orthogonal. Here, we prove this only for the case of matrix operators. Let $A$ be a Hermitian operator and let $|\lambda_i\rangle$ be the eigenvectors of $A$ with distinct eigenvalues $\lambda_i$. Then we have $A|\lambda_i\rangle=\lambda_i|\lambda_i\rangle$. So, $\langle\lambda_j|A|\lambda_i\rangle=\lambda_i\langle\lambda_j|\lambda_i\rangle$. On the other hand,
\begin{align*} \langle\lambda_j|A&=(A^\dagger|\lambda_j)^\dagger\\ &=(A|\lambda_j\rangle)^\dagger\ (A^\dagger=A)\\ &=(\lambda_j|\lambda_i\rangle)^\dagger\\ &=\bar\lambda_j\langle\lambda_j|\\ &=\lambda_j\langle\lambda_j|\ (\mbox{the $\lambda_i$ are real}) \end{align*}
From this we also obtain $\langle\lambda_j|A|\lambda_i\rangle=\lambda_j\langle\lambda_j|\lambda_i\rangle$. This means that $\lambda_i\langle\lambda_j|\lambda_i\rangle=\lambda_j\langle\lambda_j|\lambda_i\rangle$, i.e. we have
$$(\lambda_i-\lambda_j)\langle\lambda_j|\lambda_i\rangle=0$$
If $i\ne j$ then $\lambda_i\ne\lambda_j$ and so $\langle\lambda_j|\lambda_i\rangle=0$.

Example. Let us consider the matrix $A=\begin{pmatrix}
3 & 1+i\\
-1+i & -3
\end{pmatrix}$. The adjoint (i.e. the conjugate transpose of this matrix) of $A$ is $A^\dagger=\begin{pmatrix}
3 & -1-i\\
1-i & -3
\end{pmatrix}$. Since $A\ne A^\dagger$, $A$ is not Hermitian. Although $A$ is not Hermitian, it has real eigenvalues $\pm\sqrt{7}$ and the eigenvectors corresponding to the eigenvectors $\sqrt{7}$ and $=-\sqrt{7}$ are, respectively,
$$v_1=\begin{pmatrix}
\frac{1+i}{3-\sqrt{7}}\\
1
\end{pmatrix},\ v_2=\begin{pmatrix}
\frac{1+i}{3+\sqrt{7}}\\
1
\end{pmatrix}$$
$\langle v_1|v_2\rangle=2$, so they are not orthogonal. Interestingly, $v_1$ and $v_2$ are orthogonal with respect to the inner product
\begin{equation}
\label{eq:jproduct}
\langle v_1|J|v_2\rangle
\end{equation}
where
$$J=\begin{pmatrix}
1 & 0\\
0 & -1
\end{pmatrix}$$
The matrices of the form $$H=\begin{pmatrix}
a & b\\
-\bar b & d
\end{pmatrix},$$ where $a$ and $d$ are real and $(a-d)^2-4|b|^2>0$ (this condition is required to ensure that such a matrix has two distinct real eigenvalues), are self-adjoint with respect to the inner product \eqref{eq:jproduct} i.e. $H$ satisfies \begin{equation}\label{eq:jself-adjoint}\langle Hv_1|J|v_2\rangle=\langle v_1|J|Hv_2\rangle\end{equation} which is equivalent to $$H^\dagger J=JH$$

Exercise. Prove that the eigenvectors of a matrix which satisfies \eqref{eq:jself-adjoint} are mutually orthogonal withe respect to the inner product \eqref{eq:jproduct}.

The Tensor Product

Let $V$ and $W$ be two vector spaces of dimensions $m$ and $n$, respectively. The tensor product of $V$ and $W$ is a space $V\otimes W$ of dimension $mn$ together with a bilinear map $$\varphi: V\times W\longrightarrow V\otimes W;\ \varphi(v,w)=v\otimes w$$ which satisfy the following universal property: For any vector space $X$ and any bilinear map $\psi: V\times W\longrightarrow X$, there exists uniquely a linear map $\gamma : V\otimes W\longrightarrow X$ such that $\psi(v,w)=\gamma(u\otimes w)$ for all $v\in V$ and $w\in W$.

$$\begin{array}[c]{ccc}V\otimes W & & \\\uparrow\scriptstyle{\varphi} & \scriptstyle{\gamma}\searrow& \\V\times W & \stackrel{\psi}\rightarrow & X\end{array}\ \ \ \gamma\circ\varphi=\psi$$

Often, we use more down to earth definition of the tensor product. Let $\{e_1,\cdots,e_m\}$ and $\{f_1,\cdots,f_n\}$ be bases of $V$ and $W$, respectively. The tensor product $V\otimes W$ is a vector space of dimension $mn$ spanned by the basis $\{e_i\otimes f_j: i=1,\cdots,m,\ j=1,\cdots,n\}$. Let $v\in V$ and $w\in W$. Then $$v=\sum_i v_ie_i\ \mbox{and}\ w=\sum_j w_jf_j$$ The tensor product $v$ and $w$ is then given by $$v\otimes w=\sum_{i,j}v_iw_je_i\otimes f_j$$ It can be easily shown that this definition of the tensor product in terms of prescribed bases satisfies the universality property. Although this definition uses a choice of bases of $V$ and $W$, the tensor product $V\otimes W$ must not depend on a particular choice of bases, i.e. regardless of the choice of bases the resulting tensor product must be the same. This also can be shown easily using some basic properties from linear algebra. I will leave them for exercise for readers.

The tensor product can be used to describe the state of a quantum memory register. A quantum memory register consists of many 2-state systems (Hilbert spaces of qubits). Let $|\psi^{(1)}\rangle$ and $|\psi^{(2)}\rangle$ be qubits associated with two different 2-state systems. In terms of the standard orthogonal basis $\begin{pmatrix}1\\0\end{pmatrix}$ and $\begin{pmatrix}0\\1\end{pmatrix}$ for each 2-state system, we have \begin{align*}|\psi^{(1)}\rangle&=\begin{pmatrix}\omega_0^{(1)}\\\omega_1^{(1)}\end{pmatrix}=\omega_0^{(1)}\begin{pmatrix}1\\0\end{pmatrix}+\omega_1^{(1)}\begin{pmatrix}0\\1\end{pmatrix}\\|\psi^{(2)}\rangle&=\begin{pmatrix}\omega_0^{(2)}\\\omega_1^{(2)}\end{pmatrix}=\omega_0^{(2)}\begin{pmatrix}1\\0\end{pmatrix}+\omega_1^{(2)}\begin{pmatrix}0\\1\end{pmatrix}\end{align*} Define $\otimes$ on the basis members as follows: \begin{align*}|00\rangle&=\begin{pmatrix}1\\0\end{pmatrix}\otimes\begin{pmatrix}1\\0\end{pmatrix}=\begin{pmatrix}1\\0\\0\\0\end{pmatrix},\ |01\rangle=\begin{pmatrix}1\\0\end{pmatrix}\otimes\begin{pmatrix}0\\1\end{pmatrix}=\begin{pmatrix}0\\1\\0\\0\end{pmatrix}\\|10\rangle&=\begin{pmatrix}0\\1\end{pmatrix}\otimes\begin{pmatrix}1\\0\end{pmatrix}=\begin{pmatrix}0\\0\\1\\0\end{pmatrix},\ |11\rangle=\begin{pmatrix}0\\1\end{pmatrix}\otimes\begin{pmatrix}0\\1\end{pmatrix}=\begin{pmatrix}0\\0\\0\\1\end{pmatrix}\end{align*} These four vectors form a basis for a 4-dimensional Hilbert space (a 2-qubit memory register). It follows that \begin{align*}|\psi^{(1)}\rangle\otimes|\psi^{(2)}\rangle&=\omega_0^{(1)}\omega_0^{(2)}|00\rangle+\omega_0^{(1)}\omega_1^{(2)}|01\rangle+\omega_1^{(1)}\omega_0^{(2)}|10\rangle+\omega_1^{(1)}\omega_1^{(2)}|11\rangle\\&=\begin{pmatrix}\omega_0^{(1)}\omega_0^{(2)}\\\omega_0^{(1)}\omega_1^{(2)}\\\omega_1^{(1)}\omega_0^{(2)}\\\omega_1^{(1)}\omega_1^{(2)}\end{pmatrix}\end{align*}Similarly, to describe the state of a 3-qubit memory register, one performs the tensor product $|\psi^{(1)}\rangle\otimes|\psi^{(2)}\rangle\otimes|\psi^{(3)}\rangle$.

Quantum memory registers can store an exponential amount of classical information in only a polynomial number of qubits using the quantum property the Principle of Superposition. For example, consider the two classical memory registers storing complimentary sequences of bits $$\begin{array}{|c|c|c|c|c|c|c|}\hline1 & 0 & 1 & 1 & 0 & 0 &1\\\hline 0 & 1 & 0 & 0 & 1 & 1 & 0\\\hline\end{array}$$ A single quantum memory register can store both sequences simultaneously in an equally weighted superposition of the two states representing each 7-bit input $$\frac{1}{\sqrt{2}}(|1011001\rangle+|0100110\rangle)$$

A matrix can be considered as a vector. For example, a $2\times 2$ matrix $\begin{pmatrix}a & b\\c & d\end{pmatrix}$ can be identified with the vector $(a, b, c, d) \in \mathbb{R}^4$. Hence one can define the tensor product of two matrices, called the Kronecker product, in a similar manner to that of two vectors. For example, $$\begin{pmatrix}a_{11} & a_{12}\\a_{21} & a_{22}\end{pmatrix}\otimes\begin{pmatrix}b_{11} & b_{12}\\b_{21} & b_{22}\end{pmatrix}:=\begin{pmatrix}a_{11}\begin{pmatrix}b_{11} & b_{12}\\b_{21} & b_{22}\end{pmatrix} & a_{12}\begin{pmatrix}b_{11} & b_{12}\\b_{21} & b_{22}\end{pmatrix}\\a_{21}\begin{pmatrix}b_{11} & b_{12}\\b_{21} & b_{22}\end{pmatrix} & a_{22}\begin{pmatrix}b_{11} & b_{12}\\b_{21} & b_{22}\end{pmatrix}\end{pmatrix}\\=\begin{pmatrix}a_{11}b_{11} & a_{11}b_{12} & a_{12}b_{11} & a_{12}b_{12}\\a_{11}b_{21} & a_{11}b_{22} & a_{12}b_{21} & a_{12}b_{22}\\a_{21}b_{11} & a_{21}b_{12} & a_{22}b_{11} & a_{22}b_{12}\\a_{21}b_{21} & a_{21}b_{22} & a_{22}b_{21} & a_{22}b_{22}\end{pmatrix}$$

References:

[1] A. Yu. Kitaev, A. H. Shen and M. N. Vyalyi, Classical and Quantum Computation, Graduate Studies in Mathematics Volume 47, American Mathematical Society, 2002

[2] Colin P. Williams and Scott H. Clearwater, Explorations in Quantum Computing, Springer TELOS, 1998

A Linear Algebra Problem on Twitter

Problem: Let $A$ and $B$ be $n\times n$ matrices such that their sum $A+B$ is invertible. Then show that $$A(A+B)^{-1}B=B(A+B)^{-1}A$$ (Hat tip: Sam Walters)

Solution. \begin{equation}\begin{aligned}I&=(A+B)(A+B)^{-1}\\&=A(A+B)^{-1}+B(A+B)^{-1}\end{aligned}\label{eq:matrix}\end{equation} Multiply \eqref{eq:matrix} by $B$ from the right \begin{equation}\label{eq:matrix2}B=A(A+B)^{-1}B+B(A+B)^{-1}B\end{equation} Also multiply \eqref{eq:matrix} by $A$ from the left \begin{equation}\label{eq:matrix3}A=A(A+B)^{-1}A+B(A+B)^{-1}A\end{equation} Subtract \eqref{eq:matrix3} from \eqref{eq:matrix2}. \begin{equation}\label{eq:matrix4}B-A=A(A+B)^{-1}B-B(A+B)^{-1}A+B(A+B)^{-1}B-A(A+B)^{-1}A\end{equation} In a similar manner from $I=(A+B)^{-1}(A+B)$, we obtain \begin{equation}\label{eq:matrix5}A-B=A(A+B)^{-1}B-B(A+B)^{-1}A+A(A+B)^{-1}A-B(A+B)^{-1}B\end{equation} \eqref{eq:matrix4}+\eqref{eq:matrix5} results $$A(A+B)^{-1}B=B(A+B)^{-1}A$$

A mathematician who Twitter username is Manifoldless beat me to it by a few minutes :). But not just that. His solution is shorter (so better) than mine: \begin{align*}A(A+B)^{-1}B&=(A+B-B)(A+B)^{-1}(A+B-A)\\&=[I-B(A+B)^{-1}](A+B-A)\\&=A+B-A-B+B(A+B)^{-1}A\\&=B(A+B)^{-1}A\end{align*}