The Binomial Distribution

For a fair coin, if we toss it a very large number of times, nearly half the time we will get heads and half the time we will get tails. So, we can say the probability of getting heads is equal to the probability of getting tails as $\frac{1}{2}$.

Now consider the simultaneous tossing of $N$ coins. For $N=2$, there are four possible outcomes: $HH$, $HT$, $TH$, and $TT$. The probabilities of getting 2 heads, 1 head, and no head are, respectively, $\frac{1}{4}$, $\frac{2}{4}=\frac{1}{2}$, $\frac{1}{4}$. The number of ways we can get $n$ heads (consequently $N-n$ tails) is
$$\frac{N!}{n!(N-n)!}$$
The probability $p$ of getting $n$ heads is then
\begin{equation}
\label{eq:binomial}
p=\frac{N!}{n!(N-n)!2^N}
\end{equation}
Let us consider
$$\ln p=\ln\frac{N!}{n!(N-n)!2^N}=\ln N!-\ln n!-\ln(N-n)!-N\ln 2$$
Using the Stirling’s formula
$$\ln N!\approx N\ln N-N$$
we obtain
$$\ln p\approx -N\left[\frac{n}{N}\ln\frac{n}{N}+\left(1-\frac{n}{N}\right)\ln\left(1-\frac{n}{N}\right)+\ln 2\right]$$
For very large $N$, $x=\frac{n}{N}$ can be considered to be continuous.
$$\ln p\approx -N[x\ln x+(1-x)\ln(1-x)]$$
$\ln p$ has only one critical point at $x=\frac{1}{2}$ and the second derivative of $\ln p$ at $x=\frac{1}{2}$ is negative. So $\ln p$ (and consequently $p$) assumes a maximum at $x=\frac{1}{2}$. Expanding $\ln p$ at $x=\frac{1}{2}$, we have
$$\ln p\approx -2N\left(x-\frac{1}{2}\right)^2$$
or
$$p\approx\exp\left[-2N\left(x-\frac{1}{2}\right)^2\right]$$
As seen clearly, the standard deviation from $x=\frac{1}{2}$ is $\frac{1}{2\sqrt{N}}$.

$p\approx\exp[-2N(x-\frac{1}{2})^2]$ with $N=11$.

Thus far, we have considered equal probabilities. Suppose that we have a coin with a probability of $q$, $0<q<1$ for heads. Then the probability of getting $n$ heads when $N$ coins are tossed is
\begin{equation}
\label{eq:binomial2}
\frac{N!}{n!(N-n)!}q^n(1-q)^{N-n}
\end{equation}
$q=\frac{1}{2}$ reduces \eqref{eq:binomial2} to \eqref{eq:binomial}. Note that the probability in $\eqref{eq:binomial}$ peaks at $x=q$ and the standard deviation from $x=q$ is the same.

We now consider events with more than two outcomes. For example, rolling a dice has six possible outcomes.
Suppose that we want to distribute $N$ particles into $K$ boxes in such a way that $n_i$ particles go into box $i$. How many ways are there to do this? There are $\frac{N!}{n_1!(N-n_1)!}$ ways to choose $n_1$ particles out of $N$ particles. Then there are $\frac{(N-n_1)!}{n_2!(N-n_1-n_2)!}$ ways to choose $n_2$ particles out of remaining $N-n_1$ particles, and so on and so forth. Hence, the number of ways to distribute $n_1$ particles in box 1, $n_2$ particles in box 2, …, $n_K$ particles in box $K$ is
\begin{align*} \frac{N!}{n_1!(N-n_1)!}\frac{(N-n_1)!}{n_2!(N-n_1-n_2)!}&\frac{(N-n_1-n_2)!}{n_3!(N-n_1-n_2-n_3)!}\cdots\\ &\frac{(N-n_1-n_2-\cdots-n_{K-1})!}{n_K!}\\ &=\frac{N!}{n_1!n_2!\cdots n_K!}\\ &=N!\Pi_{i=1}^K\frac{1}{n_i!}, \end{align*}
$\sum_{i=1}^Kn_i=N$.

No $\sqrt{\mathrm{NOT}}$ in PT-Symmetric Quantum Mechanics

In PT-symmetric quantum mechanics, the inner product is given by
\begin{equation}
\label{eq:pt-product}
\langle v_1|P|v_2\rangle
\end{equation}
where $P=\begin{bmatrix}
0 & 1\\
1 & 0
\end{bmatrix}$.
A PT-symmetric operator $H$ is a $2\times 2$ symmetric matrix that satisfies
$$\langle Hv_1|P|v_2\rangle=\langle v_1|P|Hv_2\rangle$$
which is equivalent to
$$\bar HP=PH$$
So, PT-symmetric operators are self-adjoint operators with respect to the inner product \eqref{eq:pt-product}. PT-symmetric operators take the form
$$\begin{bmatrix}
a & b\\
b & \bar a
\end{bmatrix}$$
In PT-symmetric quantum mechanics, $\mathrm{NOT}$ gate is given by
$$\mathrm{NOT}=\begin{bmatrix}
1 & 0\\
0 &-1
\end{bmatrix}$$
The reason for this is that in PT-symmetric quantum mechanics the qubits are
$$|0\rangle=\frac{1}{\sqrt{2}}\begin{bmatrix}
1\\
1
\end{bmatrix},\ |1\rangle=\frac{1}{\sqrt{2}}\begin{bmatrix}
1\\
-1
\end{bmatrix}$$
The eigenvectors of $\mathrm{NOT}$ corresponding to the eigenvalues $\pm 1$ are, respectively, $\begin{bmatrix}
1\\
0
\end{bmatrix}$ and $\begin{bmatrix}
0\\
1
\end{bmatrix}$ and their norms are zero with respect to the inner product \eqref{eq:pt-product}. This means that the spectral decomposition of $\mathrm{NOT}$ in terms of its eigenvectors does not exist in PT-symmetric quantum mechanics. This also hints us that $\sqrt{\mathrm{NOT}}$ does not likely exist. In PT-symmetric quantum mechanics, an operator $U$ is called $P$-unitary if $U^\dagger PU=P$ and $P$-antiunitary if $U^\dagger PU=-P$.

For the same physical reason, in PT-symmetric quantum mechanics, gates are required to be $P$-unitary or $P$-antiunitary. $\mathrm{NOT}$ is $P$-antiunitary. It can be easily seen that $\sqrt{\mathrm{NOT}}$ does not exist. If $\sqrt{\mathrm{NOT}}$ exists, then it is either $P$-unitary or $P$-antiunitary. If $\sqrt{\mathrm{NOT}}$ is $P$-unitary, then
\begin{align*} \mathrm{NOT}^\dagger P\mathrm{NOT}&=\sqrt{\mathrm{NOT}}^\dagger(\sqrt{\mathrm{NOT}}^\dagger P\sqrt{\mathrm{NOT}})\sqrt{\mathrm{NOT}}\\ &=\sqrt{\mathrm{NOT}}^\dagger P\sqrt{\mathrm{NOT}}\\ &=P \end{align*}
This contradicts to the fact that $\mathrm{NOT}$ is $P$-antiunitary. If $\sqrt{\mathrm{NOT}}$ is $P$-antiunitary, then
\begin{align*} \mathrm{NOT}^\dagger P\mathrm{NOT}&=\sqrt{\mathrm{NOT}}^\dagger(\sqrt{\mathrm{NOT}}^\dagger P\sqrt{\mathrm{NOT}})\sqrt{\mathrm{NOT}}\\ &=-\sqrt{\mathrm{NOT}}^\dagger P\sqrt{\mathrm{NOT}}\\ &=P \end{align*}
Again, this is a contradiction. Therefore, $\sqrt{\mathrm{NOT}}$ does not exist. As seen here, the standard hermitian quantum mechanics admits $\sqrt{\mathrm{NOT}}$ gate. This implies that the standard hermitian quantum mechanics and PT-symmetric quantum mechanics cannot be consistent with each other. While no one has yet come up with a physical implementation of $\sqrt{\mathrm{NOT}}$ gate, it is interesting to see that the physical viability of a quantum theory hangs on the physical realizability of a quantum gate.

Square Roots of Operators II

The determinant (and also trace) of a nilpotent matrix is always zero, so a nilpotent matrix cannot be invertible. However, if $N$ is nilpotent of index $m$, then $I+N$ and $I-N$ are invertible and the inverses are given by
\begin{align*} (I+N)^{-1}&=\sum_{k=0}^{m-1}(-N)^k=I-N+N^2-N^3+\cdots+(-N)^{m-1}\\ (I-N)^{-1}&=\sum_{k=0}^{m-1}N^k=I+N+N^2+N^3+\cdots+N^{m-1} \end{align*}
We have shown here that invertible operators have square roots. By the same token, we see that the square roots of $(I+N)^{-1}$ and $(I-N)^{-1}$ exist. But how do we find them? Let us denote them by $(I+N)^{-\frac{1}{2}}$ and $(I-N)^{-\frac{1}{2}}$, respectively. Then by the same manner we proved the existence of $\sqrt{N+1}$ here, we obtain
\begin{align*} (I+N)^{-\frac{1}{2}}&=I-\frac{1}{2}N-\frac{3}{8}N^2-\frac{11}{16}N^3+\cdots\\ (I-N)^{-\frac{1}{2}}&=I+\frac{1}{2}N+\frac{3}{8}N^2+\frac{5}{16}N^3+\cdots \end{align*}

Square Roots of Operators

Mathematics is full of weird stuff. One of them is the square root of an operator. So far, I have seen only two books that discuss the square root of an operator. They are listed in the references below.

Definition. An operator $R$ is called a square root of an operator $T$ is $R^2=T$.

Example. In quantum computing, $\sqrt{\mathrm{NOT}}$ gate is given by
$$\sqrt{\mathrm{NOT}}=\begin{bmatrix}
\frac{1+i}{2} & \frac{1-i}{2}\\
\frac{1-i}{2} & \frac{1+i}{2}
\end{bmatrix}$$
and
$$\sqrt{\mathrm{NOT}}\cdot\sqrt{\mathrm{NOT}}=\mathrm{NOT}=\begin{bmatrix}
0 & 1\\
1 & 0
\end{bmatrix}$$
As required by quantum mechanics, quantum gates are unitary matrices. There is no counterpart of $\sqrt{\mathrm{NOT}}$ gate in classical computing, so $\sqrt{\mathrm{NOT}}$ gate is a truly quantum gate. As far as I know, no one has come up with a physical implementation of $\sqrt{\mathrm{NOT}}$ gate yet.

Example. An operator does not necessarily have a square root. For example, define $T:\mathbb{C}^3\longrightarrow\mathbb{C}^3$ by
$$T(z_1,z_2,z_3)=(z_2,z_3,0)$$
Then one can easily show that $T$ is linear. Suppose that $T$ has a square root $R$. Let $\begin{bmatrix}
r & s & t\\
u & v & w\\
z & y & z
\end{bmatrix}$ be the matrix associated with $R$. Since the matrix associated with $T$ is $\begin{bmatrix}
0 & 1 & 0\\
0 & 0 & 1\\
0 & 0 & 0
\end{bmatrix}$, we have the equation
$$\begin{bmatrix}
r & s & t\\
u & v & w\\
z & y & z
\end{bmatrix}\cdot\begin{bmatrix}
r & s & t\\
u & v & w\\
z & y & z
\end{bmatrix}=\begin{bmatrix}
0 & 1 & 0\\
0 & 0 & 1\\
0 & 0 & 0
\end{bmatrix}$$
This is a system of 9 scalar equations. One can attempt to solve this equation using a CAS (Computer Algebra System), for example, Maple and see that the system does not have a solution, i.e. $R$ does not exist.

So, what kind of operators have square roots?

Theorem. Identity + Nilpotent has a square root.

Proof. Let $N$ be nilpotent. Then $N^m=0$ for some positive integer $m$. Consider the Taylor series for $\sqrt{1+x}$:
$$\sqrt{1+x}=1+a_1x+a_2x^2+a_3x^3+\cdots$$
Using the nilpotency of $N$, we may guess that $\sqrt{I+N}$ takes the form
$$I+a_1N+a_2N^2+\cdots+a_{m-1}N^{m-1}$$
Now,
\begin{align*} (I+a_1N+a_2N^2+\cdots+a_{m-1}N^{m-1})^2=I&+2a_1N+(2a_2+a_1^2)N^2\\&+(2a_3+2a_1a_2)N^3+\cdots\\ &+(2a_{m-1}+\cdots)N^{m-1}=I+N \end{align*}
and by comparing the coefficients
$a_1=\frac{1}{2}$, $2a_2+a_1^2=0$, so $a_2=-\frac{1}{8}$, $2a_3+2a_1a_2=0$, so $a_3=\frac{1}{16}$, and so and forth.

Theorem. Suppose that $T: V\longrightarrow V$ is invertible. Then $T$ has a square root.

Proof. Let $\lambda_1,\lambda_2,\cdots,\lambda_m$ be the distinct eigenvalues of $T$. Then for each $j$, there exists a nilpotent operator $N_j: G(\lambda_j,T)\longrightarrow G(\lambda_j,T)$ such that $T|{G(\lambda_j,T)}=\lambda_jI+N_j$. (See [1] , p 252, Theorem 8.21 for a proof.) Here, $G(\lambda_j,T)$ is the eigenspace of $T$ corresponding to the eigenvalue $\lambda_j$. Since $T$ is invertible, no $\lambda_j$ is equal to 0, so we can write $$T|{G(\lambda_j,T)}=\lambda_j\left(I+\frac{N}{\lambda_j}\right)$$
Since $\frac{N}{\lambda_j}$ is nipotent, $I+\frac{N}{\lambda_j}$ has a square root. Let $R_j$ be $\sqrt{\lambda_j}$ times the square root of $I+\frac{N}{\lambda_j}$. Any vector $v\in V$ can be written uniquely in the form
$$v=u_1+u_2+\cdots+u_m,$$
where each $u_j$ is in $G(\lambda_j,T)$. Using this decomposition, define an operator $R:V\longrightarrow V$ by
$$Rv=R_1u_1+R_2 u_2+\cdots+R_m u_m$$
Then $R^2u_j=R_j^2 u_j=T|{G(\lambda_j,T)}u_j=\lambda_ju_j$ and so \begin{align*} R^2v&=R_1^2u_1+R_2^2u_2+\cdots+R_m^2u_m\\ &= T|{G(\lambda_1,T)}u_1+T|{G(\lambda_2,T)}u_2+\cdots+T|{G(\lambda_m,T)}u_m=Tv
\end{align*}
Therefore, $R$ is a square root of $T$.

Example. The $\mathrm{NOT}$ gate is invertible. It is an inverse of itself.

The proof of the above theorem suggests that an operator $T$ has a square root if there is a spectral decomposition of $T$ with respect to the standard inner product of $\mathbb{C}^n$. A normal operator is such an operator. Recall that an operator $T: V\longrightarrow V$ is normal if $T^\ast T=TT^\ast$. The following theorem guarantees the existence of such a spectral decomposition of a normal operator.

Theorem. A matrix $T$ is normal if and only if there exists a diagonal matrix $\Lambda$ and an unitary matrix $U$ such that $T=U\Lambda U^\ast$.

Here, the diagonal entries of $\Lambda$ are the eigenvalues of $T$ and the columns of $U$ are the eigenvectors of $T$. The matching eigenvalues in $\Lambda$ come in the same order as the eigenvectors are ordered as columns of $U$.

Let $T: V\longrightarrow V$ be a normal operator and write it as its spectral decomposition, using Dirac’s braket notation,
$$T=\sum_j\lambda_j|v_j\rangle\langle v_j|$$
Define
$$f(T)=\sum_j f(\lambda_j)|v_j\rangle\langle v_j|$$
Using this definition, one can define, for example, the exponential, square root, and logarithmic operator of $T$.

Example. Let $T=\begin{bmatrix}
0 & 1\\
1 & 0
\end{bmatrix}$. Find the square root of $T$.

Solution. $T$ is hermitian (symmetric), so it is normal. The spectral decomposition of $T$ is given by
$$T=|v_1\rangle\langle v_1|-|v_2\rangle\langle v_2|,$$
where
$$|v_1\rangle=\frac{1}{\sqrt{2}}\begin{bmatrix}
1\\
1
\end{bmatrix}\ \mathrm{and}\ |v_2\rangle=\frac{1}{\sqrt{2}}\begin{bmatrix}
1\\
-1
\end{bmatrix}$$
Now,
\begin{align*} \sqrt{T}&=|v_1\rangle\langle v_1|+i|v_2\rangle\langle v_2|\\ &=\frac{1}{2}\begin{bmatrix} 1 & 1\\ 1 & 1 \end{bmatrix}+\frac{i}{2}\begin{bmatrix} 1 & -1\\ -1 & 1 \end{bmatrix}\\ &=\begin{bmatrix} \frac{1+i}{2} & \frac{1-i}{2}\\ \frac{1-i}{2} & \frac{1+i}{2} \end{bmatrix} \end{align*}

References:

  1. Sheldon Axler, Linear Algebra Done Right, Third Edition, Springer, 2015
  2. Michael A. Nielsen and Isaac L. Chuang, Quantum Computation and Quantum Information, Cambridge University Press, 2000

The Eigenvectors of a Hermitian Operator Are Mutually Orthogonal

In quantum mechanics, operators are required to be Hermitian. A reason for this is that the eigenvalues of a quantum mechanical operator are interpreted as physically measurable quantities such as positions, momenta, energies, etc. and therefore they are required to be real. As is well-known, Hermitian operators have all real eigenvalues. Hermitian operators also have another nice property. The eigenvectors of a Hermitian operator are mutually orthogonal. Here, we prove this only for the case of matrix operators. Let $A$ be a Hermitian operator and let $|\lambda_i\rangle$ be the eigenvectors of $A$ with distinct eigenvalues $\lambda_i$. Then we have $A|\lambda_i\rangle=\lambda_i|\lambda_i\rangle$. So, $\langle\lambda_j|A|\lambda_i\rangle=\lambda_i\langle\lambda_j|\lambda_i\rangle$. On the other hand,
\begin{align*} \langle\lambda_j|A&=(A^\dagger|\lambda_j)^\dagger\\ &=(A|\lambda_j\rangle)^\dagger\ (A^\dagger=A)\\ &=(\lambda_j|\lambda_i\rangle)^\dagger\\ &=\bar\lambda_j\langle\lambda_j|\\ &=\lambda_j\langle\lambda_j|\ (\mbox{the $\lambda_i$ are real}) \end{align*}
From this we also obtain $\langle\lambda_j|A|\lambda_i\rangle=\lambda_j\langle\lambda_j|\lambda_i\rangle$. This means that $\lambda_i\langle\lambda_j|\lambda_i\rangle=\lambda_j\langle\lambda_j|\lambda_i\rangle$, i.e. we have
$$(\lambda_i-\lambda_j)\langle\lambda_j|\lambda_i\rangle=0$$
If $i\ne j$ then $\lambda_i\ne\lambda_j$ and so $\langle\lambda_j|\lambda_i\rangle=0$.

Example. Let us consider the matrix $A=\begin{pmatrix}
3 & 1+i\\
-1+i & -3
\end{pmatrix}$. The adjoint (i.e. the conjugate transpose of this matrix) of $A$ is $A^\dagger=\begin{pmatrix}
3 & -1-i\\
1-i & -3
\end{pmatrix}$. Since $A\ne A^\dagger$, $A$ is not Hermitian. Although $A$ is not Hermitian, it has real eigenvalues $\pm\sqrt{7}$ and the eigenvectors corresponding to the eigenvectors $\sqrt{7}$ and $=-\sqrt{7}$ are, respectively,
$$v_1=\begin{pmatrix}
\frac{1+i}{3-\sqrt{7}}\\
1
\end{pmatrix},\ v_2=\begin{pmatrix}
\frac{1+i}{3+\sqrt{7}}\\
1
\end{pmatrix}$$
$\langle v_1|v_2\rangle=2$, so they are not orthogonal. Interestingly, $v_1$ and $v_2$ are orthogonal with respect to the inner product
\begin{equation}
\label{eq:jproduct}
\langle v_1|J|v_2\rangle
\end{equation}
where
$$J=\begin{pmatrix}
1 & 0\\
0 & -1
\end{pmatrix}$$
The matrices of the form $$H=\begin{pmatrix}
a & b\\
-\bar b & d
\end{pmatrix},$$ where $a$ and $d$ are real and $(a-d)^2-4|b|^2>0$ (this condition is required to ensure that such a matrix has two distinct real eigenvalues), are self-adjoint with respect to the inner product \eqref{eq:jproduct} i.e. $H$ satisfies \begin{equation}\label{eq:jself-adjoint}\langle Hv_1|J|v_2\rangle=\langle v_1|J|Hv_2\rangle\end{equation} which is equivalent to $$H^\dagger J=JH$$

Exercise. Prove that the eigenvectors of a matrix which satisfies \eqref{eq:jself-adjoint} are mutually orthogonal withe respect to the inner product \eqref{eq:jproduct}.