Born Rule

Let $A$ be a Hermitian (self-adjoint) operator. Due to Max Born, who proposed the statistical interpretation of quantum mechanics, the probability of measuring an eigenvalue $\lambda_i$ of $A$ in a state $\psi$ is $\langle\psi|P_i|\psi\rangle$, where $P_i$ is the projection onto the eigenspace of $A$ corresponding to $\lambda_i$ i.e. $P_i$ is a linear map $P_i: V_{\lambda_i}\longrightarrow V_{\lambda_i}$ such that $P^2=I$. If we assume no degeneracy of $\lambda_i$, then the eigenspace of $A$ corresponding to $\lambda_i$ is one-dimensional. In this case, $P_i=|\lambda_i\rangle\langle\lambda_i|$, where the $|\lambda_i\rangle$ are orthonormal. Now,
\begin{align*} \langle\psi|P_i|\psi\rangle&=\langle\psi|\lambda_i\rangle\langle\lambda_i|\psi\rangle\\ &=\overline{\langle\lambda_i|\psi\rangle}\langle\lambda_i|\psi\rangle\\ &=|\langle\lambda_i|\psi\rangle|^2 \end{align*}
The complex number $\langle\lambda_i|\psi\rangle$ is called the probability amplitude, so the probability is the squared magnitude of the amplitude. This is called the Born rule.

One may consider $A$ as a random variable with its eigenvalues as the values of random variable. (See Remark below.) Since for each $j$, $A|\lambda_j\rangle=\lambda_j|\lambda_j\rangle$,
The expected value $\langle A\rangle$ of the self-adjoint operator $A$ in the state $|\psi\rangle$ is naturally defined as a weighted average
\begin{align*} \langle A\rangle&=\sum_i\lambda_i|\langle\lambda_i|\psi\rangle|^2\\ &=\sum_i\langle\lambda_i|A|\lambda_i\rangle\langle\lambda_i|\psi\rangle\langle\psi|\lambda_i\rangle\\ &=\sum_i\langle\psi|\lambda_i\rangle\langle\lambda_i|A|\lambda_i\rangle\langle\lambda_i|\psi\rangle\\ &=\langle\psi|\sum_i|\lambda_i\rangle\langle\lambda_i|A|\sum_i|\lambda_i\rangle\langle\lambda_i|\psi\rangle\\ &=\langle\psi|I|A|I|\psi\rangle\\ &=\langle\psi|A|\psi\rangle \end{align*}
Hence, we have
$$\langle A\rangle=\langle\psi|A|\psi\rangle$$

Remark. Let $\Omega=\bigcup_iV_{\lambda_i}$, i.e. $\Omega$ is the set of all eigenvectors of $A$. Let $\mathcal{U}$ consist of $\emptyset$, $\Omega$ and unions of subsequences of ${V_{\lambda_i}}$. Then $\mathcal{U}$ is a $\sigma$-algebra of subsets of $\Omega$. Define $X:\Omega\longrightarrow\mathbb{R}$ by
Then $X$ is a random variable.

Quantum Degeneracy

In quantum mechanics, an energy level is said to be degenerate if the energy corresponds to two or more states of a quantum system. Also, two or more states of a quantum system are said to be degenerate if they give the same energy upon measurement. The set of all degenerate states of a quantum system that correspond a particular energy $E$ forms a (Hilbert) subspace, called the eigenspace of $E$. To see this, let $H$ be a Hamiltonian and $|\psi_1\rangle$, $|\psi_2\rangle$ two linearly independent eigenstates corresponding to the same eigenvalue (energy) $E$. Then
\begin{align*} H|\psi_1\rangle&=E|\psi_1\rangle\\ H|\psi_2\rangle&=E|\psi_2\rangle \end{align*}
Let $|\psi\rangle=c_1|\psi_1\rangle+c_2|\psi_2\rangle$, a linear combination (superposition) of $|\psi_1\rangle$ and $|\psi_2\rangle$. Then
\begin{align*} H|\psi\rangle&=H(c_1|\psi_1\rangle+c_2|\psi_2\rangle)\\ &=c_1H|\psi_1\rangle+c_2H|\psi_1\rangle\\&=c_1E|\psi_1\rangle+c_2E|\psi_2\rangle\\ &=E(c_1|\psi_1\rangle+c_2|\psi_2)\\ &=E|\psi\rangle \end{align*}
The dimension of the eigenspace corresponding to an eigenvalue (energy) $E$ is called the degree of degeneracy of $E$.

Brachistochrone Curve

Brachistochrone Problem: Find the shape of the curve down which a bead sliding from rest and accelerated by gravity will slip from on point to another in the least time. Here, we do not consider friction.

Due to conservation of energy, we have
Solving \eqref{eq:energy} for the speed $v$, we obtain
The time $T_{PQ}$ for the bead to travel from a point $P$ to $Q$ is
\begin{align*}t_{PQ}&=\int_P^Q\frac{ds}{v}\\&=\int_P^Q\frac{\sqrt{dx^2+dy^2}}{\sqrt{2gy}}\\&=\int_P^Q\sqrt{\frac{1+y_x^2}{2gy}} \end{align*}
Let $f(y_x,y)=\sqrt{\frac{1+y_x^2}{2gy}}$. The path from $P$ to $Q$ which minimizes $t_{PQ}$ can be found by solving the Euler-Lagrange equation ([1])
\frac{\partial f}{\partial y}-\frac{d}{Dx}\frac{\partial f}{\partial y_x}=0
It can be easily shown that \eqref{eq:E-L} is equivalent to
\frac{\partial f}{\partial x}-\frac{d}{dx}\left(f-y_x\frac{\partial f}{\partial y_x}\right)=0
Since $f$ does not explicitly depend on $x$, $\frac{\partial f}{\partial x}=0$, so from \eqref{eq:E-L2}, this leads to
$$f-y_x\frac{\partial f}{\partial y_x}=C$$
for some constant $C$. On the other hand, we have
$$f-y_x\frac{\partial f}{\partial y_x}=\frac{1}{\sqrt{2gy(1+y_x^2}}$$
Hence, we arrive at the differential equation
where $k^2=\frac{1}{2gC^2}$. The differential equation is separable and it can be written as
Let $\sqrt{\frac{y}{k^2-y}}=\tan\frac{\theta}{2}$. Then
The equation \eqref{eq:de} becomes
Integrating both sides with the half-angle formula $\sin^2\frac{\theta}{2}=\frac{1-\cos\theta}{2}$, we obtain
for some constant $C_1$. With the condition $P(0,0)$, i.e. $x=y=0$ when $\theta=0$, we have $C_1=0$ and so,
Therefore, the curve on which the bead is sliding down in the shortest time is a cycloid given by the parametric equations
In geometry, a cycloid is the curve traced by a point on a circle as it rolls along a straight line without slipping.

Figure 1. Cycloid with $k=1$ and $0\leq\theta\leq 2\pi$.


  1. George Arfken, Mathematical Methods for Physicists, Third Edition, Academic Press, 1985

Bell’s Theorem

Suppose that Boris and Natasha are in different locations far away from each other. Their mutual friend Victor prepares a pair of particles and send one each to Boris and Natasha. Boris chooses to perform one of two possible measurements, say $A_0$ and $A_1$, associated with physical properties $P_{A_0}$ and $P_{A_1}$ of the particle he received. Each $A_0$ and $A_1$ has $+1$ or $-1$ for the outcomes of measurement. When Natasha receives one of the particles, she as well chooses to perform one of two possible measurements $B_0$, $B_1$, each of which has outcome $+1$ or $-1$. Let us consider the following quantity of the measurements $A_0$, $A_1$, $B_0$, $B_1$:
Since $A_0=\pm 1$ and $A_1=\pm 1$, either one of $A_0+A_1$ and $A_0-A_1$ is zero and the other is $\pm 2$. If the experiment is repeated over many trials with Victor preparing new pairs of particles, the expected value of all the outcomes satisfies the inequality
\langle A_0B_0+A_0B_1+A_1B_0-A_1B_1\rangle\leq 2
Proof of \eqref{eq:bell}:
\begin{align*}\langle A_0B_0+A_0B_1+A_1B_0-A_1B_1\rangle&=\sum_{A_0,A_1,B_0,B_1}p(A_0,A_1,B_0,B_1)(A_0B_0+A_0B_1+A_1B_0-A_1B_1)\\&\leq \sum_{A_0,A_1,B_0,B_1}2p(A_0,A_1,B_0,B_1)\\&=2 \end{align*}
The inequality \eqref{eq:bell} is a variant of the Bell inequality called the CHSH inequality. CHSH stands for John Clauser, Michael Horne, Abner Simony and Richard Holt. The derivation of the CHSH inequality \eqref{eq:bell} depends on two assumptions:

  1. The physical properties $P_{A_0}$, $P_{A_1}$, $P_{B_0}$, $P_{B_1}$ have definite values $A_0$, $A_1$, $B_0$, $B_1$ which exist independently of observation or measurement. This is called the assumption of realism.
  2. Boris performing his measurement does not influence the result of Natasha’s measurement. This is called the assumption of locality.

These two assumptions together are known as the assumptions of local realism. Surprisingly this intuitively innocuous inequality can be violated in quantum mechanics. Here is an example. Let $|0\rangle=\begin{pmatrix}
\end{pmatrix}$ and $|1\rangle=\begin{pmatrix}
\end{pmatrix}$. Then $|0\rangle$ and $|1\rangle$ are the eigenstates of
1 & 0\\
0 & -1
Victor prepares a quantum system of two qubits in the state
\begin{align*}|\psi\rangle&=\frac{|0\rangle\otimes |1\rangle-|1\rangle\otimes |0\rangle}{\sqrt{2}}\\ &=\frac{|01\rangle – |10\rangle}{\sqrt{2}} \end{align*}
He passes the first qubit to Boris, and the second qubit to Natasha. Boris measures either of the observables
$$A_0=\sigma_z,\ A_1=\sigma_x=\begin{pmatrix}
0 & 1\\
1 & 0
and Natasha measures either of the observables
$$B_0=-\frac{\sigma_x+\sigma_z}{\sqrt{2}},\ B_1=\frac{\sigma_x-\sigma_z}{\sqrt{2}}$$
Since the system is in the state $|\psi\rangle$, the average value of $A_0\otimes B_0$ is
\begin{align*}\langle A_0\otimes B_0\rangle&=\langle\psi|A_0\otimes B_0|\psi\rangle\\ &=\frac{1}{\sqrt{2}} \end{align*}
Similarly, the average values of the other observables ar given by
\begin{align*}\langle A_0\otimes B_1\rangle&=\frac{1}{\sqrt{2}},\ \langle A_1\otimes B_0\rangle=\frac{1}{\sqrt{2}},\ \mbox{and}\\ \langle A_1\otimes B_1\rangle&=-\frac{1}{\sqrt{2}} \end{align*}
Since the expected value is linear, we have
\langle A_0B_0+A_0B_1+A_1B_0-A_1B_1\rangle&=\langle A_0B_0\rangle+\langle A_0B_1\rangle+\langle A_1B_0\rangle-\langle A_1B_1\rangle\\&=2\sqrt{2}\end{aligned}\label{eq:bell2}
This means that the Bell inequality \eqref{eq:bell} is violated. Physicists have confirmed the prediction in \eqref{eq:bell2} by experiments using photons. It turns out that the Mother Nature does not obey the Bell inequality. What this means is that one or both of the two assumptions for the derivation of the Bell inequality \eqref{eq:bell} must be incorrect. There is no consensus among physicists which of the two assumptions needs to be dropped. An important lesson we learn from the Bell inequality is that the Mother Nature (Quantum Mechanics) defies our intuitive common sense. This also begs a troubling question. If we cannot rely on our intuition to understand how the universe works, what else can we rely on? One thing is certain. The world is not locally realistic.


  1. Michael A. Nielsen and Isaac L. Chuang, Quantum Computation and Quantum Information, Cambridge University Press, 2004

Newton’s Method

Quadratic equations can be easily solved using the quadratic formula. For cubic and quartic equations there are also formula for solutions, but they are pretty complicated. For polynomials of higher-order degree of 5 or higher there are no such formulas for roots. Newton’s Method allows us to find an approximate solution to such equations. I will use a simple example to explain how it works and then formulate Newton’s method in general. Let us consider the function $f(x)=x^4-2$. Newton’s method begins with by guessing the first solution. In order for Newton’s method to work, one needs to come up with the first guess close enough to the actual solution, otherwise Newton’s method may return an undesirable result. (I will show you an example of such case later on.) We can come up with a reasonable first guess say $x_0$ using the graph of the function.

Figure 1. The graph of $f(x)=x^4-2$

From the graph, we choose $x_0=2$. Of course, one can choose even a closer point, for example $x_0=1.5$. The tangent line to the graph of $f(x)$ at $x_0=2$ is
Setting $y=0$, we find the $x$-intercept $x_1$

Figure 2. The first iteration of Newton’s method with $x_0=2$.

In Figure 2, we see that $x_1$ is closer to the actual solution than $x_0$. This time we find the $x$-intercept $x_2$ of the tangent line to the graph of $f(x)$ at $x_1=1.562500000$.

Figure 3. The second iteration of Newton’s method with $x_1=1.562500000$.

In Figure 3, we see that $x_2$ is closer to the actual solution than $x_1$. Similarly, we can find the next approximate solution $x_3=1.203252569$ which is closer to the actual solution than $x_2$ as shown in Figure 3.

Figure 4. The third iteration of Newton’s method with $x_2=1.302947000$.

Continuing this process, the 6th approximate solution is given by $x_6=1.189207115$ which is correct to 9 decimal places. The exact solution is $\root 4\of{2}=1.189207115002721$.

In general, Newton’s Method is given by
\begin{align*}x_0&=\mbox{initial approximate}\\x_{n+1}&=x_n-\frac{f(x_n)}{f'(x_n)}\end{align*}
for $n=0,1,2,\cdots$. Here, the assumption is that $f'(x_n)\ne 0$ for $n=0,1,2,\cdots$.

Earlier, I mentioned that if we don’t choose the initial approximate $x_0$ close enough to the actual solution, Newton’s method may return an undesirable result. Let me show you an example. Let us consider the function $f(x)=x^3-2x-5$. Figure 4 shows its graph.

Figure 5. The graph of $f(x)=x^3-2x-5$.

If we choose $x_0=-4$ and run Newton’s method, we obtain the following approximates.

                  X[1] = -2.673913043
                  X[2] = -1.708838801
                  X[3] = -0.7366532045
                  X[4] = -11.29086856
                  X[5] = -7.553673519
                  X[6] = -5.065760748
                  X[7] = -3.400569565
                  X[8] = -2.252794796
                  X[9] = -1.350919123
                 X[10] = 0.01991182580
                 X[11] = -2.501495587
                 X[12] = -1.568413258
                 X[13] = -0.5049189040

As we can see, the numbers do not not appear to be converging to somewhere which indicates that Newton’s method is not working well for this case. In certain cases when we choose $x_0$ too far from the actual solution, we may end up getting $f'(x_n)=0$ for some $n$ in which case Newton’s method fails. For $x_0=4$, we obtain

                   X[1] = 2.891304348
                   X[2] = 2.311222795
                   X[3] = 2.117035157
                   X[4] = 2.094830999
                   X[5] = 2.094551526

The fifth approximate $x_5=2.094551526$ is correct to 6 decimal places.

Newton’s method is not suitable to be carried out by hand. An open source computer algebra system Maxima has a built-in package mnewton for Newton’s method. If you want to install Maxima on your computer, you can find an instruction here. Let us redo the above example using mnewton with initial approximate $x_0=4$.

(%i1) load(“mnewton”)$
(%i2) mnewton([x^3-2*x-5], [x], [4]);
(%o2) [[x = 2.094551481542326]]

What I find interesting about mnewton is that even if you use an initial approximate that didn’t work out for the standard Newton’s method such as $x_0=-4$ in the above example, it instantly returns the answer. (Try it yourself.)

Newton’s method can be used to calculate internal rate of return (IRR) in finance. It is the discount rate at which net present value (NPV) is equal to zero. NPV is the sum of the present values of all cash flows, or alternatively, NPV can be defined as the difference between the present value of the benefits (cash inflows) and the present value of the costs (cash outflows). Here is an example.

Example. If we invest \$100 today and receive \$110 in one year, then NPV can be expressed as
Setting $\mathrm{NPV}=0$, we have
If we have multiple future cash inflows \$90, \$50, and \$30 at the end of each year for the next three years, NPV is given by
Setting $\mathrm{NPV}=0$, we obtain a cubic equation
where $x=1+\mathrm{IRR}$. Using Newton’s method, we find $x=1.41$, so $\mathrm{IRR}=0.41=41\%$.

(%i1) load(“mnewton”)$
(%i2) mnewton([100x^3-90x^2-50*x-30], [x], [1]);
(%o2) [[x = 1.406937359155343]]

Update: I wrote a simple Maple script that runs Newton’s method. If you have Maplesoft, you are more than welcome to download the Maple worksheet here and use it.