Square Roots of Operators II

The determinant (and also trace) of a nilpotent matrix is always zero, so a nilpotent matrix cannot be invertible. However, if $N$ is nilpotent of index $m$, then $I+N$ and $I-N$ are invertible and the inverses are given by
\begin{align*} (I+N)^{-1}&=\sum_{k=0}^{m-1}(-N)^k=I-N+N^2-N^3+\cdots+(-N)^{m-1}\\ (I-N)^{-1}&=\sum_{k=0}^{m-1}N^k=I+N+N^2+N^3+\cdots+N^{m-1} \end{align*}
We have shown here that invertible operators have square roots. By the same token, we see that the square roots of $(I+N)^{-1}$ and $(I-N)^{-1}$ exist. But how do we find them? Let us denote them by $(I+N)^{-\frac{1}{2}}$ and $(I-N)^{-\frac{1}{2}}$, respectively. Then by the same manner we proved the existence of $\sqrt{N+1}$ here, we obtain
\begin{align*} (I+N)^{-\frac{1}{2}}&=I-\frac{1}{2}N-\frac{3}{8}N^2-\frac{11}{16}N^3+\cdots\\ (I-N)^{-\frac{1}{2}}&=I+\frac{1}{2}N+\frac{3}{8}N^2+\frac{5}{16}N^3+\cdots \end{align*}

Square Roots of Operators

Mathematics is full of weird stuff. One of them is the square root of an operator. So far, I have seen only two books that discuss the square root of an operator. They are listed in the references below.

Definition. An operator $R$ is called a square root of an operator $T$ is $R^2=T$.

Example. In quantum computing, $\sqrt{\mathrm{NOT}}$ gate is given by
$$\sqrt{\mathrm{NOT}}=\begin{bmatrix}
\frac{1+i}{2} & \frac{1-i}{2}\\
\frac{1-i}{2} & \frac{1+i}{2}
\end{bmatrix}$$
and
$$\sqrt{\mathrm{NOT}}\cdot\sqrt{\mathrm{NOT}}=\mathrm{NOT}=\begin{bmatrix}
0 & 1\\
1 & 0
\end{bmatrix}$$
As required by quantum mechanics, quantum gates are unitary matrices. There is no counterpart of $\sqrt{\mathrm{NOT}}$ gate in classical computing, so $\sqrt{\mathrm{NOT}}$ gate is a truly quantum gate. As far as I know, no one has come up with a physical implementation of $\sqrt{\mathrm{NOT}}$ gate yet.

Example. An operator does not necessarily have a square root. For example, define $T:\mathbb{C}^3\longrightarrow\mathbb{C}^3$ by
$$T(z_1,z_2,z_3)=(z_2,z_3,0)$$
Then one can easily show that $T$ is linear. Suppose that $T$ has a square root $R$. Let $\begin{bmatrix}
r & s & t\\
u & v & w\\
z & y & z
\end{bmatrix}$ be the matrix associated with $R$. Since the matrix associated with $T$ is $\begin{bmatrix}
0 & 1 & 0\\
0 & 0 & 1\\
0 & 0 & 0
\end{bmatrix}$, we have the equation
$$\begin{bmatrix}
r & s & t\\
u & v & w\\
z & y & z
\end{bmatrix}\cdot\begin{bmatrix}
r & s & t\\
u & v & w\\
z & y & z
\end{bmatrix}=\begin{bmatrix}
0 & 1 & 0\\
0 & 0 & 1\\
0 & 0 & 0
\end{bmatrix}$$
This is a system of 9 scalar equations. One can attempt to solve this equation using a CAS (Computer Algebra System), for example, Maple and see that the system does not have a solution, i.e. $R$ does not exist.

So, what kind of operators have square roots?

Theorem. Identity + Nilpotent has a square root.

Proof. Let $N$ be nilpotent. Then $N^m=0$ for some positive integer $m$. Consider the Taylor series for $\sqrt{1+x}$:
$$\sqrt{1+x}=1+a_1x+a_2x^2+a_3x^3+\cdots$$
Using the nilpotency of $N$, we may guess that $\sqrt{I+N}$ takes the form
$$I+a_1N+a_2N^2+\cdots+a_{m-1}N^{m-1}$$
Now,
\begin{align*} (I+a_1N+a_2N^2+\cdots+a_{m-1}N^{m-1})^2=I&+2a_1N+(2a_2+a_1^2)N^2\\&+(2a_3+2a_1a_2)N^3+\cdots\\ &+(2a_{m-1}+\cdots)N^{m-1}=I+N \end{align*}
and by comparing the coefficients
$a_1=\frac{1}{2}$, $2a_2+a_1^2=0$, so $a_2=-\frac{1}{8}$, $2a_3+2a_1a_2=0$, so $a_3=\frac{1}{16}$, and so and forth.

Theorem. Suppose that $T: V\longrightarrow V$ is invertible. Then $T$ has a square root.

Proof. Let $\lambda_1,\lambda_2,\cdots,\lambda_m$ be the distinct eigenvalues of $T$. Then for each $j$, there exists a nilpotent operator $N_j: G(\lambda_j,T)\longrightarrow G(\lambda_j,T)$ such that $T|{G(\lambda_j,T)}=\lambda_jI+N_j$. (See [1] , p 252, Theorem 8.21 for a proof.) Here, $G(\lambda_j,T)$ is the eigenspace of $T$ corresponding to the eigenvalue $\lambda_j$. Since $T$ is invertible, no $\lambda_j$ is equal to 0, so we can write $$T|{G(\lambda_j,T)}=\lambda_j\left(I+\frac{N}{\lambda_j}\right)$$
Since $\frac{N}{\lambda_j}$ is nipotent, $I+\frac{N}{\lambda_j}$ has a square root. Let $R_j$ be $\sqrt{\lambda_j}$ times the square root of $I+\frac{N}{\lambda_j}$. Any vector $v\in V$ can be written uniquely in the form
$$v=u_1+u_2+\cdots+u_m,$$
where each $u_j$ is in $G(\lambda_j,T)$. Using this decomposition, define an operator $R:V\longrightarrow V$ by
$$Rv=R_1u_1+R_2 u_2+\cdots+R_m u_m$$
Then $R^2u_j=R_j^2 u_j=T|{G(\lambda_j,T)}u_j=\lambda_ju_j$ and so \begin{align*} R^2v&=R_1^2u_1+R_2^2u_2+\cdots+R_m^2u_m\\ &= T|{G(\lambda_1,T)}u_1+T|{G(\lambda_2,T)}u_2+\cdots+T|{G(\lambda_m,T)}u_m=Tv
\end{align*}
Therefore, $R$ is a square root of $T$.

Example. The $\mathrm{NOT}$ gate is invertible. It is an inverse of itself.

The proof of the above theorem suggests that an operator $T$ has a square root if there is a spectral decomposition of $T$ with respect to the standard inner product of $\mathbb{C}^n$. A normal operator is such an operator. Recall that an operator $T: V\longrightarrow V$ is normal if $T^\ast T=TT^\ast$. The following theorem guarantees the existence of such a spectral decomposition of a normal operator.

Theorem. A matrix $T$ is normal if and only if there exists a diagonal matrix $\Lambda$ and an unitary matrix $U$ such that $T=U\Lambda U^\ast$.

Here, the diagonal entries of $\Lambda$ are the eigenvalues of $T$ and the columns of $U$ are the eigenvectors of $T$. The matching eigenvalues in $\Lambda$ come in the same order as the eigenvectors are ordered as columns of $U$.

Let $T: V\longrightarrow V$ be a normal operator and write it as its spectral decomposition, using Dirac’s braket notation,
$$T=\sum_j\lambda_j|v_j\rangle\langle v_j|$$
Define
$$f(T)=\sum_j f(\lambda_j)|v_j\rangle\langle v_j|$$
Using this definition, one can define, for example, the exponential, square root, and logarithmic operator of $T$.

Example. Let $T=\begin{bmatrix}
0 & 1\\
1 & 0
\end{bmatrix}$. Find the square root of $T$.

Solution. $T$ is hermitian (symmetric), so it is normal. The spectral decomposition of $T$ is given by
$$T=|v_1\rangle\langle v_1|-|v_2\rangle\langle v_2|,$$
where
$$|v_1\rangle=\frac{1}{\sqrt{2}}\begin{bmatrix}
1\\
1
\end{bmatrix}\ \mathrm{and}\ |v_2\rangle=\frac{1}{\sqrt{2}}\begin{bmatrix}
1\\
-1
\end{bmatrix}$$
Now,
\begin{align*} \sqrt{T}&=|v_1\rangle\langle v_1|+i|v_2\rangle\langle v_2|\\ &=\frac{1}{2}\begin{bmatrix} 1 & 1\\ 1 & 1 \end{bmatrix}+\frac{i}{2}\begin{bmatrix} 1 & -1\\ -1 & 1 \end{bmatrix}\\ &=\begin{bmatrix} \frac{1+i}{2} & \frac{1-i}{2}\\ \frac{1-i}{2} & \frac{1+i}{2} \end{bmatrix} \end{align*}

References:

  1. Sheldon Axler, Linear Algebra Done Right, Third Edition, Springer, 2015
  2. Michael A. Nielsen and Isaac L. Chuang, Quantum Computation and Quantum Information, Cambridge University Press, 2000

The Eigenvectors of a Hermitian Operator Are Mutually Orthogonal

In quantum mechanics, operators are required to be Hermitian. A reason for this is that the eigenvalues of a quantum mechanical operator are interpreted as physically measurable quantities such as positions, momenta, energies, etc. and therefore they are required to be real. As is well-known, Hermitian operators have all real eigenvalues. Hermitian operators also have another nice property. The eigenvectors of a Hermitian operator are mutually orthogonal. Here, we prove this only for the case of matrix operators. Let $A$ be a Hermitian operator and let $|\lambda_i\rangle$ be the eigenvectors of $A$ with distinct eigenvalues $\lambda_i$. Then we have $A|\lambda_i\rangle=\lambda_i|\lambda_i\rangle$. So, $\langle\lambda_j|A|\lambda_i\rangle=\lambda_i\langle\lambda_j|\lambda_i\rangle$. On the other hand,
\begin{align*} \langle\lambda_j|A&=(A^\dagger|\lambda_j)^\dagger\\ &=(A|\lambda_j\rangle)^\dagger\ (A^\dagger=A)\\ &=(\lambda_j|\lambda_i\rangle)^\dagger\\ &=\bar\lambda_j\langle\lambda_j|\\ &=\lambda_j\langle\lambda_j|\ (\mbox{the $\lambda_i$ are real}) \end{align*}
From this we also obtain $\langle\lambda_j|A|\lambda_i\rangle=\lambda_j\langle\lambda_j|\lambda_i\rangle$. This means that $\lambda_i\langle\lambda_j|\lambda_i\rangle=\lambda_j\langle\lambda_j|\lambda_i\rangle$, i.e. we have
$$(\lambda_i-\lambda_j)\langle\lambda_j|\lambda_i\rangle=0$$
If $i\ne j$ then $\lambda_i\ne\lambda_j$ and so $\langle\lambda_j|\lambda_i\rangle=0$.

Example. Let us consider the matrix $A=\begin{pmatrix}
3 & 1+i\\
-1+i & -3
\end{pmatrix}$. The adjoint (i.e. the conjugate transpose of this matrix) of $A$ is $A^\dagger=\begin{pmatrix}
3 & -1-i\\
1-i & -3
\end{pmatrix}$. Since $A\ne A^\dagger$, $A$ is not Hermitian. Although $A$ is not Hermitian, it has real eigenvalues $\pm\sqrt{7}$ and the eigenvectors corresponding to the eigenvectors $\sqrt{7}$ and $=-\sqrt{7}$ are, respectively,
$$v_1=\begin{pmatrix}
\frac{1+i}{3-\sqrt{7}}\\
1
\end{pmatrix},\ v_2=\begin{pmatrix}
\frac{1+i}{3+\sqrt{7}}\\
1
\end{pmatrix}$$
$\langle v_1|v_2\rangle=2$, so they are not orthogonal. Interestingly, $v_1$ and $v_2$ are orthogonal with respect to the inner product
\begin{equation}
\label{eq:jproduct}
\langle v_1|J|v_2\rangle
\end{equation}
where
$$J=\begin{pmatrix}
1 & 0\\
0 & -1
\end{pmatrix}$$
The matrices of the form $$H=\begin{pmatrix}
a & b\\
-\bar b & d
\end{pmatrix},$$ where $a$ and $d$ are real and $(a-d)^2-4|b|^2>0$ (this condition is required to ensure that such a matrix has two distinct real eigenvalues), are self-adjoint with respect to the inner product \eqref{eq:jproduct} i.e. $H$ satisfies \begin{equation}\label{eq:jself-adjoint}\langle Hv_1|J|v_2\rangle=\langle v_1|J|Hv_2\rangle\end{equation} which is equivalent to $$H^\dagger J=JH$$

Exercise. Prove that the eigenvectors of a matrix which satisfies \eqref{eq:jself-adjoint} are mutually orthogonal withe respect to the inner product \eqref{eq:jproduct}.

Optimization Problems II: Business and Economic Optimization Problems

In here, we discussed several examples of optimizations problems, mostly geometric optimization problems. In this note, we study business and economic optimization problems. Let us begin with the following example.

Example. Suppose that the price and demand for a particular luxury automobile are related by the demand equation $p+10x=200,000$, where $p$ is the price per car in dollars and $x$ is the number of cars that will be purchased at that price. What price should be charged per car if the total revenue is to be maximized?

Solution. Recall that the total revenue $R$ is given by $R=xp$, where $p$ is the price per item and $x$ is the number of items sold. In terms of $x$, the revenue is written as
$$R(x)=200,000x-10x^2$$
$R(x)=200,000-20x$ and set it equal to 0, we find the critical point $x=10,000$. Since $R^{\prime\prime}(x)=-20<0$, $R(10,000)=100,000,0000$, i.e. a billion dollars. We don’t actually calculus to see this. $R(x)$ is a quadratic function with negative leading coefficient, so it assumes as maximum at the $x$-coordinate of its vertex, $x=-\frac{b}{2a}=-\frac{200,000}{2\cdot 10}=10,000$. To answer the question, the price at which the total revenue $R$ is maximized is
$$p=-10(10,000)+200,000=100,000\ \mathrm{dollars}$$
As you might have noticed by now, it could have been shorter if we wrote $R$ in terms of the price $p$ since the question is about the price at which the revenue is maximized. In terms of $p$, $R$ is written as
$$R(p)=-\frac{p^2}{10}+20,000p$$
Either by solving $R'(p)=-\frac{p}{10}+20,000=0$ or by $p=-\frac{b}{2a}=\frac{20,000}{\frac{2}{10}}$, we find $p=\$ 100,000$.

Example. Suppose that the cost, in dollars, of producing $x$ hundred bicycles is given by $C(x)=x^2-2x+4900$. What is the minimum cost?

Solution. Either by solving $C'(x)=2x-2=0$ or by $x=-\frac{b}{2a}=-\frac{-2}{2\cdot 1}$, we find that $C(x)$ assumes the minimum at $x=1$, i.e. the cost of production is the minimum when 100 bicycles are produced. The minimum cost is $C(1)=4899$ dollars.

Example. In the preceding example, find the minimum average cost.

Solution. Recall that the average cost $\bar C(x)$ is given by $\bar C(x)=\frac{C(x)}{x}$, so we have
$$\bar C(x)=x-2+\frac{4900}{x}$$
Setting $C'(x)=1-\frac{4900}{x^2}$ equal to 0, we find $x=70$. Since $C^{\prime\prime}(x)=\frac{9800}{x^3}>0$ when $x=70$, $\bar C(70)=138$ (in dollars) is the minimum average cost.

Here is an interesting theorem from economics.

Theorem. The average cost is minimized at a level of production at which marginal cost equals average cost, i.e. when
$$C'(x)=\bar C(x)$$

Proof. Since $\bar C(x)=\frac{C(x)}{x}$, we obtain by the quotient rule
$$\bar C'(x)=\frac{xC'(x)-C(x)}{x^2}$$
Setting $\bar C'(x)=0$, we have
$$xC'(x)-C(x)=0,$$
that is
$$C'(x)=\frac{C(x)}{x}=\bar C(x)$$

The preceding example can be quickly answered using this Theorem. Setting $C'(x)=\bar C(x)$, we have
$$2x-2=x-2+\frac{4900}{x}$$
Simplifying this we obtain
$$x^2=4900$$
Hence, $x=70$ as we found earlier.

Example. The cost in dollars of producing $x$ stereos is given by $C(x)=70x+800$. The demand equation is $20p+x=18000$. (a) What level of production maximizes profit? (b) What is the price per stereo when profit is maximized? (c) What is the maximum profit?

Solution. From the demand equation, we obtain $p=-0.05x+900$. Recall that the profit function $P(x)$ is given by
\begin{align*} P(x)&=R(x)-C(x)\\ &=xp-C(x)\\ &=-0.05x^2+900x-(70x+800)\\ &=0.05x^2+830x-800 \end{align*}
Setting $P'(x)=-0.1 x+830$ equal to 0, we find the critical point $x=8300$. $P^{\prime\prime}(x)=-0.1<0$, so the profit has the maximum at $x=8300$.

(a) The level of production that maximizes profit is $x=8300$.

(b) The price per stereo at which profit is maximized is
$$p=-0.05(8300)+900=485\ \mathrm{dollars}$$

(c) The maximum profit is
$$P(8300)=-0.05(8300)^2+830(8300)-800=3,443,700\ \mathrm{dollars}$$

Here is another interesting theorem from economics.

Theorem. The profit is maximized when the marginal revenue equals the marginal cost, that is, when $R'(x)=C'(x)$.

Proof. Differentiating revenue function $P(x)=R(x)-C(x)$, we have
$$P'(x)=R'(x)-C'(x)$$
The critical point is obtained from $P'(x)=0$, i.e. when $R'(x)-C'(x)=0$ or $R'(x)=C'(x)$. That is, when the marginal revenue equals the marginal cost. This completes the proof.

The preceding example can be answered quickly using this theorem. Setting $R'(x)=C'(x)$, we have
$$-0.1x+900=70$$
or
$$x=8300$$

Example. A theater has 204 seats. The manager finds that he can fill all the seats if he charges \$4.00 per ticket. For each ten cents that he raises the ticket price he will sell three fewer seats. What ticket price should he charge to maximize the ticket revenue?

Solution. When the manager increases $n$ cents per ticket price, the number $x(n)$ of tickets sold is $x(n)=204-3n$ and the price $p(n)$ per ticket is $p(n)=4+0.1n$. Then the total revenue is given by
\begin{align*} R(n)&=x(n)p(n)\\ &=(204-3n)(4+0.1n)\\ &=-0.3n^2+8.4n+816 \end{align*}
Setting $R'(n)=-0.6n+8.4$ equal to 0, we find $n=14$. Since $R^{\prime\prime}(n)=-0.6<0$, the ticket revenue is maximized when $n=14$, i.e. when the ticket price is \$4.00+\$1.40=\$5.40.

Alternatively, one can easily find the demand equation which is linear in this case. The equation of line through two points $(204,4)$ and $(201,4.1)$ is given by
$$p=-\frac{1}{30}x+\frac{54}{5}$$
and so we obtain the revenue function
$$R(x)=-\frac{1}{30}x^2+\frac{54}{5}x$$
Setting $R'(x)=-\frac{x}{!5}+\frac{54}{5}$ equal to 0, we find $x=162$ and plugging this into the demand equation for $x$ gives $p=5.40$.

Let us consider a demand equation given as $x=D(p)$. If one were to consider $\frac{dx}{dp}$, the rate of change of demand with respect to price, often it would be convenient to have it as a dimensionless quantity, i.e. one that does not depend on particular units. For that we define a new quantity by dividing $\frac{dx}{dp}$ by $\frac{x}{p}$. The resulting ratio is called the elasticity of demand and is denoted by $\epsilon_D$.

Definition. The elasticity of demand $\epsilon_D$ is defined by
$$\epsilon_D=\frac{\frac{dx}{dp}}{\frac{x}{p}}=\frac{p}{x}\frac{dx}{dp}$$

In economics, demand decreases as price increases, so demand function is a decreasing function, i.e. $\frac{dx}{dp}<0$. Since both $x$ and $p$ are positive, the elasticity $\epsilon_D$ is always negative. The demand is said to be elastic if $|\epsilon_D|>1$, inelastic if $|\epsilon_D|<1$, and unitary if $|\epsilon_D|=1$.

Definition. The relative change of a function whose equation is $p=f(a)$ as $q$ changes from $q_1$ to $q_2$ is
$$\frac{f(q_2)-f(q_1)}{f(q_1)}$$
The percentage change is defined as
$$100\times\frac{f(q_2)-f(q_1)}{f(q_1)}$$

Example. (a) If the demand equation is $x=100-3p$ , find the elasticity of demand when $p=1$.

(b) Show that the elasticity equals the ratio of the relative change in demand to the relative change in price when $p$ changes from 1 to 2.

Solution. (a) $x=100-3p$, $\frac{dx}{dp}=-3$, and when $p=1$, $x=97$, so
$$\epsilon_D=\frac{p}{x}\frac{dx}{dp}=-\frac{3}{97}$$
Since $|\epsilon_D|<1$, the demand is inelastic.

(b) As $p$ changes from 1 to 2, the relative change in price is $\frac{2-1}{1}=1$. When $p$ changes from 1 to 2, $x$ changes from 97 to 94, so the relative change in demand is $-\frac{3}{97}$. Hence, the ration of the relative change in demand to the relative change in price is
$$\frac{-\frac{3}{97}}{1}=-\frac{3}{97}=\epsilon_D$$

Example. Given the demand equation $x=\sqrt{100-2p}$, find the elasticity of demand when $p=18$. Is the demand elastic or inelastic at $p=18$?

Solution. The elasticity $\epsilon_D$ at $p=18$ is
\begin{align*} \epsilon_D&=\frac{p}{x}\frac{dx}{dp}\\ &=\frac{18}{\sqrt{100-2(18)}}\left(-\frac{1}{\sqrt{100-2p}}\right)_{p=18}\\ &=\frac{18}{8}\left(-\frac{1}{8}\right)\\ &=-\frac{9}{32} \end{align*}
Since $|\epsilon_D|=\frac{9}{32}<1$, the demand at $p=18$ is inelastic.

Example. Show that when the revenue is maximized, $|\epsilon_D|=1$.

Solution. The total revenue is $R=xp$, so
\begin{align*} \frac{dR}{dp}&=\frac{dx}{dp}p+x\\ &=x\left(1+\frac{p}{x}\frac{dx}{dp}\right)\\ &=x(1+\epsilon_D) \end{align*}
Since $x>0$, $\frac{dR}{dp}=0$ if and only if $\epsilon_D=-1$. If $\epsilon_D<-1$, then $|\epsilon_D|>1$ i.e. the demand is elastic, and $\frac{dR}{dp}<0$ (the revenue is decreasing). If $-1<\epsilon_D<0$, then $|\epsilon_D|<1$ i.e. the demand is inelastic, and $\frac{dR}{dp}>0$ (the revenue is increasing). So, the revenue is maximized when $|\epsilon_D|=1$.

When demand is a function of price, the elasticity can be written as
$$\epsilon_D=\frac{\frac{p}{x}}{\frac{dp}{dx}}$$

Born Rule

Let $A$ be a Hermitian (self-adjoint) operator. Due to Max Born, who proposed the statistical interpretation of quantum mechanics, the probability of measuring an eigenvalue $\lambda_i$ of $A$ in a state $\psi$ is $\langle\psi|P_i|\psi\rangle$, where $P_i$ is the projection onto the eigenspace of $A$ corresponding to $\lambda_i$ i.e. $P_i$ is a linear map $P_i: V_{\lambda_i}\longrightarrow V_{\lambda_i}$ such that $P^2=I$. If we assume no degeneracy of $\lambda_i$, then the eigenspace of $A$ corresponding to $\lambda_i$ is one-dimensional. In this case, $P_i=|\lambda_i\rangle\langle\lambda_i|$, where the $|\lambda_i\rangle$ are orthonormal. Now,
\begin{align*} \langle\psi|P_i|\psi\rangle&=\langle\psi|\lambda_i\rangle\langle\lambda_i|\psi\rangle\\ &=\overline{\langle\lambda_i|\psi\rangle}\langle\lambda_i|\psi\rangle\\ &=|\langle\lambda_i|\psi\rangle|^2 \end{align*}
The complex number $\langle\lambda_i|\psi\rangle$ is called the probability amplitude, so the probability is the squared magnitude of the amplitude. This is called the Born rule. (It is named after Max Born.) Suppose $|\psi\rangle$ is a unit vector. Then since $\sum_i|\langle\lambda_i|\psi\rangle|^2=1$, we have \begin{equation}\label{eq:complete}\sum_iP_i=\sum_i|\lambda_i\rangle\langle\lambda_i|=I\end{equation} \eqref{eq:complete} is called the completeness of the $|\lambda_i\rangle$.

One may consider $A$ as a random variable with its eigenvalues as the values of random variable. (See Remark below.) Since for each $j$, $A|\lambda_j\rangle=\lambda_j|\lambda_j\rangle$,
$$\langle\lambda_i|A|\lambda_j\rangle=\lambda_j\langle\lambda_i|\lambda_j\rangle=\lambda_j\delta_{ij}$$
The expected value $\langle A\rangle$ of the self-adjoint operator $A$ in the state $|\psi\rangle$ is naturally defined as a weighted average
\begin{align*} \langle A\rangle&=\sum_i\lambda_i|\langle\lambda_i|\psi\rangle|^2\\ &=\sum_i\langle\lambda_i|\psi\rangle\langle\psi|\lambda_i|\lambda_i\rangle\\
&=\sum_i\langle\lambda_i|\psi\rangle\langle\psi|A|\lambda_i\rangle\ (A|\lambda_i\rangle=\lambda_i|\lambda_i\rangle)\\
&=\sum_i\langle\psi|A|\lambda_i\rangle\langle\lambda_i|\psi\rangle\\&=\langle\psi|A|\sum_i|\lambda_i\rangle\langle\lambda_i|\psi\rangle\\&=\langle\psi|A|I|\psi\rangle\ (\mbox{by the completeness of the}\ |\lambda_i\rangle)\\&=\langle\psi|A|\psi\rangle \end{align*}
Hence, we have
$$\langle A\rangle=\langle\psi|A|\psi\rangle$$

Remark. Let $\Omega=\bigcup_iV_{\lambda_i}$, i.e. $\Omega$ is the set of all eigenvectors of $A$. Let $\mathcal{U}$ consist of $\emptyset$, $\Omega$ and unions of subsequences of ${V_{\lambda_i}}$. Then $\mathcal{U}$ is a $\sigma$-algebra of subsets of $\Omega$. Define $X:\Omega\longrightarrow\mathbb{R}$ by
$$X|\lambda_i\rangle=\langle\lambda_i|A|\lambda_i\rangle=\lambda_i$$
Then $X$ is a random variable.