The Cross Product

Definition. Let ${\bf u}=(u_1,u_2,u_3)$ and ${\bf v}=(v_1,v_2,v_3)$. Then the cross product ${\bf u}\times {\bf v}$ is defined by \begin{equation}\label{eq:crossprod}{\bf u}\times{\bf v}=(u_2v_3-u_3v_2,u_3v_1-u_1v_3,u_1v_2-u_2v_1)\end{equation} The cross product can be also written as the determinant \begin{equation}\label{eq:crossprod2}{\bf u}\times{\bf v}=\begin{vmatrix}{\bf i} & {\bf j} & {\bf k}\\u_1 & u_2 & u_3\\v_1 & v_2 & v_3\end{vmatrix}\end{equation} One can calculate the determinant as shown in Figure 1. You multiply three entries along each indicated arrow. When you multiply three entries along each red arrow, you also multiply by −1. This is called the Rule of Sarrus named after a French mathematician Pierre Frédéric Sarrus.

Figure 1. The Cross Product

Unlike the dot product, the outcome of the dot product is a vector. Also unlike the dot product, the cross product is anticommutative i.e. $${\bf u}\times{\bf v}=-{\bf v}\times{\bf u}$$ Furthermore, ${\bf u}\times{\bf v}$ is orthogonal to both ${\bf u}$ and ${\bf v}$. This can be seen by showing that $$({\bf u}\times{\bf v})\cdot{\bf u}=({\bf u}\times{\bf v})\cdot{\bf v}=0$$ The cross product tells us about the orientation of the plane containing two vectors ${\bf u}$ and ${\bf v}$ as shown in Figure 2.

Figure 2. The orientations

Theorem. If $\theta$ is the angle between ${\bf u}$ and ${\bf v}$ ($0\leq\theta\leq\pi$), then \begin{equation}\label{eq:crossprod3}|{\bf u}\times{\bf v}|=|{\bf u}||{\bf v}|\sin\theta\end{equation}

Proof. It would require some work with algebra but one can show that $$|{\bf u}\times{\bf v}|^2=|{\bf u}|^2|{\bf v}|^2-({\bf u}\cdot{\bf v})^2$$ This, along with ${\bf u}\cdot{\bf v}=|{\bf u}||{\bf v}|\cos\theta$, will lead to \eqref{eq:crossprod3}.

From \eqref{eq:crossprod3}, we can easily see that two nonzero vectors ${\bf u}$ and ${\bf v}$ are parallel if and only if ${\bf u}\times{\bf v}=0$.

The standard basis vectors ${\bf i}$, ${\bf j}$, ${\bf k}$ satisfy the following cross products: $${\bf i}\times{\bf j}={\bf k},\ {\bf j}\times{\bf k}={\bf i},\ {\bf k}\times{\bf i}={\bf j}$$

The following theorem summarizes the properties of the cross product.

Theorem. Let ${\bf u}$, ${\bf v}$, and ${\bf w}$ be vectors and $c$ a scalar. Then

  1. ${\bf u}\times{\bf v}=-{\bf v}\times{\bf u}$
  2. $(c{\bf u})\times{\bf v}=c({\bf u}\times{\bf v})={\bf u}\times(c{\bf v})$
  3. ${\bf u}\times({\bf v}+{\bf w})={\bf u}\times{\bf v}+{\bf u}\times{\bf w}$
  4. $({\bf u}+{\bf v})\times{\bf w}={\bf u}\times{\bf w}+{\bf v}\times{\bf w}$
  5. ${\bf u}\cdot({\bf v}\times{\bf w})=({\bf u}\times{\bf v})\cdot{\bf w}$
  6. ${\bf u}\times({\bf v}\times{\bf w})=({\bf u}\cdot{\bf w}){\bf v}-({\bf u}\cdot{\bf v}){\bf w}$

The products in 5 and 6 are called, respectively, a scalar triple product and a vector triple product.

From Figure 3, we see that \begin{equation}\label{eq:areaparallelogram}|{\bf u}\times{\bf v}|\end{equation} is equal to the area of the parallelogram determined by ${\bf u}$ and ${\bf v}$.

Figure 3. The area of a parallelogram

Example. Find a vector perpendicular to the plane that passes through the points $P(1,4,6)$, $Q(-2,5,-1)$, and $R(1,-1,1)$.

Solution. The vectors $\overrightarrow{PQ}=(-3,1,-7)$ and $\overrightarrow{PR}=(0,-5,-5)$ lie in the plane through $P,Q,R$. So the cross product $$\overrightarrow{PQ}\times\overrightarrow{PR}=(-40,-15,15)$$ is perpendicular to the plane.

Example. Find the area of the triangle with vertices $P(1,4,6)$, $Q(-2,5,-1)$, and $R(1,-1,1)$.

Solution. In the previous example, we found $\overrightarrow{PQ}\times\overrightarrow{PR}=(-40,-15,15)$ and by \eqref{eq:areaparallelogram} we know that $|\overrightarrow{PQ}\times\overrightarrow{PR}|=\sqrt{(-40)^2+(-15)^2+{15}^2}=5\sqrt{82}$ is the area of the parallelogram determined by the two vectors $\overrightarrow{PQ}$ and $\overrightarrow{PR}$. The area of the triangle with vertices $P$, $Q$, and $R$ is just the half of the area of the parallelogram i.e. $\frac{5}{2}\sqrt{82}$.

From Figure 4, the volume of the parallelepiped determined by ${\bf u}$, ${\bf v}$, and ${\bf w}$ is $$V=|{\bf v}\times{\bf w}||{\bf u}|\cos\theta={\bf u}\cdot({\bf v}\times{\bf w})$$ In Figure 4, the vectors ${\bf u}$, ${\bf v}$, and ${\bf w}$ are positioned well enough so that the triple scalar product ${\bf u}\cdot({\bf v}\times{\bf w})$ is positive but depending on how they are positioned, it could be negative. Since the volume always has to be positive, it is given by \begin{equation}\label{eq:volumeparallelepiped}V=|{\bf u}\cdot({\bf v}\times{\bf w})|\end{equation}

Figure 4. The volume of a parallelepiped

The scalar triple product ${\bf u}\cdot({\bf v}\times{\bf w})$ can be written nicely by the determinant \begin{equation}\label{eq:scalartripleprod}{\bf u}\cdot({\bf v}\times{\bf w})=\begin{vmatrix}u_1 & u_2 & u_3\\v_1 & v_2 & v_3\\w_1 & w_2 & w_3\end{vmatrix}\end{equation} The calculation of the determinant can be done by the rule of Sarrus shown in Firgure 1.

Example. Determine if ${\bf u}=(1,4,-7)$, ${\bf v}=(2,-1,4)$, and ${\bf w}=(0,-9,18)$ are coplanar.

Solution. From Figure 4 above, one can easily see that the three vectors ${\bf u}$, ${\bf v}$ and ${\bf w}$ are coplanar (i.e. they are in the same plane) if and only if $\theta=\frac{\pi}{2}$ if and only if ${\bf u}\cdot ({\bf v}\times{\bf w})=0$. \begin{align*}{\bf u}\cdot ({\bf v}\times{\bf w})&=\begin{vmatrix}1 & 4 & -7\\2 & -1 & 4\\0 & -9 & 18\end{vmatrix}\\&=0\end{align*} Therefore, ${\bf u}$, ${\bf v}$ and ${\bf w}$ are coplanar.

The notion of the cross product can be used to describe physical effects involving rotations such as the circulation of electric/magnetic fields or fluids. Here we discuss the torque as a physical application of the cross product. Look at Figure 5.

Figure 5. Torque

Assume that a force ${\bf F}$ is acting on a rigid body at a point given by a position vector ${\bf r}$. The resulting turning effect ${\bf\tau}$, called the torque, can be measured by \begin{equation}\label{eq:torque}{\bf\tau}={\bf r}\times{\bf F}\end{equation}

Example. A bolt is tightened by applying a 40 N force to a 0.25 m wrench as shown in Figure 6. Find the magnitude of the torque about the center of the bolt.

Figure 6. Toque

Solution. The magnitude of the torque is \begin{align*}|{\bf\tau}|&=|{\bf r}\times{\bf F}|=|{\bf r}||{\bf F}|\sin 75^\circ=(0.25)(40)\sin 75^\circ\\&=10\sin 75^\circ\approx 9.66\ \mathrm{Nm}\end{align*}

Examples in this note have been taken from [1].

References.

[1] Calculus, Early Transcendentals, James Stewart, 6th Edition, Thompson Brooks/Cole

The Dot Product

Let us begin with the following definition.

Definition. Let ${\bf u}=(u_1,u_2,u_3)$ and ${\bf v}=(v_1,v_2,v_3)$. Then the dot product ${\bf u}\cdot{\bf v}$ is defined by $${\bf u}\cdot{\bf v}=u_1v_1+u_2v_2+u_3v_3$$

The name “product” is misleading as it is not really an operation. The reason is simple because the outcome of a dot product is a scalar, not a vector. So what is a big deal about this dot product? The dot product defines the length of a vector. Let ${\bf u}=(u_1,u_2,u_3)$. Then $$|{\bf u}|=\sqrt{{\bf u}\cdot {\bf u}}=\sqrt{u_1^2+u_2^2+u_3^3}$$ Furthermore it can also define the distance between two points in space as shown in Figure 1:

Figure 1. Distance between two point P and Q

Let two position vectors ${\bf v}=(v_1,v_2,v_3)$ and ${\bf w}=(w_1,w_2.w_3)$ respectively represent points $P$ and $Q$ in space. The the distance $\overline{PQ}$ between the two points $P$ and $Q$ is the length of the vector ${\bf v}-{\bf w}$ $$\overline{PQ}=|{\bf v}-{\bf w}|=\sqrt{({\bf v}-{\bf w})\cdot({\bf v}-{\bf w})}=\sqrt{(v_1-w_1)^2+(v_2-w_2)^2+(v_3-w_3)^2}$$

Example. \begin{align*}(2,4)\cdot(3,-1)&=2(3)+4(-1)=2\\(-1,7,4)\cdot\left(6,2,-\frac{1}{2}\right)&=-1(6)+7(2)+4\left(-\frac{1}{2}\right)=6\\({\bf i}+2{\bf j}-3{\bf k})\cdot(2{\bf j}-{\bf k})&=1(0)+2(2)+(-3)(-1)=7\end{align*}

The dot product satisfies the following properties. These properties can be easily verified from its definition.

Theorem. Let ${\bf u}$, ${\bf v}$ and ${\bf w}$ be vectors in space and $c$ a scalar. Then

  1. ${\bf u}\cdot{\bf v}={\bf v}\cdot{\bf u}$
  2. ${\bf u}\cdot ({\bf v}+{\bf w})={\bf u}\cdot{\bf v}+{\bf u}\cdot{\bf w}$
  3. $(c{\bf u})\cdot{\bf v}=c({\bf u}\cdot{\bf v})={\bf u}\cdot(c{\bf v})$
  4. ${\bf 0}\cdot {\bf u}=0$

Although this is beyond the scope of our discussion here, I would like to mention that the notion of the dot product can be generalized so that it can give rise to a different kind of length. Such generalization is called a scalar product or an inner product. You can read more about it here in case you are interested. A scalar product would satisfy the properties 1-4 in the above theorem. The dot product is associated with the length we are most familiar with, called the Euclidean length but that is not the only kind of length out there. For example, in the vector space of continuous functions on the closed interval $[0,1]$ (which was mentioned here), the scalar product of two functions $f$ and $g$, denoted by $\langle f,g\rangle$ is defined by $$\langle f,g\rangle=\int_0^1 f(x)g(x)dx$$ and the length of $f$ is defined by $$|f|=\sqrt{\langle f,f\rangle}=\sqrt{\int_0^1|f(x)|^2dx}$$ This type of a scalar product plays a very important role in quantum mechanics. It is used to measure the probability of a particle (such as an electron) to be in a particular quantum mechanical state.

There is an alternative description of the dot product.

Theorem. If $\theta$ is the angle between the vectors ${\bf u}$ and ${\bf v}$, where $0\leq\theta\leq\pi$, then \begin{equation}\label{eq:dotproduct}{\bf u}\cdot{\bf v}=|{\bf u}||{\bf v}|\cos\theta\end{equation}

Proof. By applying the Law of Cosines to triangle $\triangle OPQ$ in Figure 1, we obtain \begin{equation}\label{eq:lawcosine}|{\bf u}-{\bf v}|^2=|{\bf u}|^2+|{\bf v}|^2-2|{\bf u}||{\bf v}|\cos\theta\end{equation} \begin{align*}|{\bf u}-{\bf v}|^2&=({\bf u}-{\bf v})\cdot({\bf u}-{\bf v})\\&={\bf u}\cdot{\bf u}-{\bf u}\cdot{\bf v}-{\bf v}\cdot{\bf u}+{\bf v}\cdot{\bf v}\\&=|{\bf u}|^2-2{\bf u}\cdot{\bf v}+|{\bf v}|^2\end{align*}Replacing $|{\bf u}-{\bf v}|^2$ in \eqref{eq:lawcosine} by this last expression results in $$|{\bf u}|^2-2{\bf u}\cdot{\bf v}+|{\bf v}|^2=|{\bf u}|^2+|{\bf v}|^2-2|{\bf u}||{\bf v}|\cos\theta$$ and this simplifies to $${\bf u}\cdot{\bf v}=|{\bf u}||{\bf v}|\cos\theta$$

Example. Find the angle between the vector ${\bf u}=(2,2,-1)$ and ${\bf v}=(5,-3,2)$.

Solution. From \eqref{eq:dotproduct}, \begin{align*}\cos\theta&=\frac{{\bf u}\cdot{\bf v}}{|{\bf u}||{\bf v}|}\\&=\frac{2(5)+2(-3)+(-1)(2)}{\sqrt{2^2+2^2+(-1)^2}\sqrt{5^2+(-3)^2+2^2}}\\&=\frac{2}{3\sqrt{38}}\end{align*} Hence, $$\theta=\cos^{-1}\left(\frac{2}{3\sqrt{38}}\right)\approx 1.46\ \mathrm{rad}\ (84^\circ)$$

The alternative description of the dot product in \eqref{eq:dotproduct} is usually introduced as the definition of the dot product in high school/freshmen physics course.

From \eqref{eq:dotproduct}, we see that two vectors ${\bf u}$ and ${\bf v}$ are perpendicular or orthogonal (i.e. the angle $\theta$ between ${\bf u}$ and ${\bf v}$ is $\frac{\pi}{2}$) if and only if ${\bf u}\cdot{\bf v}=0$.

Example. $2{\bf i}+2{\bf j}-{\bf k}$ is perpendicular to $5{\bf i}-4{\bf j}+2{\bf k}$ because $$(2{\bf i}+2{\bf j}-{\bf k})\cdot(5{\bf i}-4{\bf j}+2{\bf k})=2(5)+2(-4)+(-1)(2)=0$$

This is beyond the scope of our discussion here but the notion of orthogonality of two vectors can be extended to higher dimensional spaces or more abstract vector spaces by defining that: two vectors ${\bf u}$ and ${\bf v}$ are said to be orthogonal if $\langle{\bf u},{\bf v}\rangle=0$, where $\langle\ ,\ \rangle$ denotes a scalar product. In our case, $\langle{\bf u},{\bf v}\rangle={\bf u}\cdot{\bf v}$. For example, Let $V$ be the set of all continuous functions on the closed interval $[-1,1]$. Then $V$ is a vector space with addition and scalar multiplication defined in the usual way that I discussed here. Also $\langle\ ,\ \rangle$ defined by $$\langle f,g\rangle=\int_{-1}^1f(x)g(x)dx$$ for $f,g\in V$ is a scalar product. The two functions $\sin(2n\pi x)$ and $\cos(2n\pi x)$ are continuous on $[-1,1]$ so they belong to $V$ i.e. they are vectors. They are also orthogonal because $$\langle\sin(2n\pi x),\cos(2n\pi x)\rangle=\int_{-1}^1\sin(2n\pi x)\cos(2n\pi x)dx=0$$

Let us take a look at Figure 2.

Figure 2. Scalar projection

Imagine that light rays coming down on the vector ${\bf u}$ at the direction perpendicular to the vector ${\bf v}$. The the shadow of ${\bf u}$ will be cast on ${\bf v}$ (the red line segment in Figure 2). Mathematically, this shadow is called the orthographic projection of ${\bf u}$ onto ${\bf v}$. In fact, the red line segment is the orthographic projection of the length of the vector ${\bf u}$ onto ${\bf v}$. We denote it by $\mathrm{comp}_{\bf v}{\bf u}$ and call it the scalar projection of ${\bf u}$ onto ${\bf v}$. Using basic trigonometry, we can easily find that $$\mathrm{comp}_{\bf v}{\bf u}=|{\bf u}|\cos\theta$$ However, we prefer to express the scalar projection free of the angle $\theta$ i.e. in terms of only ${\bf u}$ and ${\bf v}$. This can be done using \eqref{eq:dotproduct}: \begin{align*}\mathrm{comp}_{\bf v}{\bf u}&=|{\bf u}|\cos\theta\\&=|{\bf u}|\frac{{\bf u}\cdot{\bf v}}{|{\bf u}||{\bf v}|}\\&=\frac{{\bf u}\cdot{\bf v}}{|{\bf v}|}\end{align*} Hence we obtained our preferred form of the scalar projection \begin{equation}\label{eq:scalarprojection}\mathrm{comp}_{\bf v}{\bf u}=\frac{{\bf u}\cdot{\bf v}}{|{\bf v}|}\end{equation} One can also consider the vector projection of ${\bf u}$ onto ${\bf v}$. All you have to do is to multiplying the scalar projection \eqref{eq:scalarprojection} by the direction of ${\bf v}$: \begin{equation}\label{eq:vectorprojection}\mathrm{proj}_{\bf v}{\bf u}=\mathrm{comp}_{\bf v}{\bf u}\frac{{\bf v}}{|{\bf v}|}=\frac{{\bf u}\cdot{\bf v}}{|{\bf v}|^2}{\bf v}\end{equation}

Example. Find the scalar projection and the vector projection of ${\bf u}=(1,1,2)$ onto ${\bf v}=(-2,3,1)$.

Solution. The scalar projection is $$\mathrm{comp}_{\bf v}{\bf u}=\frac{{\bf u}\cdot{\bf v}}{|{\bf v}|}=\frac{1(-2)+1(3)+2(1)}{\sqrt{(-2)^2+3^2+1^2}}=\frac{3}{\sqrt{14}}$$ The direction of ${\bf v}$ is $\frac{1}{\sqrt{14}}(-2,3,1)$. Hence, the vector projection is $$\mathrm{proj}_{\bf v}{\bf u}=\mathrm{comp}_{\bf v}{\bf u}\frac{1}{\sqrt{14}}(-2,3,1)=\frac{3}{14}(-2,3,1)=\left(-\frac{3}{7},\frac{9}{14},\frac{3}{14}\right)$$

Consider a vector ${\bf v}=(v_1,v_2,v_3)$ in space. The angle $\alpha$ between ${\bf v}$ and ${\bf i}$, the angle $\beta$ between ${\bf v}$ and ${\bf j}$, and the angle $\gamma$ between ${\bf v}$ and ${\bf k}$ are called the direction angles of ${\bf v}$. (See Figure 3.)

Figure 3. Direction angles

Now, \begin{equation}\begin{aligned}\cos\alpha&=\frac{{\bf v}\cdot{\bf i}}{|{\bf v}||{\bf i}|}=\frac{v_1}{|{\bf v}|}\\\cos\beta&=\frac{{\bf v}\cdot{\bf j}}{|{\bf v}||{\bf j}|}=\frac{v_2}{|{\bf v}|}\\\cos\gamma&=\frac{{\bf v}\cdot{\bf k}}{|{\bf v}||{\bf k}|}=\frac{v_3}{|{\bf v}|}\end{aligned}\label{eq:directioncosine}\end{equation} $\cos\alpha$, $\cos\beta$ and $\cos\gamma$ are called the direction cosines of vector ${\bf v}$. It follows from \eqref{eq:directioncosine} that $(\cos\alpha,\cos\beta,\cos\gamma)$ is the direction of ${\bf v}$, hence the name directions cosines.

Example. Find the direction angles of the vector ${\bf v}=(1,2,3)$.

Solution. $|{\bf v}|=\sqrt{1^2+2^2+3^2}=\sqrt{14}$. Using \eqref{eq:directioncosine}, we have \begin{align*}\alpha&=\cos^{-1}\left(\frac{1}{\sqrt{14}}\right)\approx 74^\circ\\\beta&=\cos^{-1}\left(\frac{2}{\sqrt{14}}\right)\approx 58^\circ\\\gamma&=\cos^{-1}\left(\frac{3}{\sqrt{14}}\right)\approx 37^\circ\end{align*}

Work

Consider a linear motion i.e. a motion of an object along a straight line. See Figure 4.

Figure 4. Work

Suppose that an object is moved by a force ${\bf F}$. If the displacement is ${\bf D}$, then the work $W$ done by this force ${\bf F}$ is defined by the scalar projection of ${\bf F}$ onto ${\bf D}$, $|{\bf F}|\cos\theta$ (this is the component of ${\bf F}$ that actually moved the object) times the distance moved $|{\bf D}|$: \begin{equation}\label{eq:work}W={\bf F}\cdot{\bf D}\end{equation}

Example. A wagon is pulled a distance of 100 m along a horizontal path by a constant force of 70 N. The handle of the wagon is held at an angle of $35^\circ$ above the horizontal. Find the work done by the force.

Solution. The force ${\bf F}$ and the displacement ${\bf D}$ are as depicted in Figure 5.

Figure 5. Work

Thus the work $W$ is \begin{align*}W&={\bf F}\cdot{\bf D}=|{\bf F}||{\bf D}|\cos 35^\circ\\&=70(100)\cos 35^\circ=5734\ \mathrm{J}\end{align*} where J, called Joule, is a unit for work which stands for Newton times meter.

Example. A force given by the vector ${\bf F}=3{\bf i}+4{\bf j}+5{\bf k}$ moves a particle from the point $P(2,1,0)$ to the point $Q(4,6,2)$. Find the work done by the force.

Solution. Here, there is no mention of a particular path the particle is taking. We assume that the motion is again linear. The displacement is ${\bf D}=\overrightarrow{PQ}=(4-2,6-1,2-0)=(2,5,2)$. Hence the work is \begin{align*}W&={\bf F}\cdot{\bf D}=(3,4,5)\cdot(2,5,2)\\&=6+20+10=36\end{align*}

Examples in this note have been taken from [1].

References.

[1] Calculus, Early Transcendentals, James Stewart, 6th Edition, Thompson Brooks/Cole

Vectors

What is a vector?

A vector is a quantity that has both direction and magnitude. Examples of vectors include displacement, velocity, force, weight, momentum, etc. A quantity that has only magnitude is called a scalar. Scalars are really just numbers. Examples of scalars include distance, speed, mass, temperature, etc. Since a vector has both direction and magnitude, it can be visually represented by a directed arrow. For instance, if a particle moves from a point $A$ to another point $B$, its displacement (i.e. the shortest distance from $A$ to $B$) is denoted by $\overrightarrow{AB}$ and it is visually represented by a directed arrow as in Figure 1.

Figure 1. A vector

The direction at which the arrow is pointing is the direction of the vector $\overrightarrow{AB}$ and the length of the arrow is the magnitude of the vector $\overrightarrow{AB}$. When it is not necessary to specify the initial point and the terminal point, a vector is denoted by a lower case alphabet letter with arrow on top like $\vec v$ or in boldface like ${\bf v}$.

Two vectors are said to be the same or equivalent if they have the same direction and the magnitude regardless of where they are located. If vectors $\overrightarrow{AB}$ and $\overrightarrow{CD}$ are the same, we write $\overrightarrow{AB}=\overrightarrow{CD}$. Figure 2 shows two equivalent vectors $\overrightarrow{AB}$ and $\overrightarrow{CD}$.

Figure 2. Equivalent vectors

The equivalence of two vectors implies that a vector can be moved around maintaining its characters (direction and magnitude) so it stays as the same vector although its location has changed (meaning its initial and terminal points have changed). Moving a vector without changing direction and magnitude is called a parallel translation.

There are two types of operations on vectors. One is vector addition and the other is scalar multiplication. Vector addition is defined pictorially by using a parallelogram or a triangle. Figure 3 shows vector addition $$\overrightarrow{AB}+\overrightarrow{AC}=\overrightarrow{AD}$$ by using a parallelogram.

Figure 3. Vector addition by a parallelogram

Figure 4 shows vector addition $$\overrightarrow{AB}+\overrightarrow{BC}=\overrightarrow{AC}$$ by using a triangle. The two ways of adding two vectors are indeed equivalent. The only difference is how you locate two vectors to add them together.

Figure 4. Vector addition by a triangle

Scalar multiplication is a product between a scalar and a vector. While many people, even some (less careful) mathematicians, consider it as an operation, it is not an operation but an action. I am not going to talk about what an action is here. In case someone is curious, you can visit the Wikipedia page on Group Action here. In calculus level, distinction between an operation and an action is not really important at all. Let $c$ be a scalar and $v$ a vector. What the scalar multiplication $c{\bf v}$ does is, depends on the value of $c$, it can stretch (when $c>1$), shrink (when $0<c<1$), or reverse the direction (when $c=-1$) of vector ${\bf v}$ as illustrated in Figure 5.

Figure 5. Scalar multiplication

Using vector addition and scalar multiplication, one can define subtraction of a vector ${\bf v}$ from another vector ${\bf u}$: $${\bf u}-{\bf v}:={\bf u}+(-{\bf v})$$ See Figure 6.

Figure 6. Vector substraction

Earlier I mentioned that a vector can be moved around while preserving its direction and magnitude, and a parallel translation of a vector is still considered to be the same as the previous vector, although it is now at a different location. Among all those same vectors, we are particularly interested in vectors that are starting the origin $O$. Figure 7 shows an example of such a vector.

Figure 7. A position vector

A vectors whose initial point is the origin $O$ is called a position vector or a located vector. A position vector is determined only by its terminal point, thereby it can be identified with a point in space and conversely a point in space can be identified with a position vector. For example, if the terminal point of a position vector ${\bf v}$ is $(a_1,a_2,a_3)$, then we regard them the same i.e. ${\bf v}=(a_1,a_2,a_3)$. Why this is such a big deal? The directed arrow representation of a vector has a lot of limitations. The most severe limitation is that it can only be useful when we can see them, i.e. their usage is limited within 3-dimensions as our perception does not allow us to go beyond 3-dimensions. However, Einstein’s theory of relativity (which has also been confirmed by numerous experiments and observations) that our universe is actually 4-dimensional. But that’s the universe we observe right now. String theory tells us that the universe can have up to 26-dimensions. It does not have to go that far beyond though. There are many other places here down on earth including computer science, economics, etc. where the notions of vectors in higher dimensions are being used. Considering position vectors resolve the limitation. Furthermore, now that we identify vectors with points in space and points are represented by ordered $n$-tuples (ordered pairs, triples, quadruples, depending on the dimension of the space) which are algebraic objects, we can use the power of algebra to describe the properties of vectors.

Given the points $A(a_1,a_2,a_3)$ and $B(b_1,b_2,b_3)$, the vector ${\bf v}$ which is represented by the directed arrow $\overrightarrow{AB}$ is $${\bf v}=(b_1-a_1,b_2-a_2,b_3-a_3)$$

Example. Find the vector represented by the directed arrow with initial point $A(2,-3,4)$ and $B(-2,1,1)$.

Solution. ${\bf v}=(-2-2,1-(-3),1-4)=(-4,4,-3)$.

The length or magnitude of a vector ${\bf v}$ is denoted by $|{\bf v}|$. For a vector ${\bf v}=(v_1,v_2)$ in the plane, $|{\bf v}|$ is given by $$|{\bf v}|=\sqrt{v_1^2+v_2^2}$$ (It’s easy to see this from Figure 7 using the Pythagorean law.) Similarly, for a vector ${\bf v}=(v_1,v_2,v_3)$ in space, $$|{\bf v}|=\sqrt{v_1^2+v_2^2+v_3^3}$$ It follows from the definition that \begin{equation}\label{eq:length}|c{\bf v}|=|c||{\bf v}|\end{equation} where $c$ is a scalar.

Vector addition and scalar multiplication can be nicely defined algebraically without using parallelograms or triangles. Furthermore, these algebraic definitions apply to vectors in arbitrary $n$-dimensional space. For vectors ${\bf u}=(u_1,u_2,u_3)$, ${\bf v}=(v_1,v_2,v_3)$ and a scalar $c$, \begin{align*}{\bf u}+{\bf v}&:=(u_1+v_1,u_2+v_2,u_3+v+3)\\c{\bf u}&:=(cu_1,cu_2,cu_3)\end{align*}

Example. If ${\bf u}=(4,0,3)$ and ${\bf v}=(-2,1,5)$, find $|{\bf u}|$, ${\bf u}+{\bf v}$, ${\bf u}-{\bf v}$, $3{\bf v}$, $2{\bf u}+5{\bf v}$.

Solution. \begin{align*}|{\bf u}|&=\sqrt{4^2+0^2+3^3}=\sqrt{25}=5\\{\bf u}+{\bf v}&=(4+(-2),0+1,3+5)=(2,1,8)\\{\bf u}-{\bf v}&=(4-(-2),0-1,3-5)=(6,-1,-2)\\3{\bf v}&=(3(-2),3(1),3(5))=(-6,3,15)\\2{\bf u}+5{\bf v}&=(2(4),2(0),2(3))+(5(-2),5(1),5(5))=(8,0,6)+(-10,5,25)=(-2,5,31)\end{align*}

Theorem. Let ${\bf u}$, ${\bf v}$, and ${\bf w}$ be vectors in $n$-dimensional space and $c$ and $d$ are scalars. Then

  1. ${\bf u}+{\bf v}={\bf v}+{\bf u}$
  2. ${\bf u}+({\bf v}+{\bf w})=({\bf u}+{\bf v})+{\bf w})$
  3. ${\bf u}+{\bf 0}={\bf u}$, where ${\bf 0}=(0,0,\cdots,0)$
  4. ${\bf u}+(-{\bf u})={\bf 0}$
  5. $c({\bf u}+{\bf v})=c{\bf u}+c{\bf v}$
  6. $(c+d){\bf u}=c{\bf u}+d{\bf u}$
  7. $(cd){\bf u}=c(d{\bf u})$
  8. $1{\bf u}={\bf u}$

It turns out the the original definition of vectors as quantities that have both direction and magnitude is quite obsolete and that even the definition of vectors by ordered $n$-tuples is not adequate enough to address much needed a broader notion of vectors arising in modern physics and engineering. For this reason, in modern treatment of vectors we no longer define what an individual vector is but instead we define a vector space. Simply speaking, a set $V$ with addition $+$ and scalar multiplication $\cdot$ satisfying the properties 1-8 is called a vector space, and the elements of $V$ are called vectors. Under this broader notion of vectors, things that were previously inconceivable to become vectors are now considered vectors. For example, $V$ the set of all continuous real-valued functions on the closed interval $[0,1]$ with addition $+$ and scalar multiplication $\cdot$ are defined by: \begin{align*}(f+g)(x)&:=f(x)+g(x)\\(cf)(x)&:=cf(x)\end{align*} for $f,g\in V$ and a scalar $c$. Then it is straightforward to show that the properties 1-8 are satisfied and therefore, $(V,+,\cdot)$ is a vector space and we regard continuous real-valued functions on $[0,1]$ as vectors. In fact, in quantum mechanics wave functions are state vectors. Another example is signal processing where functions are regarded as vectors. We are not going to delve into vector spaces further here. It is a main topic of linear algebra. For those who are curious, more examples of vector spaces can be found here.

There are infinitely many vectors. So it is humanly impossible to check if a certain property regarding vectors holds for all vectors. However there a particular finite set of vectors, called a basis, that constitute the entire vectors. A vector ${\bf u}=(u_1,u_2,u_3)$ can be written as \begin{equation}\label{eq:lincomb}{\bf u}=u_1(1,0,0)+u_2(0,1,0)+u_3(0,0,1)\end{equation} So we see that any vector can be represented by the three vectors $${\bf i}=(1,0,0),\ {\bf j}=(0,1,0),\ {\bf k}=(0,0,1)$$ by applying vector addition and scalar multiplication finitely many times as in \eqref{eq:lincomb}. The expression on the right hand side of the identity in \eqref{eq:lincomb} is called a linear combination or a superposition of ${\bf i}$, ${\bf j}$, ${\bf k}$. The three vectors ${\bf i}$, ${\bf j}$, ${\bf k}$ are called the canonical or standard basis vectors.

Figure 8. The standard basis

The number of standard basis vectors determines the dimension of the space. The dimension of a space is not necessarily finite though we are considering only finite dimensional spaces here (actually only 2- or 3-dimensional spaces). The set $V$ of all continuous functions on $[0,1]$ is infinite dimensional. The set of all state vectors in a quantum mechanics system is, in general, an infinite dimensional space called a Hilbert space.

Example. If ${\bf u}={\bf i}+2{\bf j}-3{\bf k}$ and ${\bf v}=4{\bf i}+7{\bf k}$, express $2{\bf u}+3{\bf v}$ in terms of ${\bf i}$, ${\bf j}$, ${\bf k}$.

Solution. \begin{align*}2{\bf u}+3{\bf v}&=2({\bf i}+2{\bf j}-3{\bf k})+3(4{\bf i}+7{\bf k})\\&=2{\bf i}+4{\bf j}-6{\bf k}+12{\bf i}+21{\bf k}\\&=14{\bf i}+4{\bf j}+15{\bf k}\end{align*}

Often in geometry and physics, we are only interested in the direction of a vector. A unit vector is a vector with length 1. Any non-zero vector can be re-scaled to a unit vector with the same direction. All that’s required is dividing the given vector by its magnitude. If ${\bf u}\ne {\bf 0}$, then $$\hat{\bf u}:=\frac{{\bf u}}{|{\bf u}|}$$ is a unit vector which has the same direction as ${\bf u}$: Using \eqref{eq:length}, $$|\hat{\bf u}|=\left|\frac{{\bf u}}{|{\bf u}|}\right|=\frac{1}{|{\bf u}|}|{\bf u}|=1$$

Example. Find the unit vector in the direction of the vector $2{\bf i}-{\bf j}-2{\bf k}$.

Solution. The length of the vector is $\sqrt{2^2+(-1)^2+(-2)^2}=\sqrt{9}=3$. Hence the unit vector with the same direction is $$\frac{2}{3}{\bf i}-\frac{1}{3}{\bf j}-\frac{2}{3}{\bf k}$$

In physics, when several forces are acting on an object, the resultant force or the net force experienced by the object is the vector sum of these forces.

Example. A 100-lb weight hangs from two wires as shown in Figure 9. Find the tension forces ${\bf T}_1$ and ${\bf T}_2$ in both wires and their magnitudes.

Figure 9. The resultant force

Solution. We first express the tensions ${\bf T}_1$ and ${\bf T}_2$ in terms of their horizontal and vertical components (the vectors in green in Figure 10).

Figure 10. The resultant force

\begin{align}\label{eq:tension1}{\bf T}_1&=-|{\bf T}_1|\cos 50^\circ{\bf i}+|{\bf T}_1|\sin 50^\circ{\bf j}\\\label{eq:tension2}{\bf T}_2&=|{\bf T}_2|\cos 32^\circ{\bf i}+|{\bf T}_2|\sin 32^\circ{\bf j}\end{align} The net force ${\bf T}_1+{\bf T}_2$ of the tensions must counterbalance the weight ${\bf w}$ so that the mass stays hung as in the figure, i.e. $${\bf T}_1+{\bf T}_2=-{\bf w}=100{\bf j}$$ From equations \eqref{eq:tension1} and \eqref{eq:tension2}, we have $$(-|{\bf T}_1|\cos 50^\circ+|{\bf T}_2|\cos 32^\circ){\bf i}+(|{\bf T}_1|\sin 50^\circ+|{\bf T}_2|\sin 32^\circ){\bf j}=100{\bf j}$$ By comparing the components, we obtain the following equations: \begin{align*}-|{\bf T}_1|\cos 50^\circ+|{\bf T}_2|\cos 32^\circ&=0\\|{\bf T}_1|\sin 50^\circ+|{\bf T}_2|\sin 32^\circ&=100\end{align*} Solving these equations simultaneously we find \begin{align*}|{\bf T}_1|&=\frac{100}{\sin 50^\circ+\tan 32^\circ\cos 50^\circ}\approx 85.64\mathrm{lb}\\|{\bf T}_2|&=\frac{|{\bf T}_1|\cos 50^\circ}{\cos 32^\circ}\approx 64.91\mathrm{lb}\end{align*} Therefore, $${\bf T}_1\approx -55.05{\bf i}+65.60{\bf j},\ {\bf T}_2\approx 55.05{\bf i}+34.40{\bf j}$$

Examples in this note have been taken from [1].

References.

[1] Calculus, Early Transcendentals, James Stewart, 6th Edition, Thompson Brooks/Cole

Introductory Probability: Random Variables and Expectation

Let us begin with an example.

Example. Consider an experiment of tossing 3 fair coins. Let $X$ denote the number of heads appearing. Then $X$ takes on one of the values 0, 1, 2, 3 with respective probabilities: \begin{align*}P_r\{X=0\}&=P_r\{(T,T,T)\}=\frac{1}{8}\\P_r\{X=1\}&=P_r\{(T,T,H),(T,H,T),(H,T,T)\}=\frac{3}{8}\\P_r\{X=2\}&=P_r\{(T,H,H),(H,T,H),(H,H,T)\}=\frac{3}{8}\\P_r\{X=3\}&=P_r\{(H,H,H)\}=\frac{1}{8}\end{align*} This $X$ here is an example of what is called a random variable. Random variables are real-valued functions defined on the sample space $S$ i.e. $X: S\longrightarrow\mathbb{R}$. In this example, $X$ is the function $X: S\longrightarrow\{0,1,2,3\}$ with $S=\{(T,T,T),(T,T,H),(T,H,T),(H,T,T),(T,H,H),(H,T,H),(H,H,T),(H,H,H)\}$ defined by the number of heads appearing for each outcome. In probability, often we are more interested in the values of a random variable rather than the outcomes of an experiment.

Remark. In more traditional mathematical notation, $P_r\{X=i\}$ is denoted by $P_r\{X^{-1}(i)\}$ which means $$P_r\{X^{-1}(i)\}=P_r\{s\in S: X(s)=i\}$$ But in probability, $P_r\{X=i\}$ is commonly used notation.

Here is another example.

Example. Three marbles are randomly selected from a jar containing 3 white, 3 red, and 5 black marbles. Suppose that we win \$1 for each white marble selected and lose \$1 for each red marble selected. Let $X$ denote the total winnings from the experiment. Then $X$ is a random variable taking on the values $0,\pm 1,\pm 2,\pm3$ with respective probabilities: \begin{align*}P_r\{X=0\}&=\frac{\begin{pmatrix}5\\3\end{pmatrix}+\begin{pmatrix}3\\1\end{pmatrix}\begin{pmatrix}3\\1\end{pmatrix}\begin{pmatrix}5\\1\end{pmatrix}}{\begin{pmatrix}11\\3\end{pmatrix}}=\frac{55}{165}\\P_r\{X=1\}=P_r\{X=-1\}&=\frac{\begin{pmatrix}3\\1\end{pmatrix}\begin{pmatrix}5\\2\end{pmatrix}+\begin{pmatrix}3\\2\end{pmatrix}\begin{pmatrix}3\\1\end{pmatrix}}{\begin{pmatrix}11\\3\end{pmatrix}}=\frac{39}{165}\\P_r\{X=2\}=P_r\{X=-2\}&=\frac{\begin{pmatrix}3\\2\end{pmatrix}\begin{pmatrix}5\\1\end{pmatrix}}{\begin{pmatrix}11\\3\end{pmatrix}}=\frac{15}{165}\\P_r\{X=3\}=P_r\{X=-3\}&=\frac{\begin{pmatrix}3\\3\end{pmatrix}}{\begin{pmatrix}11\\3\end{pmatrix}}=\frac{1}{165}\end{align*} The probability that we win money is $$\sum_{i=1}^3P_r\{X=i\}=\frac{55}{165}=\frac{1}{3}$$

Definition. Let $\mathrm{PMF}_X(i):=P_r\{X=i\}$. $\mathrm{PMF}_X(i)$ is called the probability mass function for random variable $X$.

Definition. Let $\mathrm{CDF}_X(i):=P_r\{X\leq i\}$. $\mathrm{CDF}_X(i)$ is called the cumulative distribution function and it describes the probability that the value of a random variable is below a specified number.

$\mathrm{CDF}_X(i)$ can be expressed as $$\mathrm{CDF}_X(i)=\sum_{j\leq i}\mathrm{PMF}_X(j)$$ i.e. it is the accumulation of distribution (probability) described by the probability mass function for values up to $i$, hence the name the cumulative distribution function.

Example. How likely is it that it will take no more than 3 flips for a coin to land on heads?

Solution. Let $X$ denote the number of heads appearing. Then what the question is asking is $P_r\{X\leq 3\}$ i.e. $\mathrm{CDF}_X(3)$. \begin{align*}\mathrm{CDF}_X(3)&=\sum_{j=1}^3\mathrm{PMF}_X(j)\\&=\sum_{j=1}^3 P_r\{X=j\}\\&=\frac{1}{2}+\frac{1}{4}+\frac{1}{8}=\frac{7}{8}\end{align*}

Definition. The expected value of a random variable $X$, denoted by $E(X)$, is the weighted average of its possible values weighted according to their probabilities. More specifically, $$E(X)=\sum_i\mathrm{PMF}_X(i)\cdot i=\sum_i P_r\{X=i\}\cdot i$$ The expected value is also called the expectation or the mean.

Example. What is the expected value of a roll of a die?

Solution. The random variable $X$ takes on values 1, 2, 3, 4, 5, 6 and each of these values has equal proprobability of $\frac{1}{6}$. Hence, the expected value of $X$ is $$E(X)=\sum_{i=1}^6\frac{1}{6}i=\frac{1}{6}\frac{6\cdot 7}{2}=3.5$$ Here I used the formula $$1+2+3+\cdots+n=\frac{n(n+1)}{2}$$ to calculate $1+2+3+4+5+6$.

Example. If a die is rolled three times, how many distinct values are expected to appear?

Solution. Let $X$ denote the number of distinct values. There are $6^3$ outcomes of this experiment. Calculating $P_r\{X=1\}$ and $P_r\{X=3\}$ are straightforward. \begin{align*}P_r\{X=1\}&=\frac{\begin{pmatrix}6\\1\end{pmatrix}}{6^3}=\frac{1}{36}\\P_r\{X=3\}&=\frac{\begin{pmatrix}6\\3\end{pmatrix}}{6^3}=\frac{5}{54}\end{align*} $P_r\{X=2\}$ can be found by $$P_r\{X=2\}=1-P_r\{X=1\}-P_r\{X=3\}=1-\frac{1}{36}-\frac{5}{54}=\frac{95}{108}$$ The expected value is then \begin{align*}E(X)&=P_r\{X=1\}\cdot 1+P_r\{X=2\}\cdot 2+P_r\{X=3\}\cdot 3\\&=\frac{1\cdot 1}{36}+\frac{95\cdot 2}{108}+\frac{5\cdot 3}{54}=\frac{223}{108}\approx 2.06\end{align*}

Example. Let $X$ denote the number of heads appearing in a sequence of 10 flips of a coin. What is $E(X)$?

Solution. For any $i=0,1,\cdots,10$, there are $\begin{pmatrix}10\\i\end{pmatrix}$ sequences of 10 flips, each of which contains $i$ heads. Each such sequence happens with probability $\left(\frac{1}{2}\right)^{10}$. Hence, \begin{align*}E(X)&=\sum_{i=0}^{10}P_r\{X=i\}\cdot i\\&=\sum_{i=1}^{10}P_r\{X=i\}\cdot i\\&=\sum_{i=1}^{10}\begin{pmatrix}10\\i\end{pmatrix}\left(\frac{1}{2}\right)^{10}\cdot i\\&=\left(\frac{1}{2}\right)^{10}\cdot 10\cdot 2^9\\&=5\end{align*} For the second line to the last, I used the identity \begin{equation}\label{eq:binom2}\sum_{k=1}^n\begin{pmatrix}n\\k\end{pmatrix}k=n2^{n-1}\end{equation} The equation \eqref{eq:binom2} can be easily seen. \begin{align*}\sum_{k=1}^n\begin{pmatrix}n\\k\end{pmatrix}k&=\sum_{k=1}^n\frac{n!}{(k-1)!(n-k)!}\\&=n\sum_{k=1}^n\frac{(n-1)!}{(k-1)!(n-1-(k-1))!}\\&=n\sum_{k=1}^n\begin{pmatrix}n-1\\k-1\end{pmatrix}\\&=n2^{n-1}\end{align*} The last expression is obtain using the equation (4) here.

While the expected value $E(X)$ provides useful information as the weighted average of the possible values of $X$. But it does not provide information on the spread of these values. The variance provides such information.

Definition. If $X$ is a random variable with expected value $E(X)$, then the variance of $X$, denoted by $\mathrm{Var}(X)$, is defined by $$\mathrm{Var}(X)=E[(X-E(X))^2]$$

To simplify our calculation, let us denote $E(X)$ by $\mu$. Then \begin{align*}\mathrm{Var}(X)&=E[(x-\mu)^2]\\&=\sum_iP_r\{X=i\}(i-\mu)^2\end{align*} This last expression shows that $\mathrm{Var}(X)$ measures how far apart $X$ would be from its expected value on the average. Let us continue further from the last expression above. \begin{align*}\sum_iP_r\{X=i\}(i-\mu)^2&=\sum_iP_r\{X=i\}(i^2-2\mu i+\mu^2)\\&=\sum_iP_r\{X=i\}i^2-2\mu\sum_iP_r\{X=i\}i+\mu^2\sum_iP_r\{X=i\}\\&=E(X^2)-2\mu^2+\mu^2\\&=E(X^2)-\mu^2\\&=E(X^2)-[E(X)]^2\end{align*} So we have an alternative formula for the variance \begin{equation}\label{eq:variance}\mathrm{Var}(X)=E(X^2)-[E(X)]^2\end{equation}

Example. Calculate $\mathrm{Var}(X)$ if $X$ represents the outcome when a die is rolled.

Solution. First \begin{align*}E(X)&=1\cdot\frac{1}{6}+2\cdot\frac{1}{6}+3\cdot\frac{1}{6}+4\cdot\frac{1}{6}+5\cdot\frac{1}{6}+6\cdot\frac{1}{6}\\&=\frac{1}{6}\frac{6\cdot 7}{2}=\frac{7}{2}\end{align*} Next, \begin{align*}E(X^2)&=1^2\cdot\frac{1}{6}+2^2\cdot\frac{1}{6}+3^2\cdot\frac{1}{6}+4^2\cdot\frac{1}{6}+5^2\cdot\frac{1}{6}+6^2\cdot\frac{1}{6}\\&=\frac{1}{6}\frac{6\cdot 7\cdot 13}{6}=\frac{91}{6}\end{align*} The value in the second line to the last is obtained by the formula $$1^2+2^2+3^2+\cdots+n^2=\frac{n(n+1)(2n+1)}{6}$$ Therefore, $$\mathrm{Var}(X)=\frac{91}{6}-\left(\frac{7}{2}\right)^2=\frac{35}{12}$$

Remarks.

  1. In mechanics, the center of gravity of a system of particles is indeed the expected value of the position coordinates of particles. Also the moment of inertia of a body is the variance of the position coordinates of particles that constitute the body.
  2. $\mathrm{SD}(X)=\sqrt{\mathrm{Var}(X)}$ is called the standard deviation of $X$.

I will complete this note with the following useful identities.

Theorem. Let $a$ and $b$ be constant. Then

  1. $E(aX+b)=aE(X)+b$. As a special case, if $b=0$, we obatin $E(aX)=aE(X)$.
  2. $\mathrm{Var}(aX+b)=a^2\mathrm{Var}(X)$.

Proof.

  1. \begin{align*}E(aX+b)&=\sum_iP_r\{X=i\}(ai+b)\\&=a\sum_iP_r\{X=i\}i+b\sum_iP_r\{X=i\}\\&=aE(X)+b\end{align*}
  2. For simplicity, let $\mu=E(X)$. Then by the theorem 1 above, $E(aX+b)=a\mu+b$. Now, \begin{align*}\mathrm{Var}(aX+b)&=E[(aX+b-a\mu-b)^2]\\&=E[a^2(X-\mu)^2]\\&=a^2E[(X-\mu)^2]\\&=a^2\mathrm{Var}(X)\end{align*}

References.

[1] Essential Discrete Mathematics for Computer Science, Harry Lewis and Rachel Zax, Princeton University Press, 2019

[2] A First Course in Probability, Sheldon Ross, 5th Edition, Prentice-Hall, 1998

Introductory Probability: Baye’s Theorem

Let $S$ be sample space and $E, F$ events. The event $E$ can be written as \begin{align*}E&=E\cap S\\&=E\cap(F\dot\cup F^c)\\&=(E\cap F)\dot\cup(E\cap F^c)\end{align*} By axiom 3 of finite probability, we have \begin{equation}\begin{aligned}P_r(E)&=P_r(E\cap F)+P_r(E\cap F^c)\\&=P_r(E|F)P(F)+P_r(E|F^c)P(F^c)\\&=P_r(E|F)P_r(F)+P_r(E|F^c)(1-P_r(F))\end{aligned}\label{eq:baye}\end{equation} This equation states that the probability of the event $E$ is a weighted average of the conditional probability of $E$ given that $F$ has happened and the conditional probability of $E$ given that $F$ has not occurred. The equation \eqref{eq:baye} is useful because often it is difficult to calculate the probability of the even $E$ directly but knowing the information on whether the other event $F$ has happened helps us to determine the probability of $E$.

Example. An insurance company divides people into two categories: those who are accident prone and those who are not. A statistics shows that an accident-prone person will have an accident at some time within a fixed 1-year period with probability 0.4. This probability decreases to 0.2 for a non-accident-prone person. If 30% of the population is accident prone, what is the probability that a new policyholder will have an accident within a year of purchasing a policy?

Solution. Let $E$ denote the event that the policyholder will have an accident within a year of purchase. Let $F$ denote the event that the policyholder is accident prone. Using the equation \eqref{eq:baye}, \begin{align*}P_r(E)&=P_r(E|F)P_r(F)+P_r(E|F^c)P_r(F^c)\\&=0.4\times 0.3+0.2\times 0.7\\&=0.26\end{align*}

Suppose that $P_r(E)$ and $P_r(F)$ are both nonzero. Then it follow from the conditional probabilities $$P_r(E|F)=\frac{P_r(E\cap F)}{P_r(F)},\ P_r(F|E)=\frac{P_r(F\cap E)}{P_r(E)}$$ that \begin{equation}\label{eq:baye2}P_r(E|F)=\frac{P_r(F|E)P_r(E)}{P_r(F)}\end{equation} The equation \eqref{eq:baye2} is usually called Baye’s Theorem, named after an English statistician and a philosopher Reverend Thomas Bayes (pronounced ‘beiz’). If we regard the event $E$ as a hypothesis and $F$ as an evidence, the probabilities $P_r(E)$ and $P_r(E|F)$ can be interpreted, respectively, as the initial degree of belief in $E$ and the degree of belief in $E$ after having accounted the evidence $F$. The factor $\frac{P_r(F|E)}{P_r(F)}$ can then be interpreted as the support $F$ provides for $E$.

Example. This is the second part of the previous example. Suppose that a new policyholder has an accident within a year of purchasing a policy. What is the probability that he or she is accident prone?

Solution. What the question is asking is $P_r(F|E)$. By Baye’s theorem \eqref{eq:baye2}, \begin{align*}P_r(F|E)&=\frac{P_r(E|F)P_r(F)}{P_r(E)}\\&=\frac{0.4\times 0.3}{0.26}=\frac{6}{13}\end{align*} i.e. 6 out of 13 who have an accident within a year of purchasing a policy are accident-prone people.

Example. A lab blood test is 95% effective in detecting a certain disease when it is present. The test also yields a false positive result for 1% of the healthy people tested. If 0.5% of the population actually has the disease, what is the probability a person has the disease given that the test result is positive.

Solution. Let $D$ be the event that the tested person has the disease and $E$ the event that the test result is positive. What is asked is to find $P_r(D|E)$. The available information is $P_r(E|D)=0.95$, $P_r(D)=0.005$, and $P_r(E|D^c)=0.01$. Using Baye’s theorem \eqref{eq:baye2} along with \eqref{eq:baye}, \begin{align*}P_r(D|E)&=\frac{P_r(E|D)P_r(D)}{P_r(E|D)P_r(D)+P_r(E|D^c)P_r(D^c)}\\&=\frac{0.95\times 0.005}{0.95\times 0.005+0.01\times 0.995}\\&=\frac{95}{294}\approx 0.323\end{align*} i.e. only 32% of those who tested positive actually have the disease.

Example. During a criminal investigation, the detective in charge is 60% convinced that a suspect is guilty. Now a new piece of evidence comes into light and it shows that the criminal has a certain characteristic (such as left-handedness, baldness, or brown hair). Suppose that 20% of the population possesses this characteristic. It turns out that the suspect does have this characteristic, how certain is the detective now that the suspect is guilty of the crime?

Solution. Let $G$ be the event that the suspect is guilty and $C$ the event that he possesses the characteristic of the criminal. What is asked is to find $P_r(G|C)$. The available information is then $P_r(G)=0.6$, $P_r(C|G^c)=0.2$, and $P_r(C|G)=1$ (The real criminal does have the characteristic.) Using Baye’s theorem \eqref{eq:baye2} along with \eqref{eq:baye}, \begin{align*}P_r(G|C)&=\frac{P_r(C|G)P_r(G)}{P_r(C|G)P_r(G)+P_r(C|G^c)P(G^c)}\\&=\frac{1\times 0.6}{1\times 0.6+0.2\times 0.4}\\&\approx 0.882\end{align*}

References.

[1] Essential Discrete Mathematics for Computer Science, Harry Lewis and Rachel Zax, Princeton University Press, 2019

[2] A First Course in Probability, Sheldon Ross, 5th Edition, Prentice-Hall, 1998