# Introductory Probability: Baye’s Theorem

Let $S$ be sample space and $E, F$ events. The event $E$ can be written as \begin{align*}E&=E\cap S\\&=E\cap(F\dot\cup F^c)\\&=(E\cap F)\dot\cup(E\cap F^c)\end{align*} By axiom 3 of finite probability, we have \begin{aligned}P_r(E)&=P_r(E\cap F)+P_r(E\cap F^c)\\&=P_r(E|F)P(F)+P_r(E|F^c)P(F^c)\\&=P_r(E|F)P_r(F)+P_r(E|F^c)(1-P_r(F))\end{aligned}\label{eq:baye} This equation states that the probability of the event $E$ is a weighted average of the conditional probability of $E$ given that $F$ has happened and the conditional probability of $E$ given that $F$ has not occurred. The equation \eqref{eq:baye} is useful because often it is difficult to calculate the probability of the even $E$ directly but knowing the information on whether the other event $F$ has happened helps us to determine the probability of $E$.

Example. An insurance company divides people into two categories: those who are accident prone and those who are not. A statistics shows that an accident-prone person will have an accident at some time within a fixed 1-year period with probability 0.4. This probability decreases to 0.2 for a non-accident-prone person. If 30% of the population is accident prone, what is the probability that a new policyholder will have an accident within a year of purchasing a policy?

Solution. Let $E$ denote the event that the policyholder will have an accident within a year of purchase. Let $F$ denote the event that the policyholder is accident prone. Using the equation \eqref{eq:baye}, \begin{align*}P_r(E)&=P_r(E|F)P_r(F)+P_r(E|F^c)P_r(F^c)\\&=0.4\times 0.3+0.2\times 0.7\\&=0.26\end{align*}

Suppose that $P_r(E)$ and $P_r(F)$ are both nonzero. Then it follow from the conditional probabilities $$P_r(E|F)=\frac{P_r(E\cap F)}{P_r(F)},\ P_r(F|E)=\frac{P_r(F\cap E)}{P_r(E)}$$ that $$\label{eq:baye2}P_r(E|F)=\frac{P_r(F|E)P_r(E)}{P_r(F)}$$ The equation \eqref{eq:baye2} is usually called Baye’s Theorem, named after an English statistician and a philosopher Reverend Thomas Bayes (pronounced ‘beiz’). If we regard the event $E$ as a hypothesis and $F$ as an evidence, the probabilities $P_r(E)$ and $P_r(E|F)$ can be interpreted, respectively, as the initial degree of belief in $E$ and the degree of belief in $E$ after having accounted the evidence $F$. The factor $\frac{P_r(F|E)}{P_r(F)}$ can then be interpreted as the support $F$ provides for $E$.

Example. This is the second part of the previous example. Suppose that a new policyholder has an accident within a year of purchasing a policy. What is the probability that he or she is accident prone?

Solution. What the question is asking is $P_r(F|E)$. By Baye’s theorem \eqref{eq:baye2}, \begin{align*}P_r(F|E)&=\frac{P_r(E|F)P_r(F)}{P_r(E)}\\&=\frac{0.4\times 0.3}{0.26}=\frac{6}{13}\end{align*} i.e. 6 out of 13 who have an accident within a year of purchasing a policy are accident-prone people.

Example. A lab blood test is 95% effective in detecting a certain disease when it is present. The test also yields a false positive result for 1% of the healthy people tested. If 0.5% of the population actually has the disease, what is the probability a person has the disease given that the test result is positive.

Solution. Let $D$ be the event that the tested person has the disease and $E$ the event that the test result is positive. What is asked is to find $P_r(D|E)$. The available information is $P_r(E|D)=0.95$, $P_r(D)=0.005$, and $P_r(E|D^c)=0.01$. Using Baye’s theorem \eqref{eq:baye2} along with \eqref{eq:baye}, \begin{align*}P_r(D|E)&=\frac{P_r(E|D)P_r(D)}{P_r(E|D)P_r(D)+P_r(E|D^c)P_r(D^c)}\\&=\frac{0.95\times 0.005}{0.95\times 0.005+0.01\times 0.995}\\&=\frac{95}{294}\approx 0.323\end{align*} i.e. only 32% of those who tested positive actually have the disease.

Example. During a criminal investigation, the detective in charge is 60% convinced that a suspect is guilty. Now a new piece of evidence comes into light and it shows that the criminal has a certain characteristic (such as left-handedness, baldness, or brown hair). Suppose that 20% of the population possesses this characteristic. It turns out that the suspect does have this characteristic, how certain is the detective now that the suspect is guilty of the crime?

Solution. Let $G$ be the event that the suspect is guilty and $C$ the event that he possesses the characteristic of the criminal. What is asked is to find $P_r(G|C)$. The available information is then $P_r(G)=0.6$, $P_r(C|G^c)=0.2$, and $P_r(C|G)=1$ (The real criminal does have the characteristic.) Using Baye’s theorem \eqref{eq:baye2} along with \eqref{eq:baye}, \begin{align*}P_r(G|C)&=\frac{P_r(C|G)P_r(G)}{P_r(C|G)P_r(G)+P_r(C|G^c)P(G^c)}\\&=\frac{1\times 0.6}{1\times 0.6+0.2\times 0.4}\\&\approx 0.882\end{align*}

References.

[1] Essential Discrete Mathematics for Computer Science, Harry Lewis and Rachel Zax, Princeton University Press, 2019

[2] A First Course in Probability, Sheldon Ross, 5th Edition, Prentice-Hall, 1998