Category Archives: Discrete Mathematics

The Division Algorithm

Theorem 1. Suppose that $a,b\in\mathbb{Z}$ with $0<a<b$. Then there exists uniquely $q,r\in\mathbb{Z}$, $0\leq r<a$, such that
$$b=aq+r$$

Theorem 2. If $a$ and $b$ are positive integers, then
$$(a,b)[a,b]=ab$$

Suppose that $0<a<b$. Then by the division algorithm, there exist uniquely $q,r\in\mathbb{Z}$ such that $b=aq+r$ where $0\leq r<a$. If $d=(a,b)$ then $d|r$. This means $d\leq(a,r)$ since $d$ is a common divisor of $a$ and $r$. Since $(a,r)|a$ and $(a,r)|b$, $(a,r)$ is a common divisor of $a$ and $b$. This implies $(a,r)\leq(a,b)=d$. Thus, $d=(a,r)=(a,b)$. More generally, we have the following lemma holds.

Lemma 3. For any integers $a>0$, $b$, $c$ and $k$, if $a=bk+c$ then $(a,b)=(b,c)$.

Example. Let $a=123$ and $b=504$. Then
$$504=123\cdot 4+12$$
By Lemma, we have
$$(123,504)=(12,123)$$
$$123=12\cdot 10+3$$
By Lemma again, we have
$$(12,123)=(3,12)=3$$
Hence, we obtain $(123,504)=3$.

Theorem 4. (The Euclidean Algorithm) Let $a$ and $b$ be integers, $0<a<b$. Apply the division algorithm repeatedly as follows. \begin{align*}b&=aq_1+r_1,\ 0<r_1<a\\a&=r_1q_2+r_2,\ 0<r_2<r_1\\r_1&=r_2q_3+r_3,\ 0<r_3<r_2\\&\vdots\\r_{n-2}&=r_{n-1}q_n+r_n,\ 0<r_n<r_{n-1}\\r_{n-1}&=r_nq_{n+1}\end{align*}Let $r_n$ be the last nonzero remainder. Then $(a,b)=r_n$

Example. Compute $(158,188)$ using the Euclidean algorithm.
\begin{align*}188&=158\cdot 1+30\\158&=30\cdot 5+8\\30&=8\cdot 3+6\\8&=6\cdot 1+2\\6&=2\cdot 3+0 \end{align*} Hence, $(158,188)=2$.

It is possible to use the Euclidean algorithm to write $(a,b)$ in the form $ax+by$. In the above example, \begin{align*}2&=8-6\cdot 1\\&=8-(30-8\cdot 3)\cdot 1\\&=-30+8\cdot 4\\&=-30+(158-30\cdot 5)\cdot 4\\&=158\cdot 4-30\cdot 21\\&=158\cdot 4-(188-158\cdot 1)\cdot 21\\&=158\cdot 25+188\cdot(-21) \end{align*}
In general, the following property holds.

Theorem 5. (Bézout’s Lemma) If $a$ and $b$ are integers such that $(a,b)$ is defined, then there exist $x,y\in\mathbb{Z}$ such that
\begin{equation}
\label{eq:bezout}
(a,b)=ax+by
\end{equation}

\eqref{eq:bezout} is called the Bézout’s identity.

Corollary 6. If $d$ is any common divisor of $a$ and $b$, not both of which are $0$, then $d|(a,b)$.

Definition. We say $k$ is a linear combination of $a$ and $b$ if there exist $x,y\in\mathbb{Z}$ such that $k=ax+by$.

Theorem 7. Given integers $a\ne 0$ and $b\ne 0$, and $m$, if $a|m$ and $b|m$ then $[a,b]|m$.

Proof. Assume that $\frac{m}{[a,b]}$ is not an integer. Then there exist $q,r\in\mathbb{Z}$ such that $m=[a,b]q+r$, $0<r<[a,b]$. The remainder $r$ is then written as $r=m-q[a,b]$. By assumption, $a|m$, and also $a|[a,b]$. So, $a|r$. By the same argument, we also obtain $b|r$. This means that $r$ is a common multiple of $a$ and $b$. But it is a contradiction to the fact that $0<r<[a,b]$. Hence, $r=0$.

Remark. This theorem and the preceding corollary are dual to each other.

Divisibility

Definition. We say $a$ divides $b$ and write $a|b$ if there exists $d\in\mathbb{Z}$ such that $b=ad$. We say that $a$ is a divisor of $b$ and that $b$ is a multiple of $a$. If $a|b$ is false, we write $a\not|b$.

Definition. We say $d$ is the greatest common divisor (gcd in short) of $a$ and $b$ if $d$ is the largest of all integers dividing both $a$ and $b$. We write $d=(a,b)$.

Example. Let $a=4$ and $b=6$. The divisors of $4$ are $1$, $-1$, $2$, $-2$, $4$, $-4$. The divisors of $6$ are $1$, $-1$, $2$, $-2$, $3$, $-3$, $6$, $-6$. So the common divisors of $4$ and $6$ are $1$, $-1$, $2$, $-2$ and $2=(4,6)$.

Definition. We say $m$ is the least common multiple (lcm in short) of $a$ and $b$ if $m$ is the smallest of all the positive integers that are multiples of both $a$ and $b$. We write $m=[a,b]$.

Example. Let $a=4$ and $b=6$. The positive multiples of $4$ are $4$, $8$, $12$, $16$, $20$, $24$, $28$, $\cdots$ and the positive multiples of $6$ are $6$, $12$, $18$, $24$, $30$, $\cdots$. Common positive multiples of $4$ and $6$ are $12$, $24$, $\cdots$ and $[4,6]=12$.

Theorem. Given integers $a$, $b$, and $c$,

  1. if $a|b$ then $a|bc$.
  2. if $a|b$ and $b|c$ then $a|c$.
  3. if $a|b$ and $a|c$ then $a|bx+cy$ for any $x,y\in\mathbb{Z}$.

Proof.

  1. If $a|b$ then there exists $d\in\mathbb{Z}$ such that $b=ad$. Now $bc=(ad)c=a(dc)$ and $dc\in\mathbb{Z}$ and hence $a|bc$.
  2. Let $a|b$ and $b|c$. Then there exist $d_1,d_2\in\mathbb{Z}$ such that $b=ad_1$ and $c=bd_2$. Now we have
    $$c=bd_2=(ad_1)d_2=a(d_1d_2)$$
    and $d_1d_2\in\mathbb{Z}$. Hence, $a|c$.
  3. Let $a|b$ and $a|c$. Then there exist $d_1,d_2\in\mathbb{Z}$ such that $b=ad_1$ and $c=ad_2$. For any $x,y\in\mathbb{Z}$ \begin{align*} bx+cy&=(ad_1)x+(ad_2)y\\&=a(d_1x+d_2y) \end{align*}
    and $d_1x+d_2y\in\mathbb{Z}$. Hence, $a|bx+cy$.

Introductory Probability: Random Variables and Expectation

Let us begin with an example.

Example. Consider an experiment of tossing 3 fair coins. Let $X$ denote the number of heads appearing. Then $X$ takes on one of the values 0, 1, 2, 3 with respective probabilities: \begin{align*}P_r\{X=0\}&=P_r\{(T,T,T)\}=\frac{1}{8}\\P_r\{X=1\}&=P_r\{(T,T,H),(T,H,T),(H,T,T)\}=\frac{3}{8}\\P_r\{X=2\}&=P_r\{(T,H,H),(H,T,H),(H,H,T)\}=\frac{3}{8}\\P_r\{X=3\}&=P_r\{(H,H,H)\}=\frac{1}{8}\end{align*} This $X$ here is an example of what is called a random variable. Random variables are real-valued functions defined on the sample space $S$ i.e. $X: S\longrightarrow\mathbb{R}$. In this example, $X$ is the function $X: S\longrightarrow\{0,1,2,3\}$ with $S=\{(T,T,T),(T,T,H),(T,H,T),(H,T,T),(T,H,H),(H,T,H),(H,H,T),(H,H,H)\}$ defined by the number of heads appearing for each outcome. In probability, often we are more interested in the values of a random variable rather than the outcomes of an experiment.

Remark. In more traditional mathematical notation, $P_r\{X=i\}$ is denoted by $P_r\{X^{-1}(i)\}$ which means $$P_r\{X^{-1}(i)\}=P_r\{s\in S: X(s)=i\}$$ But in probability, $P_r\{X=i\}$ is commonly used notation.

Here is another example.

Example. Three marbles are randomly selected from a jar containing 3 white, 3 red, and 5 black marbles. Suppose that we win \$1 for each white marble selected and lose \$1 for each red marble selected. Let $X$ denote the total winnings from the experiment. Then $X$ is a random variable taking on the values $0,\pm 1,\pm 2,\pm3$ with respective probabilities: \begin{align*}P_r\{X=0\}&=\frac{\begin{pmatrix}5\\3\end{pmatrix}+\begin{pmatrix}3\\1\end{pmatrix}\begin{pmatrix}3\\1\end{pmatrix}\begin{pmatrix}5\\1\end{pmatrix}}{\begin{pmatrix}11\\3\end{pmatrix}}=\frac{55}{165}\\P_r\{X=1\}=P_r\{X=-1\}&=\frac{\begin{pmatrix}3\\1\end{pmatrix}\begin{pmatrix}5\\2\end{pmatrix}+\begin{pmatrix}3\\2\end{pmatrix}\begin{pmatrix}3\\1\end{pmatrix}}{\begin{pmatrix}11\\3\end{pmatrix}}=\frac{39}{165}\\P_r\{X=2\}=P_r\{X=-2\}&=\frac{\begin{pmatrix}3\\2\end{pmatrix}\begin{pmatrix}5\\1\end{pmatrix}}{\begin{pmatrix}11\\3\end{pmatrix}}=\frac{15}{165}\\P_r\{X=3\}=P_r\{X=-3\}&=\frac{\begin{pmatrix}3\\3\end{pmatrix}}{\begin{pmatrix}11\\3\end{pmatrix}}=\frac{1}{165}\end{align*} The probability that we win money is $$\sum_{i=1}^3P_r\{X=i\}=\frac{55}{165}=\frac{1}{3}$$

Definition. Let $\mathrm{PMF}_X(i):=P_r\{X=i\}$. $\mathrm{PMF}_X(i)$ is called the probability mass function for random variable $X$.

Definition. Let $\mathrm{CDF}_X(i):=P_r\{X\leq i\}$. $\mathrm{CDF}_X(i)$ is called the cumulative distribution function and it describes the probability that the value of a random variable is below a specified number.

$\mathrm{CDF}_X(i)$ can be expressed as $$\mathrm{CDF}_X(i)=\sum_{j\leq i}\mathrm{PMF}_X(j)$$ i.e. it is the accumulation of distribution (probability) described by the probability mass function for values up to $i$, hence the name the cumulative distribution function.

Example. How likely is it that it will take no more than 3 flips for a coin to land on heads?

Solution. Let $X$ denote the number of heads appearing. Then what the question is asking is $P_r\{X\leq 3\}$ i.e. $\mathrm{CDF}_X(3)$. \begin{align*}\mathrm{CDF}_X(3)&=\sum_{j=1}^3\mathrm{PMF}_X(j)\\&=\sum_{j=1}^3 P_r\{X=j\}\\&=\frac{1}{2}+\frac{1}{4}+\frac{1}{8}=\frac{7}{8}\end{align*}

Definition. The expected value of a random variable $X$, denoted by $E(X)$, is the weighted average of its possible values weighted according to their probabilities. More specifically, $$E(X)=\sum_i\mathrm{PMF}_X(i)\cdot i=\sum_i P_r\{X=i\}\cdot i$$ The expected value is also called the expectation or the mean.

Example. What is the expected value of a roll of a die?

Solution. The random variable $X$ takes on values 1, 2, 3, 4, 5, 6 and each of these values has equal proprobability of $\frac{1}{6}$. Hence, the expected value of $X$ is $$E(X)=\sum_{i=1}^6\frac{1}{6}i=\frac{1}{6}\frac{6\cdot 7}{2}=3.5$$ Here I used the formula $$1+2+3+\cdots+n=\frac{n(n+1)}{2}$$ to calculate $1+2+3+4+5+6$.

Example. If a die is rolled three times, how many distinct values are expected to appear?

Solution. Let $X$ denote the number of distinct values. There are $6^3$ outcomes of this experiment. Calculating $P_r\{X=1\}$ and $P_r\{X=3\}$ are straightforward. \begin{align*}P_r\{X=1\}&=\frac{\begin{pmatrix}6\\1\end{pmatrix}}{6^3}=\frac{1}{36}\\P_r\{X=3\}&=\frac{\begin{pmatrix}6\\3\end{pmatrix}}{6^3}=\frac{5}{54}\end{align*} $P_r\{X=2\}$ can be found by $$P_r\{X=2\}=1-P_r\{X=1\}-P_r\{X=3\}=1-\frac{1}{36}-\frac{5}{54}=\frac{95}{108}$$ The expected value is then \begin{align*}E(X)&=P_r\{X=1\}\cdot 1+P_r\{X=2\}\cdot 2+P_r\{X=3\}\cdot 3\\&=\frac{1\cdot 1}{36}+\frac{95\cdot 2}{108}+\frac{5\cdot 3}{54}=\frac{223}{108}\approx 2.06\end{align*}

Example. Let $X$ denote the number of heads appearing in a sequence of 10 flips of a coin. What is $E(X)$?

Solution. For any $i=0,1,\cdots,10$, there are $\begin{pmatrix}10\\i\end{pmatrix}$ sequences of 10 flips, each of which contains $i$ heads. Each such sequence happens with probability $\left(\frac{1}{2}\right)^{10}$. Hence, \begin{align*}E(X)&=\sum_{i=0}^{10}P_r\{X=i\}\cdot i\\&=\sum_{i=1}^{10}P_r\{X=i\}\cdot i\\&=\sum_{i=1}^{10}\begin{pmatrix}10\\i\end{pmatrix}\left(\frac{1}{2}\right)^{10}\cdot i\\&=\left(\frac{1}{2}\right)^{10}\cdot 10\cdot 2^9\\&=5\end{align*} For the second line to the last, I used the identity \begin{equation}\label{eq:binom2}\sum_{k=1}^n\begin{pmatrix}n\\k\end{pmatrix}k=n2^{n-1}\end{equation} The equation \eqref{eq:binom2} can be easily seen. \begin{align*}\sum_{k=1}^n\begin{pmatrix}n\\k\end{pmatrix}k&=\sum_{k=1}^n\frac{n!}{(k-1)!(n-k)!}\\&=n\sum_{k=1}^n\frac{(n-1)!}{(k-1)!(n-1-(k-1))!}\\&=n\sum_{k=1}^n\begin{pmatrix}n-1\\k-1\end{pmatrix}\\&=n2^{n-1}\end{align*} The last expression is obtain using the equation (4) here.

While the expected value $E(X)$ provides useful information as the weighted average of the possible values of $X$. But it does not provide information on the spread of these values. The variance provides such information.

Definition. If $X$ is a random variable with expected value $E(X)$, then the variance of $X$, denoted by $\mathrm{Var}(X)$, is defined by $$\mathrm{Var}(X)=E[(X-E(X))^2]$$

To simplify our calculation, let us denote $E(X)$ by $\mu$. Then \begin{align*}\mathrm{Var}(X)&=E[(x-\mu)^2]\\&=\sum_iP_r\{X=i\}(i-\mu)^2\end{align*} This last expression shows that $\mathrm{Var}(X)$ measures how far apart $X$ would be from its expected value on the average. Let us continue further from the last expression above. \begin{align*}\sum_iP_r\{X=i\}(i-\mu)^2&=\sum_iP_r\{X=i\}(i^2-2\mu i+\mu^2)\\&=\sum_iP_r\{X=i\}i^2-2\mu\sum_iP_r\{X=i\}i+\mu^2\sum_iP_r\{X=i\}\\&=E(X^2)-2\mu^2+\mu^2\\&=E(X^2)-\mu^2\\&=E(X^2)-[E(X)]^2\end{align*} So we have an alternative formula for the variance \begin{equation}\label{eq:variance}\mathrm{Var}(X)=E(X^2)-[E(X)]^2\end{equation}

Example. Calculate $\mathrm{Var}(X)$ if $X$ represents the outcome when a die is rolled.

Solution. First \begin{align*}E(X)&=1\cdot\frac{1}{6}+2\cdot\frac{1}{6}+3\cdot\frac{1}{6}+4\cdot\frac{1}{6}+5\cdot\frac{1}{6}+6\cdot\frac{1}{6}\\&=\frac{1}{6}\frac{6\cdot 7}{2}=\frac{7}{2}\end{align*} Next, \begin{align*}E(X^2)&=1^2\cdot\frac{1}{6}+2^2\cdot\frac{1}{6}+3^2\cdot\frac{1}{6}+4^2\cdot\frac{1}{6}+5^2\cdot\frac{1}{6}+6^2\cdot\frac{1}{6}\\&=\frac{1}{6}\frac{6\cdot 7\cdot 13}{6}=\frac{91}{6}\end{align*} The value in the second line to the last is obtained by the formula $$1^2+2^2+3^2+\cdots+n^2=\frac{n(n+1)(2n+1)}{6}$$ Therefore, $$\mathrm{Var}(X)=\frac{91}{6}-\left(\frac{7}{2}\right)^2=\frac{35}{12}$$

Remarks.

  1. In mechanics, the center of gravity of a system of particles is indeed the expected value of the position coordinates of particles. Also the moment of inertia of a body is the variance of the position coordinates of particles that constitute the body.
  2. $\mathrm{SD}(X)=\sqrt{\mathrm{Var}(X)}$ is called the standard deviation of $X$.

I will complete this note with the following useful identities.

Theorem. Let $a$ and $b$ be constant. Then

  1. $E(aX+b)=aE(X)+b$. As a special case, if $b=0$, we obatin $E(aX)=aE(X)$.
  2. $\mathrm{Var}(aX+b)=a^2\mathrm{Var}(X)$.

Proof.

  1. \begin{align*}E(aX+b)&=\sum_iP_r\{X=i\}(ai+b)\\&=a\sum_iP_r\{X=i\}i+b\sum_iP_r\{X=i\}\\&=aE(X)+b\end{align*}
  2. For simplicity, let $\mu=E(X)$. Then by the theorem 1 above, $E(aX+b)=a\mu+b$. Now, \begin{align*}\mathrm{Var}(aX+b)&=E[(aX+b-a\mu-b)^2]\\&=E[a^2(X-\mu)^2]\\&=a^2E[(X-\mu)^2]\\&=a^2\mathrm{Var}(X)\end{align*}

References.

[1] Essential Discrete Mathematics for Computer Science, Harry Lewis and Rachel Zax, Princeton University Press, 2019

[2] A First Course in Probability, Sheldon Ross, 5th Edition, Prentice-Hall, 1998

Introductory Probability: Baye’s Theorem

Let $S$ be sample space and $E, F$ events. The event $E$ can be written as \begin{align*}E&=E\cap S\\&=E\cap(F\dot\cup F^c)\\&=(E\cap F)\dot\cup(E\cap F^c)\end{align*} By axiom 3 of finite probability, we have \begin{equation}\begin{aligned}P_r(E)&=P_r(E\cap F)+P_r(E\cap F^c)\\&=P_r(E|F)P(F)+P_r(E|F^c)P(F^c)\\&=P_r(E|F)P_r(F)+P_r(E|F^c)(1-P_r(F))\end{aligned}\label{eq:baye}\end{equation} This equation states that the probability of the event $E$ is a weighted average of the conditional probability of $E$ given that $F$ has happened and the conditional probability of $E$ given that $F$ has not occurred. The equation \eqref{eq:baye} is useful because often it is difficult to calculate the probability of the even $E$ directly but knowing the information on whether the other event $F$ has happened helps us to determine the probability of $E$.

Example. An insurance company divides people into two categories: those who are accident prone and those who are not. A statistics shows that an accident-prone person will have an accident at some time within a fixed 1-year period with probability 0.4. This probability decreases to 0.2 for a non-accident-prone person. If 30% of the population is accident prone, what is the probability that a new policyholder will have an accident within a year of purchasing a policy?

Solution. Let $E$ denote the event that the policyholder will have an accident within a year of purchase. Let $F$ denote the event that the policyholder is accident prone. Using the equation \eqref{eq:baye}, \begin{align*}P_r(E)&=P_r(E|F)P_r(F)+P_r(E|F^c)P_r(F^c)\\&=0.4\times 0.3+0.2\times 0.7\\&=0.26\end{align*}

Suppose that $P_r(E)$ and $P_r(F)$ are both nonzero. Then it follow from the conditional probabilities $$P_r(E|F)=\frac{P_r(E\cap F)}{P_r(F)},\ P_r(F|E)=\frac{P_r(F\cap E)}{P_r(E)}$$ that \begin{equation}\label{eq:baye2}P_r(E|F)=\frac{P_r(F|E)P_r(E)}{P_r(F)}\end{equation} The equation \eqref{eq:baye2} is usually called Baye’s Theorem, named after an English statistician and a philosopher Reverend Thomas Bayes (pronounced ‘beiz’). If we regard the event $E$ as a hypothesis and $F$ as an evidence, the probabilities $P_r(E)$ and $P_r(E|F)$ can be interpreted, respectively, as the initial degree of belief in $E$ and the degree of belief in $E$ after having accounted the evidence $F$. The factor $\frac{P_r(F|E)}{P_r(F)}$ can then be interpreted as the support $F$ provides for $E$.

Example. This is the second part of the previous example. Suppose that a new policyholder has an accident within a year of purchasing a policy. What is the probability that he or she is accident prone?

Solution. What the question is asking is $P_r(F|E)$. By Baye’s theorem \eqref{eq:baye2}, \begin{align*}P_r(F|E)&=\frac{P_r(E|F)P_r(F)}{P_r(E)}\\&=\frac{0.4\times 0.3}{0.26}=\frac{6}{13}\end{align*} i.e. 6 out of 13 who have an accident within a year of purchasing a policy are accident-prone people.

Example. A lab blood test is 95% effective in detecting a certain disease when it is present. The test also yields a false positive result for 1% of the healthy people tested. If 0.5% of the population actually has the disease, what is the probability a person has the disease given that the test result is positive.

Solution. Let $D$ be the event that the tested person has the disease and $E$ the event that the test result is positive. What is asked is to find $P_r(D|E)$. The available information is $P_r(E|D)=0.95$, $P_r(D)=0.005$, and $P_r(E|D^c)=0.01$. Using Baye’s theorem \eqref{eq:baye2} along with \eqref{eq:baye}, \begin{align*}P_r(D|E)&=\frac{P_r(E|D)P_r(D)}{P_r(E|D)P_r(D)+P_r(E|D^c)P_r(D^c)}\\&=\frac{0.95\times 0.005}{0.95\times 0.005+0.01\times 0.995}\\&=\frac{95}{294}\approx 0.323\end{align*} i.e. only 32% of those who tested positive actually have the disease.

Example. During a criminal investigation, the detective in charge is 60% convinced that a suspect is guilty. Now a new piece of evidence comes into light and it shows that the criminal has a certain characteristic (such as left-handedness, baldness, or brown hair). Suppose that 20% of the population possesses this characteristic. It turns out that the suspect does have this characteristic, how certain is the detective now that the suspect is guilty of the crime?

Solution. Let $G$ be the event that the suspect is guilty and $C$ the event that he possesses the characteristic of the criminal. What is asked is to find $P_r(G|C)$. The available information is then $P_r(G)=0.6$, $P_r(C|G^c)=0.2$, and $P_r(C|G)=1$ (The real criminal does have the characteristic.) Using Baye’s theorem \eqref{eq:baye2} along with \eqref{eq:baye}, \begin{align*}P_r(G|C)&=\frac{P_r(C|G)P_r(G)}{P_r(C|G)P_r(G)+P_r(C|G^c)P(G^c)}\\&=\frac{1\times 0.6}{1\times 0.6+0.2\times 0.4}\\&\approx 0.882\end{align*}

References.

[1] Essential Discrete Mathematics for Computer Science, Harry Lewis and Rachel Zax, Princeton University Press, 2019

[2] A First Course in Probability, Sheldon Ross, 5th Edition, Prentice-Hall, 1998

Introductory Probability: Conditional Probability

A conditional probability is the probability that an event will occur on the condition that some other event occurs. For example, the odds of rolling two 1s with two dice are $\frac{1}{36}$. Rolls of two dice are independent events, but if we know the roll of at least one of the dice is an odd number, then the likelihood of two 1s increases.

Definition. Let $A$ and $B$ be events. Suppose $P_r(B)>0$. Then the conditional probability of $A$ given $B$ is \begin{equation}\label{eq:condprob}P_r(A|B)=\frac{P_r(A\cap B)}{P_r(B)}\end{equation} If $P_r(B)=1$, then knowing the event $B$ occurs gives no information on event $A$ and $P_r(A|B)=P_r(A)$. If $P_r(B)<1$, then knowing event $B$ happened increases the probability of event $A$ happened by $\frac{1}{P_r(B)}$.

As a special case, if all outcomes are equally likely, then \begin{align*}P_r(A|B)&=\frac{P_r(A\cap B)}{P_r(B)}\\&=\frac{\frac{|A\cap B|}{|S|}}{\frac{B|}{|S|}}\\&=\frac{|A\cap B|}{|B|}\end{align*}

Example. (Rolling two 1s with two dice) The sample space is $\{1,2,3,4,5,6\}\times\{1,2,3,4,5,6\}$ and $A=\{(1,1)\}$. Let $B$ be the event that one of the dice has come up with an odd number. Then $$B=\{1,3,5\}\times\{1,2,3,4,5,6\}\cup\{1,2,3,4,5,6\}\times\{1,3,5\}$$ $|B|=27$ and $A\cap B=A$, so we obtain $$P_r(A|B)=\frac{1}{27}>\frac{1}{36}$$

Example. If a die is rolled twice, what is the probability that the sum of rolls is at least 9? If the first roll is a 4, does the probability increase, decrease, or stay the same?

Solution. Let $A$ be the event that the rolls total at least 9. Then $$A=\{(3,6),(4,5),(4,6),(5,4),(5,5),(5,6),(6,3),(6,4),(6,5),(6,6)\}$$ and $P_r(A)=\frac{|A|}{|S|}=\frac{10}{36}=\frac{5}{18}$. Let $B$ be the event that the first roll is 4. Then $$B=\{(4,1),(4,2),(4,3),(4,4),(4,5),(4,6)\}$$ and $A\cap B=\{(4,5),(4,6)\}$. Thus, $$P_r(A|B)=\frac{|A\cap B|}{|B|}=\frac{2}{6}=\frac{1}{3}>\frac{5}{18}=P_r(A)$$

Example. A coin is flipped twice. What is the conditional probability that both flips result in heads, given that the first flip does?

Solution. Let $A$ be the event that both flips land head and $B$ the event that the first flip lands head. Then $$A=\{(H,H)\},\ B=\{(H,H),(H,T)\}$$ and so $P_r(A|B)=\frac{1}{2}$.

Example. An urn contains 10 white, 5 yellow, and 10 black marbles. A marble is chosen at random from the urn and it is noted it is not one of the black marbles. What is the probability that it is yellow?

Solution. Let $A$ be the event that the marble selected is yellow and $B$ the event that the marble selected is not black. $A\cap B=A$ so $P_r(A\cap B)=\frac{5}{25}$. Since $P_r(B)=\frac{15}{25}$, \begin{align*}P_r(A|B)&=\frac{P_r(A\cap B)}{P_r(B)}\\&=\frac{5}{15}\\&=\frac{1}{3}\end{align*}

Remark. In the above example, since we know that the chosen marble is not black, we can reduced our sample space by including only white and yellow marbles. Then the probability that a marble chosen at random is yellow is simply $P_r(A)=\frac{5}{15}=\frac{1}{3}$. When all outcomes are assumed to be equally likely, it is often easier to compute a conditional probability by reducing sample space instead of directly applying the formula \eqref{eq:condprob}.

Remark. (Independent Events and Conditional Probability) Recall that two events $A$ and $B$ are independent if and only if $P_r(A\cap B)=P_r(A)P_r(B)$. One can easily show that two events $A$ and $B$ with nonzero probabilities are independent if and only if $$P_r(A|B)=P_r(A)$$

References.

[1] Essential Discrete Mathematics for Computer Science, Harry Lewis and Rachel Zax, Princeton University Press, 2019

[2] A First Course in Probability, Sheldon Ross, 5th Edition, Prentice-Hall, 1998