Probability answers the following question. *How frequently should we expect one outcome or another to occur if we repeat some procedure many times?* First let’s introduce some terminologies that are frequently used in studying probability:

- Trial: a repeatable procedure such as tossing a coin or drawing a marble from a jar
- Experiment: a sequence of individual trials
- Sample space (of a trial): the set of all possible outcomes
- An event: any set of outcomes, so it is a subset of sample space

*Example*. Rolling a die is a trial. It’s sample space is the set $\{1,2,3,4,5,6\}$. The subset $\{2,4,6\}$ is an event.

A sample space in general can be an infinite set but we consider only finite sample space here. Let $S$ be a sample space. A *probability function* $P_r: \wp(S)\longrightarrow\mathbb{R}$ assigns probabilities to events. Any probability function must satisfy the *axioms of finite probability*:

- $0\leq P_r(A)\leq 1$ for any event $A$.
- $P_r(\emptyset)=0$ and $P_r(S)=1$.
- For any events $A$ and $B$ with $A\cap B=\emptyset$, $$P_r(A\cup B)=P_r(A)+P_r(B)$$

*Remark*. Because of axiom 1, the codomain of a probability function $P_r$ can be restricted to the unit interval $[0,1]$, i.e. $P_r:\wp(S)\longrightarrow[0,1]$.

*Theorem*. The probability of an event $A$ not happening is $$P_r(A^c)=1-P_r(A)$$

*Proof*. It follows from $S=A\dot\cup A^c$ and axiom 3. Here, $\dot\cup$ means disjoint union.

*Theorem*. Let $A$ and $B$ be events such that $A\subseteq B$. Then $P_r(A)\leq P_r(B)$.

*Proof*. The set $B$ cam be written as $B=A\dot\cup(B-A)$, thus by axiom 3 $$P_r(B)=P_r(A)+P_r(B-A)\geq P_r(A)$$

This theorem is extended to

*Theorem*. Let $A_1,A_2,\cdots,A_n$ be events such that $A_i\cap A_j=\emptyset$ for $i\ne j$. Then $$P_r\left(\bigcup A_i\right)=\sum P_r(A_i)$$

Let $A$ and $B$ be two events and assume that $A$ and $B$ are not necessarily disjoint. How do we calculate $P_r(A\cup B)$, i.e. the probability of the event $A$ or $B$ happening? The following theorem answers that.

*Theorem*. (Inclusion-Exclusion Principle) Let $A$ and $B$ be two events (not necessarily disjoint). Then $$P_r(A\cup B)=P_r(A)+P_r(B)-P_r(A\cap B)$$

*Proof*. \begin{align*}A&=(A-B)\dot\cup(A\cap B)\\B&=(B-A)\dot\cup(A\cap B)\end{align*} Hence by axiom 3, we have \begin{align*}P_r(A-B)&=P_r(A)-P_r(A\cap B)\\P_r(B-A)&=P_r(B)-P_r(A\cap B)\end{align*} Also $A-B$, $B-A$ and $A\cap B$ are mutually disjoint, so by the previous theorem we have \begin{align*}P_r(A\cup B)&=P_r(A-B)+P_r(B-A)+P(A\cap B)\\&=P_r(A)+P_r(B)-P_r(A\cap B)\end{align*}

Consider a trial with $n$ possible outcomes. Assume that the outcomes are equally likely. Examples of trials with equally likely outcomes include coin tossing, rolling a die, drawing marbles from a jar, etc. Let $S$ be the sample space of a trial with equally likely outcomes. Define a function $P_r: \wp(S)\longrightarrow[0,1]$ by $$P_r(E)=\frac{|E|}{|S|}$$ for event $E$. Then it is straightforward to show that $P_r$ satiesfies axioms 1-3, i.e. it is a probability function.

*Example*. If a fair coin is flipped 4 times, what is the probability that exactly 2 of the flip’s land heads up?

*Solution*. $|S|=2^4=16$. Let $E$ be the set of outcomes that contain exactly 2 heads. Then $|E|=\begin{pmatrix}4\\2\end{pmatrix}=6$. Hence, $P_r(E)=\frac{6}{16}=\frac{3}{8}$.

*Example*. *The Gambler’s Fallacy*. Let $n$ be a large number. If a coin is flipped $n$ times and comes up heads the first $n-1$ times, what is the probability that the $n$th flip comes up head?

*Solution*. Although your intuition might tell you that the coin is due to come up tail, the coin doesn’t remember the history of previous outcomes. The probability is still $\frac{1}{2}$.

When the knowledge of whether an event $A$ occurred gives no information about whether another event $B$ occurred, $A$ and $B$ are said to be independent. Mathematically, the events $A$ and $B$ are independent if and only if $$P_r(A\cap B)=P_r(A)P_r(B)$$

*Example*. Let us consider $n=4$ case in the previous example. Let $A$ be the event that a coin is flipped 4 times and comes up heads first three times. Let $B$ be the event that a coin is flipped 4 times and comes up head last time. We show that $A$ and $B$ are independent events. Then \begin{align*}A&=\{HHHH,HHHT\}\\B&=\{HHHH,HHTH,HTHH,THHH,THTH,TTHH,TTTH\}\end{align*} Hence, \begin{align*}P_r(A\cap B)&=\frac{|A\cap B|}{|S|}=\frac{1}{16}\\P_r(A)P_r(B)&=\frac{|A|}{|S|}\frac{|B|}{|S|}=\frac{2}{16}\frac{8}{16}=\frac{1}{16}\end{align*} Since $P_r(A\cap B)=P_r(A)P_r(B)$, $A$ and $B$ are independent.

*Example*. Suppose three marbles are drawn from a jar containing six red marbles and four blue marbles. Let us consider the following events.

- $A$: The three marbles are not all the same color.
- $B$: The first marble drawn is red.

Are $A$ and $B$ independent?

*Solution*. The number of ways of drawing 3 marbles of the same color is $$\begin{pmatrix}6\\3\end{pmatrix}+\begin{pmatrix}4\\3\end{pmatrix}=24$$ So the number of ways of choosing 3 marbles that are not all the same color, i.e. $|A|=\begin{pmatrix}10\\3\end{pmatrix}-24=120-24=96$. Hence, $P_r(A)=\frac{96}{120}=\frac{4}{5}$. The probability that the first marble drawn is red is just $P_r(B)=\frac{6}{10}=\frac{3}{5}$. To calculate $P_r(A\cap B)$, since first marble is drawn red, the remaining two marbles drawn must contain at least one blue marble. There are $\begin{pmatrix}5\\2\end{pmatrix}=10$ ways of getting 2 more red marbles and so there are $\begin{pmatrix}9\\2\end{pmatrix}-10=36-10=26$ ways of drawing at least one blue marble. Therefore, $P_r(A\cap B)=\frac{6}{10}\frac{26}{36}=\frac{13}{30}$. On the other hand, $P_r(A)P_r(B)=\frac{4}{5}\frac{3}{5}=\frac{12}{25}$. Since $P_r(A\cap B)\ne P_r(A)P_r(B)$, the events $A$ and $B$ are not independent.

*References*.

[1] Essential Discrete Mathematics for Computer Science, Harry Lewis and Rachel Zax, Princeton University Press, 2019