The Division Algorithm

Theorem 1. Suppose that $a,b\in\mathbb{Z}$ with $0<a<b$. Then there exists uniquely $q,r\in\mathbb{Z}$, $0\leq r<a$, such that

Theorem 2. If $a$ and $b$ are positive integers, then

Suppose that $0<a<b$. Then by the division algorithm, there exist uniquely $q,r\in\mathbb{Z}$ such that $b=aq+r$ where $0\leq r<a$. If $d=(a,b)$ then $d|r$. This means $d\leq(a,r)$ since $d$ is a common divisor of $a$ and $r$. Since $(a,r)|a$ and $(a,r)|b$, $(a,r)$ is a common divisor of $a$ and $b$. This implies $(a,r)\leq(a,b)=d$. Thus, $d=(a,r)=(a,b)$. Therefore, we have the following lemma.

Lemma 3. For any integers $a>0$, $b$, $c$ and $k$, if $a=bk+c$ then $(a,b)=(b,c)$.

Example. Let $a=123$ and $b=504$. Then
$$504=123\cdot 4+12$$
By Lemma, we have
$$123=12\cdot 10+3$$
By Lemma again, we have
Hence, we obtain $(123,504)=3$.

Theorem 4. (The Euclidean Algorithm) Let $a$ and $b$ be integers, $0<a<b$. Apply the division algorithm repeatedly as follows. \begin{align*}b&=aq_1+r_1,\ 0<r_1<a\\a&=r_1q_2+r_2,\ 0<r_2<r_1\\r_1&=r_2q_3+r_3,\ 0<r_3<r_2\\&\vdots\\r_{n-2}&=r_{n-1}q_n+r_n,\ 0<r_n<r_{n-1}\\r_{n-1}&=r_nq_{n+1}\end{align*}Let $r_n$ be the last nonzero remainder. Then $(a,b)=r_n$

Example. Compute $(158,188)$ using the Euclidean algorithm.
\begin{align*}188&=158\cdot 1+30\\158&=30\cdot 5+8\\30&=8\cdot 3+6\\8&=6\cdot 1+2\\6&=2\cdot 3+0 \end{align*} Hence, $(158,188)=2$.

It is possible to use the Euclidean algorithm to write $(a,b)$ in the form $ax+by$. In the above example, \begin{align*}2&=8-6\cdot 1\\&=8-(30-8\cdot 3)\cdot 1\\&=-30+8\cdot 4\\&=-30+(158-30\cdot 5)\cdot 4\\&=158\cdot 4-30\cdot 21\\&=158\cdot 4-(188-158\cdot 1)\cdot 21\\&=158\cdot 25+188\cdot(-21) \end{align*}
In general, the following property holds.

Theorem 5. (Bézout’s Lemma) If $a$ and $b$ are integers such that $(a,b)$ is defined, then there exist $x,y\in\mathbb{Z}$ such that

\eqref{eq:bezout} is called the Bézout’s identity.

Corollary 6. If $d$ is any common divisor of $a$ and $b$, not both of which are $0$, then $d|(a,b)$.

Definition. We say $k$ is a linear combination of $a$ and $b$ if there exist $x,y\in\mathbb{Z}$ such that $k=ax+by$.

Theorem 7. Given integers $a\ne 0$ and $b\ne 0$, and $m$, if $a|m$ and $b|m$ then $[a,b]|m$.

Proof. Assume that $\frac{m}{[a,b]}$ is not an integer. Then there exist $q,r\in\mathbb{Z}$ such that $m=[a,b]q+r$, $0<r<[a,b]$. The remainder $r$ is then written as $r=m-q[a,b]$. By assumption, $a|m$, and also $a|[a,b]$. So, $a|r$. By the same argument, we also obtain $b|r$. This means that $r$ is a common multiple of $a$ and $b$. But it is a contradiction to the fact that $0<r<[a,b]$. Hence, $r=0$.

Remark. This theorem and the preceding corollary are dual to each other.


Definition. We say $a$ divides $b$ and write $a|b$ if there exists $d\in\mathbb{Z}$ such that $b=ad$. We say that $a$ is a divisor of $b$ and that $b$ is a multiple of $a$. If $a|b$ is false, we write $a\not|b$.

Definition. We say $d$ is the greatest common divisor (gcd in short) of $a$ and $b$ if $d$ is the largest of all integers dividing both $a$ and $b$. We write $d=(a,b)$.

Example. Let $a=4$ and $b=6$. The divisors of $4$ are $1$, $-1$, $2$, $-2$, $4$, $-4$. The divisors of $6$ are $1$, $-1$, $2$, $-2$, $3$, $-3$, $6$, $-6$. So the common divisors of $4$ and $6$ are $1$, $-1$, $2$, $-2$ and $2=(4,6)$.

Definition. We say $m$ is the least common multiple (lcm in short) of $a$ and $b$ if $m$ is the smallest of all the positive integers that are multiples of both $a$ and $b$. We write $m=[a,b]$.

Example. Let $a=4$ and $b=6$. The positive multiples of $4$ are $4$, $8$, $12$, $16$, $20$, $24$, $28$, $\cdots$ and the positive multiples of $6$ are $6$, $12$, $18$, $24$, $30$, $\cdots$. Common positive multiples of $4$ and $6$ are $12$, $24$, $\cdots$ and $[4,6]=12$.

Theorem. Given integers $a$, $b$, and $c$,

  1. if $a|b$ then $a|bc$.
  2. if $a|b$ and $b|c$ then $a|c$.
  3. if $a|b$ and $a|c$ then $a|bx+cy$ for any $x,y\in\mathbb{Z}$.


  1. If $a|b$ then there exists $d\in\mathbb{Z}$ such that $b=ad$. Now $bc=(ad)c=a(dc)$ and $dc\in\mathbb{Z}$ and hence $a|bc$.
  2. Let $a|b$ and $b|c$. Then there exist $d_1,d_2\in\mathbb{Z}$ such that $b=ad_1$ and $c=bd_2$. Now we have
    and $d_1d_2\in\mathbb{Z}$. Hence, $a|c$.
  3. Let $a|b$ and $a|c$. Then there exist $d_1,d_2\in\mathbb{Z}$ such that $b=ad_1$ and $c=ad_2$. For any $x,y\in\mathbb{Z}$ \begin{align*} bx+cy&=(ad_1)x+(ad_2)y\\&=a(d_1x+d_2y) \end{align*}
    and $d_1x+d_2y\in\mathbb{Z}$. Hence, $a|bx+cy$.

Simple Epidemics: Stochastic Model 1

Suppose that initially there are $n$ susceptibles and one infective. Let $X(t)$ denote a random variable which represents that number of susceptibles still uninfected at time $t$. The probability that $X(t)$ the value $r$ is $p_r(t)$. At time $t$, there are $X(t)$ susceptibles and $n-X(t)+1$ infectives. Assume that the chance of a new infection in a short time interval is proportional to the product of the number of susceptibles, the number of infectives and the length of the interval. Then the chance of an infection in $\Delta t$ is given by $\beta X(t)(n-X(t)+1)\Delta t$, where $\beta$ is the contact rate. For simplicity, we change the time scale to $\tau=\beta t$. The chance then becomes $X(n-X+1)\Delta\tau$. Denote by $p_r(\tau)$ the probability that there are still $r$ susceptibles remaining uninfected at $\tau$. There are two possibilities that can occur at time $\tau+\Delta\tau$: There are either $r+1$ susceptibles at $\tau$ followed by a new infection with probability $(r+1)(n-r)\Delta\tau$ , or $r$ susceptibles at $\tau$ followed by no infection with probability $1-r(n-r+1)\Delta\tau$. Thus $p_r(\tau+\Delta\tau)$, the probability of $r$ susceptibles remaining at time $\tau+\Delta\tau$ is given by $$p_r(\tau+\Delta\tau)=(r+1)(n-r)\Delta\tau p_{r+1}(\tau)+\{1-r(n-r+1)\Delta\tau\}p_r(\tau)$$ from which we obtain the differential-difference equation \begin{align*}\frac{dp_r(\tau)}{d\tau}&=\lim_{\Delta\tau\to 0}\frac{p_r(\tau+\Delta\tau)-p_r(\tau)}{\Delta\tau}\\&=(r+1)(n-r)p_{r+1}(\tau)-r(n-r+1)p_r(\tau)\end{align*} where $0\leq r\leq n-1$. For $r=n$ at $\tau+\Delta\tau$, there are $n$ susceptibles at $\tau$ followed by no new infection with probability $1-n\Delta\tau$. Thus we have $$p_n(\tau+\Delta\tau)=(1-n\Delta\tau)p_n(\tau)$$ and $$\frac{dp_n}{d\tau}=-np_n(\tau)$$ At $\tau=0$, there is only one infective, so we have the initial condition $p_n(0)=1$. We now have the system of differential-difference equations \begin{equation}\label{eq:stocepidemic}\frac{dp_r(\tau)}{d\tau}=(r+1)(n-r)p_{r+1}(\tau)-r(n-r+1)p_r,\ 0\leq r\leq n\end{equation} with initial condition $p_n(0)=1$.

One may attempt to solve the system \eqref{eq:stocepidemic} of differential-difference equations successively, similarly to that of the Poisson distribution (see here). The first three solutions are given by \begin{align*}p_n(\tau)&=e^{-n\tau}\\p_{n-1}(\tau)&=\frac{n}{n-1}[1-e^{-(n-1)\tau}]e^{-(n-1)\tau}\\p_{n-1}(\tau)&=\frac{2n}{2n-5}e^{-(n-1)\tau}-\frac{2n}{n-4}e^{-2(n-1)\tau}+\frac{2n(n-1)}{(n-4)(2n-5)}e^{-3(n-2)\tau}\end{align*} It is already difficult to see any recognizable pattern that may lead us to a general form of the solution. There are alternative ways to solve \eqref{eq:stocepidemic}, one of which is by using a generating function (see here). Let $P(x,r)=\sum_{r=0}^np_r(\tau)x^r$. Multiplying \eqref{eq:stocepidemic} by $x^r$ and summing the result over $r$ from $r=0$ to $r=n$, we obtain the partial differential equation $$\frac{\partial P}{\partial\tau}=(1-x)\left\{n\frac{\partial P}{\partial x}-x\frac{\partial^2 P}{\partial x^2}\right\}$$ with initial condition $P(x,0)=x^n$.


  1. Norman T. J. Bailey, A Simple Stochastic Epidemic, Biometrika, Vol. 37, No. 3/4 (Dec., 1950), 193-202.
  2. The Mathematical Theory of Infectious Diseases and Its Applications, Norman T. J. Bailey, Second Edition, Charles Griffin & Company LTD, 1975.

The Poisson Process: Use of Generating Functions

In here, we obtained the Poisson distribution by successively solving a differential-difference equation. While getting successive solutions of the Poisson distribution was easy and simple, the same thing can’t be said in general. In this note, we discuss another method of obtaining probability distributions. Let $P(x,t)=\sum_{n=0}^\infty p_n(t)x^n$. $P(x,t)$ is called a probability generating function for the probability distribution $p_n(t)$. Recall the differential-difference equation (2) in here: $$\frac{dp_n(t)}{dt}=\lambda\{p_{n-1}(t)-p_n(t)\},\ n\geq 0$$ with $p_{-1}(t):=0$ and $p_0(0)=1$. Multiplying the equation by $x^n$ and then summing over $n$ from $n=1$ to $\infty$, we obtain the differential equation \begin{equation}\label{eq:genfn}\frac{\partial P(x,t)}{\partial t}=\lambda(x-1)P(x,t)\end{equation} with $P(x,0)=1$. \eqref{eq:genfn} only contains derivative with respect to $t$, so it is a separable equation. Its solution is given by $$P(x,t)=e^{\lambda t(x-1)}$$ From this we easily obtain the Poisson distribution $$p_n(t)=\frac{e^{-\lambda t}(\lambda t)^n}{n!}$$


  1. Norman T. J. Bailey, The Elements of Stochastic Processes with Applications to the Natural Sciences, John Wiley & Sons, Inc., 1964.

Quantum Gates: Single Qubit Gates

A classical computer is built from electrical circuits that consist of logic gates and wires. The logic gates perform intended logical operations of information and wires pass information between logic gates. Similarly, a quantum computer would be built from quantum circuits that consist of quantum gates and wires. However, when it comes down to comparing how operations are performed at elementary gates level, there is no similarity between classical logic gates and quantum gates. Unlike classical logic gates, quantum gates operate not according to logical rules but to physical (quantum mechanical) laws. A single qubit gate acts on not only on qubits $|0\rangle=\begin{bmatrix}1\\0\end{bmatrix}$ and $|1\rangle=\begin{bmatrix}0\\1\end{bmatrix}$ but also on a superposition of $|0\rangle$ and $|1\rangle$, $\alpha|0\rangle+\beta|1\rangle$, where $\alpha$ and $\beta$ are complex numbers. For an obvious reason, single qubit gates are $2\times 2$ matrices. Furthermore, a quantum gate operation is performed through a unitary time evolution, and this means that they are required to be unitary matrices. Recall that a matrix $U$ of complex entries is unitary if $UU^\dagger=I$ where $\dagger$ denotes transpose conjugate i.e. $U^\dagger = \bar U^t$. If we write $U=\begin{bmatrix}a & b\\c & d\end{bmatrix}$, then $U$ is unitary if and only if $|a|^2+|b|^2=|c|^2+|d|^2=1$ and $a\bar c+b\bar d=0$. Any unitary matrix is qualified to be a quantum gate but not all unitary matrices would be interesting or useful gates. For practical purposes, it is important to see which gates are universal if there are any and that which gates are physically implementable. The following are some examples of well-known single cubit gates.

  1. Pauli-X, Pauli-Y, Pauli-Z gates. A matrix $H$ is Hermitian if $H=H^\dagger$. Hermitian matrices play a crucial role in quantum mechanics as Hamiltonians are required to be Hermitian. The eigenvalues of a Hamiltonian are the energies of the corresponding quantum system and of course they all have to be real. Hermitian matrices are guaranteed to have all real eigenvalues. A $2\times 2$ Hermitian matrix can be written in the form $H=\begin{bmatrix}x_0+x_3 & x_1-ix_2\\x_1+ix_2 & x_0-x_3\end{bmatrix}$ where the $x_i$ are all reals. The matrices \begin{equation}\label{eq:pauli}\sigma_0=\begin{bmatrix}1 & 0\\0 & 1\end{bmatrix},\ \sigma_1=\begin{bmatrix}0 & 1\\1 & 0\end{bmatrix},\ \sigma_2=\begin{bmatrix}0 & -i\\i & 0\end{bmatrix},\ \sigma_3=\begin{bmatrix}1 & 0\\0 & -1\end{bmatrix}\end{equation} form a basis for the space of all Hermitian matrices. The $\sigma_i$ in \eqref{eq:pauli} are called the Pauli spin matrices in physics. $\sigma_0$ is just the identity matrix but notice that $\sigma_1,\sigma_2,\sigma_3$ are also unitary matrices. They are called Pauli-X, Pauli-Y, Pauli-Z gates, respectively. Pauli-X gate is also called NOT gate as it acts like the classical NOT gate. \begin{align*}\begin{bmatrix}0 & 1\\1 & 0\end{bmatrix}\begin{bmatrix}1\\0\end{bmatrix}&=\begin{bmatrix}0\\1\end{bmatrix}\\\begin{bmatrix}0 & 1\\1 & 0\end{bmatrix}\begin{bmatrix}0\\1\end{bmatrix}&=\begin{bmatrix}0\\1\end{bmatrix}\end{align*}
  2. Square root of NOT gate. The gate $$\sqrt{\mathrm{NOT}}=\frac{1}{2}\begin{bmatrix}1+i & 1-i\\1-i & 1+i\end{bmatrix}$$ is called square root of NOT gate because $$\sqrt{\mathrm{NOT}}\cdot\sqrt{\mathrm{NOT}}=\begin{bmatrix}0 & 1\\1 & 0\end{bmatrix}$$ $\sqrt{\mathrm{NOT}}$ is one of the gates that are truly quantum because there is no classical counterpart.
  3. Hadamard gate. $$H=\frac{1}{\sqrt{2}}\begin{bmatrix}1 & 1\\1 & -1\end{bmatrix}$$ is called the Hadamard gate. It turns $|0\rangle$ to $\frac{|0\rangle+|1\rangle}{\sqrt{2}}$ and $|1\rangle$ to $\frac{|0\rangle-|1\rangle}{\sqrt{2}}$.


  1. Michael A. Nielsen and Isaac L. Chung, Quantum Computation and Quantum Information, Cambridge University Press, 2000.
  2. Colin P. Williams and Scott H. Clearwater, Explorations in Quantum Computing, Spinger Telos, 1998.