Shannon Entropy

Let $X$ be a discrete random variable and suppose it takes values $x_1,\cdots,x_n$. Let $p$ be the probability distribution function (pdf in short) of the random variable $X$. For each $i=1,\cdots,n$, denote by $p_i$ $p(x_i)$, i.e.
$$p_i=P\{X=x_i\}$$
Here is a question. Is there a unique way to measure the amount or degree of uncertainty represented by the probability distribution? The answer is affirmative and Claude E. Shannon, in his landmark paper [2], was able to obtain such measurement of uncertainty. It is denoted by $H(p_1,\cdots,p_n)$ and called the entropy. Regardless of the name, there is no reason to expect that this notion of entropy is related to thermodynamic entropy because it is obtained independently from any physical hypotheses. Remarkably however, it coincides with thermodynamic entropy except for the presence of Boltzmann constant. To distinguish it from thermodynamic entropy, it is often called the information entropy or Shannon entropy. Shannon imposed the following three conditions on $H$:

  1. $H$ is a continuous function of the $p_i$.
  2. If all $p_i$ are equal, the quantity $$A(n)=H\left(\frac{1}{n},\cdots,\frac{1}{n}\right)$$ is a monotone increasing function of $n$. An increase in $n$ results in a decrease in the probability, therefore the uncertainty (entropy) increases.
  3. Instead of considering of the events $x_1,\cdots,x_n$ with the respective probabilities $p_1,\cdots,p_n$, we may group the first $k$ of them $x_,\cdots,x_k$ as a single composite event and is given the probability $w_1=p_1+\cdots+p_k$. Then next $m$ events $x_{k+1},\cdots,x_{k+m}$ are gouped together as a single composite event and is given the probability $w_2=p_{k+1}+\cdots+p_{k+m}$, and so on so forth. The entropy regarding these composite events is $H(w_1,\cdots,w_r)$. In addition, since the composite events have occurred, the events $x_1,\cdots,x_k$ should be given the conditional probabilities $\frac{p_1}{w_1},\cdots,\frac{p_k}{w_1}$, the events $x_{k+1},\cdots,x_{k+m}$ should be given the conditional probabilities $\frac{p_{k+1}}{w_2},\cdots,\frac{p_{k+m}}{w_2}$, and so on so forth. In order for our information measurement is to be consistent, the ultimate state of knowledge we obtained this way should be the same as one obtained by considering the event $x_1,\cdots,x_n$ with probabilities $p_1,\cdots,p_n$. The argument is summed up as \begin{equation}\begin{aligned}H(p_1,\cdots,p_n)=&H(w_1,\cdots,w_r)+w_1H\left(\frac{p_1}{w_1},\cdots,\frac{p_k}{w_1}\right)\\&+w_2H\left(\frac{p_{k+1}}{w_2},\cdots,\frac{p_{k+m}}{w_2}\right)+\cdots\end{aligned}\label{eq:composition}\end{equation}The appearance of the weighting factors $w_i$ is of course due to the fact that, for example, we run into the additional uncertainty $H\left(\frac{p_1}{w_1},\cdots,\frac{p_k}{w_1}\right)$ only with the probability $w_1$. \eqref{eq:composition} is called the composition law.

Example. $H\left(\frac{1}{2},\frac{1}{3},\frac{1}{6}\right)=H\left(\frac{1}{2},\frac{1}{2}\right)+\frac{1}{2}H\left(\frac{2}{3},\frac{1}{3}\right)$ by grouping the events with probabilities $\frac{1}{3}$ and $\frac{1}{6}$ as a composite event.

Example. Let us consider three events $x_1,x_2,x_3$ with respective probabilities $p_1=\frac{3}{9}$, $p_2=\frac{4}{9}$, $p_3=\frac{2}{9}$. We may view these three events as three composite events resulted from nine events with equal probability $\frac{1}{9}$ by grouping the first 3 events, then the next 4 events, and finally the remaining 2 events together. Using \eqref{eq:composition}, we obtain
\begin{equation}
\label{eq:entropy}
H\left(\frac{3}{9},\frac{4}{9},\frac{2}{9}\right)+\frac{3}{9}A(3)+\frac{4}{9}A(4)+\frac{2}{9}A(2)=A(9)
\end{equation}

In exactly the same manner, if we write $p_i=\frac{n_i}{\sum_in_i}$, \eqref{eq:entropy} is readily generalized to
\begin{equation}
\label{eq:entropy2}
H(p_1,\cdots,p_n)+\sum_ip_iA(n_i)=A\left(\sum_in_i\right)
\end{equation}
As a special case, if we choose $n_i=m$, then \eqref{eq:entropy2} becomes
\begin{equation}
\label{eq:entropy3}
A(m)+A(n)=A(mn)
\end{equation}
Obviously,
\begin{equation}
\label{eq:entropy4}
A(n)=K\ln n
\end{equation}
(where $K>0$ due to condition 2) is a solution. But is that a unique solution? To see that, let us consider a continuum extension $A(x)$ of $A(n)$ which is at least twice differentiable. Then \eqref{eq:entropy3} can be written as the functional equation
\begin{equation}
\label{eq:entropy5}
A(x)+A(y)=A(xy)
\end{equation}
Differentiating \eqref{eq:entropy5} with respect to $x$, we obtain
\begin{equation}
\label{eq:entropy6}
A'(x)=yA'(xy)
\end{equation}
Differentiating \eqref{eq:entropy6} with respect to $y$, we obtain
\begin{equation}
\label{eq:entropy7}
0=A'(xy)+xyA^{\prime\prime}(xy)
\end{equation}
By introducing a new variable $z=xy$, \eqref{eq:entropy7} can be written as the second-order linear differential equation
$$zA^{\prime\prime}(z)+A'(z)=0$$
whose solution is
$$A(z)=K\ln(Cz)$$
That is, we have $A(n)=K\ln(Cn)$ where $K>0$ and $C>0$ are constants. Hence, $A(n)$ is unique up to constant multiples. Since entropy is a qualitative measurement of uncertainty rather than a quantitative one, we may choose $C=1$. For the same reason, we may choose $K=1$ as well but for now we will keep the constant multiple $K$. Substituting \eqref{eq:entropy4} into \eqref{eq:entropy2}, we finally obtain Shannon entropy
\begin{equation}
\label{eq:shannon}
\begin{aligned}
H(p_1,\cdots,p_n)&=K\ln\left(\sum_in_i\right)-K\sum_ip_i\ln n_i\\
&=-K\sum_ip_i\ln\left(\frac{n_i}{\sum_in_i}\right)\\
&=-K\sum_ip_i\ln p_i
\end{aligned}
\end{equation}

Definition. There is a generalization of Shannon entropy called the Rényi entropy
\begin{equation}
\label{eq:renyi}
H_r(p_1,\cdots,p_n)=\frac{1}{1-r}\ln\left(\sum_ip_i^r\right)
\end{equation}
for $0<r<\infty$ and $r\ne 1$.

Proposition 1. $$\lim_{r\to 1}H_r(p_1,\cdots,p_n)=-\sum_ip_i\ln p_i$$
i.e., Shannon entropy (with $K=1$) is the limit of the Rényi entropy as $r\to 1$.

Proof. Let $f(r)=\ln\left(\sum_ip_i^r\right)$. Then it is readily seen that the limit on the left is simply $f'(1)$.

Definition. Tsallis entropy is another generalization of Shannon entropy. It is defined by
\begin{equation}
\label{eq:tsallis}
S_r(p_1,\cdots,p_n)=\frac{1}{r-1}\left(1-\sum_ip_i^r\right)
\end{equation}

One can easily show that the limit of Tsallis entropy as $r\to 1$ is Shannon entropy (with $K=1$) similarly to the proof of proposition 1.

References:

  1. E. T. Jaynes, Information Theory and Statistical Mechanics, The Physical Review, Vol. 106, No. 4, 620-630, May 15, 1957
  2. Claude E. Shannon, A mathematical theory of communication, Bell Sys. Tech. Journal, 27: 379-423, 623-656, 1948

Computing for Number Theory 1: GCD, LCD, Linear Diophantine Equations

Here, we study some computing problems that are related to Divisibility and the Division Algorithm.

Maxima

GCD: Maxima has a function to compute the GCD (Greatest Common Divisor) gcd(a,b).

Example. Find the GCD of 158 and 188.

(%i1) gcd(158,188);

(%o1) 2

LCM: Maxima has a function to compute the LCM (Least Common Multiple) lcm(express_1,...express_n). This function needs to be loaded by running the command load("functs").

Example. Find the LCD of 6, 8, 15, 32.

(%i1) load("functs")$

(%i2) lcm(6,8,15,32);

(%o2) 480

Bézout’s Lemma (see here) says that given two integers $a,b$, there is an integer solution $(x,y)$ to the equation $ax+by=(a,b)$. The Maxima command gcdex implements the Euclidean algorithm. gcdex(a,b) returns [x, y, u] where $u=(a,b)$ and $ax+by=u$. So, one can find a solution of the equation $ax+by=(a,b)$ using gcdex.

Example. Solve $2=158x+188y$.

We saw earlier that $(158,188)=2$.

(%i2) gcdex(158,188);

(%o2) [25, - 21, 2]

Hence, $x=25$ and $y=-21$.

One can also use the command igcdex. gcdex and igcdex both implement the Euclidean algorithm. The only difference between them is that the argument of igcdex must be integers while the arguments of gcdex can be also polynomials. For instance,

(%i7) gcdex(x^2 + 1,x^3 + 4);

(%o7)/R/ [$\displaystyle\frac{x^2+4x-1}{17}$, $\displaystyle\frac{x+4}{17}$, 1]

Python

GCD: Algorithm

  1. The name of the Python code is gcd(a,b) and it implements the Euclidean algorithm to find the GCD of two integers $a$ and $b$.
  2. If b == 0 then gcd(a,b) returns 0 because every nonzero integer divides 0.
  3. Otherwise, it returns gcd(b,a % b). % is the modulo or remainder operator. a % b returns the remainder when $a$ is divided by $b$ if $a\geq b$ or $a$ if $a<b$ because $a=b\cdot 0+a$. For example, 12345 % 2 returns 1.
  4. The loop continues until b==0.

Use your favorite editor to write the following python code (it is called a Python module) and save it as, for example, gcd.py. The Python interface IDLE has own editor. I also recommend Emacs and Geany for Python editors if you are running Python in command shell with iPython.

# Computing the GCD of two integers a and b recursively using the Euclidean algorithm

def gcd(a,b):
    if (b == 0):
        return a
    return gcd(b, a%b)

On IDLE, the Python module can be run from its pull-down (or drop-down) menu under Run. Its shortcut key is F5. To run it on iPython, first change your current folder location to the one that contains the Python module gcd.py using cd (change directory) command in command shell and then start iPython by typing ipython (you can first start iPython and then change your current folder location as well using cd within iPython) and pressing the enter key. Once iPython is loaded, do

$ ipython
Python 3.7.9 (default, Feb 4 2021, 01:17:56)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: run gcd.py
In [2]: gcd(98,56)
Out[2]: 14

The Euclidean Algorithm: The Euclidean algorithm itself is a perfect computer algorithm. A Python code of the Euclidean algorithm is

# This Python module runs the Euclidean Algorithm to find the GCD of two integers a and b
def euclid(a,b):
    r = a % b
    while r > 0:
        print (a, b, r)
        a, b, r = b, r, b % r
    return b

Running this Python module (named euclid.py) on iPython shows

$ ipython
Python 3.7.9 (default, Feb 4 2021, 01:17:56)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: run euclid.py
In [2]: euclid(98,56)
98 56 42
56 42 14
Out[2]: 14

LCM: The LCM $[a,b]$ can be obtained by $\frac{ab}{(a,b)}$ (see here).

Algorithm: The name of the Python code is lcm(a,b). It computes the LCM of two integers $a$ and $b$.

  1. Find the GCD of $a$ and $b$ by running the Python program gcd(a,b).
  2. Compute the LCM of $a$ and $b$ using the formula a*b/gcd(a,b).

Its Python code is

# Compute the LCM of two integers a and b
Recursive function to return the GCD of a and b

def gcd(a,b):
    if (b == 0):
        return a
    return gcd(b, a%b)

# Function to return the LCM of a and b

def lcm(a,b):
    return a*b/gcd(a,b)

Running this Python code (named lcm.py) on iPython shows

In [3]: run lcm.py
In [4]: lcm(15,20)
Out[4]: 60.0

Linear Diophantine Equations: As seen here, a linear diophantine equation
\begin{equation}
\label{eq:lindiophant}
ax+by=c
\end{equation}
is solvable if and only if $(a,b)|c$. If $b|a$ then $(a,b)=b$ and
$$x=0,\ y=\frac{c}{b}$$
is a solution. If $b\not|a$, then write $a=bq+r$ and substitute into \eqref{eq:lindiophant}:
$$(bq+r)x+by=c$$
This can be rearranged as
$$b(qx+y)+rx=c$$
Setting $u=qx+y$ and $v=x$, we obtain the equation
\begin{equation}
\label{eq:lindiophant2}
bu+rv=c
\end{equation}
with smaller coefficients. If we can solve \eqref{eq:lindiophant2}, we recover a solution of the original equation as
$$x=v,\ y=u-qv$$
The process terminates (according to the Euclidean algorithm) with an equation where the second coefficient divides the first. This scheme is embodied in the following Python code

# Solve a linear diophantine equation using the Euclidean algorithm

# The function divmod returns a pair of numbers, the quotient and remainder, which we store in q, r.

def diosolve(a,b,c):
    q, r = divmod(a,b)
    if r == 0:
        return( [0,c/b] )
    else:
        sol = diosolve( b, r, c )
        u = sol[0]
        v = sol[1]
        return( [ v, u-q*v ] )

Running this Python code (named diosolve.py) on iPython to solve the linear diophantine equation $12345x+54321y=3$:

In [9]: run diosolve.py
In [10]: diosolve(12345, 54321, 3)
Out[10]: [3617.0, -822.0]

Note that $(12345,54321)=3$, so of course the diophantine equation is solvable.

Refernces.

[1] Jim Carlson, A Short Course in Python for Number Theory, Draft of May 21, 2004

[2] Maximal 5.29.0 Manual, 29. Number Theory

Primality Tests

We begin by discussing a method of computing $a^d\mod n$, called the repeated squaring method, which is efficient enough to require only a handheld calculator using the following example.

Example. Let us compute the least residue of $848^{187}\mod 1189$ by using only a handheld calculator. First we convert 187 to the base 2:
\begin{align*} 187&=(10111011)_2\\ &=1\cdot 2^7+0\cdot 2^6+1\cdot 2^5+1\cdot 2^4+1\cdot 2^3+0\cdot 2^2+1\cdot 2+1\\ &=128+32+16+8+2+1 \end{align*}
Let $a=848$ and $n=1189$.
$$\begin{array}{|c|c|}
\hline
k & a^k\mod n\\
\hline
1 & 848\\
\hline
2 & 848^2=719104\equiv 948\mod 1189\\
\hline
4 & 948^2=898704\equiv 1009\mod 1189\\
\hline
8 & 1009^2= 1018081\equiv 297\mod 1189\\
\hline
16 & 297^2=88209\equiv 223\mod 1189\\
\hline
32 & 223^2=49729\equiv 980\mod 1189\\
\hline
64 & 980^2=960400\equiv 877\mod 1189\\
\hline
128 & 877^2=769129\equiv 1035\mod 1189\\
\hline
\end{array}$$
Hence,
\begin{align*} 848^{187}&=848^{128+32+16+8+2+1}\\ &=848^{128}\cdot 848^{32}\cdot 848^{16}\cdot 848^8\cdot 848^2\cdot 848\\ &\equiv 1035\cdot 980\cdot 223\cdot 297\cdot 948\cdot 848\mod 1189\\ &\equiv 190\mod 1189 \end{align*}

Here is our first primality test. Let us recall Fermat Little Theorem: If $n$ is a prime and $n\not|a$, then $a^{n-1}\equiv 1\mod n$. The contrapositive of the above statement can serve as a primality test: if $a^{n-1}\not\equiv 1\mod n$, then $n$ is not a prime or $n|a$.

Example. Let us show that $33$ is not a prime using the above primality test. (Of course it is not prime because $33=3\cdot 11$.) $33\not|2$ and
\begin{align*} 2^{33-1}&=2^{32}\\ &=(2^5)^62^2\\ &=(32)^62^2\\ &\equiv(-1)^62^2\mod 33\\ &\equiv 4\mod 33 \end{align*}
Thus, $33$ is not a prime.

Here is another primality test.

Theorem. If the integer $n>1$ has no prime divisor less than or equal to $\sqrt{n}$, then $n$ is prime.

Proof. Suppose that $n$ is a composite number. Then $n=d_1d_2$ for some $d_1>1$ and $d_2>1$. Suppose that all prime divisors are greater than $\sqrt{n}$. Then $d_1>\sqrt{n}$ and $d_2>\sqrt{n}$. But $n=d_1d_2>(\sqrt{n})^2=n$, which is impossible. This proves the theorem. If $d_1\leq\sqrt{n}$, then $d_1$ is either prime or else has a prime divisor less than or equal to $\sqrt{n}$.

Question: Is the statement “If $2^{n-1}\equiv 1\mod n$ then $n$ is a prime.” true? The ancient Chinese believed so, but there is a counterexample. $2^{340}\equiv 1\mod 341$. But, $341=11\cdot 31$ (this is left as an exercise), so it is not a prime.

Definition. We call $n$ a pseudoprime if $2^{n-1}\equiv 1\mod n$ but $n$ is composite. More generally, a composite number $n$ such that $a^{n-1}\equiv 1\mod n$ is called a pseudoprime to base $a$.

The smallest pseudoprime is $341$, and it was not discovered until 1819. Bases other than $2$ can be used to identify composite numbers. For example,
$$3^{340}\equiv 56\mod 341$$
Thus, $341$ is not a prime number. These pseudoprimes are rarer then primes. Ever rarer are pseudoprimes to multiples bases. It is known, for example, that there are only 1770 integers below $25\cdot 10^9$ that are simultaneously pseudoprimes to the bases $2$, $3$, $5$ and $7$. So, the primality of numbers less than $25\cdot 10^9$ can be determined by testing Fermat’s congruence $a^{n-1}\equiv 1\mod n$ with these four bases, then comparing any number passing all 4 tests with a list of the 1770 exceptions. If the number is not any of those 1770 exceptions, it must be a prime.

Next we discuss Mersenne numbers and Fermat numbers, and primality tests those numbers.

Definition. If $k$ is a positive integer, we call $M_k=2^k-1$ a Mersenne number.

$$\begin{array}{|c|c|c|}
\hline
k & M_k=2^k-1 & \mbox{prime?}\\
\hline
1 & 1 & \mbox{N}\\
\hline
2 & 3 & \mbox{Y}\\
\hline
3 & 7 & \mbox{Y}\\
\hline
4 & 15=3\cdot 5 & \mbox{N}\\
\hline
5 & 31 & \mbox{Y}\\
\hline
6 & 63=9\cdot 7 & \mbox{N}\\
\hline
7 & 127 & \mbox{Y}\\
\hline
8 & 255=3\cdot 5\cdot 7 & \mbox{N}\\
\hline
9 & 511=7\cdot 73 & \mbox{N}\\
\hline
10 & 1023=3\cdot 11\cdot 31 & \mbox{N}\\
\hline
\end{array}$$
From this table, one may expect that $M_k$ is prime when $k$ is prime. However, $M_{11}=2047=23\cdot 89$ is not a prime.

Theorem.
If $d|k$, then $M_d|M_k$, so if $M_k$ is prime, then $k$ is prime.

Proof. Since $d|k$, $k=dd_1$ for some $d_1\in\mathbb{Z}$.
$$M_k=2^k-1=2^{dd_1}-1=(2^d-1)(2^{(d(d_1-1)}+2^{(d(d_1-2)}+\cdots+2^d+1)$$

$$\begin{array}{|c|c|c|}
\hline
k & 2^k+1 & \mbox{prime?}\\
\hline
1 & 3 & \mbox{Y}\\
\hline
2 & 5 & \mbox{Y}\\
\hline
3 & 9=3\cdot 3 & \mbox{N}\\
\hline
4 & 17 & \mbox{Y}\\
\hline
5 & 33=3\cdot 11 & \mbox{N}\\
\hline
6 & 65=5\cdot 13 & \mbox{N}\\
\hline
7 & 129=3\cdot 43 & \mbox{N}\\
\hline
8 & 257 & \mbox{Y}\\
\hline
9 & 513=3\cdot 3\cdot 3\cdot 19 & \mbox{N}\\
\hline
10 & 1025=5\cdot 5\cdot 41 & \mbox{N}\\
\hline
\end{array}$$

Theorem. If $k$, $a$, and $b$ are positive integers such that $k=ab$, where $a$ is odd, then
$$2^b+1|2^k+1$$
In particular, if $2^k+1$ is prime, then $k$ is $0$ or a power of $2$.

Proof. Under the assumption, we want to show that
$$2^k+1\equiv 0\mod 2^b+1$$
or
$$2^k\equiv -1\mod 2^b+1$$
Since, $2^b\equiv -1\mod 2^b+1$,
$$2^k=2^{ab}=(2^b)^a\equiv (-1)^a\mod 2^b+1\equiv -1\mod 2^b+1$$
If $k>0$ is not a power of $2$, then we can take $a>1$ so that
$$1<2^b+1<2^k+1$$
This means that $2^k+1$ has a positive divisor other than 1 and itself.

Definition. $F_r=2^{2^r}+1$ is called a Fermat number.

Example. $F_0=3$, $F_1=5$, $F_2=17$, $F_3=257$, $F_4=65537$ are primes, so Fermat conjectured that Fermat numbers are prime. But $F_5=2^{32}+1$ is not a prime. It is divisible by $641$. $641$ can be written
\begin{align*} 641&=640+1=2^7\cdot 5+1\\ 641&=625+16=5^4+2^4 \end{align*}
Thus, we have $2^7\cdot 5\equiv -1\mod 641$ and $2^4\equiv -5^4\mod 641$.
Now,
\begin{align*} F_5&=2^{32}+1\\ &=(2^7)^4\cdot 2^4+1\\ &\equiv(2^7)^4(-5^4)+1\mod 641\\ &\equiv -(2^7\cdot 5)^4+1\mod 641\\ &\equiv 0\mod 641 \end{align*}
No other Fermat numbers that are prime have been found yet.

One may expect that for any composite number $n$, there exists a base $a$ for which Fermat’s theorem can be used to show that $n$ is composite. However, there are composite numbers called Carmichael numbers, which are pseudo primes to every base. That is, $n$ is composite, but $a^{n-1}\equiv 1\mod n$, whenever $(a,n)=1$. The smallest Carmichael number is $561=3\cdot 11\cdot 17$. It was proved in 1994 that there are infinitely many Carmichael numbers.


There is a primality test just for Mersenne numbers. It is called the Lucas-Lehmer test. Define a sequence $S_1,S_2,\cdots$ by
\begin{align*} S_1&=4,\\ S_n&=S_{n-1}^2-2\ \mbox{for}\ n>1. \end{align*}
For example, $S_2=4^2-2=14$, $S_3=14^2-2=194$. Then the Lucas-Lehmer test is

Theorem. If $p$ is an odd prime then $M_p=2^p-1$ is prime if and only if $S_{p-1}\equiv 0\mod M_p$.

For example, take $p=7$ and show that $M_p=2^7-1=127$ is a prime.
\begin{align*} S_3&=194\equiv 67\mod 127\\ S_4&\equiv 67^2-2\mod 127\equiv 42\mod 127\\ S_5&\equiv 42^2-2\mod 127\equiv 111\mod 127\\ S_6&\equiv 111^2-2\mod 127\equiv 0\mod 127. \end{align*}

There is also a primality test just for Fermat numbers. It is called Pépin’s test.

Theorem. If $n>0$, the Fermat number $F_n=2^{2^n}+1$ is prime if and only if $3^{\frac{F_n-1}{2}}\equiv -1\mod F_n$.

Example. Use Pépin’s test to show that $F_3=257$ is prime.

Solution. $3^{\frac{257-1}{2}}=3^{128}=3^{2^7}$.
\begin{align*} 3^2&=9\\ 3^4&=9^2=81\\ 3^8&=81^2\equiv 136\mod 257\\ 3^{16}&\equiv 136^2\mod 257\equiv 249\mod 257\\ 3^{32}&\equiv 249^2\mod 257\equiv 64\mod 257\\ 3^{64}&\equiv 64^2\mod 257\equiv 241\mod 257\\ 3^{128}&\equiv 241^2\mod 257\equiv 256\mod 257\equiv -1\mod 257 \end{align*}

RSA Protocol, A Public Key Cryptosystem

As we discussed here, the enciphering and deciphering transformations involve secret keys called encryption and decryption keys. These keys are shared only by the transmitter and the receiver, so they are called private keys. An Achilles heel of cryptographic communication systems that use private keys is that secret communications can take place only after a key is transmitted in secret over a totally secure communication channel. This is referred to as the CATCH 22 of cryptography. First there are two most famous people when it comes to cryptography and communications, way even more famous than the great mathematician Claude E. Shannon formerly at Bell Lab, who is dubbed the father of modern communication theory. They are Alice and Bob. Literally any books and papers on cryptography, communications, or even quantum computing would mention their names.

CATCH 22: Before the transmitter and the receiver Alice and Bob can communicate in secret, they must first communicate in secret.

There is more to this CATCH 22.

CATCH 22a: Even if Alice and Bob succeed in tansmitting their key over a secure communication channel, there is no mechanism that guarantees with full certainty that no one is able to eavesdrop.

To remedy this issue with private key sharing, public key cryptosystems were invented. In a public key cryptosystem, Alice makes the encryption key available to the public, so anyone can encrypt messages using the key but only Alice can decrypt the messages, and Bob can do the same. So, there is no need to secretly communicate to share keys prior to communicating in secret. Using such trapdoor function (also called one-way function ) is crucial for public key cryptosystems. The most famous public key cryptosytem is RSA (named after its inventors Ronald Rivest, Adi Shamir and Leonard Adleman) which was (officially) introduced in 1978 (unofficially, it was already invented years before they did by researchers secretly working in the Communications-Electronics Security Group at GCHQ, the British counterpart of NSA). RSA is based on the difficulty of factoring. Here is a basic idea of the textbook RSA cryptosystem.

Suppose that Alice wants to receive secret messages. She selects two large primes $p$ and $q$ and then forms the product $n=pq$. Alice also chooses an integer $E<n$ coprime to $\phi(n)=(p-1)(q-1)$. The integers $n$ and $E$ are made public and constitute the encryption key, so everybody can encrypt messages. We assume that our message is a sequence of positive integers $T<n$. Bob encypts each integer $T$ as $C\equiv T^E\mod n$, and send the ciphertext to Alice. Alice can decrypt the ciphertext as follows: Let $m=\phi(n)=(p-1)(q-1)$. Since $(E,m)=1$, there exists $D\in\mathbb{Z}/m\mathbb{Z}$ such that $DE\equiv 1\mod m$ or equivalently $DE=1$ in $\mathbb{Z}/m\mathbb{Z}$. Thus, $DE=\phi(n)k+1$ for some $k\in\mathbb{Z}$. The prime numbers $p$ and $q$ are supposedly huge, so $T$ being one of them is improbable and we can safely assume that $(T,n)=1$. Then by the Euler-Fermat theorem, we have $T^{\phi(n)}\equiv 1\mod n$. Finally, we see that the paintext $T$ is recovered by computing $C^D$ modulo $n$. \begin{align*} C^D&\equiv (T^E)^D\mod n\\ &\equiv T^{ED}\mod n\\ &\equiv T^{\phi(n)k+1}\mod n\\ &\equiv T\mod n \end{align*}
Now, assume that Celia is is eavesdropping. She knows the public key $(n,E)$. She also knows the ciphertext $C$. However, in order to decrypt $C$ she needs the decryption key $D$, the inverse of $E$ in $\mathbb{Z}/m\mathbb{Z}$ where $m=\phi(n)=(p-1)(q-1)$, and she needs to know the prime factors $p$ and $q$ of $n$ in order to compute $D$.

Example. Let $n=1073$, $p=29$, and $q=37$. The public key is $(n,E)=(1073,25)$. The plaintext $T$ is “MISS PIGGY” and the numerical equivalents are
$$13\ 9\ 19\ 19\ 27\ 16\ 9\ 7\ 7\ 25$$
Enciphering $T$: $C\equiv T^{25}\mod n=1073$
$$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|}
\hline
T & 13 & 9 & 19 & 19 & 27 & 16 & 9 & 7 & 7 & 25\\
\hline
C & 671 & 312 & 901 & 901 & 656 & 1011 & 312 & 922 & 922 & 546\\
\hline
\end{array}$$
Deciphering: Compute $D$, the inverse of 25 modulo $\phi(n)=(p-1)(q-1)=1008$. Since $1=25\cdot 121+1008(-3)$, $D=121$ is the decipering key and the paintext $T$ is retrieved by
$$C^{121}\equiv T\mod n=1073$$
For example, $671^{121}\equiv 13\mod 1073$.

Remark. In practical sense, there is a problem with the above example. If we encrypt the message letter for letter, the cryptosystem can be easily broken by frequency analysis. This problem can be remedied if we encrypt the message in blocks of about 100 letters. The chance that any two message blocks of 100 letters coincide is practically none.

Remark. Of course in practice we wouldn’t be using such small key sizes shown in the above example. Here is an example of practically usable keys: \begin{align*}E&=198778434778923476493186960677288219412181391439289859633481192496758260063\\n&=2555942223974189195272679217974555794863882765417656848825203904501780583533\\D&=1774134686349434385697798709945131299127046225553502224286279549365461014283\end{align*} Messages can be encypted more securely if one uses key sizes of about 200 digits in addition to using many-letter message blocks suggested in the above remark.

Remark. As mentioned earlier, the presumed security of RSA is based on the belief that factoring is a difficult problem to crack on a computer. So, how difficulty can it be? Umesh Vazirani, a computer scientist at UC Berkely, summed it up using the example of factoring a 2,000-digit number back in 1994: “It’s not just a case that all the computers in the world today would be unable to factor that number. Even if you imagine that every particle in the universe was a computer and was computing at full speed for the entire life of the universe, that would be insufficient to factor that number.”

Remark. RSA protocol begins with picking prime numbers to make the key. We know there are infinitely many prime numbers although we don’t know if that is still the case for a particular type of prime numbers. For example, it is an open problem in number theory if there are infinitely many Mersenne prime numbers, i.e. primes numbers of the form $2^n-1$. (The conjecture is there are infinitely many Mersenne prime numbers.) The largest prime number known to date is a Mersenne prime $2^{82589933}-1$. Especially from computational perspective, it would be interesting to see how prime numbers are distributed, and prime number theorem gives us a pretty good picture about how many prime numbers are there within a certain bound. Let $\pi(x)$ denotes the number of prime numbers less than or equal to $x$. $\pi(x)$ is called prime-counting function. Then Prime Number Theorem sates that $$\lim_{x\to\infty}\frac{\pi(x)}{\left[\frac{x}{\log(x)}\right]}=1$$ which can be restated as $$\pi(x)\sim\frac{x}{\log(x)}$$ for large $x$. This can be interpreted that for large $x$, the probability that a random integer not greater than $x$ is prime is very closed to $\frac{1}{\log(x)}$. This theorem was proved independently by Jacques Hadamard and Charles Jean de la Vallée Pousin in 1896. For example, if one wants to know how many prime numbers there are between $\frac{x}{2}$ and $x$ for a given lager number $x$, we can get an estimate from prime number theorem: \begin{align*}\pi(x)-\pi\left(\frac{x}{2}\right)&\sim\frac{x}{\log(x)}-\frac{x}{2\log\left(\frac{x}{2}\right)}\\&\sim\frac{x}{\log(x)}-\frac{x}{2\log(x)}\ (\mbox{because $x$ is large})\\&=\frac{x}{2\log(x)}\end{align*} Figure 1 shows the comparison between the asymptotic distribution of prime number between $\frac{x}{2}$ and $x$ (in blue), and its approximation (in red).

Figure 1. Asymptotic distribution of prime numbers between x/2 and x

For $x=100000$, using the approximation $\frac{x}{2\log(x)}$ there are 4343 estimated prime numbers between 50000 and 100000, so there are plentiful of them to choose from.

Enciphering Matrices

Suppose we have an $N$-letter alphabet and want to send digraphs as our message units. As we have seen here, we can let each digraph correspond to an integer considered modulo $N^2$, i.e. to an element of $\mathbb{Z}/N^2\mathbb{Z}$. An alternate possibility is to let each digraph correspond to a vector $\begin{pmatrix}
x\\
y
\end{pmatrix}$ with $x,y\in\mathbb{Z}/N\mathbb{Z}$. For example, if we are using the 26-letter alphabet A- Z with numerical equivalents 0-25, respectively, then the digraph NO corresponds to the vector $\begin{pmatrix}
13\\
14
\end{pmatrix}$. We consider the $xy$-plane as the finite lattice $\mathbb{Z}/N\mathbb{Z}\times\mathbb{Z}/N\mathbb{Z}=(\mathbb{Z}/N\mathbb{Z})^2$.

Let $M_2(\mathbb{Z}/N\mathbb{Z})$ denote the set of all $2\times 2$ matrices with entries in $\mathbb{Z}/N\mathbb{Z}$. Let $A=\begin{pmatrix}
a & b\\
c & d \end{pmatrix}\in M_2(\mathbb{Z}/N\mathbb{Z})$ and suppose that $D=\det(A)=ad-bc\in(\mathbb{Z}/N\mathbb{Z})^*$. Let $D^{-1}$ denote the multiplicative inverse of $D$ in $\mathbb{Z}/N\mathbb{Z}$. Then \begin{align*}
\begin{pmatrix}
D^{-1}d & -D^{-1}b\\
-D^{-1}c & D^{-1}a
\end{pmatrix}\begin{pmatrix}
a & b\\
c & d
\end{pmatrix}&=\begin{pmatrix}
D^{-1}(da-bc) & 0\\
0 & D^{-1}(-cb+ad)
\end{pmatrix}\\
&=\begin{pmatrix}
1 & 0\\
0 & 1
\end{pmatrix}
\end{align*}
Similarly,
$$\begin{pmatrix}
a & b\\
c & d
\end{pmatrix}\begin{pmatrix}
D^{-1}d & -D^{-1}b\\
-D^{-1}c & D^{-1}a
\end{pmatrix}=\begin{pmatrix}
1 & 0\\
0 & 1
\end{pmatrix}$$
Thus, we see that
\begin{equation}
\label{eq:invmat}
A^{-1}=\begin{pmatrix}
D^{-1}d & -D^{-1}b\\
-D^{-1}c & D^{-1}a
\end{pmatrix}
\end{equation}

Example. Find the inverse of $A=\begin{pmatrix}
2 & 3\\
7 & 8
\end{pmatrix}\in M_2(\mathbb{Z}/26\mathbb{Z})$.

Solution. $D=16-21=-5\equiv 21\mod 26$. Since $(21,26)=1$, there exists $21^{-1}\in\mathbb{Z}/26\mathbb{Z}$. Remember that finding $21^{-1}\in\mathbb{Z}/26\mathbb{Z}$ is equivalent to solving the diophantine equation $21x+26y=1$.
\begin{align*} 26&=21\cdot 1+5\\ 21&=5\cdot 4+1 \end{align*}
and
\begin{align*} 1&=21-5\cdot 4\\ &=21-(26-21)4\\ &=5\cdot 21+26(-4) \end{align*}
Thus, we find $21^{-1}\equiv 5\mod 26$.
\begin{align*} A^{-1}&=\begin{pmatrix} 5\cdot 8 & -5\cdot 3\\ -5\cdot 7 & 5\cdot 2 \end{pmatrix}\\ &=\begin{pmatrix} 40 & -15\\ -35 & 10 \end{pmatrix}\\ &\equiv\begin{pmatrix} 14 & 11\\ 17 & 10 \end{pmatrix}\mod 26 \end{align*}

Let $A=\begin{pmatrix}
a & b\\
c & d
\end{pmatrix}\in M_2(\mathbb{Z}/N\mathbb{Z})$. Then
$$\begin{pmatrix}
x’\\
y’
\end{pmatrix}=A\begin{pmatrix}
x\\
y
\end{pmatrix}=\begin{pmatrix}
a & b\\
c & d
\end{pmatrix}\begin{pmatrix}
x\\
y
\end{pmatrix}=\begin{pmatrix}
ax+by\\
cx+dy
\end{pmatrix}$$
This gives a linear map or a linear transformation from vectors to vectors.

We introduce the following theorem without proof.

Theorem. Let $A=\begin{pmatrix}
a & b\\
c & d
\end{pmatrix}\in M_2(\mathbb{Z}/N\mathbb{Z})$ and set $D=ad-bc$ as before. Then the following are equivalent:

  1. $(D,N)=1$.
  2. $A$ has an inverse matrix.
  3. if $x$ and $y$ are not both $0$ in $\mathbb{Z}/N\mathbb{Z}$, then $A\begin{pmatrix}x\\y \end{pmatrix}\ne\begin{pmatrix}0\\0\end{pmatrix}$.
  4. $A$ gives a one-to-one correspondence of $(\mathbb{Z}/N\mathbb{Z})^2$ with itself.

Example. Solve the following systems of simultaneous congruences.

  1. \begin{align*}2x+3y&\equiv 1\mod 26\\ 7x+8y&\equiv 2\mod 26 \end{align*}
  2. \begin{align*} x+3y&\equiv 1\mod 26\\ 7x+9y&\equiv 2\mod 26 \end{align*}
  3. \begin{align*} x+3y&\equiv 1\mod 26\\ 7x+9y&\equiv 1\mod 26 \end{align*}

Solution.

  1. The system of simultaneous congruences can be written as the matrix congruence $AX\equiv B\mod 26$ or as the matrix equation $AX=B$ in $\mathbb{Z}/26\mathbb{Z}$, where $$A=\begin{pmatrix}2 & 3\\7 & 8\end{pmatrix},\ X=\begin{pmatrix}x\\y\end{pmatrix},\ B=\begin{pmatrix}1\\2\end{pmatrix}$$ $D=16-21=-5$ and $(-5,26)=1$, so by above Theorem, we see that $A^{-1}$ exists. In fact, we have already found $A^{-1}$ in the previous example. The solution $X$ is given by \begin{align*} X&\equiv A^{-1}B\mod 26\\&\equiv\begin{pmatrix} 14 & 11\\ 17 & 10 \end{pmatrix}\begin{pmatrix} 1\\ 2 \end{pmatrix}\mod 26\\ &\equiv\begin{pmatrix} 10\\ 11 \end{pmatrix}\mod 26 \end{align*}
  2. The matrix in parts 2 and 3 does not have an inverse matrix in $\mathbb{Z}/26\mathbb{Z}$ because $D=9-21=-12$ and $(-12,26)=2\ne 1$. However, it does have an inverse matrix in $\mathbb{Z}/13\mathbb{Z}$ and the solution is given by \begin{align*} \begin{pmatrix} x\\ y \end{pmatrix}&\equiv\begin{pmatrix} 9 & 10\\ 6 & 1 \end{pmatrix}\begin{pmatrix} 1\\ 2 \end{pmatrix}\mod 13\\ &\equiv\begin{pmatrix} 3\\ 8 \end{pmatrix}\mod 13 \end{align*} We now examine the possibilities of having solutions in $\mathbb{Z}/26\mathbb{Z}$. Since $x\equiv 3\mod 13$ and $y\equiv 8\mod 13$, $x$ and $y$ can be written as $x=3+13k_1$ and $y=8+13k_2$ for some $k_1,k_2\in\mathbb{Z}$. \begin{align*} x+3y&\equiv 1+13(k_1+k_2)\mod 26\\ &\equiv 1\mod 26 \end{align*} if and only if $k_1+k_2$ is even. Suppose that $k_1+k_2$ is even, so that $x+3y\equiv 1\mod 26$. \begin{align*} 7x+9y&\equiv 15+13(k_1+k_2)\mod 26\\ &\equiv 15\mod 26\\ &\not\equiv 2\mod 26 \end{align*} since $k_1+k_2$ is even. Hence, there are no solutions for part 2.
  3. Similarly to part 2, the solution in $\mathbb{Z}/13\mathbb{Z}$ is given by \begin{align*} \begin{pmatrix} x\\ y \end{pmatrix}&\equiv\begin{pmatrix} 9 & 10\\ 6 & 1 \end{pmatrix}\begin{pmatrix} 1\\ 1 \end{pmatrix}\mod 13\\ &\equiv\begin{pmatrix} 6\\ 7 \end{pmatrix}\mod 13 \end{align*} We test the possibilities of solutions in $\mathbb{Z}/26\mathbb{Z}$. Since $x\equiv 6\mod 13$ and $y\equiv 7\mod 13$, $x$ and $y$ can be written as $x=6+13k_1$ and $y=7+13k_2$ for some $k_1,k_2\in\mathbb{Z}$. \begin{align*} x+3y&\equiv 1+13(k_1+k_2)\mod 26\\ &\equiv 1\mod 26 \end{align*} if and only if $k_1+k_2$ is even. Now we assume that $k_1+k_2$ is even, so that we have $x+3y\equiv 1\mod 26$. \begin{align*} 7x+9y&\equiv 1+13(k_1+k_2)\mod 26\\ &\equiv 1\mod 26 \end{align*} since $k_1+k_2$ is even. Hence, there are solutions for part 3. Let $0\leq x,y\leq 25$. Then $k_1, k_2=0,1$. Since $k_1+k_2$ is even, we have either $k_1=k_2=0$ or $k_1=k_2=1$. Therefore, the solutions are \begin{align*} \begin{pmatrix} x\\ y \end{pmatrix}&\equiv\begin{pmatrix} 6\\ 7 \end{pmatrix}\mod 26\\ &\mbox{or}\\ \begin{pmatrix} x\\ y \end{pmatrix}&\equiv\begin{pmatrix} 19\\ 20 \end{pmatrix}\mod 26 \end{align*}

Let $A=\begin{pmatrix}
a & b\\
c & d
\end{pmatrix}
\in M_2(\mathbb{Z}/N\mathbb{Z})$ with $(D,N)=1$ where $D=ad-bc$ be an enciphering matrix. Then it defines the enciphering transformation $C=AP$ where $P=\begin{pmatrix} x\\ y
\end{pmatrix}$ and $C=\begin{pmatrix}
x’\\
y’
\end{pmatrix}$ are plaintext message unit and ciphertext, respectively. That is,
$$\begin{pmatrix}
x’\\
y’
\end{pmatrix}=\begin{pmatrix}
a & b\\
c & d
\end{pmatrix}\begin{pmatrix} x\\ y
\end{pmatrix}$$
The deciphering transformation is then given by $P=A^{-1}C$, i.e.
$$\begin{pmatrix} x\\ y
\end{pmatrix}=\begin{pmatrix}
D^{-1}d & -D^{-1}b\\
-D^{-1}c & D^{-1}a
\end{pmatrix}\begin{pmatrix} x’\\ y’
\end{pmatrix}$$

Example. Wokring in the 26-letter alphabet, use the matrix $A=\begin{pmatrix} 2 & 3\\ 7 & 8 \end{pmatrix}\in M_2(\mathbb{Z}/26\mathbb{Z})$ to encipher the message unit “NO”. $$AP=\begin{pmatrix} 2 & 3\\ 7 & 8 \end{pmatrix}\begin{pmatrix} 13\\ 14 \end{pmatrix}=\begin{pmatrix} 68\\ 103 \end{pmatrix}\equiv\begin{pmatrix} 16\\ 21 \end{pmatrix}\mod 26 $$ Thus the ciphertext $C=AP$ is “QV”.

Remark. To encipher a plaintext sequence of $k$ digraphs $P=P_1P_1P_3\cdots P_k$, we can write the $k$ vectors as columns of a $2\times k$ matrix and then multiply $A$ by the $2\times k$ matrix $P$ to get a $2\times k$ matrix $C=AP$ of coded digraph vectors.

Example. Encipher “NOANSWER”.

Solution. The numerical equivalence of “NOANSWER” is $$\begin{pmatrix} 13\\ 14 \end{pmatrix}\begin{pmatrix} 0\\ 13 \end{pmatrix}\begin{pmatrix} 18\\ 22 \end{pmatrix}\begin{pmatrix} 4\\ 17 \end{pmatrix}$$ \begin{align*} C&=AP\\ &=\begin{pmatrix} 2 & 3\\ 7 & 8 \end{pmatrix}\begin{pmatrix} 13 & 0 & 18 & 4\\ 14 & 13 & 22 & 17 \end{pmatrix}\\ &=\begin{pmatrix} 68 & 39 & 102 & 59\\ 203 & 104 & 302 & 164 \end{pmatrix}\\ &\equiv\begin{pmatrix} 16 & 13 & 24 & 7\\ 21 & 0 & 16 & 8 \end{pmatrix}\mod 26 \end{align*} i.e. the coded message is “QVNAYQHI”.

Example. In the situation of previous example, decipher the ciphertext “FWMDIQ”.

Solution.
\begin{align*} P&=A^{-1}C\\ &=\begin{pmatrix} 14 & 11\\ 17 & 10 \end{pmatrix}\begin{pmatrix} 5 & 12 & 8\\ 22 & 3 & 16 \end{pmatrix}\\ &=\begin{pmatrix} 312 & 201 & 288\\ 305 & 234 & 296 \end{pmatrix}\\ &\equiv\begin{pmatrix} 0 & 19 & 2\\ 19 & 0 & 10 \end{pmatrix}\mod 26 \end{align*}
Hence, the deciphered plaintext is “ATTACK”.