Let $(M^n, ds^2)$ denote an $n$-dimensional Riemannian manifold. The Riemannian metric $ds^2$, in general, is given, in terms of local coordinates $x’_1,\cdots,x’_n$, by $$ds^2=\sum_{i,j}h_{ij}(x’_1,\cdots,x’_n)dx’_idx’_j$$ Since the metric tensor $(h_{ij})$ is symmetric, by the finite dimensional spectral theorem, there exists an orthogonal matrix $O$ such that $O^t(h_{ij})O=(g_{ii})$. Let $$\begin{pmatrix}dx_1\\\vdots\\dx_n\end{pmatrix}=O^t\begin{pmatrix}dx’_1\\\vdots\\dx’_n\end{pmatrix}$$ Then \begin{align*}ds^2&=\sum_{i,j}h_{ij}(x’_1,\cdots,x’_n)dx’_idx’_j\\&=\begin{pmatrix}dx’_1\\\vdots\\dx’_n\end{pmatrix}^t(h_{ij})\begin{pmatrix}dx’_1\\\vdots\\dx’_n\end{pmatrix}\\&=\left[O\begin{pmatrix}dx_1\\\vdots\\dx_n\end{pmatrix}\right]^tO(g_{ii})O^t\left[O\begin{pmatrix}dx_1\\\vdots\\dx_n\end{pmatrix}\right]\\&=\begin{pmatrix}dx_1\\\vdots\\dx_n\end{pmatrix}^t(g_{ii})\begin{pmatrix}dx_1\\\vdots\\dx_n\end{pmatrix}\\&=\sum_{i}g_{ii}dx_i^2\end{align*} Therefore, without loss of generality, we may assume that a Riemannian metric is of the diagonal form $$ds^2=\sum_{i}g_{ii}(x_1,\cdots,x_n)dx_i^2$$

# Author Archives: Sung Lee

# An Easier Way to Solve Quadratic Equations

The Greek mathematician Heron (100? B.C.) is credited by some as the first (historically known) person who studied quadratic equations. He considered questions like: “There is a square. The sum of its area and circumference is 896. What is its length (or width)?” But some simple quadratic equations were already known to Mesopotamians 4,000 years ago. Along with linear equations, quadratic equations are among the oldest subjects of mathematics that have been studied. So, it is amazing and even somewhat amusing that there is still something new to find out about the solution of quadratic equations. I recently came across an article on the arXiv titled “A Simple Proof of the Quadratic Formula” by Po-Shen Loh at Carnegie Mellon. His proof also provides an alternative way to solve quadratic equations without dealing with factoring quadratic polynomials or having to remember the quadratic formula. I believe students will find it much easier than the conventional methods (factoring, completing the square, or the quadratic formula). First, here is the proof. Any quadratic equation can be written as a monic equation $$x^2+bc+c=0$$ Suppose that its solutions are $\alpha$ and $\beta$. Then $$x^2+bx+c=(x-\alpha)(x-\beta)=x^2-(\alpha+\beta)x+\alpha\beta=0$$ By comparing the coefficients, we have $\alpha+\beta=-b$ and $\alpha\beta=c$. Thus, $\alpha$ and $\beta$ can be written as $u-\frac{b}{2}$ and $-u-\frac{b}{2}$, respectively, where $u$ is an unknown quantity that is to be determined. Since $\alpha\beta=c$, we have $u^2-\frac{b^2}{4}=-c$. Solving this equation for $u$, we obtain $u=\pm\frac{\sqrt{b^2-4ac}}{2}$ and hence $\alpha$ and $\beta$ are given by $-\frac{b}{2}\pm\frac{\sqrt{b^2-4ac}}{2}$.

*Example*. The above question by Heron is equivalent to solving the quadratic equation $x^2+4x-896=0$. $\frac{b}{2}=2$ so we have $u^2-4=896$ i.e. $u^2=900$. Therefore, $u=\pm 30$ and the two solutions are $x=-2\pm 30$. Since $x>0$, the answer is $x=28$.

The method still applies to quadratic equations that have complex solutions. For example,

*Example.* Consider $x^2+x+1=0$. $\frac{b}{2}=\frac{1}{2}$, so we have the equation $u^2-\frac{1}{4}=-1$ i.e. $u^2=-\frac{3}{4}$. Therefore, $u=\pm\frac{\sqrt{3}i}{2}$ and the solutions are $x=-\frac{1}{2}\pm\frac{\sqrt{3}i}{2}$.

# Shannon Entropy

Let $X$ be a discrete random variable and suppose it takes values $x_1,\cdots,x_n$. Let $p$ be the probability distribution function (pdf in short) of the random variable $X$. For each $i=1,\cdots,n$, denote by $p_i$ $p(x_i)$, i.e.

$$p_i=P\{X=x_i\}$$

Here is a question. Is there a unique way to measure the amount or degree of uncertainty represented by the probability distribution? The answer is affirmative and Claude E. Shannon, in his landmark paper [2], was able to obtain such measurement of uncertainty. It is denoted by $H(p_1,\cdots,p_n)$ and called the *entropy*. Regardless of the name, there is no reason to expect that this notion of entropy is related to thermodynamic entropy because it is obtained independently from any physical hypotheses. Remarkably however, it coincides with thermodynamic entropy except for the presence of Boltzmann constant. To distinguish it from thermodynamic entropy, it is often called the *information entropy* or *Shannon entropy*. Shannon imposed the following three conditions on $H$:

- $H$ is a continuous function of the $p_i$.
- If all $p_i$ are equal, the quantity $$A(n)=H\left(\frac{1}{n},\cdots,\frac{1}{n}\right)$$ is a monotone increasing function of $n$. An increase in $n$ results in a decrease in the probability, therefore the uncertainty (entropy) increases.
- Instead of considering of the events $x_1,\cdots,x_n$ with the respective probabilities $p_1,\cdots,p_n$, we may group the first $k$ of them $x_,\cdots,x_k$ as a single composite event and is given the probability $w_1=p_1+\cdots+p_k$. Then next $m$ events $x_{k+1},\cdots,x_{k+m}$ are gouped together as a single composite event and is given the probability $w_2=p_{k+1}+\cdots+p_{k+m}$, and so on so forth. The entropy regarding these composite events is $H(w_1,\cdots,w_r)$. In addition, since the composite events have occurred, the events $x_1,\cdots,x_k$ should be given the conditional probabilities $\frac{p_1}{w_1},\cdots,\frac{p_k}{w_1}$, the events $x_{k+1},\cdots,x_{k+m}$ should be given the conditional probabilities $\frac{p_{k+1}}{w_2},\cdots,\frac{p_{k+m}}{w_2}$, and so on so forth. In order for our information measurement is to be consistent, the ultimate state of knowledge we obtained this way should be the same as one obtained by considering the event $x_1,\cdots,x_n$ with probabilities $p_1,\cdots,p_n$. The argument is summed up as \begin{equation}\begin{aligned}H(p_1,\cdots,p_n)=&H(w_1,\cdots,w_r)+w_1H\left(\frac{p_1}{w_1},\cdots,\frac{p_k}{w_1}\right)\\&+w_2H\left(\frac{p_{k+1}}{w_2},\cdots,\frac{p_{k+m}}{w_2}\right)+\cdots\end{aligned}\label{eq:composition}\end{equation}The appearance of the weighting factors $w_i$ is of course due to the fact that, for example, we run into the additional uncertainty $H\left(\frac{p_1}{w_1},\cdots,\frac{p_k}{w_1}\right)$ only with the probability $w_1$. \eqref{eq:composition} is called the
*composition law*.

*Example*. $H\left(\frac{1}{2},\frac{1}{3},\frac{1}{6}\right)=H\left(\frac{1}{2},\frac{1}{2}\right)+\frac{1}{2}H\left(\frac{2}{3},\frac{1}{3}\right)$ by grouping the events with probabilities $\frac{1}{3}$ and $\frac{1}{6}$ as a composite event.

*Example*. Let us consider three events $x_1,x_2,x_3$ with respective probabilities $p_1=\frac{3}{9}$, $p_2=\frac{4}{9}$, $p_3=\frac{2}{9}$. We may view these three events as three composite events resulted from nine events with equal probability $\frac{1}{9}$ by grouping the first 3 events, then the next 4 events, and finally the remaining 2 events together. Using \eqref{eq:composition}, we obtain

\begin{equation}

\label{eq:entropy}

H\left(\frac{3}{9},\frac{4}{9},\frac{2}{9}\right)+\frac{3}{9}A(3)+\frac{4}{9}A(4)+\frac{2}{9}A(2)=A(9)

\end{equation}

In exactly the same manner, if we write $p_i=\frac{n_i}{\sum_in_i}$, \eqref{eq:entropy} is readily generalized to

\begin{equation}

\label{eq:entropy2}

H(p_1,\cdots,p_n)+\sum_ip_iA(n_i)=A\left(\sum_in_i\right)

\end{equation}

As a special case, if we choose $n_i=m$, then \eqref{eq:entropy2} becomes

\begin{equation}

\label{eq:entropy3}

A(m)+A(n)=A(mn)

\end{equation}

Obviously,

\begin{equation}

\label{eq:entropy4}

A(n)=K\ln n

\end{equation}

(where $K>0$ due to condition 2) is a solution. But is that a unique solution? To see that, let us consider a continuum extension $A(x)$ of $A(n)$ which is at least twice differentiable. Then \eqref{eq:entropy3} can be written as the functional equation

\begin{equation}

\label{eq:entropy5}

A(x)+A(y)=A(xy)

\end{equation}

Differentiating \eqref{eq:entropy5} with respect to $x$, we obtain

\begin{equation}

\label{eq:entropy6}

A'(x)=yA'(xy)

\end{equation}

Differentiating \eqref{eq:entropy6} with respect to $y$, we obtain

\begin{equation}

\label{eq:entropy7}

0=A'(xy)+xyA^{\prime\prime}(xy)

\end{equation}

By introducing a new variable $z=xy$, \eqref{eq:entropy7} can be written as the second-order linear differential equation

$$zA^{\prime\prime}(z)+A'(z)=0$$

whose solution is

$$A(z)=K\ln(Cz)$$

That is, we have $A(n)=K\ln(Cn)$ where $K>0$ and $C>0$ are constants. Hence, $A(n)$ is unique up to constant multiples. Since entropy is a qualitative measurement of uncertainty rather than a quantitative one, we may choose $C=1$. For the same reason, we may choose $K=1$ as well but for now we will keep the constant multiple $K$. Substituting \eqref{eq:entropy4} into \eqref{eq:entropy2}, we finally obtain Shannon entropy

\begin{equation}

\label{eq:shannon}

\begin{aligned}

H(p_1,\cdots,p_n)&=K\ln\left(\sum_in_i\right)-K\sum_ip_i\ln n_i\\

&=-K\sum_ip_i\ln\left(\frac{n_i}{\sum_in_i}\right)\\

&=-K\sum_ip_i\ln p_i

\end{aligned}

\end{equation}

*Definition*. There is a generalization of Shannon entropy called the *Rényi entropy*

\begin{equation}

\label{eq:renyi}

H_r(p_1,\cdots,p_n)=\frac{1}{1-r}\ln\left(\sum_ip_i^r\right)

\end{equation}

for $0<r<\infty$ and $r\ne 1$.

*Proposition* 1. $$\lim_{r\to 1}H_r(p_1,\cdots,p_n)=-\sum_ip_i\ln p_i$$

i.e., Shannon entropy (with $K=1$) is the limit of the Rényi entropy as $r\to 1$.

*Proof*. Let $f(r)=\ln\left(\sum_ip_i^r\right)$. Then it is readily seen that the limit on the left is simply $f'(1)$.

*Definition*. *Tsallis entropy* is another generalization of Shannon entropy. It is defined by

\begin{equation}

\label{eq:tsallis}

S_r(p_1,\cdots,p_n)=\frac{1}{r-1}\left(1-\sum_ip_i^r\right)

\end{equation}

One can easily show that the limit of Tsallis entropy as $r\to 1$ is Shannon entropy (with $K=1$) similarly to the proof of proposition 1.

*References*:

- E. T. Jaynes, Information Theory and Statistical Mechanics, The Physical Review, Vol. 106, No. 4, 620-630, May 15, 1957
- Claude E. Shannon, A mathematical theory of communication, Bell Sys. Tech. Journal, 27: 379-423, 623-656, 1948

# Computing for Number Theory 1: GCD, LCD, Linear Diophantine Equations

Here, we study some computing problems that are related to Divisibility and the Division Algorithm.

#### Maxima

**GCD:** Maxima has a function to compute the GCD (Greatest Common Divisor) `gcd(a,b)`

.

*Example*. Find the GCD of 158 and 188.

`(%i1) gcd(158,188);`

`(%o1) 2`

**LCM:** Maxima has a function to compute the LCM (Least Common Multiple) `lcm(express_1,...express_n)`

. This function needs to be loaded by running the command `load("functs")`

.

*Example*. Find the LCD of 6, 8, 15, 32.

`(%i1) load("functs")$`

`(%i2) lcm(6,8,15,32);`

`(%o2) 480`

Bézout’s Lemma (see here) says that given two integers $a,b$, there is an integer solution $(x,y)$ to the equation $ax+by=(a,b)$. The Maxima command `gcdex`

implements the Euclidean algorithm. `gcdex(a,b)`

returns `[x, y, u]`

where $u=(a,b)$ and $ax+by=u$. So, one can find a solution of the equation $ax+by=(a,b)$ using `gcdex`

.

*Example*. Solve $2=158x+188y$.

We saw earlier that $(158,188)=2$.

`(%i2) gcdex(158,188);`

`(%o2) [25, - 21, 2]`

Hence, $x=25$ and $y=-21$.

One can also use the command `igcdex`

. `gcdex`

and `igcdex`

both implement the Euclidean algorithm. The only difference between them is that the argument of `igcdex`

must be integers while the arguments of `gcdex`

can be also polynomials. For instance,

(%i7) gcdex(x^2 + 1,x^3 + 4);

(%o7)/R/ [$\displaystyle\frac{x^2+4x-1}{17}$, $\displaystyle\frac{x+4}{17}$, 1]

#### Python

**GCD:** `Algorithm`

- The name of the Python code is
`gcd(a,b)`

and it implements the Euclidean algorithm to find the GCD of two integers $a$ and $b$. - If
`b == 0`

then`gcd(a,b)`

returns`0`

because every nonzero integer divides 0. - Otherwise, it returns
`gcd(b,a % b)`

. % is the*modulo*or*remainder operator*.`a % b`

returns the remainder when $a$ is divided by $b$ if $a\geq b$ or $a$ if $a<b$ because $a=b\cdot 0+a$. For example,`12345 % 2`

returns 1. - The loop continues until
`b==0`

.

Use your favorite editor to write the following python code (it is called a Python module) and save it as, for example, gcd.py. The Python interface IDLE has own editor. I also recommend Emacs and Geany for Python editors if you are running Python in command shell with iPython.

# Computing the GCD of two integers a and b recursively using the Euclidean algorithm def gcd(a,b): if (b == 0): return a return gcd(b, a%b)

On IDLE, the Python module can be run from its pull-down (or drop-down) menu under Run. Its shortcut key is F5. To run it on iPython, first change your current folder location to the one that contains the Python module gcd.py using `cd`

(change directory) command in command shell and then start iPython by typing `ipython`

(you can first start iPython and then change your current folder location as well using `cd`

within iPython) and pressing the enter key. Once iPython is loaded, do

$ ipython Python 3.7.9 (default, Feb 4 2021, 01:17:56) Type 'copyright', 'credits' or 'license' for more information IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: run gcd.py In [2]: gcd(98,56) Out[2]: 14

**The Euclidean Algorithm:** The Euclidean algorithm itself is a perfect computer algorithm. A Python code of the Euclidean algorithm is

# This Python module runs the Euclidean Algorithm to find the GCD of two integers a and b def euclid(a,b): r = a % b while r > 0: print (a, b, r) a, b, r = b, r, b % r return b

Running this Python module (named euclid.py) on iPython shows

$ ipython Python 3.7.9 (default, Feb 4 2021, 01:17:56) Type 'copyright', 'credits' or 'license' for more information IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: run euclid.py In [2]: euclid(98,56) 98 56 42 56 42 14 Out[2]: 14

**LCM:** The LCM $[a,b]$ can be obtained by $\frac{ab}{(a,b)}$ (see here).

`Algorithm`

: The name of the Python code is `lcm(a,b)`

. It computes the LCM of two integers $a$ and $b$.

- Find the GCD of $a$ and $b$ by running the Python program
`gcd(a,b)`

. - Compute the LCM of $a$ and $b$ using the formula
`a*b/gcd(a,b)`

.

Its Python code is

# Compute the LCM of two integers a and b Recursive function to return the GCD of a and b def gcd(a,b): if (b == 0): return a return gcd(b, a%b) # Function to return the LCM of a and b def lcm(a,b): return a*b/gcd(a,b)

Running this Python code (named lcm.py) on iPython shows

In [3]: run lcm.py In [4]: lcm(15,20) Out[4]: 60.0

**Linear Diophantine Equations:** As seen here, a linear diophantine equation

\begin{equation}

\label{eq:lindiophant}

ax+by=c

\end{equation}

is solvable if and only if $(a,b)|c$. If $b|a$ then $(a,b)=b$ and

$$x=0,\ y=\frac{c}{b}$$

is a solution. If $b\not|a$, then write $a=bq+r$ and substitute into \eqref{eq:lindiophant}:

$$(bq+r)x+by=c$$

This can be rearranged as

$$b(qx+y)+rx=c$$

Setting $u=qx+y$ and $v=x$, we obtain the equation

\begin{equation}

\label{eq:lindiophant2}

bu+rv=c

\end{equation}

with smaller coefficients. If we can solve \eqref{eq:lindiophant2}, we recover a solution of the original equation as

$$x=v,\ y=u-qv$$

The process terminates (according to the Euclidean algorithm) with an equation where the second coefficient divides the first. This scheme is embodied in the following Python code

# Solve a linear diophantine equation using the Euclidean algorithm # The function divmod returns a pair of numbers, the quotient and remainder, which we store in q, r. def diosolve(a,b,c): q, r = divmod(a,b) if r == 0: return( [0,c/b] ) else: sol = diosolve( b, r, c ) u = sol[0] v = sol[1] return( [ v, u-q*v ] )

Running this Python code (named diosolve.py) on iPython to solve the linear diophantine equation $12345x+54321y=3$:

In [9]: run diosolve.py In [10]: diosolve(12345, 54321, 3) Out[10]: [3617.0, -822.0]

Note that $(12345,54321)=3$, so of course the diophantine equation is solvable.

*Refernces*.

[1] Jim Carlson, A Short Course in Python for Number Theory, Draft of May 21, 2004

# Primality Tests

We begin by discussing a method of computing $a^d\mod n$, called the *repeated squaring method*, which is efficient enough to require only a handheld calculator using the following example.

*Example*. Let us compute the least residue of $848^{187}\mod 1189$ by using only a handheld calculator. First we convert 187 to the base 2:

\begin{align*} 187&=(10111011)_2\\ &=1\cdot 2^7+0\cdot 2^6+1\cdot 2^5+1\cdot 2^4+1\cdot 2^3+0\cdot 2^2+1\cdot 2+1\\ &=128+32+16+8+2+1 \end{align*}

Let $a=848$ and $n=1189$.

$$\begin{array}{|c|c|}

\hline

k & a^k\mod n\\

\hline

1 & 848\\

\hline

2 & 848^2=719104\equiv 948\mod 1189\\

\hline

4 & 948^2=898704\equiv 1009\mod 1189\\

\hline

8 & 1009^2= 1018081\equiv 297\mod 1189\\

\hline

16 & 297^2=88209\equiv 223\mod 1189\\

\hline

32 & 223^2=49729\equiv 980\mod 1189\\

\hline

64 & 980^2=960400\equiv 877\mod 1189\\

\hline

128 & 877^2=769129\equiv 1035\mod 1189\\

\hline

\end{array}$$

Hence,

\begin{align*} 848^{187}&=848^{128+32+16+8+2+1}\\ &=848^{128}\cdot 848^{32}\cdot 848^{16}\cdot 848^8\cdot 848^2\cdot 848\\ &\equiv 1035\cdot 980\cdot 223\cdot 297\cdot 948\cdot 848\mod 1189\mod 1189\\ &\equiv 190\mod 1189 \end{align*}

Here is our first primality test. Let us recall Fermat Little Theorem: If $n$ is a prime and $n\not|a$, then $a^{n-1}\equiv 1\mod n$. The contrapositive of the above statement can serve as a primality test: if $a^{n-1}\not\equiv 1\mod n$, then $n$ is not a prime or $n|a$.

*Example*. Let us show that $33$ is not a prime using the above primality test. (Of course it is not prime because $33=3\cdot 11$.) $33\not|2$ and

\begin{align*} 2^{33-1}&=2^{32}\\ &=(2^5)^62^2\\ &=(32)^62^2\\ &\equiv(-1)^62^2\mod 33\\ &\equiv 4\mod 33 \end{align*}

Thus, $33$ is not a prime.

Here is another primality test.

*Theorem*. If the integer $n>1$ has no prime divisor less than or equal to $\sqrt{n}$, then $n$ is prime.*Proof*. Suppose that $n$ is a composite number. Then $n=d_1d_2$ for some $d_1>1$ and $d_2>1$. Suppose that all prime divisors are greater than $\sqrt{n}$. Then $d_1>\sqrt{n}$ and $d_2>\sqrt{n}$. But $n=d_1d_2>(\sqrt{n})^2=n$, which is impossible. This proves the theorem. If $d_1\leq\sqrt{n}$, then $d_1$ is either prime or else has a prime divisor less than or equal to $\sqrt{n}$.

*Question*: Is the statement “If $2^{n-1}\equiv 1\mod n$ then $n$ is a prime.” true? The ancient Chinese believed so, but there is a counterexample. $2^{340}\equiv 1\mod 341$. But, $341=11\cdot 31$ (this is left as an exercise), so it is not a prime.

*Definition*. We call $n$ a *pseudoprime* if $2^{n-1}\equiv 1\mod n$ but $n$ is composite. More generally, a composite number $n$ such that $a^{n-1}\equiv 1\mod n$ is called a *pseudoprime to base* $a$.

The smallest pseudoprime is $341$, and it was not discovered until 1819. Bases other than $2$ can be used to identify composite numbers. For example,

$$3^{340}\equiv 56\mod 341$$

Thus, $341$ is not a prime number. These pseudoprimes are rarer then primes. Ever rarer are pseudoprimes to multiples bases. It is known, for example, that there are only 1770 integers below $25\cdot 10^9$ that are simultaneously pseudoprimes to the bases $2$, $3$, $5$ and $7$. So, the primality of numbers less than $25\cdot 10^9$ can be determined by testing Fermat’s congruence $a^{n-1}\equiv 1\mod n$ with these four bases, then comparing any number passing all 4 tests with a list of the 1770 exceptions. If the number is not any of those 1770 exceptions, it must be a prime.

Next we discuss Mersenne numbers and Fermat numbers, and primality tests those numbers.

Definition. If $k$ is a positive integer, we call $M_k=2^k-1$ a *Mersenne number*.

$$\begin{array}{|c|c|c|}

\hline

k & M_k=2^k-1 & \mbox{prime?}\\

\hline

1 & 1 & \mbox{N}\\

\hline

2 & 3 & \mbox{Y}\\

\hline

3 & 7 & \mbox{Y}\\

\hline

4 & 15=3\cdot 5 & \mbox{N}\\

\hline

5 & 31 & \mbox{Y}\\

\hline

6 & 63=9\cdot 7 & \mbox{N}\\

\hline

7 & 127 & \mbox{Y}\\

\hline

8 & 255=3\cdot 5\cdot 7 & \mbox{N}\\

\hline

9 & 511=7\cdot 73 & \mbox{N}\\

\hline

10 & 1023=3\cdot 11\cdot 31 & \mbox{N}\\

\hline

\end{array}$$

From this table, one may expect that $M_k$ is prime when $k$ is prime. However, $M_{11}=2047=23\cdot 89$ is not a prime.

*Theorem*.

If $d|k$, then $M_d|M_k$, so if $M_k$ is prime, then $k$ is prime.

*Proof*. Since $d|k$, $k=dd_1$ for some $d_1\in\mathbb{Z}$.

$$M_k=2^k-1=2^{dd_1}-1=(2^d-1)(2^{(d(d_1-1)}+2^{(d(d_1-2)}+\cdots+2^d+1)$$

$$\begin{array}{|c|c|c|}

\hline

k & 2^k+1 & \mbox{prime?}\\

\hline

1 & 3 & \mbox{Y}\\

\hline

2 & 5 & \mbox{Y}\\

\hline

3 & 9=3\cdot 3 & \mbox{N}\\

\hline

4 & 17 & \mbox{Y}\\

\hline

5 & 33=3\cdot 11 & \mbox{N}\\

\hline

6 & 65=5\cdot 13 & \mbox{N}\\

\hline

7 & 129=3\cdot 43 & \mbox{N}\\

\hline

8 & 257 & \mbox{Y}\\

\hline

9 & 513=3\cdot 3\cdot 3\cdot 19 & \mbox{N}\\

\hline

10 & 1025=5\cdot 5\cdot 41 & \mbox{N}\\

\hline

\end{array}$$

*Theorem*. If $k$, $a$, and $b$ are positive integers such that $k=ab$, where $a$ is odd, then

$$2^b+1|2^k+1$$

In particular, if $2^k+1$ is prime, then $k$ is $0$ or a power of $2$.

*Proof*. Under the assumption, we want to show that

$$2^k+1\equiv 0\mod 2^b+1$$

or

$$2^k\equiv -1\mod 2^b+1$$

Since, $2^b\equiv -1\mod 2^b+1$,

$$2^k=2^{ab}=(2^b)^a\equiv (-1)^a\mod 2^b+1\equiv -1\mod 2^b+1$$

If $k>0$ is not a power of $2$, then we can take $a>1$ so that

$$1<2^b+1<2^k+1$$

This means that $2^k+1$ has a positive divisor other than 1 and itself.

*Definition*. $F_r=2^{2^r}+1$ is called a *Fermat number*.

*Example*. $F_0=3$, $F_1=5$, $F_2=17$, $F_3=257$, $F_4=65537$ are primes, so Fermat conjectured that Fermat numbers are prime. But $F_5=2^{32}+1$ is not a prime. It is divisible by $641$. $641$ can be written

\begin{align*} 641&=640+1=2^7\cdot 5+1\\ 641&=625+16=5^4+2^4 \end{align*}

Thus, we have $2^7\cdot 5\equiv -1\mod 641$ and $2^4\equiv -5^4\mod 641$.

Now,

\begin{align*} F_5&=2^{32}+1\\ &=(2^7)^4\cdot 2^4+1\\ &\equiv(2^7)^4(-5^4)+1\mod 641\\ &\equiv -(2^7\cdot 5)^4+1\mod 641\\ &\equiv 0\mod 641 \end{align*}

No other Fermat numbers that are prime have been found yet.

One may expect that for any composite number $n$, there exists a base $a$ for which Fermat’s theorem can be used to show that $n$ is composite. However, there are composite numbers called *Carmichael numbers*, which are pseudo primes to every base. That is, $n$ is composite, but $a^{n-1}\equiv 1\mod n$, whenever $(a,n)=1$. The smallest Carmichael number is $561=3\cdot 11\cdot 17$. It was proved in 1994 that there are infinitely many Carmichael numbers.

There is a primality test just for Mersenne numbers. It is called the *Lucas-Lehmer test*. Define a sequence $S_1,S_2,\cdots$ by

\begin{align*} S_1&=4,\\ S_n&=S_{n-1}-2\ \mbox{for}\ n>1. \end{align*}

For example, $S_2=4^2-2=14$, $S_3=14^2-2=194$. Then the Lucas-Lehmer test is

*Theorem*. If $p$ is an odd prime then $M_p=2^p-1$ is prime if and only if $S_{p-1}\equiv 0\mod M_p$.

For example, take $p=7$ and show that $M_p=2^7-1=127$ is a prime.

\begin{align*} S_3&=194\equiv 67\mod 127\\ S_4&\equiv 67^2-2\mod 127\equiv 42\mod 127\\ S_5&\equiv 42^2-2\mod 127\equiv 111\mod 127\\ S_6&\equiv 111^2-2\mod 127\equiv 0\mod 127. \end{align*}

There is also a primality test just for Fermat numbers. It is called *Péppin’s test*.

*Theorem*. If $n>0$, the Fermat number $F_n=2^{2^n}+1$ is prime if and only if $3^{\frac{F_n-1}{2}}\equiv -1\mod F_n$.

*Example*. Use Pépin’s test to show that $F_3=257$ is prime.

*Solution*. $3^{\frac{257-1}{2}}=3^{128}=3^{2^7}$.

\begin{align*} 3^2&=9\\ 3^4&=9^2=81\\ 3^8&=81^2\equiv 136\mod 257\\ 3^{16}&\equiv 136^2\mod 257\equiv 249\mod 257\\ 3^{32}&\equiv 249^2\mod 257\equiv 64\mod 257\\ 3^{64}&\equiv 64^2\mod 257\equiv 241\mod 257\\ 3^{128}&\equiv 241^2\mod 257\equiv 256\mod 257\equiv -1\mod 257 \end{align*}