# An Easier Way to Solve Quadratic Equations

The Greek mathematician Heron (100? B.C.) is credited by some as the first (historically known) person who studied quadratic equations. He considered questions like: “There is a square. The sum of its area and circumference is 896. What is its length (or width)?” But some simple quadratic equations were already known to Mesopotamians 4,000 years ago. Along with linear equations, quadratic equations are among the oldest subjects of mathematics that have been studied. So, it is amazing and even somewhat amusing that there is still something new to find out about the solution of quadratic equations. I recently came across an article on the arXiv titled “A Simple Proof of the Quadratic Formula” by Po-Shen Loh at Carnegie Mellon. His proof also provides an alternative way to solve quadratic equations without dealing with factoring quadratic polynomials or having to remember the quadratic formula. I believe students will find it much easier than the conventional methods (factoring, completing the square, or the quadratic formula). First, here is the proof. Any quadratic equation can be written as a monic equation $$x^2+bc+c=0$$ Suppose that its solutions are $\alpha$ and $\beta$. Then $$x^2+bx+c=(x-\alpha)(x-\beta)=x^2-(\alpha+\beta)x+\alpha\beta=0$$ By comparing the coefficients, we have $\alpha+\beta=-b$ and $\alpha\beta=c$. Thus, $\alpha$ and $\beta$ can be written as $u-\frac{b}{2}$ and $-u-\frac{b}{2}$, respectively, where $u$ is an unknown quantity that is to be determined. Since $\alpha\beta=c$, we have $u^2-\frac{b^2}{4}=-c$. Solving this equation for $u$, we obtain $u=\pm\frac{\sqrt{b^2-4ac}}{2}$ and hence $\alpha$ and $\beta$ are given by $-\frac{b}{2}\pm\frac{\sqrt{b^2-4ac}}{2}$.

Example. The above question by Heron is equivalent to solving the quadratic equation $x^2+4x-896=0$. $\frac{b}{2}=2$ so we have $u^2-4=896$ i.e. $u^2=900$. Therefore, $u=\pm 30$ and the two solutions are $x=-2\pm 30$. Since $x>0$, the answer is $x=28$.

The method still applies to quadratic equations that have complex solutions. For example,

Example. Consider $x^2+x+1=0$. $\frac{b}{2}=\frac{1}{2}$, so we have the equation $u^2-\frac{1}{4}=-1$ i.e. $u^2=-\frac{3}{4}$. Therefore, $u=\pm\frac{\sqrt{3}i}{2}$ and the solutions are $x=-\frac{1}{2}\pm\frac{\sqrt{3}i}{2}$.

# Shannon Entropy

Let $X$ be a discrete random variable and suppose it takes values $x_1,\cdots,x_n$. Let $p$ be the probability distribution function (pdf in short) of the random variable $X$. For each $i=1,\cdots,n$, denote by $p_i$ $p(x_i)$, i.e.
$$p_i=P\{X=x_i\}$$
Here is a question. Is there a unique way to measure the amount or degree of uncertainty represented by the probability distribution? The answer is affirmative and Claude E. Shannon, in his landmark paper [2], was able to obtain such measurement of uncertainty. It is denoted by $H(p_1,\cdots,p_n)$ and called the entropy. Regardless of the name, there is no reason to expect that this notion of entropy is related to thermodynamic entropy because it is obtained independently from any physical hypotheses. Remarkably however, it coincides with thermodynamic entropy except for the presence of Boltzmann constant. To distinguish it from thermodynamic entropy, it is often called the information entropy or Shannon entropy. Shannon imposed the following three conditions on $H$:

1. $H$ is a continuous function of the $p_i$.
2. If all $p_i$ are equal, the quantity $$A(n)=H\left(\frac{1}{n},\cdots,\frac{1}{n}\right)$$ is a monotone increasing function of $n$. An increase in $n$ results in a decrease in the probability, therefore the uncertainty (entropy) increases.
3. Instead of considering of the events $x_1,\cdots,x_n$ with the respective probabilities $p_1,\cdots,p_n$, we may group the first $k$ of them $x_,\cdots,x_k$ as a single composite event and is given the probability $w_1=p_1+\cdots+p_k$. Then next $m$ events $x_{k+1},\cdots,x_{k+m}$ are gouped together as a single composite event and is given the probability $w_2=p_{k+1}+\cdots+p_{k+m}$, and so on so forth. The entropy regarding these composite events is $H(w_1,\cdots,w_r)$. In addition, since the composite events have occurred, the events $x_1,\cdots,x_k$ should be given the conditional probabilities $\frac{p_1}{w_1},\cdots,\frac{p_k}{w_1}$, the events $x_{k+1},\cdots,x_{k+m}$ should be given the conditional probabilities $\frac{p_{k+1}}{w_2},\cdots,\frac{p_{k+m}}{w_2}$, and so on so forth. In order for our information measurement is to be consistent, the ultimate state of knowledge we obtained this way should be the same as one obtained by considering the event $x_1,\cdots,x_n$ with probabilities $p_1,\cdots,p_n$. The argument is summed up as \begin{aligned}H(p_1,\cdots,p_n)=&H(w_1,\cdots,w_r)+w_1H\left(\frac{p_1}{w_1},\cdots,\frac{p_k}{w_1}\right)\\&+w_2H\left(\frac{p_{k+1}}{w_2},\cdots,\frac{p_{k+m}}{w_2}\right)+\cdots\end{aligned}\label{eq:composition}The appearance of the weighting factors $w_i$ is of course due to the fact that, for example, we run into the additional uncertainty $H\left(\frac{p_1}{w_1},\cdots,\frac{p_k}{w_1}\right)$ only with the probability $w_1$. \eqref{eq:composition} is called the composition law.

Example. $H\left(\frac{1}{2},\frac{1}{3},\frac{1}{6}\right)=H\left(\frac{1}{2},\frac{1}{2}\right)+\frac{1}{2}H\left(\frac{2}{3},\frac{1}{3}\right)$ by grouping the events with probabilities $\frac{1}{3}$ and $\frac{1}{6}$ as a composite event.

Example. Let us consider three events $x_1,x_2,x_3$ with respective probabilities $p_1=\frac{3}{9}$, $p_2=\frac{4}{9}$, $p_3=\frac{2}{9}$. We may view these three events as three composite events resulted from nine events with equal probability $\frac{1}{9}$ by grouping the first 3 events, then the next 4 events, and finally the remaining 2 events together. Using \eqref{eq:composition}, we obtain

\label{eq:entropy}
H\left(\frac{3}{9},\frac{4}{9},\frac{2}{9}\right)+\frac{3}{9}A(3)+\frac{4}{9}A(4)+\frac{2}{9}A(2)=A(9)

In exactly the same manner, if we write $p_i=\frac{n_i}{\sum_in_i}$, \eqref{eq:entropy} is readily generalized to

\label{eq:entropy2}
H(p_1,\cdots,p_n)+\sum_ip_iA(n_i)=A\left(\sum_in_i\right)

As a special case, if we choose $n_i=m$, then \eqref{eq:entropy2} becomes

\label{eq:entropy3}
A(m)+A(n)=A(mn)

Obviously,

\label{eq:entropy4}
A(n)=K\ln n

(where $K>0$ due to condition 2) is a solution. But is that a unique solution? To see that, let us consider a continuum extension $A(x)$ of $A(n)$ which is at least twice differentiable. Then \eqref{eq:entropy3} can be written as the functional equation

\label{eq:entropy5}
A(x)+A(y)=A(xy)

Differentiating \eqref{eq:entropy5} with respect to $x$, we obtain

\label{eq:entropy6}
A'(x)=yA'(xy)

Differentiating \eqref{eq:entropy6} with respect to $y$, we obtain

\label{eq:entropy7}
0=A'(xy)+xyA^{\prime\prime}(xy)

By introducing a new variable $z=xy$, \eqref{eq:entropy7} can be written as the second-order linear differential equation
$$zA^{\prime\prime}(z)+A'(z)=0$$
whose solution is
$$A(z)=K\ln(Cz)$$
That is, we have $A(n)=K\ln(Cn)$ where $K>0$ and $C>0$ are constants. Hence, $A(n)$ is unique up to constant multiples. Since entropy is a qualitative measurement of uncertainty rather than a quantitative one, we may choose $C=1$. For the same reason, we may choose $K=1$ as well but for now we will keep the constant multiple $K$. Substituting \eqref{eq:entropy4} into \eqref{eq:entropy2}, we finally obtain Shannon entropy

\label{eq:shannon}
\begin{aligned}
H(p_1,\cdots,p_n)&=K\ln\left(\sum_in_i\right)-K\sum_ip_i\ln n_i\\
&=-K\sum_ip_i\ln\left(\frac{n_i}{\sum_in_i}\right)\\
&=-K\sum_ip_i\ln p_i
\end{aligned}

Definition. There is a generalization of Shannon entropy called the Rényi entropy

\label{eq:renyi}
H_r(p_1,\cdots,p_n)=\frac{1}{1-r}\ln\left(\sum_ip_i^r\right)

for $0<r<\infty$ and $r\ne 1$.

Proposition 1. $$\lim_{r\to 1}H_r(p_1,\cdots,p_n)=-\sum_ip_i\ln p_i$$
i.e., Shannon entropy (with $K=1$) is the limit of the Rényi entropy as $r\to 1$.

Proof. Let $f(r)=\ln\left(\sum_ip_i^r\right)$. Then it is readily seen that the limit on the left is simply $f'(1)$.

Definition. Tsallis entropy is another generalization of Shannon entropy. It is defined by

\label{eq:tsallis}
S_r(p_1,\cdots,p_n)=\frac{1}{r-1}\left(1-\sum_ip_i^r\right)

One can easily show that the limit of Tsallis entropy as $r\to 1$ is Shannon entropy (with $K=1$) similarly to the proof of proposition 1.

References:

1. E. T. Jaynes, Information Theory and Statistical Mechanics, The Physical Review, Vol. 106, No. 4, 620-630, May 15, 1957
2. Claude E. Shannon, A mathematical theory of communication, Bell Sys. Tech. Journal, 27: 379-423, 623-656, 1948

# Computing for Number Theory 1: GCD, LCD, Linear Diophantine Equations

Here, we study some computing problems that are related to Divisibility and the Division Algorithm.

#### Maxima

GCD: Maxima has a function to compute the GCD (Greatest Common Divisor) gcd(a,b).

Example. Find the GCD of 158 and 188.

(%i1) gcd(158,188);

(%o1) 2

LCM: Maxima has a function to compute the LCM (Least Common Multiple) lcm(express_1,...express_n). This function needs to be loaded by running the command load("functs").

Example. Find the LCD of 6, 8, 15, 32.

(%i1) load("functs")$(%i2) lcm(6,8,15,32); (%o2) 480 Bézout’s Lemma (see here) says that given two integers$a,b$, there is an integer solution$(x,y)$to the equation$ax+by=(a,b)$. The Maxima command gcdex implements the Euclidean algorithm. gcdex(a,b) returns [x, y, u] where$u=(a,b)$and$ax+by=u$. So, one can find a solution of the equation$ax+by=(a,b)$using gcdex. Example. Solve$2=158x+188y$. We saw earlier that$(158,188)=2$. (%i2) gcdex(158,188); (%o2) [25, - 21, 2] Hence,$x=25$and$y=-21$. One can also use the command igcdex. gcdex and igcdex both implement the Euclidean algorithm. The only difference between them is that the argument of igcdex must be integers while the arguments of gcdex can be also polynomials. For instance, (%i7) gcdex(x^2 + 1,x^3 + 4); (%o7)/R/ [$\displaystyle\frac{x^2+4x-1}{17}$,$\displaystyle\frac{x+4}{17}$, 1] #### Python GCD: Algorithm 1. The name of the Python code is gcd(a,b) and it implements the Euclidean algorithm to find the GCD of two integers$a$and$b$. 2. If b == 0 then gcd(a,b) returns 0 because every nonzero integer divides 0. 3. Otherwise, it returns gcd(b,a % b). % is the modulo or remainder operator. a % b returns the remainder when$a$is divided by$b$if$a\geq b$or$a$if$a<b$because$a=b\cdot 0+a$. For example, 12345 % 2 returns 1. 4. The loop continues until b==0. Use your favorite editor to write the following python code (it is called a Python module) and save it as, for example, gcd.py. The Python interface IDLE has own editor. I also recommend Emacs and Geany for Python editors if you are running Python in command shell with iPython. # Computing the GCD of two integers a and b recursively using the Euclidean algorithm def gcd(a,b): if (b == 0): return a return gcd(b, a%b) On IDLE, the Python module can be run from its pull-down (or drop-down) menu under Run. Its shortcut key is F5. To run it on iPython, first change your current folder location to the one that contains the Python module gcd.py using cd (change directory) command in command shell and then start iPython by typing ipython (you can first start iPython and then change your current folder location as well using cd within iPython) and pressing the enter key. Once iPython is loaded, do$ ipython
Python 3.7.9 (default, Feb 4 2021, 01:17:56)
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: run gcd.py
In [2]: gcd(98,56)
Out[2]: 14

The Euclidean Algorithm: The Euclidean algorithm itself is a perfect computer algorithm. A Python code of the Euclidean algorithm is

# This Python module runs the Euclidean Algorithm to find the GCD of two integers a and b
def euclid(a,b):
r = a % b
while r > 0:
print (a, b, r)
a, b, r = b, r, b % r
return b

Running this Python module (named euclid.py) on iPython shows