Author Archives: Sung Lee

About Sung Lee

I am a mathematician and a physics buff. I am interested in studying mathematics inspired by theoretical physics as well as theoretical physics itself. Besides, I am also interested in studying theoretical computer science.

Gabriel’s Horn

Consider the curve $y=\frac{1}{x}$, $1\leq x<\infty$.

The Graph of y=1/x, x=1…50

Gabriel’s horn is the surface that is obtained by rotating the above curve about the x-axis.

Gabriel’s horn

What’s interesting about this Gabriel’s horn is that its surface area is infinite while the volume of its interior is finite. Let us first calculate the volume. Using the disk method, the volume is given by \begin{align*}V&=\pi\int_1^\infty\left(\frac{1}{x}\right)^2dx\\&=\pi\lim_{a\to\infty}\int_1^a\frac{1}{x^2}dx\\&=\pi\lim_{a\to\infty}\left[-\frac{1}{x}\right]_1^a\\&=\pi\lim_{a\to\infty}\left[1-\frac{1}{a}\right]\\&=\pi\end{align*}Its surface area is obtained by calculating the integral \begin{align*}A&=2\pi\int_1^\infty\frac{1}{x}\sqrt{1+\left(-\frac{1}{x^2}\right)^2}dx\\&=2\pi\lim_{a\to\infty}\int_1^a\frac{1}{x}\sqrt{1+\frac{1}{x^4}}dx\end{align*} We don’t actually have to evaluate this integral to see the area is infinite. Since $\sqrt{1+\frac{1}{x^4}}>1$, $$\int_1^a\frac{1}{x}\sqrt{1+\frac{1}{x^4}}dx\geq \int_1^a\frac{1}{x}dx=\ln a$$ Hence, $A=\infty$. The integral $\int \frac{1}{x}\sqrt{1+\frac{1}{x^4}}dx$ can be evaluated exactly. Using first the substitution $u=x^2$ and then the trigonometric substitution $u=\tan\theta$, \begin{align*}\int\frac{1}{x}\sqrt{1+\frac{1}{x^4}}dx&=\frac{1}{2}\int\frac{\sqrt{1+u^2}}{u^2}du\\&=\frac{1}{2}\int\frac{\sec^2\theta\sec\theta}{\tan^2\theta}d\theta\\&=\frac{1}{2}\int\frac{(1+\tan^2\theta)\sec\theta}{\tan^2\theta}d\theta\\&=\frac{1}{2}\left[\int\cot^2\theta\sec\theta d\theta+\int\sec\theta d\theta\right]\\&=\frac{1}{2}\left[\int\frac{\cos\theta}{\sin^2\theta}d\theta+\int\sec\theta d\theta\right]\\&=\frac{1}{2}\left[-\csc\theta+\ln|\sec\theta+\tan\theta|\right]\\&=\frac{1}{2}\left[-\frac{\sqrt{x^4+1}}{x^2}+\ln(\sqrt{x^4+1}+x^2)\right]\end{align*}

Another horn-shaped surface can be obtained by rotating the curve $y=e^{-x}$, $0\leq x<\infty$:

The Graph of y=exp(-x), x=0…5
Surface of revolution of y=exp(-x) about the x-axis

The volume of the interior is finite and \begin{align*}V&=\pi\int_0^\infty e^{-2x}dx\\&=\pi\lim_{a\to\infty}\int_0^\infty e^{-2x}dx\\&=-\frac{\pi}{2}\lim_{a\to\infty}[e^{-2x}]_0^a\\&=\frac{\pi}{2}\end{align*} Unlike Gabriel’s horn, the surface area is also finite. To see that, it is given by the improper integral \begin{align*}A&=\int_0^\infty 2\pi e^{-x}\sqrt{1+(-e^{-x})^2}dx\\&=2\pi\lim_{a\to\infty}\int_0^a e^{-x}\sqrt{1+e^{-2x}}dx\end{align*} Using the substitution $u=e^{-x}$ and then the trigonometric substitution $u=\tan\theta$ the integral $\int e^{-x}\sqrt{1+e^{-2x}}dx$ is evaluated to be \begin{align*}\int e^{-x}\sqrt{1+e^{-2x}}dx&=-\int\sqrt{1+u^2}du\\&=-\int\sec^3\theta d\theta\\&=-\frac{1}{2}[\sec\theta\tan\theta+\ln|\sec\theta+\tan\theta|]\\&=-\frac{1}{2}[u\sqrt{1+u^2}+\ln|\sqrt{u^2+1}+u|]\\&=-\frac{1}{2}[e^{-x}\sqrt{e^{-2x}+1}+\ln(\sqrt{e^{-2x}+1}+e^{-x})]\end{align*} (For the details on how to evaluate the integral $\int\sec^3\theta d\theta$, see here.) Hence, $A$ is computed to be $\pi[\sqrt{2}+\ln(\sqrt{2}+1)]$.

There is a particularly interesting horn-shaped surface (actually a shape of two identical horns put together as shown in the figure below) although it has both a finite volume and a finite surface area. It is called a pseudosphere. A pseudosphere is a surface with constant negative Gaussian curvature. The pseudosphere of radius 1 is obtained by revolving the tractrix $$t\mapsto (t-\tanh t,\mathrm{sech}\ t),\ -\infty<t<\infty$$ about its asymptote (the $x$-axis).

The Tractrix

The resulting surface of revolution, the pseudosphere of radius 1, is seen in the following figure.

The pseudosphere of radius 1

The volume of the interior of the pseudosphere is \begin{align*}V&=\pi\int_{-\infty}^\infty y^2dx\\&=2\pi\int_0^\infty y^2dx\\&=2\pi\int_0^\infty\mathrm{sech}^2\ t (1-\mathrm{sech}^2\ t)dt\\&=2\pi\int_0^\infty \mathrm{sech}^2\ t\tanh^2 tdt\\&=2\pi\int_0^1 u^2du\ (u=\tanh t)\\&=2\pi\left[\frac{u^3}{3}\right]_0^1\\&=\frac{2\pi}{3}\end{align*} The area of the pseudosphere is \begin{align*}A&=\int_{-\infty}^\infty 2\pi y ds\\&=2\int_0^\infty 2\pi y\sqrt{\left(\frac{dx}{dt}\right)^2+\left(\frac{dy}{dt}\right)^2}dt\\&=4\pi\int_0^\infty\mathrm{sech}\ t\sqrt{(1-\mathrm{sech}^2\ t)^2+(-\mathrm{sech}\ t\tanh t)^2}dt\\&=4\pi\int_0^\infty\mathrm{sech}\ t\sqrt{\tanh^4 t+\mathrm{sech}^2 t\tanh^2 t}dt\\&=4\pi\int_0^\infty\mathrm{sech}\ t\tanh t dt\\&=4\pi\int_0^\infty\frac{\sinh t}{\cosh^2 t}dt\\&=4\pi\int_1^\infty\frac{1}{u^2}du\ (u=\cosh t)\\&=4\pi\end{align*} Notice that its volume is half of the volume of the unit sphere and its area is the same as the area of the unit sphere. Such volume and area relationships are still true for the pseudosphere of radius $r$, i.e. the volume and the area of the pseudosphere of radius $r$ are, respectively, $\frac{2}{3}\pi r^3$ and $4\pi r^2$ (we do not discuss it here, but the radius of a pseudosphere is defined to be the radius of its equator) as noted by the Dutch physicist Christiaan Huygens. The Gaussian curvature of a regular surface can be computed by the Gauss’ formula. The pseudosphere of radius 1 as a parametric surface is represented by the equation $$\varphi(t,s)=(t-\tanh t,\mathrm{sech}\ t\cos s,\mathrm{sech}\ t\sin s)$$ As seen in the figure above, the pseudosphere is not regular along the equator (at $t=0$). It has a constant negative Gaussian curvature $K=-1$ anywhere else. A common misunderstanding (usually by non-mathematicians) is that a surface with constant negative Gaussian curvature is a hyperbolic surface. In order for a surface to be hyperbolic, in addition to having a constant negative curvature, it is required to be complete and regular, so the pseudosphere is not hyperbolic although it was introduced by Eugenio Beltrami as a model of hyperbolic geometry. In fact, Hilbert’s theorem states that there exist no complete regular surface of constant negative Gaussian curvature immersed in $\mathbb{R}^3$. This means that one cannot obtain a model of two-dimensional hyperbolic geometry in $\mathbb{R}^3$. However, one can obtain a model of two-dimensional hyperbolic geoemtry in $\mathbb{R^{2+1}}$, a 3-dimensional Minkowski space-time. There is another interesting surface with constant negative Gaussian curvature called Dini’s surface. It is described by the parametric equations \begin{align*}x&=a\cos u\sin v\\y&=a\sin u\sin v\\z&=a\left(\cos v+\ln\tan\frac{v}{2}\right)+bu\end{align*}

Dini’s surface with a=1 and b=1/2.

The Gaussian curvature of Dini’s surface is computed to be $K=-\frac{1}{a^2+b^2}$.

Black Hole 1: Schwarzschild Black Hole

In light of the recent exciting press release about the supermassive black hole at the center of our own Milky Way galaxy, I am going to discuss a brief introduction to black holes and other related concepts in a series of blog articles for those who wish to understand better these amazing monstrous heavenly creatures. For starter, this first article is about Schwarzschild black hole, the simplest black hole model and also the first exact solution of the Einstein’s Field Equation.

The Supermassive Black Hole (Sagittarius A*) at the Center of Milky Way Galaxy.

First, what is a black hole?

A black hole is a hypothetical (though its existence is no longer in doubt) region of spacetime where gravity is so immense that nothing, even light (the lightest matter in the universe) can escape from it.

How did we come to know about a black hole? Was it just an imagination of some mad scientists?

No imagination. We came to know about the theoretical notion of a black hole from Einstein’s general relativity. But long before general relativity came out, French mathematician Pierre-Simon Laplace had already thought about a celestial object whose gravity is so strong that even light can’t escape. It is discussed in his five-volume book Mécanique Céleste (Celestial Mechanics).

In general relativity, black holes are solutions of the Einstein’s Field Equations.

What are the Einstein’s Field Equations?

They are the fundamental equations of Einstein’s theory of gravitation. Such equations are expected to satisfy the following requirements:

  • The field equations should be independent of coordinate systems, i.e. they should be tensorial.
  • They should be partial differential equations of second order for the components $g_{ij}$ of the unknown metric tensor. If you want to know what a metric tensor is, go deeper here.
  • They are a relativistic generalization of the Poisson equation of Newtonian gravitational potential $$\nabla^2\Phi=4\pi G\rho$$ where $\rho$ is the mass density.
  • Since the energy-momentum tensor $T_{ij}$ is the special relativistic analogue of the mass density, it should be the source of the gravitational field.
  • If the space is flat, $T_{ij}$ should vanish.

The equations satisfying all these requirements are given by \begin{equation}\label{eq:EFT}G_{ij}=8\pi GT_{ij}\end{equation} where $G_{ij}=R_{ij}-\frac{1}{2}Rg_{ij}$. $G_{ij}$ is called the Einstein tensor. $R_{ij}$ and $R$ are, respectively, Ricci curvature tensor and scalar curvature. If you want to know what they are, go deeper here. Interestingly, the field equations \eqref{eq:EFT} were derived independently and almost simultaneously by Albert Einstein and David Hilbert in 1915.

So, how do we solve the Field Equations?

The field equations are nonlinear partial differential equations. Partial differential equations are, in general,  difficult to solve with equations alone. We impose additional (physical) assumptions and conditions (such as boundary conditions and initial conditions) so the equations can becomes simpler enough for us to be able to solve. First, we have no information on the source of gravity, i.e. no information on the energy-momentum tensor $T_{ij}$. Instead, we consider the field equations outside the field-producing masses. Thus, we obtain the vacuum field equations $$R_{ij}-\frac{1}{2}Rg_{ij}=0$$ Without loss of generality, we may assume that the metric tensor $g_{ij}$ is diagonal, i.e. $g_{ij}=0$ if $i\ne j$. This results in $R_{ij}=0$ if $i\ne j$ from the vacuum field equations. For $i=j$, since the scalar curvature $R$ is given by $R=\sum_ig^{ii}R_{ii}$, from the vacuum field equations, we have $(n-2)R=0$. For $n=4$, $R=0$ and therefore the vacuum field equations turn into \begin{equation}\label{eq:ricciflat}R_{ij}=0\end{equation} i.e. vanishing Ricci curvature tensor. \eqref{eq:ricciflat} is simpler than the original vacuum field equations but by no means it is easy to solve. The simplest solution of \eqref{eq:ricciflat} was obtained in 1915 by a German physicist Karl Schwarzschild while serving in the army during World War I. This is also the first exact solution to the Einstein’s field equations. Schwarzschild assumed a static spherical symmetric metric as a solution of the vacuum equations \eqref{eq:ricciflat}. The ansatz for a static isotropic metric is given by \begin{equation}\label{eq:ansatz}ds^2=-A(r)dt^2+B(r)dr^2+r^2(d\theta^2+\sin^2\theta d\phi^2)\end{equation} In addition, we want the solution to become the flat spacetime far away from the masses i.e. the solution is asymptotically flat, so we require $$\lim_{r\to\infty}A(r)=\lim_{r\to\infty}B(r)=1$$ Now, we are ready to solve the vacuum field equations \eqref{eq:ricciflat}. From the ansatz \eqref{eq:ansatz}, we have $$g_{tt}=-A(r),\ g_{rr}=B(r),\ g_{\theta\theta}=r^2,\ g_{\phi\phi}=r^2\sin^2\theta$$ The nonzero Christoffel symbols are computed to be \begin{align*}\Gamma_{rr}^r&=\frac{B’}{2B},\ \Gamma_{tt}^r=\frac{A’}{2B},\ \Gamma_{\theta\theta}^r=-\frac{r}{B}\\\Gamma_{\phi\phi}^r&=-\frac{r\sin^2\theta}{B},\ \Gamma_{\theta r}^\theta=\Gamma_{\phi r}^\phi=\frac{1}{r},\\\Gamma_{tr}^t&=\frac{A’}{2A},\ \Gamma_{\phi\phi}^\theta=-\sin\theta\cos\theta,\ \Gamma_{\phi\theta}^\phi=\cot\theta\end{align*} Here, ‘ stands for $\frac{d}{dr}$. Using these Christoffel symbols, the Ricci tensors are computed to be \begin{align*}R_{tt}&=\frac{A^{\prime\prime}}{2B}-\frac{A’}{4B}\left(\frac{A’}{A}+\frac{B’}{B}\right)+\frac{A’}{rB}\\R_{rr}&=-\frac{A^{\prime\prime}}{2A}+\frac{A’}{4A}\left(\frac{A’}{A}+\frac{B’}{B}\right)+\frac{B’}{rB}\\R_{\theta\theta}&=1-\frac{1}{B}-\frac{r}{2B}\left(\frac{A’}{A}-\frac{B’}{B}\right)\\R_{\phi\phi}&=\sin^2\theta R_{\theta\theta}\end{align*} From the first two equations of the Ricci tensors above and by requirung them to be 0, $$BR_{tt}+AR_{rr}=\frac{1}{rB}(A’B+AB’)=\frac{1}{rB}(AB)’=0$$ So, we have $A(r)B(r)$=constant. Asymptotic flatness implies that $A(r)B(r)=1$. $R_{\theta\theta}=0$ along with $B(r)=\frac{1}{A(r)}$ and $A’B+AB’=0$ results in $A+rA’=(rA)’=1$. Hence, $A(r)=1+\frac{C}{r}$, where $C$ is a constant. By weak field approximation (Newtonian limit) , we obtain \begin{align*}A(r)=-g_{00}&=1+\frac{2\Phi}{c^2}\\&=1-\frac{2MG}{c^2r}\end{align*} where $\Phi=-\frac{GM}{r}$. This determines the constant $C=-\frac{2MG}{c^2r}$. Finally, the Schwarzschild solution is given by \begin{equation}\label{eq:bh}ds^2=-\left(1-\frac{2MG}{c^2r}\right)c^2dt^2+\left(1-\frac{2MG}{c^2r}\right)^{-1}dr^2+r^2(d\theta^2+\sin^2\theta d\phi^2)\end{equation} As I mentioned in the beginning, this is the simplest black hole model and also is the first exact solution of the Einstein’s Field Equations. It turns out that Schwarzschild solution is the only spherically symmetric solution to the vacuum Einstein equation \eqref{eq:ricciflat} due to

Birkhoff’s Theorem

In 1923, George David Birkhoff proved:

Theorem. Any spherically symmetric solution of the vacuum Einstein’s field equations \eqref{eq:ricciflat} must be static and asymptotically flat.

The Shape of a Black Hole

Is the converse of Birkhoff’s theorem true? Namely, is any static and asymptotically flat solution of the vacuum field equations necessarily spherically symmetric? I don’t know the answer, though I assume the answer is known. While the notion of a black hole originated from general relativity as a solution of the field equations, general relativity does not tell us how a black hole can be actually formed. A mechanism of black hole formation was provided by nuclear physics as a consequence of the gravitational collapse of a dead star. From our observations of the shape of massive objects , one would think the shape of a black hole is a sphere also. Since we don’t know (I mean I don’t know) for sure, it would be interesting to ponder whether the shape of a black hole different from a sphere can exist, for example a toroidal (doughnut shaped) black hole. If not for Schwarzschild black hole, how about for a rotational black hole?

The Event Horizon

I conclude this article by mentioning that the Schwarzschild solution in \eqref{eq:bh} only describes the outside of the black hole. What determines the inside and the outside of a black hole? It is the event horizon. The event horizon is where $r=r_g=\frac{2MG}{c^2}$. $r_g$ is called the Schwarzschild radius.The Schwarzschild solution requires that $r>r_g$. The Schwarzschild solution unfortunately does not tell anything about whatever happens inside the black hole. We will never know for sure what happens once an object falls into a black hole past its event horizon unless we have information on the source of its gravity, i.e. the stress-energy tensor $T_{ij}$.

The Diagonalization of a Riemannian Metric

Let $(M^n, ds^2)$ denote an $n$-dimensional Riemannian manifold. The Riemannian metric $ds^2$, in general, is given, in terms of local coordinates $x’_1,\cdots,x’_n$, by $$ds^2=\sum_{i,j}h_{ij}(x’_1,\cdots,x’_n)dx’_idx’_j$$ Since the metric tensor $(h_{ij})$ is symmetric, by the finite dimensional spectral theorem, there exists an orthogonal matrix $O$ such that $O^t(h_{ij})O=(g_{ii})$. Let $$\begin{pmatrix}dx_1\\\vdots\\dx_n\end{pmatrix}=O^t\begin{pmatrix}dx’_1\\\vdots\\dx’_n\end{pmatrix}$$ Then \begin{align*}ds^2&=\sum_{i,j}h_{ij}(x’_1,\cdots,x’_n)dx’_idx’_j\\&=\begin{pmatrix}dx’_1\\\vdots\\dx’_n\end{pmatrix}^t(h_{ij})\begin{pmatrix}dx’_1\\\vdots\\dx’_n\end{pmatrix}\\&=\left[O\begin{pmatrix}dx_1\\\vdots\\dx_n\end{pmatrix}\right]^tO(g_{ii})O^t\left[O\begin{pmatrix}dx_1\\\vdots\\dx_n\end{pmatrix}\right]\\&=\begin{pmatrix}dx_1\\\vdots\\dx_n\end{pmatrix}^t(g_{ii})\begin{pmatrix}dx_1\\\vdots\\dx_n\end{pmatrix}\\&=\sum_{i}g_{ii}dx_i^2\end{align*} Therefore, without loss of generality, we may assume that a Riemannian metric is of the diagonal form $$ds^2=\sum_{i}g_{ii}(x_1,\cdots,x_n)dx_i^2$$

An Easier Way to Solve Quadratic Equations

The Greek mathematician Heron (100? B.C.) is credited by some as the first (historically known) person who studied quadratic equations. He considered questions like: “There is a square. The sum of its area and circumference is 896. What is its length (or width)?” But some simple quadratic equations were already known to Mesopotamians 4,000 years ago. Along with linear equations, quadratic equations are among the oldest subjects of mathematics that have been studied. So, it is amazing and even somewhat amusing that there is still something new to find out about the solution of quadratic equations. I recently came across an article on the arXiv titled “A Simple Proof of the Quadratic Formula” by Po-Shen Loh at Carnegie Mellon. His proof also provides an alternative way to solve quadratic equations without dealing with factoring quadratic polynomials or having to remember the quadratic formula. I believe students will find it much easier than the conventional methods (factoring, completing the square, or the quadratic formula). First, here is the proof. Any quadratic equation can be written as a monic equation $$x^2+bc+c=0$$ Suppose that its solutions are $\alpha$ and $\beta$. Then $$x^2+bx+c=(x-\alpha)(x-\beta)=x^2-(\alpha+\beta)x+\alpha\beta=0$$ By comparing the coefficients, we have $\alpha+\beta=-b$ and $\alpha\beta=c$. Thus, $\alpha$ and $\beta$ can be written as $u-\frac{b}{2}$ and $-u-\frac{b}{2}$, respectively, where $u$ is an unknown quantity that is to be determined. Since $\alpha\beta=c$, we have $u^2-\frac{b^2}{4}=-c$. Solving this equation for $u$, we obtain $u=\pm\frac{\sqrt{b^2-4ac}}{2}$ and hence $\alpha$ and $\beta$ are given by $-\frac{b}{2}\pm\frac{\sqrt{b^2-4ac}}{2}$.

Example. The above question by Heron is equivalent to solving the quadratic equation $x^2+4x-896=0$. $\frac{b}{2}=2$ so we have $u^2-4=896$ i.e. $u^2=900$. Therefore, $u=\pm 30$ and the two solutions are $x=-2\pm 30$. Since $x>0$, the answer is $x=28$.

The method still applies to quadratic equations that have complex solutions. For example,

Example. Consider $x^2+x+1=0$. $\frac{b}{2}=\frac{1}{2}$, so we have the equation $u^2-\frac{1}{4}=-1$ i.e. $u^2=-\frac{3}{4}$. Therefore, $u=\pm\frac{\sqrt{3}i}{2}$ and the solutions are $x=-\frac{1}{2}\pm\frac{\sqrt{3}i}{2}$.

Shannon Entropy

Let $X$ be a discrete random variable and suppose it takes values $x_1,\cdots,x_n$. Let $p$ be the probability distribution function (pdf in short) of the random variable $X$. For each $i=1,\cdots,n$, denote by $p_i$ $p(x_i)$, i.e.
$$p_i=P\{X=x_i\}$$
Here is a question. Is there a unique way to measure the amount or degree of uncertainty represented by the probability distribution? The answer is affirmative and Claude E. Shannon, in his landmark paper [2], was able to obtain such measurement of uncertainty. It is denoted by $H(p_1,\cdots,p_n)$ and called the entropy. Regardless of the name, there is no reason to expect that this notion of entropy is related to thermodynamic entropy because it is obtained independently from any physical hypotheses. Remarkably however, it coincides with thermodynamic entropy except for the presence of Boltzmann constant. To distinguish it from thermodynamic entropy, it is often called the information entropy or Shannon entropy. Shannon imposed the following three conditions on $H$:

  1. $H$ is a continuous function of the $p_i$.
  2. If all $p_i$ are equal, the quantity $$A(n)=H\left(\frac{1}{n},\cdots,\frac{1}{n}\right)$$ is a monotone increasing function of $n$. An increase in $n$ results in a decrease in the probability, therefore the uncertainty (entropy) increases.
  3. Instead of considering of the events $x_1,\cdots,x_n$ with the respective probabilities $p_1,\cdots,p_n$, we may group the first $k$ of them $x_,\cdots,x_k$ as a single composite event and is given the probability $w_1=p_1+\cdots+p_k$. Then next $m$ events $x_{k+1},\cdots,x_{k+m}$ are gouped together as a single composite event and is given the probability $w_2=p_{k+1}+\cdots+p_{k+m}$, and so on so forth. The entropy regarding these composite events is $H(w_1,\cdots,w_r)$. In addition, since the composite events have occurred, the events $x_1,\cdots,x_k$ should be given the conditional probabilities $\frac{p_1}{w_1},\cdots,\frac{p_k}{w_1}$, the events $x_{k+1},\cdots,x_{k+m}$ should be given the conditional probabilities $\frac{p_{k+1}}{w_2},\cdots,\frac{p_{k+m}}{w_2}$, and so on so forth. In order for our information measurement is to be consistent, the ultimate state of knowledge we obtained this way should be the same as one obtained by considering the event $x_1,\cdots,x_n$ with probabilities $p_1,\cdots,p_n$. The argument is summed up as \begin{equation}\begin{aligned}H(p_1,\cdots,p_n)=&H(w_1,\cdots,w_r)+w_1H\left(\frac{p_1}{w_1},\cdots,\frac{p_k}{w_1}\right)\\&+w_2H\left(\frac{p_{k+1}}{w_2},\cdots,\frac{p_{k+m}}{w_2}\right)+\cdots\end{aligned}\label{eq:composition}\end{equation}The appearance of the weighting factors $w_i$ is of course due to the fact that, for example, we run into the additional uncertainty $H\left(\frac{p_1}{w_1},\cdots,\frac{p_k}{w_1}\right)$ only with the probability $w_1$. \eqref{eq:composition} is called the composition law.

Example. $H\left(\frac{1}{2},\frac{1}{3},\frac{1}{6}\right)=H\left(\frac{1}{2},\frac{1}{2}\right)+\frac{1}{2}H\left(\frac{2}{3},\frac{1}{3}\right)$ by grouping the events with probabilities $\frac{1}{3}$ and $\frac{1}{6}$ as a composite event.

Example. Let us consider three events $x_1,x_2,x_3$ with respective probabilities $p_1=\frac{3}{9}$, $p_2=\frac{4}{9}$, $p_3=\frac{2}{9}$. We may view these three events as three composite events resulted from nine events with equal probability $\frac{1}{9}$ by grouping the first 3 events, then the next 4 events, and finally the remaining 2 events together. Using \eqref{eq:composition}, we obtain
\begin{equation}
\label{eq:entropy}
H\left(\frac{3}{9},\frac{4}{9},\frac{2}{9}\right)+\frac{3}{9}A(3)+\frac{4}{9}A(4)+\frac{2}{9}A(2)=A(9)
\end{equation}

In exactly the same manner, if we write $p_i=\frac{n_i}{\sum_in_i}$, \eqref{eq:entropy} is readily generalized to
\begin{equation}
\label{eq:entropy2}
H(p_1,\cdots,p_n)+\sum_ip_iA(n_i)=A\left(\sum_in_i\right)
\end{equation}
As a special case, if we choose $n_i=m$, then \eqref{eq:entropy2} becomes
\begin{equation}
\label{eq:entropy3}
A(m)+A(n)=A(mn)
\end{equation}
Obviously,
\begin{equation}
\label{eq:entropy4}
A(n)=K\ln n
\end{equation}
(where $K>0$ due to condition 2) is a solution. But is that a unique solution? To see that, let us consider a continuum extension $A(x)$ of $A(n)$ which is at least twice differentiable. Then \eqref{eq:entropy3} can be written as the functional equation
\begin{equation}
\label{eq:entropy5}
A(x)+A(y)=A(xy)
\end{equation}
Differentiating \eqref{eq:entropy5} with respect to $x$, we obtain
\begin{equation}
\label{eq:entropy6}
A'(x)=yA'(xy)
\end{equation}
Differentiating \eqref{eq:entropy6} with respect to $y$, we obtain
\begin{equation}
\label{eq:entropy7}
0=A'(xy)+xyA^{\prime\prime}(xy)
\end{equation}
By introducing a new variable $z=xy$, \eqref{eq:entropy7} can be written as the second-order linear differential equation
$$zA^{\prime\prime}(z)+A'(z)=0$$
whose solution is
$$A(z)=K\ln(Cz)$$
That is, we have $A(n)=K\ln(Cn)$ where $K>0$ and $C>0$ are constants. Hence, $A(n)$ is unique up to constant multiples. Since entropy is a qualitative measurement of uncertainty rather than a quantitative one, we may choose $C=1$. For the same reason, we may choose $K=1$ as well but for now we will keep the constant multiple $K$. Substituting \eqref{eq:entropy4} into \eqref{eq:entropy2}, we finally obtain Shannon entropy
\begin{equation}
\label{eq:shannon}
\begin{aligned}
H(p_1,\cdots,p_n)&=K\ln\left(\sum_in_i\right)-K\sum_ip_i\ln n_i\\
&=-K\sum_ip_i\ln\left(\frac{n_i}{\sum_in_i}\right)\\
&=-K\sum_ip_i\ln p_i
\end{aligned}
\end{equation}

Definition. There is a generalization of Shannon entropy called the Rényi entropy
\begin{equation}
\label{eq:renyi}
H_r(p_1,\cdots,p_n)=\frac{1}{1-r}\ln\left(\sum_ip_i^r\right)
\end{equation}
for $0<r<\infty$ and $r\ne 1$.

Proposition 1. $$\lim_{r\to 1}H_r(p_1,\cdots,p_n)=-\sum_ip_i\ln p_i$$
i.e., Shannon entropy (with $K=1$) is the limit of the Rényi entropy as $r\to 1$.

Proof. Let $f(r)=\ln\left(\sum_ip_i^r\right)$. Then it is readily seen that the limit on the left is simply $f'(1)$.

Definition. Tsallis entropy is another generalization of Shannon entropy. It is defined by
\begin{equation}
\label{eq:tsallis}
S_r(p_1,\cdots,p_n)=\frac{1}{r-1}\left(1-\sum_ip_i^r\right)
\end{equation}

One can easily show that the limit of Tsallis entropy as $r\to 1$ is Shannon entropy (with $K=1$) similarly to the proof of proposition 1.

References:

  1. E. T. Jaynes, Information Theory and Statistical Mechanics, The Physical Review, Vol. 106, No. 4, 620-630, May 15, 1957
  2. Claude E. Shannon, A mathematical theory of communication, Bell Sys. Tech. Journal, 27: 379-423, 623-656, 1948