Solving a Functional Equation $x^y=y^x$

Here is another math problem proposed by Sam Walters, one of my favorite mathematicians on Twitter.

Sam used a trigger word for me “isn’t hard.” I took it as being easy enough so any undergraduate math student can solve (turns out it actually is) which means I should be able to solve it in no time. I spent some hours to solve the functional equation $x^y=y^x$ but I was still stuck with my wounded ego until I saw a hint from another mathematician Rob Corless in his reply to the above tweet. The answer lies in Lambert W function! It is shame but I didn’t know Lambert W function though I have seen it. The function $f(x)=xe^x$ is injective (one-to-one) so it is invertible but one cannot explicitly write it’s inverse function so we denote it by $W(xe^x)$ i.e. $x=f^{-1}(xe^x)=W(xe^x)$. This $W$ is called Lambert W function. First let us take natural logarithm of the equation $x^y=y^x$. With some rearrangements, we arrive at \begin{equation}\label{eq:funeq}\frac{\ln y}{y}=\frac{\ln x}{x}\end{equation} Equation \eqref{eq:funeq} is well defined by the conditions $x,y>1$. Clearly $y=x$ is a solution. Now we want to find a less trivial solution. Let us introduce a new variable $u$ which satisfies $y=\frac{1}{u}$. Then \eqref{eq:funeq} is written in terms of $u$ as \begin{equation}\label{eq:funeq2}u\ln u=-\frac{\ln x}{x}\end{equation} Yet we introduce another variable $v$ which satisfies $u=e^v$. In terms of $v$, \eqref{eq:funeq2} is written as $f(v)=ve^v=-\frac{\ln x}{x}$ and hence $v=W\left(-\frac{\ln x}{x}\right)$ i.e. \begin{equation}\label{eq:funeq3}y=-\frac{x}{\ln x}W\left(-\frac{\ln x}{x}\right)\end{equation} The equation $v=W\left(-\frac{\ln x}{x}\right)$ above is a useful identity itself for Lambert W function \begin{equation}\label{eq:funeq4}W\left(-\frac{\ln x}{x}\right)=-\ln x\end{equation} From \eqref{eq:funeq4} we can get some special values of $W$ for example $W(0)=0$ and $W\left(-\frac{1}{e}\right)=-1$. The graphs of $y=x$ and $y=-\frac{x}{\ln x}W\left(-\frac{\ln x}{x}\right)$ can be seen in Figure 1.

Figure 1. The graphs of y=x (in red) and y=-xW(-ln(x)/x)/ln(x) (in blue)

It appears that $y=x$ and $y=-\frac{x}{\ln x}W\left(-\frac{\ln x}{x}\right)$ coincide on $(1,e)$. $y=-\frac{x}{\ln x}W\left(-\frac{\ln x}{x}\right)$ has a kink at $x=e$ and is decreasing on $(e,\infty)$. Differentiating \eqref{eq:funeq} with respect to $x$ results in \begin{equation}\label{eq:funeq5}\frac{1-\ln y}{y^2}\frac{dy}{dx}=\frac{1-\ln x}{x^2}\end{equation} Let $y=f(x)$. Recall $\frac{df^{-1}(x)}{dx}=\frac{1}{\frac{df(x)}{dx}}$. But $f(x)$ is an involution i.e. $f^{-1}=f$ and so $\frac{df(x)}{dx}=\pm 1$. By \eqref{eq:funeq5} with $f(x)$ being an involution, we see that $y=f(x)$ is an increasing function on $(1,e)$ thus $\frac{df(x)}{dx}=1$ and with $f(e)=e$, $f(x)=x$ on $(1,e)$ as we have speculated from Figure 1. $y=-\frac{x}{\ln x}W\left(-\frac{\ln x}{x}\right)$ parts ways with $y=x$ at $x=e$ and it becomes decreasing on $(e,\infty)$. As for Sam’s last question, first rewrite \eqref{eq:funeq} as $\ln y=y\frac{\ln x}{x}$. Since $y$ is decreasing on $(e,\infty)$ and is bounded below by 1, $\lim_{x\to\infty}y$ must exist, so $\lim_{x\to\infty}\ln y=0$. (This limit is obtained with $\lim_{x\to\infty}\frac{\ln x}{x}=0$.) Since $\lim_{x\to\infty}\ln y=\ln(\lim_{x\to\infty}y)$, $\lim_{x\to\infty}y=1$.

Update:

  1. I realized that I was sloppy in the definition of $W$. In general $W$ is defined as the collection of the branches of $f(z)=ze^z$ where $z$ is the complex variable $z=x+iy$. $f(z)=ze^z$ is not injective so $W$ is multivalued. I considered the real version which is the inverse of $f(x)=xe^x$ here and I said it is injective. The reason is that I was restricting its domain to $x\geq 0$ though I didn’t mention it. But $f(x)=xe^x$ is in general not injective as seen in  Figure 2.

    Figure 2. The graph of f(x)=xe^x.

    One can also easily show that it is not injective without a graph. $f'(x)=e^x(x+1)$ so $x=-1$ is a critical point. $f'(x)<0$ on $(-\infty,-1)$ i.e. $f(x)$ is decreasing on $(-\infty,-1)$ and $f'(x)>0$ on $(-1,\infty)$ i.e. $f(x)$ is increasing on $(-1,\infty)$. Hence $W$ is still multivalued. The upper branch $W\geq -1$ is denoted $W_0$ and is defined to be the principal branch of $W$ and the lower branch $W\leq -1$ is denoted by $W_{-1}$. Now one can easily see why there is a kink for $y=-\frac{x}{\ln x}W\left(-\frac{\ln x}{x}\right)$ at $x=e$. Because that is where $W$ is failed to be single-valued. In fact, $y=-\frac{x}{\ln x}W_{-1}\left(-\frac{\ln x}{x}\right)$ has no kink as seen in the nice graphics made by Greg Egan here. Also see Figure 3 for the graph of $y=-\frac{x}{\ln x}W_{-1}\left(-\frac{\ln x}{x}\right)$ alone.

    Figure 3. The graphs of y=x (in red) and y=-xW_{-1}(pointed out by Sam-ln(x)/x)/ln(x) (in blue)

    By the way, in case you don’t know, Greg is a well-known science fiction writer and his novels have lots of interesting math and physics stuff. For information on his novels visit his website. There is so much interesting stuff to learn about Lambert W function. To learn more about it, for starter, see its Wikipedia entry and also a survey paper on Lambert W function here. Note that one of the authors is Rob Corless. I thought he was just a knowledgeable passerby who threw a hint at others but it turns out he is an expert of Lambert W function.

  2.  By substituting $z_0=ze^z$ in $z=W(ze^z)$ we obtain $z_0=W(z_0)e^{W(z_0)}$ for any complex number $z_0$. This can be used as the defining equation for Lambert $W$ function.

Update:

  1. After reading Maple information on the command LambertW, I realized that Figure 1 is actually the graph of the principal branch $W_0$.
  2. The substitutions $u$ and $v$ I used to find the solution of $x^y=y^x$ can be combined into one as pointed out by Sam. Let $v$ be a variable satisfying $y=e^{-v}$. Then \eqref{eq:funeq} turns into $ve^v=-\frac{\ln x}{x}$ as before. In order for $ve^v$ to be injective we require that $-1<v<\infty$ or equivalently $0<y<\frac{1}{e}$.

Update: Sam twitted that the equation $m^n=n^m$ where $m, n$ are integers with $1<m<n$ has only one solution $m=2,n=4$. This can be easily seen from Figure 1. Can we see that without a graph? Yes we can.  Since $m\ne n$ and $\frac{\ln m}{m}=\frac{\ln n}{n}$ then, it must be that $m,n\geq 3$ and that $n$ needs to be of the form $k^r$ where both $k,r$ are integers. The first such $n$ is $n=4$ and $\frac{\ln m}{m}=\frac{\ln 4}{4}=\frac{\ln 2}{2}$, thus $m=2$. Since $y$ is decreasing, so is $m$ and hence we see that $(m,n)=(2,4)$ is the only solution to $m^n=n^m$.

Topologizing a Set by Continuous Functions

Let $f: X\longrightarrow Y$ be a function. The continuity of $f$ is then determined by the topologies on $X$ and $Y$. To be precise, $f: X\longrightarrow Y$ is continuous on $X$ if for every open set $U\subset Y$ $f^{-1}(U)$ is open in $X$. But if you have a clear idea of what class of functions, say $\{f_\alpha:X\longrightarrow Y|\alpha\in\mathscr{A}\}$, should be continuous, you can also define a topology on $X$ that makes $f_\alpha :X\longrightarrow Y$ continuous for all $\alpha\in\mathscr{A}$ as Sam Walters mentioned on his tweet. Here is how. Let $$\mathscr{S}=\{f_{\alpha}^{-1}(U)| \mbox{$U$ is open in $Y$}, \alpha\in\mathscr{A}\}$$
Since $\bigcup\{G|G\in\mathscr{S}\}=X$, $\mathscr{S}$ is a subbase for a topology in $X$. (Note: $\mathscr{S}$ is not a base for a topology in $X$ unless $\mathscr{A}$ is a singleton set.) Denote by $\tau(\mathscr{S})$ the topology generated by the subbase $\mathscr{S}$. Then $\tau(\mathscr{S})$ is the smallest topology on $X$ that makes $f_\alpha :X\longrightarrow Y$ continuous for all $\alpha\in\mathscr{A}$. The reason is any topology on $X$ that makes $f_\alpha :X\longrightarrow Y$ continuous for all $\alpha\in\mathscr{A}$ would include $\mathscr{S}$. We have in fact seen a similar idea which is Tychonoff product topology. Consider the cartesian product
$$\prod_{\alpha\in\mathscr{A}}X_\alpha=\{c:\mathscr{A}\longrightarrow\bigcup_{\alpha\in\mathscr{A}}X_\alpha\}$$
of topological spaces $(X_\alpha,\tau_\alpha)$, $\alpha\in\mathscr{A}$. The topology we want is the one that makes the projection maps $\pi_\alpha :\prod_{\alpha\in\mathscr{A}}X_\alpha\longrightarrow X_\alpha$ for all $\alpha\in\mathscr{A}$ continuous in particular the smallest one. Let $$\mathscr{S}=\{\pi_\alpha^{-1}(U_\alpha)|U_\alpha\in\tau_\alpha,\forall\alpha\in\mathscr{A}\}$$
Then $\mathscr{S}$ is a subbase for a topology called the Tychonoff product topology. This is the smallest topology that makes the projection maps continuous. The projection maps are also open.

Given a surjective function $p: X\longrightarrow Y$ from a topological space $X$ onto a set $Y$, we can also define a topology on $Y$ that makes $p$ continuous: $U\subset Y$ is open if and only if $p^{-1}(U)$ is open in $X$. This topology on $Y$ automatically makes $p$ continuous. Moreover it is the largest topology on $Y$ that makes $p$ continuous. This topology is called the identification topology. The identification topology can be used to get a new topological space (identification space) from an old one. More specifically, it can be defined on a partition of $X$ or equivalently a quotient set of $X$ modulo an equivalence relation and it makes the canonical projection $\pi$ continuous. For example, let $X$ be the unit square $[0,1]\times[0,1]$ in $\mathbb{E}^2$ with the subspace topology. Partition $X$ into the following subsets:

  1. $\{(0,0),(1,0),(0,1),(1,1)\}$ i.e. the set of four corner points.
  2. $\{(x,0),(x,1)|0<x<1\}$
  3. $\{(0,y),(1,y)|0<y<1\}$
  4. $\{(x,y)\}$ where $0<x<1$, $0<y<1$.

The resulting identification space is the torus which is homeomorphic to $S^1\times S^1$ as shown in Figure 1.

Figure 1. Clifford Torus S^1 x S^1

References:

  1. M. A. Armstrong, Basic Topology, Springer-Verlag, 1983
  2. Benjamin T. Sims, Fundamentals of Topology, Collier Macmillan

Kepler’s Law

In his recent tweet, Sam obtained Kepler’s (second) law simply by using polar coordinates, integrals and conservation law of angular momentum. In this note I discuss basic physics about conservation law of angular momentum and Kepler’s second law as its consequence.

What is angular momentum?

Let $r$ be a vector from a fixed point (called the pivot).

Then the angular momentum is given by $$L=r\times p$$ where $p=mv$ is the linear momentum of the mass $m$. \begin{align*}\frac{dL}{dt}&=\frac{d}{dt}(r\times mv)\\&=\frac{dr}{dt}\times mv+r\times\frac{d(mv)}{dt}\\&=v\times mv+r\times\frac{dp}{dt}\\&=r\times\frac{dp}{dt}\end{align*} since $v\times mv=0$. That is, $\frac{dL}{dt}=r\times F$ and this is called torque. If torque $r\times F=0$ then $L$ is constant. This is conservation law of angular momentum. $r\times F=0$ if and only if $r$ and $F$ are parallel or antiparallel except for the trivial cases $r=0$ or $F=0$. A force that acts exclusively parallel or antiparallel to the position vector is called a central force. That is to say, central forces obey conservation law of angular momentum.

Conservation law of angular momentum implies Kepler’s second law

The area $dA$ spanned by $r$ and $dr$ is $$dA=\frac{1}{2}|r\times dr|$$

Figure 2. The area dA spanned by r and dr.

\begin{align*}\frac{dA}{dt}&=\frac{1}{2}|r\times v|\\&=\frac{1}{2m}|r\times mv|\\&=\frac{1}{2m}|L|\end{align*} $\frac{dA}{dt}$ is the area velocity of the radial vector $r$. It measures how fast area is covered per unit time. For the planetary motion gravitational force is a central force so $L$ is constant which means $\frac{dA}{dt}$ is constant. Hence conservation law of angular momentum implies the second Kepler law: The radial vector $r$ of a planet sweeps equal areas in equal time.

References:

Walter Greiner, Classical Mechanics, Point Particles and Relativity, Springer-Verlag, 2004

A Linear Algebra Problem on Twitter

Problem: Let $A$ and $B$ be $n\times n$ matrices such that their sum $A+B$ is invertible. Then show that $$A(A+B)^{-1}B=B(A+B)^{-1}A$$ (Hat tip: Sam Walters)

Solution. \begin{equation}\begin{aligned}I&=(A+B)(A+B)^{-1}\\&=A(A+B)^{-1}+B(A+B)^{-1}\end{aligned}\label{eq:matrix}\end{equation} Multiply \eqref{eq:matrix} by $B$ from the right \begin{equation}\label{eq:matrix2}B=A(A+B)^{-1}B+B(A+B)^{-1}B\end{equation} Also multiply \eqref{eq:matrix} by $A$ from the left \begin{equation}\label{eq:matrix3}A=A(A+B)^{-1}A+B(A+B)^{-1}A\end{equation} Subtract \eqref{eq:matrix3} from \eqref{eq:matrix2}. \begin{equation}\label{eq:matrix4}B-A=A(A+B)^{-1}B-B(A+B)^{-1}A+B(A+B)^{-1}B-A(A+B)^{-1}A\end{equation} In a similar manner from $I=(A+B)^{-1}(A+B)$, we obtain \begin{equation}\label{eq:matrix5}A-B=A(A+B)^{-1}B-B(A+B)^{-1}A+A(A+B)^{-1}A-B(A+B)^{-1}B\end{equation} \eqref{eq:matrix4}+\eqref{eq:matrix5} results $$A(A+B)^{-1}B=B(A+B)^{-1}A$$

A mathematician who Twitter username is Manifoldless beat me to it by a few minutes :). But not just that. His solution is shorter (so better) than mine: \begin{align*}A(A+B)^{-1}B&=(A+B-B)(A+B)^{-1}(A+B-A)\\&=[I-B(A+B)^{-1}](A+B-A)\\&=A+B-A-B+B(A+B)^{-1}A\\&=B(A+B)^{-1}A\end{align*}

An Algebra Problem on Twitter

Problem: Given $x,y\geq 0$ satisfying \begin{equation}\label{eq:ellipse}x+y+\sqrt{2x^2+2xy+3y^2}=4\end{equation} prove $x^2y<4$. (Hat tip: Sam Walters)

Solution. First rewrite \eqref{eq:ellipse} as \begin{equation}\label{eq:ellipse2}\sqrt{2x^2+2xy+3y^2}=4-x-y\end{equation} Squaring \eqref{eq:ellipse} we obtain an equation of ellipse \begin{equation}\label{eq:ellipse3}(x+4)^2+2(y+2)^2=40\end{equation} (Figire 1)

Figure 1

 

Graphically we see that the inequality holds as shown in Figure 2.

Figure 2. Ellipse (x+4)^2+2(y+2)^2=40, x=0..-4+sqrt(40) (red) and y=4/x^2 (blue)

Suppose $x>0$ (for $x=0$ the inequality $x^2y<4$ is trivial). Since $x>0,y>0$ then, $$x^2y<4\Longleftrightarrow (y+2)^2<\left(\frac{4}{x^2}+2\right)^2$$ Solve \eqref{eq:ellipse3} for $(y+2)^2$. \begin{equation}\label{eq:ellipse4}(y+2)^2=20-\frac{(x+4)^2}{2}\end{equation} Now subtract $\left(\frac{4}{x^2}+2\right)^2$ from the RHS of \eqref{eq:ellipse4}. $$20-\frac{(x+4)^2}{2}-\left(\frac{4}{x^2}+2\right)^2=\frac{-x^6-8x^5+16x^4-32x^2-32}{2x^4}<0$$ since $-x^6-8x^5+16x^4-32x^2-32<0$ for $0<x<-4+\sqrt{40}$ as shown in Figure 3.

Figure 3. The graph of f(x)=-x^6-8x^5+16x^4-32x^2-32, x=0..-4+sqrt(40)

Update: Republic of Math graphically came up with a sharper inequality $x^2y<1$. The graphics can be seen here. As you can see in the graphics, there is still room for even (slightly) more sharp inequality. In fact $x^2y<0.9$ as you can see in Figure 4 below.

Figure 4. Ellipse (x+4)^2+2(y+2)^2=40, x=0..-4+sqrt(40) (red) and y=0.9/x^2 (blue)

Update: While I could not analytically find the smallest value of $a>0$ such that $x^2y<a$, I found graphically that $a$ can be as small as $0.789$. Figure 5 and Figure 6 are the graphs of $f(x)=-x^6-8x^5+16x^4-8ax^2-2a^2$ for $0\leq x\leq -4+\sqrt{40}$ for $a=0.789$ and for $a=0.788$, respectively. For $a=0.789$, $f(x)<0$ on $[0,-4+\sqrt{40}]$.

Figure 5. The graph of f(x)=-x^6-8x^5+16x^4-8ax^2-2a^2 (for a=0.789), x=0..-4+sqrt(40)

But with $a=0.788$ $f(x)$ is no longer negative for all $x$ in $[0,-4+\sqrt{40}]$.

Figure 5. The graph of f(x)=-x^6-8x^5+16x^4-8ax^2-2a^2 (for a=0.788), x=0..-4+sqrt(40)