The Proof of the Chain Rule

In this note, we introduce two versions of the proof of the Chain Rule. The first one comes from [1]. Let $y=f(u)$ and $u=g(x)$ be differentiable functions. We claim that
$$\frac{dy}{dx}=f'(u)g'(x)$$
The finite difference $\frac{f(g(x+h))-f(g(x))}{h}$ can be written as $\frac{f(u+k)-f(u)}{h}$ where $k=g(x+h)-g(x)$. Define $\varphi(t)=\frac{f(u+t)-f(u)}{t}-f'(u)$ if $t\ne 0$. Multiplying by $t$ and rearranging terms, we obtain
\begin{equation}
\label{eq:chainpf}
f(u+t)-f(u)=t[\varphi(t)+f'(u)]
\end{equation}
$\lim_{t\to 0}\varphi(t)=0$ so we define $\varphi(0)=0$. Then \eqref{eq:chainpf} is defined for all $t$. Now replace $t$ in \eqref{eq:chainpf} by $k$.
\begin{equation}
\label{eq:chainpf2}
\frac{f(u+k)-f(u)}{h}=\frac{k}{h}[\varphi(k)+f'(u)]
\end{equation}
\eqref{eq:chainpf2} is valid even if $k=0$. When $h\to 0$, $\frac{k}{h}\to g'(x)$ and $\varphi(k)\to 0$. Hence the RHS of \eqref{eq:chainpf2} approaches $f'(u)g'(x)$. This completes the proof.

Another version of the proof of the Chain Rule is from [2] as a guided exercise (# 99 on page p. 559). Here we suppose that $y=f(u)$ is differentiable at $u_0=g(x_0)$ and $u=g(x)$ is differentiable at $x_0$. Then we claim that $y=f(g(x))$ is differentiable at $x=x_0$ and $$\left[\frac{dy}{dx}\right]_{x=x_0}=f'(u_0)g'(x_0)$$
Since $g'(x_0)$ exists, $\Delta u$ can be written as
$$\Delta u=g'(x_0)\Delta x+\rho(x)$$
where $\lim_{\Delta x\to 0}\frac{\rho(x)}{\Delta x}=0$. Similarly, if $\Delta u\ne 0$ (it could be 0), then $\Delta y$ can be written as
\begin{equation}
\label{eq:chainpf3}
\Delta y=f'(u_0)\Delta u+\sigma(u)
\end{equation}
where $\lim_{\Delta u\to 0}\frac{\sigma(u)}{\Delta u}=0$.
\begin{align*}
\Delta y&=f'(u_0)[g'(x_0)\Delta x+\rho(x)]+\sigma(g(x))\\
&=f'(u_0)g'(x_0)\Delta x+f'(u_0)\rho(x)+\sigma(x)
\end{align*}
As $\Delta u\to 0$, $\Delta y\to 0$ and accordingly $\sigma(u)\to 0$. So one can define $\sigma(U)=0$ if $\Delta u=0$ (that is one can define $\sigma(u_0)=\sigma(g(x_0))=0$). Then \eqref{eq:chainpf3} is still valid if $\Delta u=0$.
$$\frac{\sigma(g(x))}{\Delta x}=\left\{\begin{array}{ccc}
\frac{\sigma(g(x))}{\Delta u}\cdot\frac{\Delta u}{\Delta x} & \mbox{if} & \Delta u\ne 0\\
0 & \mbox{if} & \Delta u=0\end{array}\right.\to 0$$
as $\Delta x\to 0$. Therefore,
$$\frac{\Delta y}{\Delta x}=f'(u_0)g'(x_0)+f'(u_0)\frac{\rho(x)}{\Delta x}+\frac{\sigma(g(x))}{\Delta x}$$
approaches
$$\frac{dy}{dx}=f'(u_0)g'(x_0)$$
as $\Delta x\to 0$.

References:

[1] Tom M. Apostol, Calculus, Volume I One-Variable Calculus with an Introduction to Linear Algebra, 2nd Edition, John Wiley & Sons, Inc., 1967

[2] Jerrold Marsden and Alan Weinstein, Calculus II, Springer-Verlag, 1985

One-to-One Functions and Inverse Functions

A function $y=f(x)$ is said to be one-to-one if satisfies the property
$$f(x_1)=f(x_2) \Longrightarrow x_1=x_2$$
or equivalently
$$x_1\ne x_2 \Longrightarrow f(x_1)\ne f(x_2)$$
for all $x_1,x_2$ in the domain. In plain English what this says is no two numbers in the domain are corresponded to the same number in the range. Figure 1 is the graph of $f(x)=x^2$. It is not one-to-one.

Figure 1. The graph of y=x^2

For example, $-1\ne 1$ but $f(-1)=1=f(1)$.

Figure 2. The graph of y=x^3

Figure 2 is the graph of $f(x)=x^3$. It is one-to-one as seen clearly from the graph. But let us pretend that we don’t know the graph but want to prove that it is one-to-one following the definition. Here we go. Suppose that $f(x_1)=f(x_2)$. Then $x_1^3=x_2^3$ or $x_1^3-x_2^3=(x_1-x_2)(x_1^2+x_1x_2+x_2^2)=0$. This means $x_1=x_2$ which completes the proof.

Why do we care about one-to-one functions? The reason is that if $y=f(x)$ is one-to-one, it has an inverse function $y=f^{-1}(x)$.
\begin{align*}
x&\stackrel{f}{\longrightarrow} y\\
x&\stackrel{f^{-1}}{\longleftarrow} y
\end{align*}
Given a one-to-one function $y=f(x)$, here is how to find its inverse function $y=f^{-1}(x)$

STEP 1. Swap $x$ and $y$ in $y=f(x)$. The reason we are doing this is that $\mathrm{Dom}(f)=\mathrm{Range}(f^{-1})$ and $\mathrm{Dom}(f^{-1})=\mathrm{Range}(f)$.

STEP 2. Solve the resulting expression $x=f(y)$ for $y$. That is the inverse function $y=f^{-1}(x)$.

Example. Find the inverse function of $f(x)=\frac{2x+3}{x-1}$. (It is a one-to-one function.)

Solution. STEP 1. Let $y=\frac{2x+3}{x-1}$ and swap $x$ and $y$. Then we have
$$x=\frac{2y+3}{y-1}$$

STEP 2. Let us solve $x=\frac{2y+3}{y-1}$ for $y$. First multiply $x=\frac{2y+3}{y-1}$ by $y-1$. Then we have $x(y-1)=2y+3$ or $xy-x=2y+3$. Isolating the terms that contain $y$ in the LHS, we get $xy-2y=x+3$ or $(x-2)y=x+3$. Finally we find $y=\frac{x+3}{x-2}$. This is the inverse function.

$y=f(x)$ and its inverse $y=f^{-1}(x)$ satisfy the following properties.
$$(f\circ f^{-1})(x)=x,\ (f^{-1}\circ f)(x)=x$$
The reason for these properties to hold is clear from the definition of an inverse function. We can check the properties using the above example. I will do $(f\circ f^{-1})(x)=x$ and leave the other for an exercise.
\begin{align*}
(f\circ f^{-1})(x)&=f(f^{-1}(x))\\
&=f\left(\frac{x+3}{x-2}\right)\\
&=\frac{2\left(\frac{x+3}{x-2}\right)+3}{\left(\frac{x+3}{x-2}\right)-1}\\
&=x
\end{align*}
The graph of $y=f(x)$ and the graph of its inverse $y=f^{-1}(x)$ satisfy a nice symmetry, namely they are symmetric about the line $y=x$. This symmetry helps us obtain the graph of $y=f^{-1}(x)$ when the explicit expression for $f^{-1}(x)$ is not available. You will see such a case later when you study the logarithmic functions. Figure 3 shows the symmetry with $y=x^2$ ($x\geq 0$) and its inverse $y=\sqrt(x)$.

Figure 3. The symmetry between the graphs of y=x^2 (red) and y=sqrt(x) (blue) about y=x (green)

Combining Functions

It’s quite interesting that functions can be treated like numbers, namely you can define $+$, $-$, $\times$, and $\div$ on a collection of functions. How do we do this? For instance given two functions $f$ and $g$, we can define a new function $f+g$ by
$$(f+g)(x)=f(x)+g(x)$$
for all $x$ in the domain (for sake of simplicity we assume that $f$ and $g$ both have the same domain. If not, one can take the intersection of the domains of $f$ and $g$, no big deal). In a similar manner, we can also define $f-g$, $fg$, and $\frac{f}{g}$ respectively as
\begin{align*}
(f-g)(x)&=f(x)-g(x)\\
(fg)(x)&=f(x)g(x)\\
\left(\frac{f}{g}\right)(x)&=\frac{f(x)}{g(x)}\ \mbox{provided}\ g(x)\ne 0
\end{align*}

Example. Let $f(x)=\frac{1}{x-2}$ and $g(x)=\sqrt{x}$.

(a) Find the functions $f+g$, $f-g$, $fg$ and $\frac{f}{g}$ and their domains.

(b) Find $(f+g)(4)$, $(f-g)(4)$, $(fg)(4)$, $\left(\frac{f}{g}\right)(4)$

Solution. (a) $\mathrm{Dom}(f)=\{x|x\ne 2\}$ and $\mathrm{Dom}(g)=\{x|x\geq 0\}$. So the intersection is $\{x|0\leq x<2\}\cup\{x|x>2\}=[0,2)\cup(2,\infty)$ and this is the domain of $f+g$, $f-g$ and $fg$. For $\frac{f}{g}$ since $g$ is not defined at $x=0$, its domain should be $(0,2)\cup(2,\infty)$.
\begin{align*}
(f+g)(x)&=\frac{1}{x-2}+\sqrt{x}\\
(f-g)(x)&=\frac{1}{x-2}-\sqrt{x}\\
(fg)(x)&=\frac{\sqrt{x}}{x-2}\\
\left(\frac{f}{g}\right)(x)&=\frac{1}{(x-2)\sqrt{x}}
\end{align*}

(b) I will do only $(f+g)(4)$. One way to evaluate $(f+g)(4)$ is to use $(f+g)(x)$ we obtained in part (a) i.e. $(f+g)(4)=\frac{1}{4-2}+\sqrt{4}=\frac{5}{2}$. Another way is evaluating $f(4)$ and $g(4)$ first which are $f(4)=\frac{1}{2}$ and $g(4)=2$. Then $(f+g)(4)=f(4)+g(4)=\frac{5}{2}$.

Composite Functions

Given two functions $f$ and $g$, if the range of $f$ is a subset of the domain of $g$, then we can combine the two functions to create a new function which we will denote by $g\circ f$.
$$x\stackrel{f}{\longmapsto} f(x)\stackrel{g}{\longmapsto} g(f(x))$$
The above diagram hints us that we can define a new function $g\circ f$ by
$$(g\circ f)(x)=g(f(x))$$
We call $g\circ f$ “$f$ followed by $g$.”

Example. Let $f(x)=x^2$ and $g(x)=x-3$.

(a) Find composite functions $f\circ g$ and $g\circ f$ and their domains.

(b) Find $(f\circ g)(5)$ and $(g\circ f)(5)$.

Solution. (a) By definition $(f\circ g)(x)=f(g(x))=f(x-3)=(x-3)^2$. Also by definition $(g\circ f)(x)=g(f(x))=g(x^2)=x^2-3$. From these we can clearly see both their domains are $(-\infty,\infty)$. In general $\mathrm{Dom}(f\circ g)=\mathrm{Dom}(g)$ and $\mathrm{Dom}(g\circ f)=\mathrm{Dom}(f)$.

(b) $(f\circ g)(5)$ can be evaluated using $(f\circ g)(x)$ we obtained in part (a).
$$(f\circ g)(5)=(5-3)^2=4$$
There is another way to do this. If you don’t have to find $(f\circ g)(x)$ but only need to calculate $(f\circ g)(5)$, this may be simpler. First note $(f\circ g)(5)=f(g(5))$. $g(5)=5-3=2$, so $f(g(5))=f(2)=2^2=4$. Similarly we find $(g\circ f)(5)=22$. In general $(f\circ g)(x)\ne (g\circ f)(x)$.

Time Dilation and Time Travel

In this note, we discuss one of the relativistic effects called Time Dilation namely a clock that is moving relative to an observer will be measured to tick slower than a clock that is at rest in the observer’s reference frame. This is pretty intriguing for those who are familiar with Newtonian notion of time as being a universal parameter for motions. Let us do a thought experiment. Let us consider a frame $K$ at rest and suppose that a light ray is emitted by the light source $Q$ and after reflection by the mirror $S$ is received at $E$. See Figure 1.

Figure 1. Time Dilation

The measured time interval in the frame $K$ is $\Delta t=t_2-t_1=\frac{2l}{c}$. Now consider a frame $K’$ moving at a constant speed $v$ to the right. An observer at rest in $K’$ sees the light ray emerging from $Q$, hitting the mirror (at rest in $K$) at $M$ and reaching the $x’$-axis again at $E$. The observer measures a longer time interval as the light has to travel a longer path to reach the receiver but the speed of light is remained the same according to Einstein’s postulate. How much longer? The time $\Delta t’$ measured by an observer at rest in the frame $K’$ can be easily calculated using the Pythagorean law applied to the isosceles triangle seen in Figure 1. We find
$$\left(\frac{c\Delta t’}{2}\right)^2=l^2+\left(\frac{v\Delta t’}{2}\right)$$
Solving this for $\Delta t’$ we find
\begin{equation}
\label{eq:timedilation}
\Delta t’=\frac{\Delta t}{\sqrt{1-\frac{v^2}{c^2}}}
\end{equation}
Note that \eqref{eq:timedilation} amounts to the Lorentz transformation into the system $K’$
$$\Delta t’=t_2′-t_1′,$$
where
$$t_i’=\frac{t_i-\frac{v}{c^2}x_i}{\sqrt{1-\frac{v^2}{c^2}}},\ i=1,2$$
Since $x_1=x_2$, we obtain \eqref{eq:timedilation}. In case this whole frame thing is confusing, let us imagine that you are sitting in a train that is running at a constant speed. Since there is no acceleration, you do not feel that you are moving. So inside the train you are at rest (frame $K$). For an observer outside you are moving (frame $K’$) and the observer would measure the time ($\Delta t’$) on your clock ticking slower than what you would measure it ($\Delta t$). In physics $\Delta t$ is called proper time. Simply speaking proper time is the time measured by a clock that is moving along with inertial frame. Mathematically, proper time can be calculated from the arc length $ds^2$ of a worldline, the trajectory of a moving particle or an object in spacetime. Denote by $\tau$ the proper time. Then since the worldline is timelike (meaning leaning more toward time), $ds^2=-c^2d\tau^2$. So the proper time interval is given by
\begin{equation}
\begin{aligned}
\Delta\tau&=\frac{1}{c}\int\sqrt{-ds^2}\\
&=\frac{1}{c}\int\sqrt{c^2dt^2-dx^2-dy^2-dz^2}\\
&=\int\sqrt{1-\frac{v(t)^2}{c^2}}dt
\end{aligned}\label{eq:propertime}
\end{equation}
If $v(t)$ is constant speed $v$, \eqref{eq:propertime} becomes \eqref{eq:timedilation}.
The time dilation effect in \eqref{eq:timedilation} hints us that a time travel to the future may be possible. Here is how. The exoplanet Proxima b is interesting because it is orbiting within the habitable zone of the red dwarf star Proxima Centauri which is a part of triple star system Alpha Centauri in the Constellation of Centaurus, and also because it is relatively close to our world. It is located about 4.2 light-years or 40 trillion km from Earth. In fact, it is the closest known exoplanet to the Solar System.

Artist’s depiction of Proxima b

Let us say we are sending a manned spaceship to Proxima b. Also let us assume that the spaceship can travel at 90% of the speed of light. (It is actually impossible to achieve this due to a physical limitation. I will discuss this in my other note at a later time. In reality, the best we can achieve using nuclear propulsion is about 0.067% of the speed of light.) For people on Earth it would take $\Delta t’=\frac{4\times 10^{13}\mbox{km}}{2.7\times 10^5\mbox{km/sec}}=1.\overline{481}\times 10^8\mbox{sec}$ for the spaceship to get to Proxima b. Since $1\mbox{sec}=3.17\times 10^{-8}\mbox{years}$, it is 4.7 years. Since it would take the same time from Proxima b to Earth, the overall travel time for people on Earth is 9.4 years. In reality, we will have to take some factors into consideration: it takes time for the spaceship to accelerate to reach 90% of the speed of light, once the spaceship is near Proxima b it will have to slow down for stopping or U-turning, etc. But for the sake of simplicity we will disregard those factors. For the crew memebers it took only
\begin{align*}
\Delta t&=\sqrt{1-\frac{v^2}{c^2}}\Delta t’\\
&=\sqrt{1-(0.9)^2}\cdot 1.\overline{481}\times 10^8\mbox{sec}\\
&\approx 0.65\times 10^8\mbox{sec}\\
&\approx 2\mbox{yrs}
\end{align*}
to get to Proxima b. So when they come back home, it’s like they traveled more then 5 years forward in time. I know it is not what you probably think and yes I admit that this is a kind of boring time travel. Can one travel backward in time? This is one of the most intriguing questions. I will come back to this question at another time.

I will finish this note with an example as an application of \eqref{eq:timedilation}. This example was taken from [1].

Example. Muon Decay

The Earth is surrounded by an atmosphere of about 30 km thickness screening us off from cosmic radiation. If a proton from the consmic radiation hits the atmosphere, $\pi$-mesons are produced and several of them decay further into a muon and a neutrino. The muon has a mean lifetime of $\Delta t=2\times 10^{-6}\mbox{sec}$ in its rest system. Classically it would travel even with the speed of light (only massless particles can travel at the speed of light)
\begin{align*}
s&=c\Delta t\\
&=3\times 10^5\mbox{km/sec}\cdot 2\times 10^{-6}\mbox{sec}\\
&=0.6\mbox{km}
\end{align*}
or 600m. If this were true, muon particles would never reach the surface, but they are detected on the surface. In the relativistic approach,
$$s’=v\Delta t’=\frac{v\Delta t}{\sqrt{1-\frac{v^2}{c^2}}}$$
Muons at rest have a mass of $m_0c^2=10^8$eV (I know it is actually energy but due to mass-energy equivalence physicists customarily call it mass.) The cosmic muons are created at an altitude of about 10km with a total energy of $E=5\times 10^9$eV. In order to apply this information we rewite $S’$ as
\begin{align*}
S’&=\frac{vm_0c^2}{m_0c^2\sqrt{1-\frac{v^2}{c^2}}}\Delta t\\
&=\frac{v}{m_0c^2}E\Delta t\\
&\leq\frac{c}{m_0c^2}E\Delta t\\
&=\frac{3\times 10^5\mbox{km/sec}}{10^8\mbox{eV}}\cdot 5\times 10^9\mbox{eV}\cdot 2\times 10^{-6}\mbox{sec}\\
&=30\mbox{km}
\end{align*}
Here we used $E=mc^2=\frac{m_0c^2}{\sqrt{1-\frac{v^2}{c^2}}}$. We will discuss this later in another post. The actual measurement gives a value of 38km.

References:

[1] Walter Greiner, Classical Mechanics, Point Particles and Relativity, Springer, 2004

[2] Paul A. Tipler and Ralph A. Llewellyn, Modern Physics, 5th Edition, W. H. Freeman and Company, 2008

Heisenberg Relation

This morning I saw a seemingly random tweet from Sam Walters @SamuelGWalters, a mathematician at the University of Northern British Columbia.

I have no clue as to any motivation behind the tweet but the mathematical statement in it is interesting. It’s proof is pretty easy though. Before we prove the statement, what he referred to as Heisenberg relation (also called Heisenberg commutator) is originated from quantum mechanics where the relation exhibits noncommutativity of the position and the momentum operators $\hat x$ and $\hat p$ as
$$[\hat x,\hat p]=\hat x\hat p-\hat p\hat x=i\hbar$$
in contrast to the classical case ($\hbar\to 0$) where the position and the momentum commute.

Let $\alpha$ and $\beta$ are scalars and $x,y,z$ be vectors (as members of a module or of a Lie algebra depending on the context). Then it is straightforward to show that
$$[\alpha x+\beta y,z]=\alpha[x,z]+\beta[y,z]$$
i.e. the commutator is linear in the first slot. It is also linear in the second slot. Hence the commutator is bilinear. Therefore, it suffices to show that
$$[x^ny,yx^n]=nx^{n-1}$$
for all integers $n\geq 0$. We prove this by induction. It is trivial for $n=0$ and $n=1$. Let $n=2$. Then
\begin{align*}
[x^2y,yx^2]&=x^2y-yx^2\\
&=x^2y-(yx)x\\
&=x^2y-(xy-1)x\\
&=x^2y-xyx+x\\
&=x(xy-yx)+x\\
&=2x
\end{align*}
Now we assume that the statement is true for $n=k$ i.e.
$$[x^ky,yx^k]=kx^{k-1}$$
For $n=k+1$,
\begin{align*}
[x^{k+1}y,yx^{k+1}]&=x^{k+1}y-yx^{k+1}\\
&=x^{k+1}y-(yx)x^k\\
&=x^{k+1}y-(xy-1)x^k\\
&=x^{k+1}y-xyx^k+x^k\\
&=x(x^ky-yx^k)+x^k\\
&=x(kx^{k-1})+x^k\\
&=(k+1)x^k
\end{align*}
This completes the proof.