Group Theory 4: Examples of Groups

In the first lecture here, we defined a group. We have also seen an example of a group here, which is a symmetric group. In this lecture, we study some well-known examples of groups of finite and infinite orders. Recall that the order of a group os the number of elements in the group.

Examples of Groups

  1. $(\mathbb{Z},+)$, $(\mathbb{Q},+)$, $(\mathbb{R},+)$, and $(\mathbb{C},+)$ are abelian groups of infinite order.
  2. $(\mathbb{Q}\setminus\{0\},\cdot)$ is an abelian group of infinite order.
  3. $(\mathbb{R}^+,\cdot)$ is an abelian group of infinite order. Here $\mathbb{R}^+$ denotes the set of all positive real numbers.
  4. Let $E_n=\{e^{\frac{2k\pi i}{n}}: k=0,1,2,\cdots,n-1\}$. $e^{\frac{2k\pi i}{n}}$, $k=0,1,2,\cdots,n-1$ are the $n$-th roots of unity i.e. the zeros of $z^n=1$. $E_n$ forms an abelian group of order $n$. It is generated by a single element $e^{\frac{2\pi i}{n}}$. Such a group is called a cyclic group.

Definition. A group $G$ is said to be a finite group if it has a finite number of elements. The number of elements in $G$ is called the order of $G$ as mentioned previously and it is denoted by $|G|$ or $\mathrm{ord}(G)$. We use the notation $|G|$ for the order of the group $G$.

Example.Let $\mathbb{Z}_n=\{0,1,2,\cdots,n-1\}$. $\mathbb{Z}_n$ is the set of all possible remainders when an integer is divided by 2. The addition $+$ on $\mathbb{Z}_n$ is defined naturally as follows.
$$\begin{aligned}
\mathbb{Z}_2\\
\begin{array}{|c||c|c|}
\hline
+ & 0 & 1\\
\hline
\hline
0 & 0 & 1\\
\hline
1 & 1 & 0\\
\hline
\end{array}
\end{aligned}$$
$$\begin{aligned}
\mathbb{Z}_3\\
\begin{array}{|c||c|c|c|}
\hline
+ & 0 & 1 & 2\\
\hline
\hline
0 & 0 & 1 & 2\\
\hline
1 & 1 & 2 & 0\\
\hline
2 & 0 & 1 & 2\\
\hline
\end{array}
\end{aligned}$$
$(\mathbb{Z}_n,+)$ is an abelian group of order $n$. We will see later that the group $E_n$ and $\mathbb{Z}_n$ are indeed the same group.

Example. The set $M_{m\times n}(\mathbb{R})$ of all $m\times n$ matrices of real entries under matrix addition is an abelian group of infinite order.

Example. The set $\mathrm{GL}(n,\mathbb{R})$ of alll $n\times n$ non-singular matrices of real entries under matrix multiplication is a non-abelian group of infinite order. $\mathrm{GL}(n,\mathbb{R})$ is called the general linear group of degree $n$. It can be viewed as the set of all invertible linear transformations $T: \mathbb{R}^n\longrightarrow\mathbb{R}^n$ i.e. linear isomorphisms from $\mathbb{R}^n$ onto itself.

Example. [Klein four-group (Vierergruppe in German)]

Klein four-group

Klein four-group

Let $V=\{e,a,b,c\}$ where $a$ is counterclockwise rotation of the rectangle in the picture about the $x$-axis by $180^\circ$, $b$ is counterclockwise rotation of the rectangle about the $y$-axis by $180^\circ$ and $c$ is counterclockwise rotation of the rectangle about the $z$-axis (the axis coming out of the origin toward you) by $180^\circ$. Then $a$, $b$ and $c$ satisfy the following relationship
\begin{align*}
a^2=b^2=c^2=e,\ ab&=ba=c,\ bc=cb=a,\\
ca&=ac=b.
\end{align*}
Here, $\cdot$ is the function composition i.e. successive application of rotations. By labling the vertices of the rectangle as 1, 2, 3, 4 as seen in the picture, we can define each rotation as a permutation of $\{1,2,3,4\}$.
\begin{align*}
e&=\begin{pmatrix}
1 & 2 & 3 & 4\\
1 & 2 & 3 & 4
\end{pmatrix},\ a=\begin{pmatrix}
1 & 2 & 3 & 4\\
2 & 1 & 4 &3
\end{pmatrix},\\
b&=\begin{pmatrix}
1 & 2 & 3 & 4\\
4 & 3 & 2 & 1
\end{pmatrix},\ c=\begin{pmatrix}
1 & 2 & 3 & 4\\
3 & 4 & 1 &2
\end{pmatrix}.
\end{align*}
For instance, we calculate $ab$:
\begin{align*}
ab&=\begin{pmatrix}
1 & 2 & 3 & 4\\
2 & 1 & 4 &3
\end{pmatrix}\begin{pmatrix}
1 & 2 & 3 & 4\\
4 & 3 & 2 & 1
\end{pmatrix}\\
&=\begin{pmatrix}
1 & 2 & 3 & 4\\
2 & 1 & 4 &3
\end{pmatrix}\begin{pmatrix}
4 & 3 & 2 & 1\\
3 & 4 & 1 & 2
\end{pmatrix}\\
&=\begin{pmatrix}
1 & 2 & 3 & 4\\
3 & 4 & 1 &2
\end{pmatrix}\\
&=c.
\end{align*}

Example. [The $n$-th dihedral group $D_n$, $n\geq 3$]

Consider an equilateral triangle shown in the following picture.

Dihedral group D3

Dihedral group D3

$\rho$ is counterclockwise rotation about the axis coming out of the origin by $\frac{360^\circ}{3}=120^\circ$ and $\mu_i$, $i=1,2,3$ is counterclockwise rotation about the axis of rotation through each vertex $i$ by $180^\circ$. As permutations of the vertices $\{1,2,3\}$, $\rho$ and $\mu_i$ , $i=1,2,3$ are given by
\begin{align*}
\rho&=\begin{pmatrix}
1 & 2 & 3\\
2 & 3 & 1
\end{pmatrix},\ \mu_1=\begin{pmatrix}
1 & 2 & 3\\
1 & 3 & 2
\end{pmatrix}\\
\mu_2&=\begin{pmatrix}
1 & 2 & 3\\
3 & 2 & 1
\end{pmatrix},\ \mu_3=\begin{pmatrix}
1 & 2 & 3\\
2 & 1 &3
\end{pmatrix}.
\end{align*}
$D_3=\{e,\rho,\rho^2,\mu_1,\mu_2,\mu_3\}$ is a group called the 3rd dihedral group. $$\mu_1\mu_2=\rho^2\ne\rho=\mu_2\mu_1.$$
So, we see that $D_3$ is not an abelian group. $D_3$ is the group of symmetries of an equilateral triangle.

Now this time let us consider a square as shown in the following picture.

Dihedral group D4

Dihedral group D4

$\rho$ is counterclockwise rotation about the $z$-axis coming out of the origin by $\frac{360^\circ}{4}=90^\circ$. $\mu_i$, $i=1,2$ are counterclockwise rotations about $y$-axis and $x$-axis, respectively by $180^\circ$. $\delta_i$, $i=1,2$ are counterclockwise rotations about the axis through the vertices 2 and 4 and the vertices through 1 and 3, respectively by $180^\circ$. As permutations of the vertices $\{1,2,3,4\}$, $\rho$, $\mu_i$ and $\delta_i$, $i=1,2$ are given by
\begin{align*}
\rho&=\begin{pmatrix}
1 & 2 & 3 & 4\\
2 & 3 & 4 & 1
\end{pmatrix},\\
\mu_1&=\begin{pmatrix}
1 & 2 & 3 & 4\\
2 & 1 & 4 & 3
\end{pmatrix},\ \mu_2=\begin{pmatrix}
1 & 2 & 3 & 4\\
4 & 3 & 2 & 1
\end{pmatrix},\\
\delta_1&=\begin{pmatrix}
1 & 2 & 3 & 4\\
3 & 2 & 1 & 4
\end{pmatrix},\ \delta_2=\begin{pmatrix}
1 & 2 & 3 & 4\\
1 & 4 & 3 & 2
\end{pmatrix}.
\end{align*}
$D_4=\{e,\rho,\rho^2,\rho^3,\mu_1,\mu_2,\delta_1,\delta_2\}$ is the group of symmetries of a square, called the 4th dihedral group. It is also called the octic group. Note that $|D_n|=2n$, $n\geq 3$.

Basic (Metric) Topology

Let $(X,d)$ be a metric space.

Definition. A subset $U\subset X$ is said to be open if $\forall x\in U$ $\exists\epsilon>0$ s.t. $B(x,\epsilon)\subset U$.

If $U\subset X$ is open then $U$ can be expressed as union of open balls $B(x,\epsilon)$. Hence, the set of all open balls in $X$, $\mathcal{B}=\{B(x,\epsilon): x\in X,\ \epsilon>0\}$ form a basis for a topology (a metric topology, the topology induced by the metric $d$) on $X$. Those who have not studied topology before may simply understand it as the set of all open sets in $X$.

Definition. A subset $F\subset X$ is said to be closed if its complement, $F^c=X\setminus F$ is open in $X$.

The following is the definition of a continuous function that you are familiar with from calculus. The definition is written in terms of metrics.

Definition. Let $(X,d_X)$ and $(Y,d_Y)$ be metric spaces. A mapping $T:X\longrightarrow Y$ is said to be continuous at $x_0\in X$ if $\forall\epsilon>0$ $\exists\delta>0$ s.t $d_Y(Tx,Tx_0)<\epsilon$ whenever $d_X(x,x_0)<\delta$.

$T$ is said to be continuous if it is continuous at every point of $X$.

The above definition can be generalized in terms of open sets as follows.

Theorem. A mapping $T: (X,d_X)\longrightarrow(Y,d_Y)$ is continuous if and only if $\forall$ open set $U$ in $Y$, $T^{-1}U$ is open in $X$.

Proof. (Only if, $\Rightarrow$) Suppose that $T:X\longrightarrow Y$ is continuous. Let $U$ be open in $Y$. Then we show that $T^{-1}U$ is open in $X$. Let $x_0\in T^{-1}U$. Then $Tx_0\in U$. Since $U$ is open in $Y$, $\exists\epsilon>0$ s.t. $B(Tx_0,\epsilon)\subset U$. By the continuity of $T$, for this $\epsilon>0$ $\exists\delta>0$ s.t. whenever $d(x,x_0)<\delta$, $d(Tx,Tx_0)<\epsilon$. This means that
$$TB(x_0,\delta)\subset B(Tx_0,\epsilon)\subset U\Longrightarrow B(x_0,\delta)\subset T^{-1}(TB(x_0,\delta))\subset T^{-1}U.$$ Hence, $T^{-1}U$ is open in $X$.

(If, $\Leftarrow$) Suppose that $\forall$ open set $U$ in $Y$, $T^{-1}U$ is open in $X$. We show that $T$ is continuous. Let $x_0\in X$ and let $\epsilon>0$ be given. Then $B(Tx_0,\epsilon)$ is open in $Y$. So by the assumption, $x_0\in T^{-1}B(Tx_0,\epsilon)$ is open in $X$. This means that $\exists\delta>0$ s.t.
$$B(x_0,\delta)\subset T^{-1}B(Tx_0,\epsilon)\Longrightarrow TB(x_0,\delta)\subset T(T^{-1}B(Tx_0,\epsilon))\subset B(Tx_0,\epsilon).$$ This is equivalent to saying that $\exists\delta>0$ s.t. whenever $d(x,x_0)<\delta$, $d(Tx,Tx_0)<\epsilon$. That is, $T$ is continuous at $x_0$. Since the choice $x_0\in X$ was arbitrary, the proof is complete.

Let $A\subset X$. $x\in X$ is called an accumulation point or a limit point of $A$ if $\forall$ open set $U(x)$ in $X$, $(U(x)-\{x\})\cap A\ne\emptyset$. Here the notation $U(x)$ means that it contains $x$. The set of all accumulation points of $A$ is denoted by $A’$ and is called the derived set of $A$. $\bar A:=A\cup A’$ is called the closure of $A$. $\bar A$ is the smallest closed set containing $A$.

Theorem. Let $A\subset X$. Then $x\in\bar A$ if and only if $\forall$ open set $U(x)$, $U(x)\cap A\ne\emptyset$.

Definition. $D\subset X$ is said to be dense if $\bar D=X$. This means that $\forall$ open set $U$ in $X$, $U\cap D\ne\emptyset$.

Definition. $X$ is said to be separable if it has a countable dense subset.

Examples. The real line $\mathbb{R}$ is separable. The complex plane $\mathbb{C}$ is also separable.

Theorem. The space $\ell^\infty$ is not separable.

Proof. Let $y=(\eta_1,\eta_2,\eta_3,\cdots)$ be a sequence of zeros and ones. Then $y\in\ell^\infty$. We can then associate $y$ with the binary representation
$$\hat y=\frac{\eta_1}{2}+\frac{\eta_2}{2^2}+\frac{\eta_3}{2^3}+\cdots\in [0,1].$$ Each $\hat y\in [0,1]$ has a binary representation and different $\hat y$’s have different binary representations. So, there are uncountably many sequences of zeros and ones. If $y$ and $z$ are sequences of zeros and ones and $y\ne z$, then $d(y,z)=1$. This means that for any two distinct sequences $y$ and $z$ of zeros and ones, $B\left(y,\frac{1}{3}\right)\cap B\left(z,\frac{1}{3}\right)=\emptyset$. Let $A$ be a dense subset of $\ell^\infty$. Then for each sequence $y$ of zeros and ones, $B\left(y,\frac{1}{3}\right)$ has at least one element of $A$. This means that $A$ cannot be countable.

Theorem. The space $\ell^p$ with $1\leq p<\infty$ is separable.

Proof. Let $A$ be the set of all sequences $y$ of the form
$$y=(\eta_1,\eta_2,\cdots,\eta_n,0,0,\cdots,0),$$ where $n$ is a positive integer and the $\eta_j$’s are rational. For each $n=1,2,\cdots$, the number of sequences of the form $y=(\eta_1,\eta_2,\cdots,\eta_n,0,0,\cdots,0)$ is the same as the number of functions from $\{1,2,3,\cdots,n\}$ to $\mathbb{Q}$, the set of all rational numbers. $\mathbb{Q}$ has the cardinality $\aleph_0$ and so the number is $\aleph_0^n=\aleph_0$. The cardinality of $A$ is then $\aleph_0\cdot\aleph_0=\aleph_0$ i.e. $A$ is countable. Now we show that $A$ is dense in $\ell^p$. Let $x=(\xi_j)\in\ell^p$. Let $\epsilon>0$ be given. Since $\displaystyle\sum_{j=1}^\infty|\xi_j|^p<\infty$, $\exists$ a positive integer $N$ s.t. $\displaystyle\sum_{j=N+1}^\infty|\xi_j|^p<\frac{\epsilon^p}{2}$. Since rationals are dense in $\mathbb{R}$, one can find $y=(\eta_1,\eta_2,\cdots,\eta_N,0,0,\cdots)\in A$ s.t. $\displaystyle\sum_{j=1}^N|\xi_j-\eta_j|^p<\frac{\epsilon^p}{2}$. Hence,
$$[d(x,y)]^p=\sum_{j=1}^N|\xi_j-\eta_j|^p+\sum_{j=N+1}^\infty|\xi_j|^p<\epsilon^p,$$
i.e. $d(x,y)<\epsilon$. This means that $y\in B(x,\epsilon)\cap A\ne\emptyset$. This completes the proof.

Group Theory 3: Preliminaries (Basic Number Theory)

In this lecture, we study some basic number theory as it is needed to study group theory.

Let $\mathbb{Z}$ denote the set of integers. $\mathbb{Z}$ satisfies well-ordering principle, namely any non-empty set of nonnegative integers has a smallest member.

One of the most fundamental theorems regarding numbers is Euclid’s Algorithm. Although we will not discuss its proof, it can be proved using well-ordering principle.

Theorem. [Euclid’s Algorithm] If $m$ and $n$ are integers with $n>0$, then $\exists$ integers $q$ and $r$ with $0\leq r<n$ such that $m=qn+r$.

Euclid’s algorithm hints us how we can define the notion that one integer divides another.

Definition. Given $m\ne 0, n\in\mathbb{Z}$, we say $m$ divides $n$ and write $m|n$ if $n=cm$ for some $c\in\mathbb{Z}$.

Examples. $2|14$, $(-7)|14$, $4|(-16)$.

If $m|n$, we call $m$ a divisor or a factor of $n$, and $n$ a multiple of $m$. To indicate $m$ is not a divisor of $n$, we write $m\not|n$. For example, $3\not|5$.

Lemma. The following properties hold.

(a) $1|n$ $\forall n$.

(b) If $m\ne 0$ then $m|0$.

(c) If $m|n$ and $n|q$, then $m|q$.

(d) If $m|n$ and $m|q$ then $m|(\mu n+\nu q)$ $\forall \mu,\nu$.

(e) If $m|1$ then $m=\pm 1$.

(f) If $m|n$ and $n|m$ then $m=\pm n$.

Definition. Given $a,b$ (not both 0), their greatest common divisor (in short gcd) $c$ is defined by the following properties:

(a) $c>0$

(b) $c|a$ and $c|b$

(c) If $d|a$ and $d|b$ then $d|c$.

If $c$ is the gcd of $a$ and $b$, we write $c=(a,b)$.

$(24,9)=3$. Note that the gcd 3 can be written in terms of 24 and 9 as $3\cdot 9+1\cdot (-24)$ or $(-5)9+2\cdot 24$. In general, we have the following theorem holds.

Theorem. If $a,b$ are not both 0, their gcd exists uniquely. Moreover, $\exists m,n\in\mathbb{Z}$ s.t. $c=ma+nb$.

Now let us talk about how to find the gcd of two positive numbers $a$ and $b$. W.L.O.G. (Without Loss Of Generality), we may assume that $b<a$. Then by Euclid’s algorithm we have
$$a=bq+r,\ \mbox{where}\ 0\leq r<b.$$
Let $c=(a,b)$. Then $c|r$, so $c$ is a common divisor of $b$ and $r$. If $d$ is a common divisor of $b$ and $r$, it is also a common divisor of $a$ and $b$. This implies that $d\leq c$ and so $c=(b,r)$. Finding $(b,r)$ is of course easier because one of the numbers is smaller than before.

Example. [Finding GCD]
$$
\begin{aligned}
(100,28)&=(28,16)\ &(100&=28\cdot 3+16)\\
&=(16,12)\ &(28&=16\cdot 1+12)\\
&=(12,4)\ &(16&=12\cdot 1+14)\\
&=4.
\end{aligned}
$$
By working backward, we can also find integers $m$ and $n$ such that
$$4=m\cdot 100+n\cdot 28.$$
\begin{align*}
4&=16+12(-1)\\
&=16+(-1)[28+(-1)16]\\
&=(-1)28+2\cdot 16\\
&=(-1)28+2[100+(-3)28]\\
&=2\cdot 100+(-7)28.
\end{align*}
Therefore, $m=2$ and $n=-7$.

Definition. We say that $a$ and $b$ are relatively prime if $(a,b)=1$.

Theorem. The integers $a$ and $b$ are relatively prime if and only if $1=ma+nb$ for some $m$ and $n$.

Theorem. If $(a,b)=1$ and $a|bc$ then $a|c$.

Theorem. If $b$ and $c$ are both relatively prime to $a$, then $bc$ is also relatively prime to $a$.

Definition. A prime number, or shortly prime, is an integer $p>1$ such that $\forall a\in\mathbb{Z}$, either $p|a$ or $(p,a)=1$.

Suppose that $p$ is a prime as defined above and $p=ab$, where $1\leq a<p$. Then $p\not|a$ since $a<p$, so $(p,a)=1$. This implies that $p|b$. On the other hand, $b|p(=ab)$ and hence $p=b$ and $a=1$. So, the above definition coincides with the definition of a prime we are familiar with.

Theorem. If $p$ is a prime and $p|a_1a_2\cdots a_n$, then $p|a_i$ for some $i$ with $1\leq i\leq n$.

Proof. If $p|a_1$, we are done. If not, $(p,a_1)=1$ and so $p|a_2a_3\cdots a_n$. Continuing this, we see that $p|a_i$ for some $i$.

Regarding primes, we have the following theorems.

Theorem. If $n>1$, then either $n$ is a prime or the product of primes.

Theorem. [Unique Factorization Theorem] Given $n>1$, there is a unique way to write $n$ in the form $n=p_1^{a_1}p_2^{a_2}\cdots p_k^{a_k}$, where $p_1<p_2<\cdots<p_k$ are primes and the exponents $a_1,\cdots,a_k$ are all positive.

Theorem. [Euclid] There is an infinite number of primes.

$\ell^p$ and $L^p$ as Metric Spaces

Let $p\geq 1$ be a fixed number and let
$$\ell^p=\left\{x=(\xi_j): \sum_{j=1}^\infty|\xi_j|^p<\infty\right\}.$$
Define $d:\ell^p\times\ell^p\longrightarrow\mathbb{R}^+\cup\{0\}$ by
$$d(x,y)=\left(\sum_{j=1}^\infty|\xi_j-\eta_j|^p\right)^{\frac{1}{p}}.$$
Then $(\ell^p,d)$ is a metric space. The properties (M1) and (M2) are clearly satisfied. We prove the remaining property (M3) the triangle inequality. $p=1$ case can be easily shown by the triangle inequality of numbers. We need a few steps to do this. First we prove the following inequality: $\forall\alpha>0,\beta>0$,
$$\alpha\beta\leq\frac{\alpha^p}{p}+\frac{\beta^q}{q},$$
where $p>1$ and $\frac{1}{p}+\frac{1}{q}=1$. The numbers $p$ and $q$ are called conjugate exponents. It follows from $\frac{1}{p}+\frac{1}{q}=1$ that $(p-1)(q-1)=1$ i.e. $\frac{1}{p-1}=q-1$. If we let $u=t^{p-1}$ then $t=u^{\frac{1}{p-1}}=u^{q-1}$. By comparing areas, we obtain
$$\alpha\beta\leq\int_0^{\alpha}t^{p-1}dt+\int_0^{\beta}u^{q-1}du=\frac{\alpha^p}{p}+\frac{\beta^q}{q}.$$
Next, using this inequality we prove the Hölder inequality
$$\sum_{j=1}^\infty|\xi_j\eta_j|\leq\left(\sum_{k=1}^\infty|\xi_k|^p\right)^{\frac{1}{p}}\left(\sum_{m=1}^\infty|\eta_m|^q\right)^{\frac{1}{q}}$$
where $p>1$ and $\frac{1}{p}+\frac{1}{q}=1$. When $p=2$ and $q=2$, we obtain the well-known Cauchy-Schwarz inequality.

Proof. Let $(\tilde\xi_j)$ and $(\tilde\eta_j)$ be two sequences such that
$$\sum_{j=1}^\infty|\tilde\xi_j|^p=1,\ \sum_{j=1}^\infty|\tilde\eta_j|^q=1.$$
Let $\alpha=|\tilde\xi_j|$ and $\beta=|\tilde\eta_j|$. Then by the inequality we proved previously,
$$|\tilde\xi_j\tilde\eta_j|\leq\frac{|\tilde\xi_j|^p}{p}+\frac{|\tilde\eta_j|^q}{q}$$
and so we obtain
$$\sum_{j=1}^\infty|\tilde\xi_j\tilde\eta_j|\leq\sum_{j=1}^\infty\frac{|\tilde\xi_j|^p}{p}+\sum_{j=1}^\infty\frac{|\tilde\eta_j|^q}{q}=1.$$
Now take any nonzero $x=(\xi_j)\in\ell^p$, $y=(\eta_j)\in\ell^q$. Setting
$$\tilde\xi_j=\frac{\xi_j}{\left(\displaystyle\sum_{k=1}^\infty|\xi_k|^p\right)^{\frac{1}{p}}},\ \tilde\eta_j=\frac{\eta_j}{\left(\displaystyle\sum_{m=1}^\infty|\eta_m|^q\right)^{\frac{1}{q}}}$$
results in the Hölder inequality.

Next, we prove the Minkowski inequality
$$\left(\sum_{j=1}^\infty|\xi_j+\eta_j|^p\right)^{\frac{1}{p}}\leq\left(\sum_{k=1}^\infty|\xi_k|^p\right)^{\frac{1}{p}}+\left(\sum_{m=1}^\infty|\eta_m|^p\right)^{\frac{1}{p}}$$
where $x=(\xi_j)\,y=(\eta_j)\in\ell^p$ and $p\geq 1$. $p=1$ case comes from the triangle inequality for numbers. Let $p>1$. Then
\begin{align*}
|\xi_j+\eta_j|^p&=|\xi_j+\eta_j||\xi_j+\eta_j|^{p-1}\\
&\leq(|\xi_j|+|\eta_j|)|\xi_j+\eta_j|^{p-1}\ (\mbox{triangle inequality for numbers}).
\end{align*}
For a fixed $n$, we have
$$\sum_{j=1}^n|\xi_j+\eta_j|^p\leq\sum_{j=1}^n|\xi_j||\xi_j+\eta_j|^{p-1}+\sum_{j=1}^n|\eta_j||\xi_j+\eta_j|^{p-1}.$$
Using the Hölder inequality, we get the following inequality
\begin{align*}
\sum_{j=1}^n|\xi_j||\xi_j+\eta_j|^{p-1}&\leq \sum_{j=1}^\infty |\xi_j||\xi_j+\eta_j|^{p-1}\\
&\leq\left(\sum_{k=1}^\infty |\xi_k|^p\right)^{\frac{1}{p}}\left(\sum_{m=1}^\infty(|\xi_m+\eta_m|^{p-1})^q\right)^{\frac{1}{q}}\ (\mbox{Hölder})\\
&=\left(\sum_{k=1}^\infty|\xi_k|^p\right)^{\frac{1}{p}}\left(\sum_{m=1}^\infty|\xi_m+\eta_m|^p\right)^{\frac{1}{q}}.
\end{align*}
Similarly, we also get the inequality
$$\sum_{j=1}^n|\eta_j||\xi_j+\eta_j|^{p-1}\leq \left(\sum_{k=1}^\infty|\eta_k|^p\right)^{\frac{1}{p}}\left(\sum_{m=1}^\infty|\xi_m+\eta_m|^p\right)^{\frac{1}{q}}.$$
Combining these two inequalities, we get
$$\sum_{j=1}^n|\xi_j+\eta_j|^p\leq\left\{\left(\sum_{k=1}^\infty|\xi_k|^p\right)^{\frac{1}{p}}+\left(\sum_{k=1}^\infty|\eta_k|^p\right)^{\frac{1}{p}}\right\}\left(\sum_{m=1}^\infty|\xi_m+\eta_m|^p\right)^{\frac{1}{q}}$$
and by taking the limit $n\to \infty$ on the left hand side, we get
$$\sum_{j=1}^\infty|\xi_j+\eta_j|^p\leq\left\{\left(\sum_{k=1}^\infty|\xi_k|^p\right)^{\frac{1}{p}}+\left(\sum_{k=1}^\infty|\eta_k|^p\right)^{\frac{1}{p}}\right\}\left(\sum_{m=1}^\infty|\xi_m+\eta_m|^p\right)^{\frac{1}{q}}.$$
Finally, dividing this inequality by $\displaystyle\left(\sum_{m=1}^\infty|\xi_m+\eta_m|^p\right)^{\frac{1}{q}}$ results in the Minkowski inequality. The Minkowski inequality tells that
$$d(x,y)=\left(\sum_{j=1}^\infty|\xi_j-\eta_j|^p\right)^{\frac{1}{p}}<\infty$$
for $x,y\in\ell^p$. Let $x=(\xi_j), y=(\eta_j),\ z=(\zeta_j)\in\ell^p$. Then
\begin{align*}
d(x,y)&=\left(\sum_{j=1}^\infty|\xi_j-\eta_j|^p\right)^{\frac{1}{p}}\\
&\leq\left(\sum_{j=1}^\infty[|\xi_j-\zeta_j|+|\zeta_j-\eta_j|]^p\right)^{\frac{1}{p}}\\
&\leq\left(\sum_{j=1}^\infty|\xi_j-\zeta_j|^p\right)^{\frac{1}{p}}+\left(\sum_{j=1}^\infty|\zeta_j-\eta_j|^p\right)^{\frac{1}{p}}\\
&=d(x,z)+d(z,y).
\end{align*}The inequality that is second to the last expression is obtained by Minkowski inequality.

A measurable function $f$ on a closed interval $[a,b]$ is said to belong to $L^p$ if $\int_a^b|f(t)|^p dt<\infty$. $L^p$ is a vector space. For functions $f,g\in L^p$, we define
$$d(f,g)=\left\{\int_a^b|f(t)-g(t)|^pdt\right\}^{\frac{1}{p}}.$$
Then clearly (M2) symmetry is satisfied and one can also prove that (M3) triangle inequality holds. However, (M1) is not satisfied since what we have is that if $d(f,g)=0$ then $f=g$ a.e. (almost everywhere) i.e. the set $\{t\in[a,b]: f(t)\ne g(t)\}$ has measure $0$. It turns out that $=$ a.e. is an equivalence relation on $L^p$, so by considering $f\in L^p$ as its equivalence class $[f]$, $d$ can be defined as a metric on $L^p$ (actually the quotient space of $L^p$). Later, we will be particularly interested in the case when $p=2$ in which case $L^p$ as well as $\ell^p$ become Hilbert spaces. Those of you who want to know details about $L^p$ space are referred to

Real Analysis, H. L. Royden, 3rd Edition. Macmillan Publishing Company, 1988

Group Theory 2: Preliminaries (Functions)

In my previous notes here, I mentioned some about logical symbols. The logical symbols I will use often are $\forall$ which means “for all”, “for any”, “for each”, or “for every” depending on the context, $\exists$ which means “there exists”, and $\ni$ which means “such that” (don’t be confused with $\in$ which means “be an element of”). We also use s.t. for “such that.” There are also $\Longrightarrow$ which means “implies” and $\Longleftrightarrow$ which means “if and only if.” I guess these pretty much cover what we use most of time.

Now lets review about functions in a more formal way. Let $X$ and $Y$ be two non-empty sets. The the Cartesian product $X\times Y$ of $X$ and $Y$ is defined as the set
$$X\times Y=\{(x,y): x\in X,\ y\in Y\}.$$
A subset $f$ of the Cartesian product $X\times Y$ (we write $f\subset X\times Y$) is called a graph from $X$ to $Y$. A graph $f\subset X\times Y$ is called a function from $X$ to $Y$ (we write $f: X\longrightarrow Y$) if whenever $(x,y_1),(x,y_2)\in f$, $y_1=y_2$. If $f: X\longrightarrow Y$ and $(x,y)\in f$, we also write $y=f(x)$. A function $f: X\longrightarrow Y$ is said to be one-to-one or injective if whenever $(x_1,y),(x_2,y)\in f$, $x_1=x_2$. This is equivalent to saying $f(x_1)=f(x_2)$ implies $x_1=x_2$. A function $f: X\longrightarrow Y$ is said to be onto or surjective if $\forall y\in Y$ $\exists x\in X$ s.t. $(x,y)\in f$. A function $f: X\longrightarrow Y$ is said to be one-to-one and onto (or bijective) if it is both one-to-one and onto (or both injective and surjective).

Let $f: X\longrightarrow Y$ and $g: Y\longrightarrow Z$ be two functions. Then the composition or the composite function $g\circ f: X\longrightarrow Z$ is defined by $g\circ f(x)=g(f(x))$ $\forall x\in X$. The function composition $\circ$ may be considered as an operation and it is associative.

Lemma. If $h: X\longleftrightarrow Y$, $g:Y\longleftrightarrow Z$ and $f:Z\longleftrightarrow W$, then $f\circ(g\circ h)=(f\circ g)\circ h$.

Note that $\circ$ is not commutative i.e. it is not necessarily true that $f\circ g=g\circ f$ even when both $f\circ g$ and $g\circ f$ are defined.

The following lemmas will be useful when we study group theory later.

Lemma. If both $f: X\longrightarrow Y$ and $g: Y\longrightarrow Z$ are one-to-one, so is $g\circ f: X\longrightarrow Z$.

Lemma. If both $f: X\longrightarrow Y$ and $g: Y\longrightarrow Z$ are onto, so is $g\circ f: X\longrightarrow Z$.

As an immediate consequence of combining these two lemmas, we obtain

Lemma. If both $f: X\longrightarrow Y$ and $g: Y\longrightarrow Z$ are bijective, so is $g\circ f: X\longrightarrow Z$.

If $f\subset X\times Y$, then the inverse graph $f^{-1}\subset Y\times X$ is defined by
$$f^{-1}=\{(y,x)\in Y\times X: (x,y)\in f\}.$$
If $f: X\longrightarrow Y$ is one-to-one and onto (bijective) then its inverse graph $f^{-1}$ is a function $f^{-1}: Y\longrightarrow X$. The inverse $f^{-1}$ is also one-to-one and onto.

Lemma. If $f: X\longrightarrow Y$ is a bijection, then $f\circ f^{-1}=\imath_Y$ and $f^{-1}\circ f=\imath_X$, where $\imath_X$ and $\imath_Y$ are the identity mappings of $X$ and $Y$, respectively.

Let $A(X)$ be the set of all one-to-one functions of $X$ onto $X$ itself. Then $(A(X),\circ)$ is a group. If $X$ is a finite set of $n$-elements (we may conveniently say $X=\{1,2,\cdots,n\})$, then $(A(X),\circ)$ is a finite group of order $n!$, called the symmetric group of degree $n$. The symmetric group of degree $n$ is denoted by $S_n$ and the elements of $S_n$ are called permutations.