Let $(\Omega,\mathscr{U},P)$ be a probability space and $X:\Omega\longrightarrow\mathbb{R}^n$ a randome variable. We define an ordering between two vectors in $\mathbb{R}^n$ as follows: Let $x=(x_1,\cdots,x_n),y=(y_1,\cdots,y_n)\in\mathbb{R}^n$. Then $x\leq y$ means $x_i\leq y_i$ for $i=i,\cdots,n$.
Definition. The distribution function of $X$ is the function $F_X: \mathbb{R}^n\longrightarrow[0,1]$ defined by
$$F_X(x):=P(X\leq x)$$
for all $x\in\mathbb{R}^n$. If $X_1,\cdots,X_m:\Omega\longrightarrow\mathbb{R}^n$ are random variables, their joint distribution function $F_{X_1,\cdots,X_m}:(\mathbb{R}^n)^m\longrightarrow[0,1]$ is defined by
$$F_{X_1,\cdots,X_m}(x_1,\cdots,x_m):=P(X_1\leq x_1,\cdots,X_m\leq x_m)$$
for all $x_i\in\mathbb{R}^n$ and for all $i=1,\cdots,n$.
Definition. Let $X$ be a random variable, $F=F_X$ its distribution function. If there exists a nonnegative integrable function $f:\mathbb{R}^n\longrightarrow\mathbb{R}$ such that
$$F(x)=F(x_1,\cdots,x_n)=\int_{-\infty}^{x_1}\cdots\int_{-\infty}^{x_n}f(y_1,\cdots,y_n)dy_1\cdots dy_n$$
then $f$ is called the density function for $X$. More generally,
$$P(X\in B)=\int_B f(x)dx$$
for all $B\in\mathscr{B}$ where $\mathscr{B}$ is the Borel $\sigma$-algebra.
Example. If $X:\Omega\longrightarrow\mathbb{R}$ has the density function
$$f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{|x-m|^2}{2\sigma^2}},\ x\in\mathbb{R}$$
then we say $X$ has a Gaussian or normal distribution with mean $m$ and variance $\sigma^2$. In this case, we write “$X$ is an $N(m,\sigma^2)$ random variable.”
Example. If $X: \Omega\longrightarrow\mathbb{R}^n$ has the density
$$f(x)=\frac{1}{\sqrt{(2\pi)^n\det C}}e^{-\frac{1}{2}(x-m)C^{-1}(x-m)^t},\ x\in\mathbb{R}^n$$
for some $m\in\mathbb{R}^n$ and some positive definite symmetric matrix $C$, we say that “$X$ has a Gaussian or normal distribution with mean $m$ and covariance matrix $C$.” We write $X$ is an $N(m,C)$ random variable. The covariance matrix is given by
\begin{equation}\label{eq:covmatrix}C=E[(X-E(X))^t(X-E(X))]\end{equation}
where $X=(X_1,\cdots,X_n)$, i.e. each $C$ is the matrix whose $(i,j)$ entry is the covariance
$$C_{ij}=\mathrm{cov}(X_i,X_j)=E[(X_i-E(X_i))(X_j-E(X_j))]=E(X_iX_j)-E(X_i)E(X_j)$$
Clearly $C$ is a symmetric matrix. Recall that for a real-valued random matrix $X$ the variance $\sigma^2$ is given by
$$\sigma^2=V(X)=E[(X-E(X))^2]=E[(X-E(X))\cdot (X-E(X))]$$
So one readily sees that \eqref{eq:covmatrix} is a generalization of variance to higher dimensions. It follows from \eqref{eq:covmatrix} that for a vector $b\in\mathbb{R}^n$,
$$V(Xb^t)=bV(X)b^t$$
Since the variance is nonnegative, we see that the covariance matrix is a positive definite matrix. Since $C$ is symmetric, $PCP^{-1}=D$ where $P$ is an orthogonal matrix and $D$ is a diagonal matrix whose main diagonal contains the eigenvalues of $C$. Recall that for two $n\times n$ matrices $A$ and $B$, $\det(AB)=\det(A)\det(B)$ so we see that $\det(C)=\det(D)$. Since all the eigenvalues of a positive definite matrix are positive, $\det(C)>0$.
Lemma. Let $X:\Omega\longrightarrow\mathbb{R}^n$ be a random variable and assume that its distribution function $F=F_X$ has the density $f$. Suppose $g:\mathbb{R}^n\longrightarrow\mathbb{R}$ and $Y=g(X)$ is integrable. Then
$$E(Y)=\int_{\mathbb{R}^n}g(x)f(x)dx$$
Proof. Suppose first that $g$ is a simple function on $\mathbb{R}^n$.
$$g=\sum_{i=1}^mb_iI_{B_i}\ (B_i\in\mathscr{B})$$
\begin{align*}E(g(X))&=\sum_{i=1}^mb_i\int_{\Omega}I_{B_i}(X)dP\\&=\sum_{i=1}^mb_iP(X\in B_i).\end{align*}
But
\begin{align*}\int_{\mathbb{R}^n}g(x)f(x)dx&=\sum_{i=1}^mb_i\int_{\mathbb{R}^n}I_{B_i}f(x)dx\\&=\sum_{i=1}^nb_i\int_{B_i}f(x)dx\\&=\sum_{i=1}^mb_iP(X\in B_i)\end{align*}
Hence proves the lemma for the case $g$ is a simple function. The rest of the argument extends to general $g$ straightforwardly.
Corollary. If $X:\Omega\longrightarrow\mathbb{R}^n$ is a random variable and its distribution function $F=F_X$ has the density $f$, then
$$V(X)=\int_{\mathbb{R}^n}|x-E(X)|^2f(x)dx$$
Proof. Recall that $V(X)=E(|X-E(X)|^2)$. Define $g:\mathbb{R}^n\longrightarrow\mathbb{R}$ by
$$g(x)=|x-E(X)|^2$$
for all $x\in\mathbb{R}^n$. Then by the Lemma we have
$$V(X)=\int_{\mathbb{R}^n}|x-E(X)|^2f(x)dx$$
Corollary. If $X:\Omega\longrightarrow\mathbb{R}$ is a random variable and its distribution function $F=F_X$ has the density $f$, then $E(X)=\int_{-\infty}^\infty xf(x)dx$ and $V(X)=\int_{-\infty}^\infty |x-E(X)|^2f(x)dx$.
Proof. Trivial from the Lemma by taking $g:\mathbb{R}\longrightarrow\mathbb{R}$ the identity map.
Corollary. If $X:\Omega\longrightarrow\mathbb{R}^n$ is a random variable and its distribution function $F=F_X$ has the density $f$, then
$$E(X_1\cdots X_n)=\int_{\mathbb{R}^n}x_1\cdots x_nf(x)dx$$
Proof. Define $g:\mathbb{R}^n\longrightarrow\mathbb{R}$ by
$$g(x)=x_1\cdots x_n\ \mbox{for all}\ x=(x_1,\cdots,x_n)\in\mathbb{R}^n$$
Then the rest follows by the Lemma.
Example. If $X$ is $N(m,\sigma^2)$ then
\begin{align*}
E(X)&=\frac{1}{\sqrt{2\pi\sigma^2}}\int_{-\infty}^\infty xe^{-\frac{(x-m)^2}{2\sigma^2}}dx\\
&=m\\
V(X)&=\frac{1}{\sqrt{2\pi\sigma^2}}\int_{-\infty}^\infty (x-m)^2e^{-\frac{(x-m)^2}{2\sigma^2}}dx\\
&=\sigma^2
\end{align*}
Therefore, $m$ is the mean and $\sigma^2$ is the variance.
References:
Lawrence C. Evans, An Introduction to Stochastic Differential Equations, Lecture Notes