Locally Lipschitz implies Lipschitz on Compact Set Proof

Assume \phi is locally Lipschitz on \mathbb{R}^n, that is, for any x\in \mathbb{R}^n, there exists \delta, L>0 (depending on x) such that |\phi(z)-\phi(y)|\leq L|z-y| for all z,y\in B_\delta(x)=\{t\in\mathbb{R}^n: |x-t|<\delta\}.

Then, for any compact set K\subset\mathbb{R}^n, there exists a constant M>0 (depending on K) such that |\phi(x)-\phi(y)|\leq M|x-y| for all x,y\in K. That is, \phi is Lipschitz on K.


Suppose to the contrary \phi is not Lipschitz on K, so that for all M>0, there exists x,y\in K such that \displaystyle \frac{|\phi(x)-\phi(y)|}{|x-y|}>M.

Then there exists two sequences x_n, y_n\in K such that \displaystyle \frac{|\phi(x_n)-\phi(y_n)|}{|x_n-y_n|}\to\infty.

Since \phi is locally Lipschitz implies \phi is continuous, so \phi is bounded on K by Extreme Value Theorem. Hence |x_n-y_n|\to 0.

By sequential compactness of K, there exists a convergent subsequence x_{n_k}\to x, and thus y_{n_k}\to x.

Then for any L>0, there exists k such that x_{n_k},y_{n_k}\in B_\delta(x) but \displaystyle \frac{|\phi(x_{n_k})-\phi(y_{n_k})|}{|x_{n_k}-y_{n_k}|}>L which contradicts that \phi is locally Lipschitz.

Sufficient condition for “Weak Convergence”

This is a sufficient condition for something that resembles “Weak convergence”: \int f_kg\to \int fg for all g\in L^{p'}
Suppose that f_k\to f a.e.\ and that f_k, f\in L^p, 1<p\leq\infty. If \|f_k\|_p\leq M<\infty, we have \int f_kg\to\int fg for all g\in L^{p'}, 1/p+1/p'=1. Note that the result is false if p=1.

(Case: |E|<\infty, where E is the domain of integration).

We may assume |E|>0, M>0, \|g\|_{p'}>0 otherwise the result is trivially true. Also, by Fatou’s Lemma, \displaystyle \|f\|_p\leq\liminf_{k\to\infty}\|f_k\|_p\leq M.

Let \epsilon>0. Since g\in L^{p'}, so g^{p'}\in L^1 and there exists \delta>0 such that for any measurable subset A\subseteq E with |A|<\delta, \int_A |g^{p'}|<\epsilon^{p'}.

Since f_k\to f a.e.\ (f is finite a.e.\ since f\in L^p), by Egorov’s Theorem there exists closed F\subseteq E such that |E\setminus F|<\delta and \{f_k\} converge uniformly to f on F. That is, there exists N(\epsilon) such that for k\geq N, |f_k(x)-f(x)|<\epsilon for all x\in F.

Then for k\geq N,
\begin{aligned}  \left|\int_E f_kg-fg\right|&\leq\int_E|f_k-f||g|\\  &=\int_{E\setminus F}|f_k-f||g|+\int_F|f_k-f||g|\\  &\leq\left(\int_{E\setminus F}|f_k-f|^p\right)^\frac{1}{p}\left(\int_{E\setminus F}|g|^{p'}\right)^\frac{1}{p'}+\epsilon\int_F |g|\\  &<\|f_k-f\|_p(\epsilon)+\epsilon\left(\int_F|g|^{p'}\right)^\frac{1}{p'}\left(\int_F |1|^p\right)^\frac{1}{p}\\  &\leq 2M\epsilon+\epsilon\|g\|_{p'}|E|^\frac{1}{p}\\  &=\epsilon(2M+\|g\|_{p'}|E|^\frac{1}{p}).  \end{aligned}

Since \epsilon>0 is arbitrary, this means \int_E f_g\to \int_E fg.

(Case: |E|=\infty). Error: See correction below.

Define E_N=E\cap B_N(0), where B_N(0) is the ball with radius N centered at the origin. Then |E_N|<\infty, so there exists N_1>0 such that for N\geq N_1, \int_{E_N}|f_k-f||g|<\epsilon.

Since |g|^{p'}\chi_{E_N}\nearrow|g|^{p'} on E, by Monotone Convergence Theorem, \displaystyle \lim_{N\to\infty}\int_{E_N}|g|^{p'}=\int_E |g|^{p'}<\infty.
Thus there exists N_2>0 such that for N\geq N_2, \int_{E\setminus E_N} |g|^{p'}<\epsilon^{p'}.

Then for N\geq\max\{N_1, N_2\},
\begin{aligned}  \int_E |f_kg-fg|&=\int_{E_N}|f_k-f||g|+\int_{E\setminus E_N}|f_k-f||g|\\  &<\epsilon+\left(\int_{E\setminus E_N}|f_k-f|^p\right)^\frac{1}{p}\left(\int_{E\setminus E_N}|g|^{p'}\right)^\frac{1}{p'}\\  &<\epsilon+\|f_k-f\|_p(\epsilon)\\  &\leq\epsilon+2M\epsilon\\  &=\epsilon(1+2M).  \end{aligned}
so that \int_E f_kg\to\int_E fg.

(Show that the result is false if p=1).

Let f_k:=k\chi_{[0,\frac 1k]}. Then f_k\to f a.e., where f\equiv 0. Note that \int_\mathbb{R} |f_k|=1, \int_\mathbb{R} |f|=0 so that f_k, f\in L^1(\mathbb{R}). Similarly, \|f_k\|_1\leq M=1.

However if g\equiv 1\in L^\infty, \int_\mathbb{R} f_kg=1 for all k but \int_\mathbb{R} fg=0.

Correction for the case |E|=\infty:

Define E_N=E\cap B_N(0), where B_N(0) is the ball with radius N centered at the origin.

Since |g|^{p'}\chi_{E_N}\nearrow |g|^{p'} on E, by Monotone Convergence Theorem, \displaystyle \lim_{N\to\infty}\int_{E_N}|g|^{p'}=\int_E|g|^{p'}<\infty.

Thus there exists N_1>0 such that \int_{E\setminus E_{N_1}}|g|^{p'}<\epsilon^{p'}.

Since |E_{N_1}|<\infty, by the finite measure case there exists N_2 such that for k\geq N_2, \displaystyle \int_{E_{N_1}}|f_k-f||g|<\epsilon.

So for k\geq N_2,
\begin{aligned}  \int_E|f_kg-fg|&=\int_{E_{N_1}}|f_k-f||g|+\int_{E\setminus E_{N_1}}|f_k-f||g|\\  &<\epsilon+\left(\int_{E\setminus E_{N_1}}|f_k-f|^p\right)^{1/p}\left(\int_{E\setminus E_{N_1}}|g|^{p'}\right)^{1/p'}\\  &<\epsilon+\|f_k-f\|_p(\epsilon)\\  &\leq\epsilon+2M\epsilon\\  &=\epsilon(1+2M).  \end{aligned}

so that \int_Ef_kg\to\int_E fg.

Relationship between L^p convergence and a.e. convergence

It turns out that convergence in Lp implies that the norms converge. Conversely, a.e. convergence and the fact that norms converge implies Lp convergence. Amazing!

Relationship between L^p convergence and a.e. convergence:
Let f, \{f_k\}\in L^p, 0<p\leq\infty. If \|f-f_k\|_p\to 0, then \|f_k\|_p\to\|f\|_p. Conversely, if f_k\to f a.e.\ and \|f_k\|_p\to\|f\|_p, 0<p<\infty, then \|f-f_k\|_p\to 0. Note that the converse may fail for p=\infty.

Assume \|f-f_k\|_p\to 0.

(Case: 0<p<1).
Lemma 1:
If 0<p<1, |a+b|^p\leq|a|^p+|b|^p for all a,b\in\mathbb{R}.
Proof of Lemma 1:
\displaystyle 1=\frac{|a|}{|a|+|b|}+\frac{|b|}{|a|+|b|}\leq\left(\frac{|a|}{|a|+|b|}\right)^p+\left(\frac{|b|}{|a|+|b|}\right)^p=\frac{|a|^p+|b|^p}{(|a|+|b|)^p}.
Hence |a+b|^p\leq(|a|+|b|)^p\leq|a|^p+|b|^p.
End Proof of Lemma 1.
Hence, using |a|^p\leq|a-b|^p+|b|^p and |b|^p\leq|a-b|^p+|a|^p we see that \displaystyle ||a|^p-|b|^p|\leq|a-b|^p.

\begin{aligned}  \left|\|f_k\|_p^p-\|f\|_p^p\right|&=\left|\int(|f_k|^p-|f|^p)\right|\\  &\leq\int\left||f_k|^p-|f|^p\right|\\  &\leq\int|f_k-f|^p\\  &=\|f-f_k\|_p^p\to 0\ \ \ \text{as}\ k\to\infty.  \end{aligned}

Hence \|f_k\|_p\to\|f\|_p.

(Case: 1\leq p\leq\infty.)

By Minkowski’s inequality, \|f\|_p\leq\|f-f_k\|_p+\|f_k\|_p and \|f_k\|_p\leq\|f-f_k\|_p+\|f\|_p so that \displaystyle \left|\|f_k\|_p-\|f\|_p\right|\leq\|f-f_k\|_p\to 0 as k\to\infty. Done.


Assume f_k\to f a.e.\ and \|f_k\|_p\to\|f\|_p, 0<p<\infty.
Lemma 2:
For a,b\in\mathbb{R}, |a+b|^p\leq 2^{p-1}(|a|^p+|b|^p) for 1\leq p<\infty.
Proof of Lemma 2:
By convexity of |x|^p for 1\leq p<\infty, \displaystyle \left|\frac 12 a+\frac 12 b\right|^p\leq\frac 12 |a|^p+\frac 12 |b|^p.
Multiplying throughout by 2^p gives \displaystyle |a+b|^p\leq 2^{p-1}(|a|^p+|b|^p).

Thus together with Lemma 1, for 0<p<\infty we have |f-f_k|^p\leq c(|f|^p+|f_k|^p) with c=\max\{2^{p-1}, 1\}.

Note that |f-f_k|^p\to 0 a.e.\ and \phi_k:=c(|f|^p+|f_k|^p)\to\phi:=2c|f|^p a.e.\ which is integrable. Also, \int\phi_k\to\int\phi since \|f_k\|_p^p\to\|f\|_p^p. By Generalized Lebesgue’s DCT, we have \int |f-f_k|^p\to 0 thus \displaystyle \|f-f_k\|_p\to 0.

(Show that the converse may fail for p=\infty):

Consider f_k=\chi_{[-k,k]}\in L^\infty(\mathbb{R}). Then f_k\to f a.e.\ where f(x)\equiv 1, and \|f_k\|_\infty\to\|f\|_\infty=1. However \|f-f_k\|_\infty=1\not\to 0.

Wheeden Zygmund Measure and Integration Solutions

Here are some solutions to exercises in the book: Measure and Integral, An Introduction to Real Analysis by Richard L. Wheeden and Antoni Zygmund.

Chapter 1,2: analysis1

Chapter 3: analysis2

Chapter 4, 5: analysis3

Chapter 5,6: analysis4

Chapter 6,7: analysis5

Chapter 8: analysis6

Chapter 9: analysis7

Measure and Integral: An Introduction to Real Analysis, Second Edition (Chapman & Hall/CRC Pure and Applied Mathematics)

Other than this book by Wheedon, also check out other highly recommended undergraduate/graduate math books.

Also check out popular Measure Theory exam question topics here:

Absolute Continuity of Lebesgue Integral

The following is a wonderful property of the Lebesgue Integral, also known as absolute continuity of Lebesgue Integral. Basically, it means that whenever the domain of integration has small enough measure, then the integral will be arbitrarily small.

Suppose f is integrable.
Given \epsilon>0, there exists \delta>0 such that for all measurable sets B\subseteq E with |B|<\delta, |\int_B f\,dx|<\epsilon.

Define A_k=\{x\in E: \frac 1k\leq|f(x)|<k\} for k\in\mathbb{N}. Each A_k is measurable and A_k\nearrow A:=\bigcup_{k=1}^\infty A_k. Note that \displaystyle \int_E |f|=\int_{\{f=0\}}|f|+\int_A |f|+\int_{\{f=\infty\}}|f|=\int_A |f|.

Let f_k=|f|\chi_{A_k}. Then \{f_k\} is a sequence of non-negative functions such that f_k\nearrow |f|\chi_A. By Monotone Convergence Theorem, \lim_{k\to\infty}\int_E f_k=\int_E |f|\chi_A, that is, \displaystyle \lim_{k\to\infty}\int_{A_k}|f|\,dx=\int_A |f|\,dx=\int_E |f|\,dx.

Let N>0 be sufficiently large such that \int_{E\setminus A_N}|f|\,dx<\epsilon/2.

Let \delta=\frac{\epsilon}{2N}, and suppose |B|<\delta. Then
\begin{aligned}  |\int_B f\,dx|&\leq\int_B |f|\,dx\\  &=\int_{(E\setminus A_N)\cap B}|f|\,dx+\int_{A_N\cap B}|f|\,dx\\  &\leq\int_{E\setminus A_N}|f|\,dx+\int_{A_N\cap B}N\,dx\\  &<\epsilon/2+N\cdot|A_N\cap B|\\  &\leq\epsilon/2+N\cdot|B|\\  &<\epsilon/2+N\cdot\frac{\epsilon}{2N}\\  &=\epsilon.  \end{aligned}

Inequalities for pth powers, where 0<p<infinity

There are some useful inequalities for |x+y|^p, where p is a number ranging from 0 to infinity. These are the top 3 useful inequalities (note some of them only work for p less than 1, or p greater than 1).

For a,b\in\mathbb{R}, |a+b|^p\leq 2^p(|a|^p+|b|^p), where 0<p<\infty.

\begin{aligned}  |a+b|^p&\leq(|a|+|b|)^p\\  &\leq(2\max\{|a|,|b|\})^p\\  &=2^p(\max\{|a|,|b|\})^p\\  &\leq 2^p(|a|^p+|b|^p).  \end{aligned}

If 0<p<1, |a+b|^p\leq|a|^p+|b|^p for all a,b\in\mathbb{R}.

\displaystyle 1=\frac{|a|}{|a|+|b|}+\frac{|b|}{|a|+|b|}\leq\left(\frac{|a|}{|a|+|b|}\right)^p+\left(\frac{|b|}{|a|+|b|}\right)^p=\frac{|a|^p+|b|^p}{(|a|+|b|)^p}.
Hence |a+b|^p\leq(|a|+|b|)^p\leq|a|^p+|b|^p.

For a,b\in\mathbb{R}, |a+b|^p\leq 2^{p-1}(|a|^p+|b|^p) for 1\leq p<\infty.

By convexity of |x|^p for 1\leq p<\infty, \displaystyle \left|\frac 12 a+\frac 12 b\right|^p\leq\frac 12 |a|^p+\frac 12 |b|^p.
Multiplying throughout by 2^p gives \displaystyle |a+b|^p\leq 2^{p-1}(|a|^p+|b|^p).

Composition of Continuously Differentiable Function and Function of Bounded Variation

Assume \phi is a continuously differentiable function on \mathbb{R} and f is a function of bounded variation on [0,1]. Then \phi(f) is also a function of bounded variation on [0,1].


\displaystyle V_a^b(\phi(f))=\sup_{P\in\mathcal{P}}\sum_{i=0}^{N_P-1}|\phi(f(x_{i+1}))-\phi(f(x_i))| where \displaystyle \mathcal{P}=\{P|P:a=x_0<x_1<\dots<x_{N_P}=b\ \text{is a partition of}\ [a,b]\}.

By Mean Value Theorem, \displaystyle |\phi(f(x_{i+1}))-\phi(f(x_i))|=|f(x_{i+1})-f(x_i)||\phi'(c)| for some c\in(x_i, x_{i+1}).

Since \phi' is continuous, it is bounded on [0,1], say |\phi'(x)|\leq K for all x\in[0,1]. Thus
\begin{aligned}  V_a^b(\phi(f))&=\sup_{P\in\mathcal{P}}\sum_{i=0}^{N_P-1}|\phi(f(x_{i+1}))-\phi(f(x_i))|\\  &\leq K\sup_{P\in\mathcal{P}}\sum_{i=0}^{N_P-1}|f(x_{i+1})-f(x_i)|\\  &=KV_a^b(f)\\  &<\infty.  \end{aligned}

Fatou’s Lemma for Convergence in Measure

Suppose f_k\to f in measure on a measurable set E such that f_k\geq 0 for all k, then \displaystyle\int_E f\,dx\leq\liminf_{k\to\infty}\int_E f_k\,dx.

The proof is short but slightly tricky:

Suppose to the contrary \int_E f\,dx>\liminf_{k\to\infty}\int_E f_k\,dx. Let \{f_{k_l}\} be a subsequence such that \displaystyle \lim_{l\to\infty}\int f_{k_l}=\liminf_{k\to\infty}\int_E f_k<\int_E f
(using the fact that for any sequence there is a subsequence converging to \liminf).

Since f_{k_l}\xrightarrow{m}f, there exists a further subsequence f_{k_{l_m}}\to f a.e. By Fatou’s Lemma, \displaystyle \int_E f\leq\liminf_{m\to\infty}\int_E f_{k_{l_m}}=\lim_{l\to\infty}\int f_{k_l}<\int_E f, a contradiction.

The last equation above uses the fact that if a sequence converges, all subsequences converge to the same limit.

Lebesgue’s Dominated Convergence Theorem for Convergence in Measure

Lebesgue’s Dominated Convergence Theorem for Convergence in Measure

If \{f_k\} satisfies f_k\xrightarrow{m}f on E and |f_k|\leq\phi\in L(E), then f\in L(E) and \int_E f_k\to\int_E f.


Let \{f_{k_j}\} be any subsequence of \{f_k\}. Then f_{k_j}\xrightarrow{m}f on E. Thus there is a subsequence f_{k_{j_l}}\to f a.e.\ in E. Clearly |f_{k_{j_l}}|\leq\phi\in L(E).

By the usual Lebesgue’s DCT, f\in L(E) and \int_E f_{k_{j_l}}\to\int_E f.

Since every subsequence of \{\int_E f_k\} has a further subsequence that converges to \int_E f, we have \int_E f_k\to\int_E f.

Generalized Lebesgue Dominated Convergence Theorem Proof

This key theorem showcases the full power of Lebesgue Integration Theory.

Generalized Lebesgue Dominated Convergence Theorem

Let \{f_k\} and \{\phi_k\} be sequences of measurable functions on E satisfying f_k\to f a.e. in E, \phi_k\to \phi a.e. in E, and |f_k|\leq\phi_k a.e. in E. If \phi\in L(E) and \int_E \phi_k\to\int_E \phi, then \int_E |f_k-f|\to 0.


We have |f_k-f|\leq|f_k|+|f|\leq\phi_k+\phi. Applying Fatou’s lemma to the non-negative sequence \displaystyle h_k=\phi_k+\phi-|f_k-f|, we get \displaystyle 2\int_E\phi\leq\liminf_{k\to\infty}\int_E (\phi_k+\phi-|f_k-f|).
That is, \displaystyle 2\int_E \phi\leq2\int_E\phi-\limsup_{k\to\infty}\int_E |f_k-f|.

Since \int_E\phi<\infty, we get \limsup_{k\to\infty}\int_E |f_k-f|\leq 0. Since \liminf_{k\to\infty}\int_E |f_k-f|\geq 0, this implies \lim_{k\to\infty}\int_E |f_k-f|=0.

Leibniz Integral Rule (Differentiating under Integral) + Proof

“Differentiating under the Integral” is a useful trick, and here we describe and prove a sufficient condition where we can use the trick. This is the Measure-Theoretic version, which is more general than the usual version stated in calculus books.

Let X be an open subset of \mathbb{R}, and \Omega be a measure space. Suppose f:X\times\Omega\to\mathbb{R} satisfies the following conditions:
1) f(x,\omega) is a Lebesgue-integrable function of \omega for each x\in X.
2) For almost all w\in\Omega, the derivative \frac{\partial f}{\partial x}(x,\omega) exists for all x\in X.
3) There is an integrable function \Theta: \Omega\to\mathbb{R} such that \displaystyle \left|\frac{\partial f}{\partial x}(x,\omega)\right|\leq\Theta(\omega) for all x\in X.

Then for all x\in X, \displaystyle \frac{d}{dx}\int_\Omega f(x,\omega)\,d\omega=\int_\Omega\frac{\partial}{\partial x} f(x,\omega)\,d\omega.

By definition, \displaystyle \frac{\partial f}{\partial x}(x,\omega)=\lim_{h\to 0}\frac{f(x+h,\omega)-f(x,\omega)}{h}.

Let h_n be a sequence tending to 0, and define \displaystyle \phi_n(x,\omega)=\frac{f(x+h_n,\omega)-f(x,\omega)}{h_n}.

It follows that \displaystyle \frac{\partial f}{\partial x}(x,\omega)=\lim_{n\to\infty}\phi_n(x,\omega) is measurable.

Using the Mean Value Theorem, we have \displaystyle |\phi_n(x,\omega)|\leq\sup_{x\in X}|\frac{\partial f}{\partial x}(x,\omega)|\leq\Theta(w) for each x\in X.

Thus for each x\in X, by the Dominated Convergence Theorem, we have \displaystyle \lim_{n\to\infty}\int_\Omega \phi_n(x,\omega)\,d\omega=\int_\Omega\lim_{n\to\infty}\phi_n(x,\omega)\,dw which implies \displaystyle \lim_{h_n\to 0}\frac{\int_\Omega f(x+h_n,\omega)\,d\omega-\int_\Omega f(x,\omega)\,d\omega}{h_n}=\int_\Omega \frac{\partial f}{\partial x}(x,\omega)\,d\omega.

That is, \displaystyle \frac{d}{dx}\int_\Omega f(x,\omega)\,d\omega=\int_\Omega \frac{\partial}{\partial x}f(x,\omega)\,d\omega.

Laurent Series with WolframAlpha

WolframAlpha can compute (simple) Laurent series:

Series[Sin[z^(-1)], {z, 0, 5}]

1/z-1/(6 z^3)+1/(120 z^5)+O((1/z)^6)
(Laurent series)
(converges everywhere away from origin)

Unfortunately, more “complex” (pun intended) Laurent series are not possible for WolframAlpha.

Laurent Series (Example)

The Laurent series is something like the Taylor series, but with terms with negative exponents, e.g. z^{-1}. The below Laurent Series formula may not be the most practical way to compute the coefficients, usually we will use known formulas, as the example below shows.

Laurent Series

The Laurent series for a complex function f(z) about a point c is given by: \displaystyle f(z)=\sum_{n=-\infty}^\infty a_n(z-c)^n where \displaystyle a_n=\frac{1}{2\pi i}\oint_\gamma\frac{f(z)\, dz}{(z-c)^{n+1}}.

The path of integration \gamma is anticlockwise around a closed, rectifiable path containing no self-intersections, enclosing c and lying in an annulus A in which f(z) is holomorphic. The expansion for f(z) will then be valid anywhere inside the annulus.


Consider f(z)=\frac{e^z}{z}+e^\frac{1}{z}. This function is holomorphic everywhere except at z=0. Using the Taylor series of the exponential function \displaystyle e^z=\sum_{k=0}^\infty\frac{z^k}{k!}, we get
\begin{aligned}  \frac{e^z}{z}&=z^{-1}+1+\frac{z}{2!}+\frac{z^2}{3!}+\dots\\  e^\frac{1}{z}&=1+z^{-1}+\frac{1}{2!}z^{-2}+\frac{1}{3!}z^{-3}+\dots\\  \therefore f(z)&=\dots+(\frac{1}{3!})z^{-3}+(\frac{1}{2!})z^{-2}+2z^{-1}+2+(\frac{1}{2!})z+(\frac{1}{3!})z^2+\dots  \end{aligned}
Note that the residue (coefficient of z^{-1}) is 2.

Implicit Function Theorem

The implicit function theorem is a strong theorem that allows us to express a variable as a function of another variable. For instance, if x^2y+y^3x+9xy=0, can we make y the subject, i.e. write y as a function of x? The implicit function theorem allows us to answer such questions, though like most Pure Math theorems, it only guarantees existence, the theorem does not explicitly tell us how to write out such a function.

The below material are taken from Wikipedia.

Implicit function theorem

Let f:\mathbb{R}^{n+m}\to\mathbb{R}^m be a continuously differentiable function, and let \mathbb{R}^{n+m} have coordinates (\mathbf{x},\mathbf{y})=(x_1,\dots,x_n,y_1,\dots,y_m). Fix a point (\mathbf{a},\mathbf{b})=(a_1,\dots,a_n,b_1,\dots,b_m) with f(\mathbf{a},\mathbf{b})=\mathbf{c}, where \mathbf{c}\in\mathbb{R}^m. If the matrix \displaystyle [(\partial f_i/\partial y_j)(\mathbf{a},\mathbf{b})] is invertible, then there exists an open set U containing \mathbf{a}, an open set V containing \mathbf{b}, and a unique continuously differentiable function g:U\to V such that \displaystyle \{(\mathbf{x},g(\mathbf{x}))\mid\mathbf{x}\in U\}=\{(\mathbf{x},\mathbf{y})\in U\times V\mid f(\mathbf{x},\mathbf{y})=\mathbf{c}\}.


Abbreviating (a_1,\dots,a_n,b_1,\dots,b_m) to (\mathbf{a},\mathbf{b}), the Jacobian matrix is
\displaystyle (Df)(\mathbf{a},\mathbf{b})=\begin{pmatrix}  \frac{\partial f_1}{\partial x_1}(\mathbf{a},\mathbf{b}) & \dots &\frac{\partial f_1}{\partial x_n}(\mathbf{a},\mathbf{b}) & \frac{\partial f_1}{\partial y_1}(\mathbf{a},\mathbf{b}) & \dots & \frac{\partial f_1}{\partial y_m}(\mathbf{a},\mathbf{b})\\  \vdots & \ddots &\vdots & \vdots & \ddots &\vdots\\  \frac{\partial f_m}{\partial x_1}(\mathbf{a},\mathbf{b}) & \dots & \frac{\partial f_m}{\partial x_n}(\mathbf{a}, \mathbf{b}) & \frac{\partial f_m}{\partial y_1}(\mathbf{a}, \mathbf{b}) & \dots & \frac{\partial f_m}{\partial y_m}(\mathbf{a}, \mathbf{b})  \end{pmatrix}  =(X\mid Y)
where X is the matrix of partial derivatives in the variables x_i and Y is the matrix of partial derivatives in the variables y_j.

The implicit function theorem says that if Y is an invertible matrix, then there are U, V, and g as desired.

Example (Unit circle)

In this case n=m=1 and f(x,y)=x^2+y^2-1.

\displaystyle (Df)(a,b)=(\frac{\partial f}{\partial x}(a,b)\ \frac{\partial f}{\partial y}(a,b))=(2a\ 2b).

Note that Y=(2b) is invertible iff b\neq 0. By the implicit function theorem, we see that we can locally write the circle in the form y=g(x) for all points where y\neq 0.

lim sup & lim inf of Sets

The concept of lim sup and lim inf can be applied to sets too. Here is a nice characterisation of lim sup and lim inf of sets:

For a sequence of sets \{E_k\}, \limsup E_k consists of those points that belong to infinitely many E_k, and \liminf E_k consists of those points that belong to all E_k from some k on (i.e. belong to all but finitely many E_k).

Note that
\begin{aligned}  x\in\limsup E_k&\iff x\in\bigcup_{k=j}^\infty E_k\ \text{for all}\ j\in\mathbb{N}\\  &\iff\text{For all}\ j\in\mathbb{N}, \text{there exists}\ i\geq j\ \text{such that}\ x\in E_i\\  &\iff x\ \text{belongs to infinitely many}\ E_k.  \end{aligned}
\begin{aligned}  x\in\liminf E_k&\iff x\in\bigcap_{k=j}^\infty E_k\ \text{for some}\ j\in\mathbb{N}\\  &\iff x\in E_k\ \text{for all}\ k\geq j.  \end{aligned}

Fundamental Theorem of Calculus

The Fundamental Theorem of Calculus is one of the most amazing and important theorems in analysis. It is a non-trivial result that links the concept of area and gradient, two seemingly unrelated concepts.

Fundamental Theorem of Calculus

The first part deals with the derivative of an antiderivative, while the second part deals with the relationship between antiderivatives and definite integrals.

First part

Let f be a continuous real-valued function defined on a closed interval [a,b]. Let F be the function defined, for all x in [a,b], by \displaystyle F(x)=\int_a^x f(t)\,dt.

Then F is uniformly continuous on [a,b], differentiable on the open interval (a,b), and \displaystyle F'(x)=f(x) for all x in (a,b).

Second part

Let f and F be real-valued functions defined on [a,b] such that F is continuous and for all x\in (a,b), \displaystyle F'(x)=f(x).

If f is Riemann integrable on [a,b], then \displaystyle \int_a^b f(x)\,dx=F(b)-F(a).

Gradient Theorem (Proof)

This amazing theorem is also called the Fundamental Theorem of Calculus for Line Integrals. It is quite a powerful theorem that sometimes allows fast computations of line integrals.

Gradient Theorem (Fundamental Theorem of Calculus for Line Integrals)

Let C be a differentiable curve given by the vector function \mathbf{r}(t), a\leq t\leq b.

Let f be a differentiable function of n variables whose gradient vector \nabla f is continuous on C. Then \displaystyle \int_C \nabla f\cdot d\mathbf{r}=f(\mathbf{r}(b))-f(\mathbf{r}(a)).


\begin{aligned}  \int_C\nabla f\cdot d\mathbf{r}&=\int_a^b\nabla f(\mathbf{r}(t))\cdot \mathbf{r}'(t)\,dt\ \ \ \text{(Definition of line integral)}\\  &=\int_a^b (\frac{\partial f}{\partial x_1}\frac{dx_1}{dt}+\frac{\partial f}{\partial x_2}\frac{dx_2}{dt}+\dots+\frac{\partial f}{\partial x_n}\frac{dx_n}{dt})\,dt\\  &=\int_a^b \frac{d}{dt}f(\mathbf{r}(t))\,dt\ \ \ \text{(by Multivariate Chain Rule)}\\  &=f(\mathbf{r}(b))-f(\mathbf{r}(a))\ \ \ \text{(by Fundamental Theorem of Calculus)}  \end{aligned}

Multivariable Version of Taylor’s Theorem

Multivariable calculus is an interesting topic that is often neglected in the curriculum. Furthermore it is hard to learn since the existing textbooks are either too basic/computational (e.g. Multivariable Calculus, 7th Edition by Stewart) or too advanced. Many analysis books skip multivariable calculus altogether and just focus on measure and integration.

If anyone has a good book that covers multivariable calculus (preferably rigorously with proofs), do post it in the comments!

The following is a useful multivariable version of Taylor’s Theorem, using the multi-index notation which is regarded as the most efficient way of writing the formula.

Multivariable Version of Taylor’s Theorem

Let f:\mathbb{R}^n\to\mathbb{R} be a k times differentiable function at the point \mathbf{a}\in\mathbb{R}^n. Then there exists h_\alpha:\mathbb{R}^n\to\mathbb{R} such that \displaystyle f(\mathbf{x})=\sum_{|\alpha|\leq k}\frac{D^\alpha f(\mathbf{a})}{\alpha!}(\mathbf{x}-\mathbf{a})^\alpha+\sum_{|\alpha|=k}h_\alpha(\mathbf{x})(\mathbf{x}-\mathbf{a})^\alpha, and \lim_{\mathbf{x}\to\mathbf{a}}h_\alpha(\mathbf{x})=0.

Example (n=2, k=1)

Write \mathbf{x}-\mathbf{a}=\mathbf{v}.
\displaystyle f(x,y)=f(\mathbf{a})+\frac{\partial f}{\partial x}(\mathbf{a})v_1+\frac{\partial f}{\partial y}(\mathbf{a})v_2+h_{(1,0)}(x,y)v_1+h_{(0,1)}(x,y)v_2.

Pasting Lemma (Elaboration of Wikipedia’s proof)

The proof of the Pasting Lemma at Wikipedia is correct, but a bit unclear. In particular, it does not clearly show how the hypothesis that X, Y are both closed is being used. It actually has something to do with subspace topology.

I have added some clarifications here:

Pasting Lemma (Statement)

Let X, Y be both closed (or both open) subsets of a topological space A such that A=X\cup Y, and let B also be a topological space. If both f|_X: X\to B and f|_Y: Y\to B are continuous, then f:A \to B is continuous.


Let U be a closed subset of B. Then f^{-1}(U)\cap X is closed in X since it is the preimage of U under the function f|_X:X\to B, which is continuous. Hence f^{-1}(U)\cap X=F\cap X for some set F closed in A. Since X is closed in A, f^{-1}(U)\cap X is closed in A.

Similarly, f^{-1}(U)\cap Y is closed (in A). Then, their union f^{-1}(U) is also closed (in A), being a finite union of closed sets.

Mertens’ Theorem

Mertens’ Theorem

Let (a_n) and (b_n) be real or complex sequences.

If the series \sum_{n=0}^\infty a_n converges to A and \sum_{n=0}^\infty b_n converges to B, and at least one of them converges absolutely, then their Cauchy product converges to AB.

An immediate corollary of Mertens’ Theorem is that if a power series f(x)=\sum a_kx^k has radius of convergence R_a, and another power series g(x)=\sum b_kx^k has radius of convergence R_b, then their Cauchy product converges to f\cdot g and has radius of convergence at least the minimum of R_a, R_b.

Note that a power series converges absolutely within its radius of convergence so Mertens’ Theorem applies.

Tietze Extension Theorem and Pasting Lemma

Tietze Extension Theorem

If X is a normal topological space and \displaystyle f:A\to\mathbb{R} is a continuous map from a closed subset A\subseteq X, then there exists a continuous map \displaystyle F:X\to\mathbb{R} with F(a)=f(a) for all a in A.

Moreover, F may be chosen such that \sup\{|f(a)|:a\in A\}=\sup\{|F(x)|:x\in X\}, i.e., if f is bounded, F may be chosen to be bounded (with the same bound as f). F is called a continuous extension of f.

Pasting Lemma

Let X, Y be both closed (or both open) subsets of a topological space A such that A=X\cup Y, and let B also be a topological space. If both f|_X: X\to B and f|_Y: Y\to B are continuous, then f is continuous.


Let U be a closed subset of B. Then f^{-1}(U)\cap X is closed since it is the preimage of U under the function f|_X:X\to B, which is continuous. Similarly, f^{-1}(U)\cap Y is closed. Then, their union f^{-1}(U) is also closed, being a finite union of closed sets.

Lusin’s Theorem and Egorov’s Theorem

Lusin’s Theorem and Egorov’s Theorem are the second and third of Littlewood’s famous Three Principles.

There are many variations and generalisations, the most basic of which I think are found in Royden’s book.

Lusin’s Theorem:

Informally, “every measurable function is nearly continuous.”

(Royden) Let f be a real-valued measurable function on E. Then for each \epsilon>0, there is a continuous function g on \mathbb{R} and a closed set F\subseteq E for which \displaystyle f=g\ \text{on}\ F\ \text{and}\ m(E\setminus F)<\epsilon.

Egorov’s Theorem

Informally, “every convergent sequence of functions is nearly uniformly convergent.”

(Royden) Assume m(E)<\infty. Let \{f_n\} be a sequence of measurable functions on E that converges pointwise on E to the real-valued function f.

Then for each \epsilon>0, there is a closed set F\subseteq E for which \displaystyle f_n\to f\ \text{uniformly on}\ F\ \text{and}\ m(E\setminus F)<\epsilon.

A holomorphic and injective function has nonzero derivative

This post proves that if f:U\to V is a function that is holomorphic (analytic) and injective, then f'(z)\neq 0 for all z in U. The condition of having nonzero derivative is equivalent to the condition of conformal (preserves angles). Hence, this result can be stated as “A holomoprhic and injective function is conformal.”

(Proof modified from Stein-Shakarchi Complex Analysis)

We prove by contradiction. Suppose to the contrary f'(z_0)=0 for some z_0\in D. Using Taylor series, \displaystyle f(z)=f(z_0)+f'(z_0)(z-z_0)+\frac{f''(z_0)}{2!}(z-z_0)^2+\dots

Since f'(z_0)=0, \displaystyle f(z)-f(z_0)=a(z-z_0)^k+G(z) for all z near z_0, with a\neq 0, k\geq 2 and G(z)=(z-z_0)^{k+1}H(z) where H is analytic.

For sufficiently small w\neq 0, we write \displaystyle f(z)-f(z_0)-w=F(z)+G(z), where F(z)=a(z-z_0)^k-w.

Since |G(z)|<|F(z)| on a small circle centered at z_0, and F has at least two zeroes inside that circle, Rouche’s theorem implies that f(z)-f(z_0)-w has at least two zeroes there.

Since the zeroes of a non-constant holomorphic function are isolated, f'(z)\neq 0 for all z\neq z_0 but sufficiently close to z_0.

Let z_1, z_2 be the two roots of f(z)-f(z_0)-w. Note that since w\neq 0, z_1\neq z_0, z_2\neq z_0. If z_1=z_2, then f(z)-f(z_0)-w=(z-z_1)^2h(z) for some analytic function h. This means f'(z_0)=0 which is a contradiction.

Thus z_1\neq z_2, which implies that f is not injective.

Underrated Complex Analysis Theorem: Schwarz Lemma

The Schwarz Lemma is a relatively basic lemma in Complex Analysis, that can be said to be of greater importance that it seems. There is a whole article written on it.

The conditions and results of Schwarz Lemma are rather difficult to memorize offhand, some tips I gathered from the net on how to memorize the Schwarz Lemma are:

Conditions: f:D\to D holomorphic and fixes zero.

Result 1: |f(z)|\leq|z| can be remembered as “Range of f” subset of “Domain”.

|f'(0)|\leq 1 can be remembered as some sort of “Contraction Mapping”.

Result 2: If |f(z)|=|z|, or |f'(0)|=1, then f=az where |a|=1. Remember it as “f is a rotation”.

If you have other tips on how to remember or intuitively understand Schwarz Lemma, please let me know by posting in the comments below.

Finally, we proceed to prove the Schwarz Lemma.

Schwarz Lemma

Let D=\{z:|z|<1\} be the open unit disk in the complex plane \mathbb{C} centered at the origin and let f:D\to D be a holomorphic map such that f(0)=0.

Then, |f(z)|\leq |z| for all z\in D and |f'(0)|\leq 1.

Moreover, if |f(z)|=|z| for some non-zero z or |f'(0)|=1, then f(z)=az for some a\in\mathbb{C} with |a|=1 (i.e.\ f is a rotation).


Consider g(z)=\begin{cases}  \dfrac{f(z)}{z} &\text{if }z\neq 0,\\  f'(0) &\text{if }z=0.  \end{cases}
Since f is analytic, f(z)=0+a_1z+a_2z^2+\dots on D, and f'(0)=a_1. Note that g(z)=a_1+a_2z+\dots on D, so g is analytic on D.

Let D_r=\{z:|z|\leq r\} denote the closed disk of radius r centered at the origin. The Maximum Modulus Principle implies that, for r<1, given any z\in D_r, there exists z_r on the boundary of D_r such that \displaystyle |g(z)|\leq|g(z_r)|=\frac{|f(z_r)|}{|z_r|}\leq\frac{1}{r}.

As r\to 1 we get |g(z)|\leq 1, thus |f(z)|\leq|z|. Thus
\begin{aligned}  |f'(0)|&=|\lim_{z\to 0}\frac{f(z)}{z}|\\  &=\lim_{z\to 0}|\frac{f(z)}{z}|\\  &\leq1.  \end{aligned}
Moreover, if |f(z)|=|z| for some non-zero z\in D or |f'(0)|=1, then |g(z)|=1 at some point of D. By the Maximum Modulus Principle, g(z)\equiv a where |a|=1. Therefore, f(z)=az.

Groups of order pq

In this post, we will classify groups of order pq, where p and q are primes with p<q. It turns out there are only two isomorphism classes of such groups, one being a cyclic group the other being a semidirect product.

Let G be the group of order pq.

Case 1: p does not divide q-1.

By Sylow’s Third Theorem, we have n_p\equiv 1\pmod p, n_p\mid q, n_q\equiv 1\pmod q, n_q\mid p.

Since n_q\mid p, n_q=1 or p. Since p<q and n_q\equiv 1\pmod q, we conclude n_q=1. Similarly, since n_p\mid q, n_p=1 or q. Since p\nmid q-1, n_p\equiv 1\pmod p implies n_p=1.

Let P, Q be the Sylow p-subgroup and Sylow q-subgroup respectively. By Lagrange’s Theorem, P\cap Q=\{1_G\}. Thus |P\cup Q|=p+q-1. Since \displaystyle pq\geq 2q>p+q>p+q-1, there is a non-identity element in G which is not in P\cup Q. Its order has to be pq, thus G is cyclic. Therefore G\cong\mathbb{Z}_{pq}.

Case 2: p divides q-1.

From previous arguments, n_q=1 hence Q is normal. Thus QP=PQ so PQ is a subgroup of G. \displaystyle |PQ|=\frac{|P||Q|}{|P\cap Q|}=pq, thus G=PQ. \text{Aut}(Q)\cong(\mathbb{Z}/q\mathbb{Z})^*\cong\mathbb{Z}_{q-1} is cyclic, thus it has a unique subgroup P' of order p, where P'=\{x\mapsto x^i\mid i\in\mathbb{Z}_q, i^p=1\}.

Let a and b be generators for P and Q respectively. Suppose the action of a on Q by conjugation is x\mapsto x^{i_0}, where i_0^p=1. (We may conclude this since the action of a on Q by conjugation is an automorphism which has order 1 or P, thus it lies in P'.)

If i_0=1, then G=P\times Q\cong\mathbb{Z}_{pq}.

If i_0\neq 1, then \displaystyle G=PQ=\langle P,Q\rangle=\langle a,b\mid a^p=b^q=1, aba^{-1}=b^{i_0}\rangle. Choosing a different i_0 amounts to choosing a different generator a for P, and hence does not result in a new isomorphism class.

Rouche’s Theorem

Rouche’s Theorem

If the complex-valued functions f and g are holomorphic inside and on some closed contour K, with |g(z)|<|f(z)| on K, then f and f+g have the same number of zeroes inside K, where each zero is counted as many times as its multiplicity.


Consider the polynomial z^5+3z^3+7 in the disk |z|<2. Let g(z)=3z^3+7, f(z)=z^5, then

\begin{aligned}  |3z^3+7|&<3(8)+7\\  &=31\\  &<32\\  &=|z^5|  \end{aligned}
for every |z|=2.
Then f+g has the same number of zeroes as f(z)=z^5 in the disk |z|<2, which is exactly 5 zeroes.

The most Striking Theorem in Real Analysis

Lebesgue’s Theorem (see below) has been called one of the most striking theorems in real analysis. Indeed it is a very surprising result.

Lebesgue’s Theorem (Monotone functions)

If the function f is monotone on the open interval (a,b), then it is differentiable almost everywhere on (a,b).

Absolutely Continuous Functions


A real-valued function f on a closed, bounded interval [a,b] is said to be absolutely continuous on [a,b] provided for each \epsilon>0, there is a \delta>0 such that for every finite disjoint collection \{(a_k,b_k)\}_{k=1}^n of open intervals in (a,b), if \displaystyle \sum_{k=1}^n(b_k-a_k)<\delta, then \displaystyle \sum_{k=1}^n|f(b_k)-f(a_k)|<\epsilon.

Equivalent Conditions

The following conditions on a real-valued function f on a compact interval [a,b] are equivalent:
(i) f is absolutely continuous;

(ii) f has a derivative f' almost everywhere, the derivative is Lebesgue integrable, and \displaystyle f(x)=f(a)+\int_a^x f'(t)\,dt for all x on [a,b];

(iii) there exists a Lebesgue integrable function g on [a,b] such that \displaystyle f(x)=f(a)+\int_a^x g(t)\,dt for all x on [a,b].

Equivalence between (i) and (iii) is known as the Fundamental Theorem of Lebesgue integral calculus.

Inner and Outer Approximation of Lebesgue Measurable Sets

Let E\subseteq\mathbb{R}. Then each of the following four assertions is equivalent to the measurability of E.

(Outer Approximation by Open Sets and G_\delta Sets)

(i) For each \epsilon>0, there is an open set G containing E for which m^*(G\setminus E)<\epsilon.

(ii) There is a G_\delta set G containing E for which m^*(G\setminus E)=0.

(Inner Approximation by Closed Sets and F_\sigma Sets)

(iii) For each \epsilon>0, there is a closed set F contained in E for which m^*(E\setminus F)<\epsilon.

(iv) There is an F_\sigma set F contained in E for which m^*(E\setminus F)=0.

(E measurable implies (i)):

Assume E is measurable. Let \epsilon>0. First we consider the case where m^*(E)<\infty. By the definition of outer measure, there is a countable collection of open intervals \{I_k\}_{k=1}^\infty which covers E and satisfies \displaystyle \sum_{k=1}^\infty l(I_k)<m^*(E)+\epsilon.

Define G=\bigcup_{k=1}^\infty I_k. Then G is an open set containing E. By definition of the outer measure of G, \displaystyle m^*(G)\leq\sum_{k=1}^\infty l(I_k)<m^*(E)+\epsilon.

Since E is measureable and has finite outer measure, by the excision property, \displaystyle m^*(G\setminus E)=m^*(G)-m^*(E)<\epsilon.

Now consider the case that m^*(E)=\infty. Since \mathbb{R} is \sigma-finite, E may be expressed as the disjoint union of a countable collection \{E_k\}_{k=1}^\infty of measurable sets, each of which has finite outer measure.

By the finite measure case, for each k\in\mathbb{N}, there is an open set G_k containing E_k for which m^*(G_k\setminus E_k)<\epsilon/2^k. The set G=\bigcup_{k=1}^\infty G_k is open, it contains E and \displaystyle G\setminus E=(\bigcup_{k=1}^\infty G_k)\setminus E\subseteq\bigcup_{k=1}^\infty (G_k\setminus E_k).

\begin{aligned}  m^*(G\setminus E)&\leq\sum_{k=1}^\infty m^*(G_k\setminus E_k)\\  &<\sum_{k=1}^\infty\epsilon/2^k\\  &=\epsilon.  \end{aligned}
Thus property (i) holds for E.

((i) implies (ii)):

Assume property (i) holds for E. For each k\in\mathbb{N}, choose an open set O_k that contains E such that m^*(O_k\setminus E)<1/k. Define G=\bigcap_{k=1}^\infty O_k. Then G is a G_\delta set that contains E. Note that for each k, \displaystyle G\setminus E\subseteq O_k\setminus E.

By monotonicity of outer measure, \displaystyle m^*(G\setminus E)\leq m^*(O_k\setminus E)<1/k.

Thus m^*(G\setminus E)=0 and hence (ii) holds.

((ii)\implies E is measurable):

Now assume property (ii) holds for E. Since a set of measure zero is measurable, G\setminus E is measurable. G is a G_\delta set and thus measurable. Since measurable sets form a \sigma-algebra, E=G\cap(G\setminus E)^c is measurable.


Assume condition (i) holds. Note that E^c is measurable iff E is measurable. Thus there exists an open set G\supseteq E^c such that m^*(G\setminus E^c)<\epsilon.

Define F=\mathbb{R}\setminus G which is closed. Note that F\subseteq E, and m^*(E\setminus F)=m^*(G\setminus E^c)<\epsilon.




Similar idea. Note that a set is G_\delta iff its complement is F_\sigma.

How to remember the Divergence Theorem

The Divergence Theorem:
\displaystyle \int_U\nabla\cdot\mathbf{F}\,dV_n=\oint_{\partial U}\mathbf{F}\cdot\mathbf{n}\,dS_{n-1}

is a rather formidable looking formula that is not so easy to memorise.

One trick is to remember it is to remember the simpler-looking General Stoke’s Theorem.

One can use the general Stoke’s Theorem (\int_{\Omega}d\omega=\int_{\partial\Omega}\omega) to equate the n-dimensional volume integral of the divergence of a vector field \mathbf{F} over a region U to the (n-1)-dimensional surface integral of \mathbf{F} over the boundary of U.

Markov’s Inequality: No more than 1/5 of the population can have more than 5 times the average income

One way to remember Markov’s Inequality (also called Chebyshev’s Inequality) is to remember this application: No more than 1/5 of the population can have more than 5 times the average income. For instance, if the average income of a certain country is USD $3000 per month, no more than 20% of the citizens can earn more than $15 000!

Brief Explanation

\mu(\{x\in X: f(x)\geq\epsilon\})\leq\frac{1}{\epsilon}\int_X f\,d\mu is Markov’s Inequality, where \mu is the probability measure. Taking \epsilon=5A to be 5 times the average income, the left hand side represents the probability of having more than 5 times the average income. The right hand side is \frac{1}{5A}\cdot A=\frac 15.

Chebyshev’s/Markov’s Inequality (Proof):
If (X,\Sigma,\mu) is a measure space, f is a non-negative measurable extended real-valued function, and \epsilon>0, then \displaystyle \mu(\{x\in X: f(x)\geq\epsilon\})\leq\frac{1}{\epsilon}\int_X f\,d\mu.

Define \displaystyle s(x)=\begin{cases}  \epsilon, &\text{if}\ f(x)\geq\epsilon\\  0, &\text{if}\ f(x)<\epsilon.  \end{cases}
Then 0\leq s(x)\leq f(x). Thus \int_X f(x)\,d\mu\geq\int_X s(x)\,d\mu=\epsilon\mu(\{x\in X: f(x)\geq\epsilon\}). Dividing both sides by \epsilon>0 gives the result.

Fatou’s Lemma

Fatou’s Lemma
Let (f_n) be a sequence of nonnegative measurable functions, then \displaystyle\int\liminf_{n\to\infty}f_n\,d\mu\leq\liminf_{n\to\infty}\int f_n\,d\mu.

A brilliant graphical way to remember Fatou’s Lemma (taken from the site http://math.stackexchange.com/questions/242920/what-are-some-tricks-to-remember-fatous-lemma).

The first two are f1 and f2 respectively, but even the smaller of these is larger than the area in the third picture, which is inf fn.