Abstract
The derivative is one of the two fundamental concepts introduced in calculus. The other one is, of course, the (Riemann) integral. For a real-valued function of a real variable, the derivative may be interpreted as an extension of the notion of slope defined for (nonvertical) straight lines. Recall that a (nonvertical) straight line is the graph of an affine function x ↦ ax + b, where a, b are real constants and a is the slope of the line.
Keywords
- Intermediate Value Property
- Newton-Raphson Process
- Arithmetic-harmonic Mean Inequality
- Characterizing Convex Functions
- Inequality Means
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
The derivative is one of the two fundamental concepts introduced in calculus. The other one is, of course, the (Riemann) integral. For a real-valued function of a real variable, the derivative may be interpreted as an extension of the notion of slope defined for (nonvertical) straight lines. Recall that a (nonvertical) straight line is the graph of an affine function x ↦ ax + b, where a, b are real constants and a is the slope of the line. Now, if \(f(x)\!:= ax + b\ \forall x \in \mathbb{R},\) then, for any \(x,\;x_{0} \in \mathbb{R},\;x\neq x_{0},\) we have
In other words, the slope a is the (average) rate of change of the dependent variable \(y\!:= f(x) = ax + b\) with respect to the independent variable x. For a general function \(f: I \rightarrow \mathbb{R},\) where \(I \subset \mathbb{R}\) is an interval, the quotient in (*) is no longer a constant because the graph is a curve. Now, using a graphing calculator, which is a quite popular tool these days, if we zoom in repeatedly at a point (x 0, f(x 0)) where the graph is smooth, we observe that the graph becomes practically a straight line segment. In other words, at least locally [i.e., in a small neighborhood of a (smooth) point], the graph is linear. Therefore, in that small neighborhood, the graph of f and the tangent line to this graph at (x 0, f(x 0)) are practically identical. This suggests, once again, the “analytical” approach of divide and conquer. Our goal in this chapter will be to carry out this analysis by making the above intuitive approach mathematically rigorous. Throughout the chapter, I,J will always denote intervals in \(\mathbb{R}\) with nonempty interior.
6.1 Differentiability
In this section we define the derivative of a real-valued function of a real variable and investigate its basic properties. We begin with the following definition.
Definition 6.1.1 (Differentiable, Derivative, Tangent Line).
Let \(f: I \rightarrow \mathbb{R}\) and let x 0 ∈ I.
-
1.
We say that f is differentiable at x 0 if the limit
$$\displaystyle{f^{{\prime}}(x_{ 0})\!:=\lim _{x\rightarrow x_{0}} \frac{f(x) - f(x_{0})} {x - x_{0}} }$$exists. The number f ′(x 0) is then called the derivative of f at x 0. This number is also called the slope of the tangent line to the graph of f at the point (x 0, f(x 0)). The equation of this tangent line is then
$$\displaystyle{y - f(x_{0}) = f^{{\prime}}(x_{ 0})(x - x_{0}).}$$ -
2.
If A ⊂ I and if f ′(x) exists for every x ∈ A, then we say that f is differentiable on A. We say that f is differentiable if it is differentiable on I. If f is differentiable on I, the function x ↦ f ′(x) (defined on I) is called the derivative of f.
Remark 6.1.2.
-
1.
The difference quotient \((f(x) - f(x_{0}))/(x - x_{0})\) is the slope of the line segment joining the points (x 0, f(x 0)) and (x, f(x)) of the graph of f. It can also be interpreted as the average rate of change of y: = f(x) with respect to x on the interval with endpoints x 0 and x. The derivative f ′(x 0), if it exists, is then the instantaneous rate of change of y = f(x) with respect to x at x 0.
-
2.
If we set \(h = x - x_{0}\) in the above definition, we may also write
$$\displaystyle{f^{{\prime}}(x_{ 0})\!:=\lim _{h\rightarrow 0}\frac{f(x_{0} + h) - f(x_{0})} {h},}$$if the limit exists.
-
3.
It follows from the above definition that differentiability of a function is a local property. In other words, if \(f: I \rightarrow \mathbb{R},\ x_{0} \in I,\) and if g is a function such that \(f(x) = g(x)\ \forall x \in (x_{0}-\delta,x_{0}+\delta ) \cap I\) is satisfied for some δ > 0, then f is differentiable at x 0 if and only if g is differentiable at x 0, and \(f^{{\prime}}(x_{0}) = g^{{\prime}}(x_{0})\).
Notation 6.1.3.
There are several commonly used forms to denote the derivative of a function. Depending on the situation, one may prefer one form to another. For example, there are cases in which Leibniz’s df∕dx is more convenient than Newton’s (in fact, Lagrange’s!) prime notation f ′(x). There are also situations where Arbogast’s operator notation Df(x) (or D x f(x)) has definite advantages. Most of these forms will be used in this book. Thus, if f is differentiable at x 0, we write
In fact, we shall even abuse the notation and write (f(x))′ instead of f ′(x), if this simplifies the exposition. For example, if f(x) = x n, we may write (x n)′ instead of f ′(x).
Example 6.1.4.
The function \(f(x)\!:= x^{3}\ \forall x \in \mathbb{R}\) is differentiable on \(\mathbb{R}\) with derivative \(f^{{\prime}}(x) = 3x^{2}\ \forall x \in \mathbb{R}\).
To see this, note that for each \(x_{0} \in \mathbb{R}\) we have
Definition 6.1.5.
- (Left (Right) Derivative, Angular Point). :
-
Let \(f: I \rightarrow \mathbb{R}\) and let x 0 ∈ I. If x 0 is not the right endpoint of I, then we say that f is right differentiable at x 0 if the limit
$$\displaystyle{f_{+}^{{\prime}}(x_{ 0})\!:=\lim _{x\rightarrow x_{0}+}\frac{f(x) - f(x_{0})} {x - x_{0}} }$$exists. The number f + ′(x 0) is then called the right derivative of f at x 0. Similarly, if x 0 is not the left endpoint of I, then we say that f is left differentiable at x 0 if the limit
$$\displaystyle{f_{-}^{{\prime}}(x_{ 0})\!:=\lim _{x\rightarrow x_{0}-}\frac{f(x) - f(x_{0})} {x - x_{0}} }$$exists. The number f − ′(x 0) is then called the left derivative of f at x 0. If the left and right derivatives of f are both defined at x 0 ∈ I ∘ but are not equal, then we say that (x 0, f(x 0)) is an angular point of the graph of f.
- (Infinite Derivative, Vertical Tangent). :
-
We say that f has an infinite derivative at x 0, and write f ′(x 0) = ±∞, if
$$\displaystyle{\lim _{h\rightarrow 0}\frac{f(x_{0} + h) - f(x_{0})} {h} = \pm \infty.}$$If f has an infinite derivative at x 0, we say that the graph of f has a vertical tangent at (x 0, f(x 0)). The equation of this line is, of course, x = x 0.
Remark 6.1.6.
If \(f: I \rightarrow \mathbb{R}\) and x 0 ∈ I ∘, then it is obvious that f is differentiable at x 0 if and only if it is both right and left differentiable at x 0 and we have \(f_{-}^{{\prime}}(x_{0}) = f_{+}^{{\prime}}(x_{0})\). This common value is then f ′(x 0). Also, if f ′(x 0) exists and x 0 is the left endpoint (resp., right endpoint) of the interval I, then we automatically have \(f^{{\prime}}(x_{0}) = f_{+}^{{\prime}}(x_{0})\) (resp., \(f^{{\prime}}(x_{0}) = f_{-}^{{\prime}}(x_{0}))\).
Example 6.1.7.
-
(a)
The function \(f(x)\!:= \vert x\vert \ \forall x \in \mathbb{R}\) is differentiable on \(\mathbb{R}\setminus \{0\}\) with derivative
$$\displaystyle{ f^{{\prime}}(x) = \left \{\begin{array}{@{}l@{\quad }l@{}} -1\quad &\mbox{ if $x < 0$}, \\ 1 \quad &\mbox{ if $x > 0$}.\end{array} \right. }$$Indeed, this is an immediate consequence of the definition:
$$\displaystyle{ f^{{\prime}}(x_{ 0}) =\lim _{x\rightarrow x_{0}} \frac{\vert x\vert -\vert x_{0}\vert } {x - x_{0}} = \left \{\begin{array}{@{}l@{\quad }l@{}} \lim _{x\rightarrow x_{0}} \frac{x - x_{0}} {x - x_{0}} = 1 \quad &\mbox{ if $x_{0} > 0$}, \\ \lim _{x\rightarrow x_{0}} \frac{-(x - x_{0})} {x - x_{0}} = -1\quad &\mbox{ if $x_{0} < 0$}. \end{array} \right. }$$Note also that
$$\displaystyle{f^{{\prime}}(0) =\lim _{ x\rightarrow 0}\frac{\vert x\vert -\vert 0\vert } {x - 0} =\lim _{x\rightarrow 0}\frac{\vert x\vert } {x} }$$does not exist. In fact, \(f_{-}^{{\prime}}(0) = -1\neq 1 = f_{+}^{{\prime}}(0)\). The point (0, 0) is therefore an angular point of the graph.
-
(b)
The function \(f(x)\!:= x^{1/3}\ \forall x \in \mathbb{R}\) is differentiable on \(\mathbb{R}\setminus \{0\}\) with derivative
$$\displaystyle{f^{{\prime}}(x) = \frac{1} {3}x^{-2/3}\quad \;\forall x\neq 0.}$$Also, f has a vertical tangent at x 0 = 0. Indeed, for each \(x_{0} \in \mathbb{R}\setminus \{ 0\},\)
$$\displaystyle\begin{array}{rcl} f^{{\prime}}(x_{ 0})& =& \lim _{x\rightarrow x_{0}} \frac{\root{3}\of{x} -\root{3}\of{x_{0}}} {x - x_{0}} =\lim _{x\rightarrow x_{0}} \frac{\root{3}\of{x} -\root{3}\of{x_{0}}} {(\root{3}\of{x} -\root{3}\of{x_{0}})(\root{3}\of{x^{2}} + \root{3}\of{xx_{0}} + \root{3}\of{x_{0}^{2}})} {}\\ & =& \lim _{x\rightarrow x_{0}} \frac{1} {\root{3}\of{x^{2}} + \root{3}\of{xx_{0}} + \root{3}\of{x_{0}^{2}}} = \frac{1} {3\root{3}\of{x_{0}^{2}}}, {}\\ \end{array}$$which also implies that \(\lim _{h\rightarrow 0}\frac{f(h)-f(0)} {h} = +\infty \). (Why?)
Exercise 6.1.8.
Consider the function \(f(x)\!:= \sqrt{\vert x\vert },\ \forall x \in \mathbb{R}\).
-
(a)
Using the definition (and considering the cases x 0 > 0 and x 0 < 0 separately), find f ′(x 0) for all x 0 ≠ 0.
-
(b)
Show that f ′(0) does not exist (even as an infinite derivative). In fact, show that the left derivative at x 0 = 0 is −∞ while the right derivative is + ∞. This shows that the graph of f does not have a vertical tangent at (0, 0) in the sense of the above definition.
Exercise 6.1.9.
Given a finite set \(\{a_{1},a_{2},\ldots,a_{n}\} \subset \mathbb{R},\) use an appropriate (algebraic) combination of functions of the form x ↦ | x − c | to construct a continuous function \(f:\mathbb{R}\rightarrow \mathbb{R}\) such that f ′(a k ) does not exist for k = 1, 2, …, n.
Remark 6.1.10.
In fact, it is even possible to construct functions that are continuous on \(\mathbb{R}\) but are nowhere differentiable. We shall construct such a function later, when we study sequences and series of functions.
The following characterization of differentiability will be useful in many proofs. Before stating it, we briefly recall the definitions of equivalent functions and of Landau’s little “oh” (see Sect. 3.5 for details). We say that two functions f and g (defined near a point x 0) are equivalent at x 0, and we write f ∼ g (x → x 0), if there is a function u (defined near x 0) such that f(x) = g(x)u(x) and \(\lim _{x\rightarrow x_{0}}u(x) = 1\). Also, we say that f is negligible compared to g as x → x 0, and write f = o(g) (x → x 0), if there is a function ζ (defined near x 0) such that f(x) = g(x)ζ(x) and \(\lim _{x\rightarrow x_{0}}\zeta (x) = 0\).
Proposition 6.1.11 (Carathéodory).
Let \(f: I \rightarrow \mathbb{R}\) and let x 0 ∈ I. Then f is differentiable at x 0 if and only if there exists a function \(\phi: I \rightarrow \mathbb{R}\) such that ϕ is continuous at x 0 and we have
In this case, we have f ′ (x 0 ) = ϕ(x 0 ).
Proof.
If ϕ exists, then \(\phi (x) = (f(x) - f(x_{0}))/(x - x_{0}),\ x\neq x_{0},\) and, since ϕ is continuous at x 0, we have \(\lim _{x\rightarrow x_{0}}(f(x) - f(x_{0}))/(x - x_{0}) =\phi (x_{0});\) i.e., \(f^{{\prime}}(x_{0}) =\phi (x_{0})\). Conversely, if f ′(x 0) exists, then we define
It is then easily seen (why?) that ϕ satisfies the conditions of the proposition. □
Remark 6.1.12.
By Remark 6.1.2(3), for f ′(x 0) = ϕ(x 0) to exist, the continuous function ϕ in the above proposition need not be defined on all of I. We only need ϕ to be defined on a (nondegenerate) subinterval J ⊂ I with x 0 ∈ J.
As we saw above, the function \(f(x)\!:= \vert x\vert \ \forall x \in \mathbb{R}\) is not differentiable at x 0 = 0 even though it is obviously continuous there. The following corollary shows that differentiability is a stronger condition and, in general, implies continuity:
Corollary 6.1.13 (Differentiable \(\boldsymbol{\Longrightarrow}\) Continuous).
Let \(f: I \rightarrow \mathbb{R}\) and let x 0 ∈ I. If f is differentiable at x 0 , then it is continuous at x 0 . In fact, if f is right (resp., left) differentiable at x 0 , then it is right (resp., left) continuous at x 0 . In particular, f is continuous at its angular points.
Proof.
Well, let ϕ be as in Proposition 6.1.11. Then
so that \(\lim _{x\rightarrow x_{0}}f(x) = f(x_{0})\) as desired. Alternatively, we have
So letting x → x 0 or x → x 0+ or x → x 0−, we obtain the continuity or right (resp., left) continuity of f at x 0. The last statement is then obvious. □
The next consequence is in fact a rewording of Proposition 6.1.11 itself:
Corollary 6.1.14 (Local Linearity).
Let \(f: I \rightarrow \mathbb{R}\) and let x 0 ∈ I. Then f is differentiable at x 0 if and only if there exists a number \(m \in \mathbb{R}\) such that
and we then have m = f ′ (x 0 ). Thus, with the affine function \(g(x)\!:= mx + f(x_{0}) - mx_{0},\) whose graph is (by definition) the tangent line to the graph of f at (x 0 ,f(x 0 )), we have \(f(x) - g(x) = (x - x_{0})o(1)\) , as x → x 0 .
Proof.
It is obvious (from (*) and the definition of f ′(x 0)) that, if m exists, then we indeed have f ′(x 0) = m. Conversely, define \(\zeta (x)\!:=\phi (x) -\phi (x_{0})\) for x ∈ I, where ϕ is as in Proposition 6.1.11. Then \(\lim _{x\rightarrow x_{0}}\zeta (x) = 0\). Thus, ζ(x) = o(1) (x → x 0) and (∗) follows at once with \(m =\phi (x_{0}) = f^{{\prime}}(x_{0})\). □
Remark 6.1.15.
Note that, with the above notation, not only f(x) − g(x) → 0 as x → x 0, but even \([f(x) - g(x)]/(x - x_{0}) \rightarrow 0\) as x → x 0.
Corollary 6.1.16.
Let \(f: I \rightarrow \mathbb{R}\) be differentiable at x 0 ∈ I. If f ′ (x 0 ) ≠ 0, then we have
Proof.
Since \(\lim _{h\rightarrow 0}(f^{{\prime}}(x_{0}) + o(1)) = f^{{\prime}}(x_{0})\neq 0,\) we have \(f^{{\prime}}(x_{0}) + o(1) \sim f^{{\prime}}(x_{0})\ (h \rightarrow 0)\). Also, we obviously have h ∼ h (h → 0). Thus, by Corollary 6.1.14 and Theorem 3.5.11, we have
□
6.2 Derivatives of Elementary Functions
We are now going to find the derivatives of some of the most commonly used functions. These include the power functions, the trigonometric functions, and the exponential function. As we have mentioned before, the rigorous definitions of trigonometric and exponential functions will be given later. In fact, the definition of the general power function x ↦ x r, where \(x > 0\) and \(r \in \mathbb{R},\) also depends on the exponential function. Thus, we are going to assume some of the properties of these functions whose proofs will not be given in this section. Once these properties are assumed, however, the rest of the arguments are quite straightforward.
Beginning with constant functions, we have the following trivial result:
Proposition 6.2.1.
If \(f(x)\!:= c\ \forall x \in I\) and some constant \(c \in \mathbb{R},\) then \(f^{{\prime}}(x) = 0\ \forall x \in I\) . In other words, the derivative of a constant function (on an interval I) is the (identically) zero function (on I).
Proof.
Indeed, for every x 0 ∈ I, it follows from the definition that
□
Next, we look at power functions. Recall that, if \(r = m/n \in \mathbb{Q},\) where m, n are relatively prime integers (and, of course, n ≠ 0), then we have \(x^{r}\!:= \root{n}\of{x^{m}},\) where we assume x ≥ 0 if n is even and x ≠ 0 if r ≤ 0. Recall also that, for x ≠ 0, we have x 0: = 1 and that \(x^{-r}\!:= 1/x^{r},\) when x r is well defined.
Proposition 6.2.2 (Power Rule).
Given any rational number \(r \in \mathbb{Q},\) the function x ↦ x r is differentiable, and we have
for every x for which the two sides are defined. In fact, the rule remains valid for the function x ↦ x r where x > 0 and \(r \in \mathbb{R}\) is arbitrary.
Proof.
(For \(r \in \mathbb{Q}\)) For r = 0 (resp., r = 1) we have \(x^{0} = 1\ \forall x\neq 0\) (resp., \(x^{r} = x\ \forall x \in \mathbb{R}\)) and a direct application of the definition implies that (x 0)′ = 0 (resp., (x)′ = 1). Assume next that \(0 < r = m/n\neq 1,\) with relatively prime positive integers m and n. Also, assuming x r and x 0 r are both defined, let \(\xi \!:= \root{n}\of{x}\) and \(\xi _{0}\!:= \root{n}\of{x_{0}}\). Then we have
as can be checked at once by expanding and simplifying the right-hand side. Similarly, we have
Therefore, assuming x 0 ≠ 0 if r ∈ (0, 1), we have
Finally, suppose that \(r = -m/n,\) with m, n as above. Then, using the previous case and assuming that all powers make sense, we have
and the proof is complete for the case \(r \in \mathbb{Q}\). For \(r \in \mathbb{R}\), the proof will be given later when the power function x ↦ x r is defined rigorously. □
Next, we consider the (natural) exponential function \(\exp (x) = e^{x}\ \forall x \in \mathbb{R}\). As we pointed out above, this function will be rigorously defined later. One of the consequences of that definition will be the well-known property
Another important consequence is the following proposition. The proofs of these facts are postponed until the precise definition is given.
Proposition 6.2.3.
The (natural) exponential function x ↦ exp (x) = e x satisfies
In other words, since e 0 := 1, the function x ↦ exp (x) is differentiable at x = 0 and we have exp′ (0) = 1.
An immediate consequence is then the following.
Proposition 6.2.4.
The exponential function x ↦ exp (x) is differentiable on \(\mathbb{R}\) and we have
Proof.
Using the limit property (*) in Proposition 6.2.3, we have that
□
Finally, we look at the derivatives of the trigonometric functions sin and cos. Once again, the rigorous definitions of these functions will be given later when we discuss power series. It will follow from those definitions that, for any real numbers \(x,\;h \in \mathbb{R},\) we have
-
(i)
\(\sin (x + h) =\sin x\cos h +\cos x\sin h.\)
Similarly, for all \(x,\;h \in \mathbb{R},\) we have
-
(ii)
\(\cos (x + h) =\cos x\cos h -\sin x\sin h.\)
We also have the following limit properties:
Proposition 6.2.5.
The functions x ↦ sin x and x ↦ cos x are continuous on \(\mathbb{R}\) and we have
In other words, since sin 0 = 0 and cos 0 = 1, the functions sin and cos are both differentiable at x = 0 with (sin ) ′ (0) = 1 and (cos ) ′ (0) = 0.
Proof.
Postponed! □
We can now prove that the functions sin and cos are differentiable on \(\mathbb{R}\).
Proposition 6.2.6.
The functions x ↦ sin x and x ↦ cos x are differentiable on \(\mathbb{R}\) and we have
Proof.
For (a), using the identity (i) above and Proposition 6.2.5, we have
For (b), we use the identity (ii) and Proposition 6.2.5 to get
□
6.3 The Differential Calculus
In this section we shall derive the fundamental rules of differentiation. Some of these rules allow the differentiation of functions constructed from differentiable functions by means of simple algebraic operations. The Chain Rule, which is the most important and powerful of these rules, will handle the differentiation of composite functions.
Theorem 6.3.1.
Let f and g be real-valued functions defined on an interval I and assume that both functions are differentiable at a point x 0 ∈ I. Then the functions f ± g, fg, cf (where \(c \in \mathbb{R}\) is any constant), and f∕g are differentiable at x 0 (for f∕g we obviously assume g(x 0 ) ≠ 0), and we have
-
(a)
\((f \pm g)^{{\prime}}(x_{0}) = f^{{\prime}}(x_{0}) \pm g^{{\prime}}(x_{0});\)
-
(b)
\((fg)^{{\prime}}(x_{0}) = f^{{\prime}}(x_{0})g(x_{0}) + f(x_{0})g^{{\prime}}(x_{0})\qquad (product\;rule)\) ;
-
(c)
\((cf)^{{\prime}}(x_{0}) = cf^{{\prime}}(x_{0});\)
-
(d)
\((f/g)^{{\prime}}(x_{0}) = \frac{f^{{\prime}}(x_{0})g(x_{0}) - f(x_{0})g^{{\prime}}(x_{0})} {(g(x_{0}))^{2}} \qquad (quotient\;rule)\) .
Proof.
These rules are immediate consequences of the definition of the derivative and the limit properties (cf. Theorem 3.3.3). Part (a) follows from the fact that
Also, (c) follows from (b) and Proposition 6.2.1 or from the obvious observation
To prove (b), note that
Now, by Corollary 6.1.13, f is continuous at x 0 and we have \(\lim _{x\rightarrow x_{0}}f(x) = f(x_{0})\). Therefore, taking limits as x → x 0 in (∗), we obtain (b). Finally, to prove (d), we first observe that, if g(x 0) ≠ 0, then
Indeed,
But, by Corollary 6.1.13, g is continuous at x 0 and hence \(\lim _{x\rightarrow x_{0}}g(x) = g(x_{0})\). Taking limits in (†) as x → x 0, we obtain (**) as claimed. The property (d) is now an immediate consequence of (b) and (**). □
Corollary 6.3.2.
Let the functions \(f_{j}: I \rightarrow \mathbb{R}\quad j = 1,\;2,\ldots,\;n\) be differentiable at x 0 ∈ I and let \(c_{1},\;c_{2},\;\ldots,\;c_{n} \in \mathbb{R}\) be arbitrary constants. Then the linear combination \(\sum _{j=1}^{n}c_{j}f_{j}\) is differentiable at x 0 and we have
Also, the product f 1 f 2 ⋯f n is differentiable at x 0 , with derivative
Exercise 6.3.3.
-
(a)
Prove the corollary. Hint: Use induction on n.
-
(b)
Deduce the following extension of the Power Rule for integral exponents: Let \(f: I \rightarrow \mathbb{R}\) and, for any integer \(n \in \mathbb{Z},\) consider the function \(g(x)\!:= [f(x)]^{n}\ \forall x \in I\) (where, for n ≤ 0, we have \(\mbox{ dom}(g) =\{ x \in I: f(x)\neq 0\}\)). If f is differentiable at x 0 ∈ I, then so is g and we have
$$\displaystyle{g^{{\prime}}(x_{ 0}) = n[f(x_{0})]^{n-1}f^{{\prime}}(x_{ 0}),}$$where the formula is interpreted as \(g^{{\prime}}(x_{0}) = f^{{\prime}}(x_{0})\) if n = 1, and we assume f(x 0) ≠ 0 if n < 1.
We are now going to state and prove the Chain Rule, which is the most important and powerful rule of differentiation. This rule, combined with the other rules and the well-known derivatives of the elementary functions, allows the differentiation of all functions one encounters in practice.
Theorem 6.3.4 (Chain Rule).
Let \(f: I \rightarrow \mathbb{R},f(I) \subset J,\) and \(g: J \rightarrow \mathbb{R}\) . If f is differentiable at a point x 0 ∈ I and g is differentiable at the point y 0 := f(x 0 ) ∈ J, then the composite function h:= g ∘ f is differentiable at x 0 and we have
Proof.
By Proposition 6.1.11, there is a function \(\phi: I \rightarrow \mathbb{R}\) such that ϕ is continuous at x 0 and \(f(x) - f(x_{0}) = (x - x_{0})\phi (x)\). Similarly, there exists a function \(\psi: J \rightarrow \mathbb{R}\) such that ψ is continuous at y 0: = f(x 0) and \(g(y) - g(y_{0}) = (y - y_{0})\psi (y)\). It follows that
Since products and composites of continuous functions are continuous, the function x ↦ ϕ(x)ψ(f(x)) is continuous at x 0 and the theorem follows from (*) and Proposition 6.1.11. □
Remark 6.3.5.
Given that differentiability means local linearity, the Chain Rule should come as no surprise. Indeed, if \(f(x) = ax + b\) and \(g(x) = cx + d\) are both affine functions, then so is the composite \((g \circ f)(x) = cax + cb + d,\) whose slope is precisely \(ca = g^{{\prime}}(f(x))f^{{\prime}}(x)\), valid for all x in this case.
Exercise 6.3.6 (General Power Rule).
Show that, if \(f: I \rightarrow \mathbb{R}\) is differentiable at x 0 ∈ I, then so is the function \(g: x\mapsto [f(x)]^{r},r \in \mathbb{R},\) and we have
Here, the domain of g depends on the exponent r. Thus, for arbitrary \(\;r \in \mathbb{R},\) we have dom(g) = { x ∈ I: f(x) > 0}.
Example 6.3.7.
-
(a)
The function
$$\displaystyle{ f(x)\!:= \left \{\begin{array}{@{}l@{\quad }l@{}} x\sin (1/x)\quad &\mbox{ if $x\neq 0$},\\ 0 \quad &\mbox{ if $x = 0$},\end{array} \right. }$$is continuous on \(\mathbb{R}\) and differentiable on \(\mathbb{R}\setminus \{0\}\). Indeed, the functions x ↦ x, x ↦ sinx, and x ↦ 1∕x are all continuous on \(\mathbb{R}\setminus \{0\}\) and hence so is f. To prove the continuity at x = 0, we note that \(\vert x\sin (1/x)\vert \leq \vert x\vert \ \forall x\neq 0\). Therefore, by the Squeeze Theorem, we have \(\lim _{x\rightarrow 0}f(x) =\lim _{x\rightarrow 0}\vert x\vert = 0 = f(0)\). Next, for each x ≠ 0, it follows from the Product Rule, the Quotient Rule, and the Chain Rule, that
$$\displaystyle{f^{{\prime}}(x) =\sin \left (\frac{1} {x}\right ) -\frac{1} {x}\cos \left (\frac{1} {x}\right )\quad \forall x\neq 0.}$$Therefore, f is indeed differentiable on \(\mathbb{R}\setminus \{0\}\) as stated and, in fact, f ′ is continuous there. At x = 0, we use the definition:
$$\displaystyle{f^{{\prime}}(0) =\lim _{ x\rightarrow 0}\frac{f(x) - f(0)} {x - 0} =\lim _{x\rightarrow 0}\frac{x\sin (1/x)} {x} =\lim _{x\rightarrow 0}\sin (1/x).}$$Since this limit does not exist (why?), f is not differentiable at x = 0.
-
(b)
The function
$$\displaystyle{ g(x)\!:= \left \{\begin{array}{@{}l@{\quad }l@{}} x^{2}\sin (1/x)\quad &\mbox{ if $x\neq 0$}, \\ 0 \quad &\mbox{ if $x = 0$},\end{array} \right. }$$is differentiable on \(\mathbb{R}\) and g ′(0) = 0. Moreover, g ′ is continuous at every \(x \in \mathbb{R}\) except x = 0. To see this, note first that, applying the differential calculus, we have
$$\displaystyle{g^{{\prime}}(x) = 2x\sin (1/x) -\cos (1/x)\quad \;\forall x\neq 0,}$$so that g ′ is indeed continuous on \(\mathbb{R}\setminus \{0\}\). At x = 0, we use the definition and obtain
$$\displaystyle{g^{{\prime}}(0) =\lim _{ x\rightarrow 0}\frac{g(x) - g(0)} {x - 0} =\lim _{x\rightarrow 0}\frac{x^{2}\sin (1/x)} {x} =\lim _{x\rightarrow 0}x\sin (1/x) = 0,}$$as was pointed out above. Finally, g ′ is not continuous at 0 because \(\lim _{x\rightarrow 0}g^{{\prime}}(x) =\lim _{x\rightarrow 0}(2x\sin (1/x) -\cos (1/x))\) does not exist. (Why?)
Our next goal will be to look at the derivative of an inverse function. Recall that a function \(f: I \rightarrow \mathbb{R}\) is invertible if and only if it is injective (i.e., one-to-one). If this is the case, then the inverse function f −1 has domain f(I) and is characterized by
When we are interested in differentiability, the natural question is whether or not injective, differentiable functions have differentiable inverses. The following theorem addresses this question.
Theorem 6.3.8 (Differentiability of Inverse Functions).
Let I ≠ ∅ be an open interval and let \(f: I \rightarrow \mathbb{R}\) be an injective, continuous function. If f is differentiable at x 0 ∈ I and f ′ (x 0 ) ≠ 0, then f −1 is differentiable at y 0 := f(x 0 ), and we have
In particular, if f is injective and differentiable on I and \(f^{{\prime}}(x)\neq 0\ \forall x \in I,\) then f −1 is differentiable on J:= f(I) and we have
Proof.
By Theorem 4.5.23, J: = f(I) is an interval and f is a homeomorphism of I onto J; in other words, f −1: J → I is also continuous. In particular, f and f −1 are either both strictly increasing or both strictly decreasing. Using the sequential definition of limit (Theorem 3.3.1), we must show that, given any sequence (y n ) in J ∖{y 0} with lim(y n ) = y 0, we have
But, if \(x_{n}\!:= f^{-1}(y_{n})\), the injectivity and continuity of f −1 imply that (x n ) is a sequence in I ∖{x 0} with lim(x n ) = x 0. Since f is differentiable at x 0 and f ′(x 0) ≠ 0, we have
The last statement now follows from the fact that, if f is differentiable on I, then (by Corollary 6.1.13) it is continuous on I. □
Corollary 6.3.9 (Derivative of the Natural Logarithm).
The natural logarithm x ↦ log x is differentiable on (0,∞) and we have
-
(i)
\((\log x)^{{\prime}} = \frac{1} {x}\qquad (\forall x > 0).\)
In fact, the function x ↦ log |x| is differentiable on \(\mathbb{R}\setminus \{0\}\) and we have
-
(ii)
\((\log \vert x\vert )^{{\prime}} = \frac{1} {x}\qquad (\forall x\neq 0).\)
More generally, if \(u: I \rightarrow \mathbb{R}\) is differentiable on I and \(u(x)\neq 0\ \forall x \in I,\) then the function x → log |u(x)| is differentiable on I and we have
-
(iii)
\((\log \vert u(x)\vert )^{{\prime}} = \frac{u^{{\prime}}(x)} {u(x)} \qquad (\forall x \in I).\)
Proof.
To prove (i), note that \(x\mapsto \log x\ \forall x > 0\) is the inverse of the natural exponential x ↦ exp(x). Since \((e^{x})^{{\prime}} = e^{x}\ \forall x \in \mathbb{R}\) and \(e^{x} > 0\ \forall x \in \mathbb{R},\) Theorem 6.3.8 implies that the inverse function x ↦ logx is differentiable on its domain (0, ∞) and we have
Next, we have \(\log \vert x\vert =\log x\ \forall x > 0\) so that we must only check (ii) for the case x < 0. But then, \(\vert x\vert = -x\) and (i) together with the Chain Rule implies
Finally, (iii) is an immediate consequence of (ii) and the Chain Rule. □
Exercise 6.3.10.
-
(a)
Consider the function \(f(x)\!:= x^{n}\ \forall x \in \mathbb{R},\) where n is an odd integer (cf. Example 4.5.24). Using Theorem 6.3.8, prove the Power Rule
$$\displaystyle{(x^{1/n})^{{\prime}} = \frac{1} {n}x^{1/n-1}\qquad (\forall x\neq 0).}$$ -
(b)
Prove the same rule for even integers n, using the function \(x\mapsto x^{n}\ \forall x > 0\).
-
(c)
Combining (a) and (b) (and the Chain Rule), give another proof of the Power Rule for rational exponents: \((x^{r})^{{\prime}} = rx^{r-1},\ r \in \mathbb{Q}\).
Exercise 6.3.11.
Let I be an open interval. Assume that \(u,\;v: I \rightarrow \mathbb{R}\) are both differentiable on I and \(u(x) > 0\ \forall x \in I\). Using the Chain Rule, find the derivative of the function
6.4 Mean Value Theorems
Recall that the derivative of a function f at a point x 0 is defined to be the instantaneous rate of change of the values f(x) with respect to x, as x approaches x 0. In other words, it is the limit (as x → x 0) of the average rate of change \((f(x) - f(x_{0}))/(x - x_{0})\) on the interval with endpoints x 0 and x. The main result of this section will be that, for a function that is continuous on a closed, bounded interval [a, b] and differentiable inside, the average rate of change on the interval is in fact equal to the instantaneous rate of change at an interior point. As we shall see, this result turns out to play a fundamental role in the study of the behavior of real-valued functions of a real variable. We begin with a definition:
Definition 6.4.1 (Local Extrema).
Let \(f: I \rightarrow \mathbb{R}\) and let x 0 ∈ I ∘. We say that f has a local maximum (resp., local minimum) at x 0 if there exists δ > 0 such that f(x) ≤ f(x 0) (resp., f(x) ≥ f(x 0)) for all \(x \in B_{\delta }(x_{0}) \cap I\!:= (x_{0}-\delta,x_{0}+\delta ) \cap I\). We say that f has a local extremum at x 0 if it has a local maximum or a local minimum at x 0.
Remark 6.4.2.
-
(a)
Recall that \(f: I \rightarrow \mathbb{R}\) is said to have a (global or absolute) maximum [resp., (global or absolute) minimum] at x 0 if f(x) ≤ f(x 0) (resp., f(x) ≥ f(x 0)) for all x ∈ I.
-
(b)
The plurals for maximum, minimum, and extremum are maxima, minima, and extrema, respectively.
Proposition 6.4.3 (Fermat’s Theorem).
Let \(f: I \rightarrow \mathbb{R}\) and let x 0 ∈ I ∘ be an interior point. If f has a local extremum at x 0 and is differentiable at x 0 , then f ′ (x 0 ) = 0. In other words, the tangent line to the graph of f at the point (x 0 ,f(x 0 )) is horizontal.
Proof.
Let us assume that f has a local maximum at x 0. For the local minimum, the proof is similar or one may use the function − f. Pick δ > 0 so small that B δ (x 0) ⊂ I and \(f(x) \leq f(x_{0})\ \forall x \in B_{\delta }(x_{0})\). Then we have
Letting x → x 0 in (*), we get f ′(x 0) ≤ 0. Similarly, we have
Letting x → x 0 in (**), we get f ′(x 0) ≥ 0. Therefore, we must have f ′(x 0) = 0 as claimed. □
The following consequence of the proposition shows that derivatives have one fundamental property in common with continuous functions, namely, the Intermediate Value Property:
Theorem 6.4.4 (Darboux’s Theorem).
Let \(f: I \rightarrow \mathbb{R}\) be differentiable on I ∘ and let a < b in I ∘ be such that \(f^{{\prime}}(a) <\eta < f^{{\prime}}(b)\) . Then there exists ξ ∈ (a,b) such that f ′ (ξ) = η. A similar result holds, of course, if \(f^{{\prime}}(a) > f^{{\prime}}(b)\) .
Proof.
The function \(g(x)\!:= f(x) -\eta x\) on I is differentiable on I ∘ and hence continuous there. In particular, Theorem 4.5.2 implies that g attains its minimum value on [a, b]. Now, we have g ′(a) < 0 and g ′(b) > 0. It follows from the definition of the derivative that, for δ > 0 small enough, we have g(x) < g(a) ∀x ∈ (a, a +δ) and \(g(x) < g(b)\ \forall x \in (b-\delta,b)\). Therefore, the minimum value of g on [a, b] occurs at some point ξ ∈ (a, b). (Why?) It now follows from Proposition 6.4.3 that we have g ′(ξ) = 0. □
Remark 6.4.5.
Recall that the function
defined in Example 6.3.7(b), is differentiable on \(\mathbb{R}\) and g ′ is in fact continuous at all x except x = 0. Darboux’s theorem implies that, despite the discontinuity at x = 0, the function g ′ has the Intermediate Value Property. In particular, the discontinuity at x = 0 is not of the first kind (i.e., jump discontinuity). The reader may refer to Sect. 4.4 for the definitions of various discontinuities. In general, we can make the following statement: If \(f: I \rightarrow \mathbb{R}\) is differentiable on I, then all the discontinuities of f ′ are of the second kind.
Our next result is a special form of the Mean Value Theorem but, in fact, is strong enough to be equivalent to it.
Theorem 6.4.6 (Rolle’s Theorem).
Let \(f: [a,b] \rightarrow \mathbb{R}\) be continuous on [a,b], be differentiable on (a,b), and satisfy f(a) = f(b). Then there exists a point c ∈ (a,b) such that f ′ (c) = 0.
Proof.
By Theorem 4.5.2, the continuous function f attains both its maximum and minimum values on [a, b]. If both of them are attained at the endpoints, then f is in fact constant (why?) and we have \(f^{{\prime}}(c) = 0\ \forall c \in (a,b)\). If not, at least one of the extrema is an interior one, i.e., attained at a point c ∈ (a, b). But then, by Proposition 6.4.3, we have f ′(c) = 0. □
Remark 6.4.7.
-
(a)
The point c ∈ (a, b) guaranteed by Rolle’s Theorem is not necessarily unique, as the following example shows: Consider the function \(f(x) = 3x^{4} - 6x^{2} + 1\). Then \(f(-2) = f(2) = 25\) and \(f^{{\prime}}(x) = 12x^{3} - 12x,\) so that \(f^{{\prime}}(-1) = f^{{\prime}}(0) = f^{{\prime}}(1) = 0\).
-
(b)
The assumption f(a) = f(b) implies that the chord joining the points (a, f(a)) and (b, f(b)) on the graph of f is horizontal. The theorem then implies that, if this is the case, then the tangent line to the graph of f is horizontal (i.e., parallel to the above chord) at some point (c, f(c)) with (a not necessarily unique) c ∈ (a, b).
If one rotates the graph of the function f in Rolle’s Theorem, then the condition f(a) = f(b) will no longer be satisfied and hence the chord joining the points (a, f(a)) and (b, f(b)) will not be horizontal. It is obvious, however, that the new graph will have the property that the tangent line will be parallel to the chord at least once between the endpoints (a, f(a)) and (b, f(b)). This suggests the following extension of Rolle’s Theorem:
Theorem 6.4.8 (Mean Value Theorem).
Let \(f: [a,b] \rightarrow \mathbb{R}\) be continuous on [a,b] and differentiable on (a,b). Then there exists a point c ∈ (a,b) such that
Proof.
Consider the function
Note that g is simply the difference between the function f and the affine function
whose graph is the line segment joining the points (a, f(a)) and (b, f(b)). The hypotheses of the theorem imply that g is continuous on [a, b], is differentiable on (a, b), and satisfies g(a) = g(b). It then follows from Rolle’s Theorem that g ′(c) = 0 holds for at least one c ∈ (a, b). But this means precisely that
and the proof is complete. □
Remark 6.4.9.
-
(a)
As the above proof shows, the Mean Value Theorem (henceforth abbreviated MVT) is a consequence of Rolle’s Theorem. Since the converse is obviously satisfied (why?), the two theorems are in fact equivalent.
-
(b)
The Mean Value Theorem (MVT) can also be interpreted in terms of motion as follows: If f(t) represents a car’s position at time t (i.e., its (signed) distance from an initial point), then f ′(t) represents the instantaneous velocity at that time, and \((f(b) - f(a))/(b - a)\) represents the average velocity over the time interval [a, b]. Thus the MVT implies that, at some time c ∈ (a, b), the instantaneous velocity is in fact equal to the average (i.e., mean) velocity.
The Mean Value Theorem is a fundamental tool in the study of the behavior of functions defined and differentiable on intervals. For instance, it is obvious (e.g., geometrically) that the derivative of a constant function on an interval is the identically zero function on that interval (cf. Proposition 6.2.1). That the converse is also true is an immediate consequence of the MVT, as we shall see below.
Corollary 6.4.10.
Let h > 0 and suppose that \(f: [x,x + h] \rightarrow \mathbb{R}\) is continuous on [x,x + h] and differentiable on (x,x + h). Then there exists a number θ ∈ (0,1) such that
Proof.
Simply note that any number in (x, x + h) can be written as x +θ h, for some θ ∈ (0, 1). □
Corollary 6.4.11.
Let \(f: [a,b] \rightarrow \mathbb{R}\) be continuous on [a,b] and differentiable on (a,b). If \(f^{{\prime}}(x) = 0\ \forall x \in (a,b),\) then f is constant on [a,b].
Proof.
We must show that, for any x 1, x 2 ∈ [a, b], we have f(x 1) = f(x 2). Assume (without loss of generality) that x 1 < x 2. Then, applying the MVT to the function f on the interval [x 1, x 2], we can find a point x 0 ∈ (x 1, x 2) such that
and the corollary follows. □
Exercise 6.4.12.
Suppose that \(f,\;g:\mathbb{R}\rightarrow \mathbb{R}\) are both differentiable and satisfy f ′ = g and \(g^{{\prime}} = -f\). Show that f 2 + g 2 is a constant function. Give examples of f and g satisfying the given conditions.
Corollary 6.4.13.
Suppose that f and g are continuous on [a,b], differentiable on (a,b), and \(f^{{\prime}}(x) = g^{{\prime}}(x)\ \forall x \in (a,b)\) . Then there exists a constant C such that \(f = g + C\) .
Proof.
Apply Corollary 6.4.11 to the function f − g. □
Corollary 6.4.14.
Let \(f: I \rightarrow \mathbb{R}\) be continuous on I and differentiable on its interior I ∘ . Then f is increasing (resp., strictly increasing) on I if \(f^{{\prime}}(x) \geq 0\ \forall x \in I^{\circ }\) (resp., \(f^{{\prime}}(x) > 0\ \forall x \in I^{\circ }\) ). Similarly, f is decreasing (resp., strictly decreasing) on I if \(f^{{\prime}}(x) \leq 0\ \forall x \in I^{\circ }\) (resp., \(f^{{\prime}}(x) < 0\ \forall x \in I^{\circ }\) ).
Proof.
We simply note that, for any x 1 < x 2 in I, we may apply the MVT on [x 1, x 2] to find a point x 0 ∈ (x 1, x 2) with
from which the corollary follows at once. □
Remark 6.4.15.
-
(a)
Note that, although the converses of the statements in Corollary 6.4.14 are true for the increasing (resp., decreasing) cases (why?), they are false for the strictly increasing (resp., strictly decreasing) cases. The function \(f(x)\!:= x^{3}\ \forall x \in \mathbb{R},\) e.g., is strictly increasing on \(\mathbb{R},\) but f ′(x) = 3x 2, so that f ′(0) = 0.
-
(b)
Using the above corollary, we can strengthen the last statement of Theorem 6.3.8 as follows.
Corollary 6.4.16 (Inverse Function Theorem).
If \(I \subset \mathbb{R}\) is an open interval and if \(f: I \rightarrow \mathbb{R}\) is a differentiable function such that f ′ (x) ≠ 0 for all x ∈ I, then f is a homeomorphism onto the interval J:= f(I) and its inverse f −1 : J → I is differentiable at every y = f(x) ∈ J with derivative
Proof.
Since f ′ is never zero, Darboux’s theorem (Theorem 6.4.4) implies that we either have f ′(x) > 0 for all x ∈ I or f ′(x) < 0 for all x ∈ I. Thus f is either strictly increasing or strictly decreasing on I. The rest of the proof is identical to that of Theorem 6.3.8. □
Exercise 6.4.17.
-
(a)
Using the MVT and the fact that \((e^{x})^{{\prime}} = e^{x}\ \forall x \in \mathbb{R},\) prove the inequality
$$\displaystyle{e^{x} \geq 1 + x\quad \quad (\forall x \in \mathbb{R}).}$$ -
(b)
Using the MVT and the fact that \((\sin x)^{{\prime}} =\cos x,(\cos x)^{{\prime}} = -\sin x\ \forall x \in \mathbb{R},\) show that both sin and cos are Lipschitz functions with Lipschitz constant 1 (cf. Sect. 4.6, particularly Example 4.6.12(a)). In other words, show that we have
$$\displaystyle{\vert \sin x -\sin y\vert \leq \vert x - y\vert,\quad \vert \cos x -\cos y\vert \leq \vert x - y\vert \qquad (\forall x,\;y \in \mathbb{R}).}$$Deduce, in particular, that | sinx | ≤ x and \(\vert \cos x - 1\vert \leq x\ \forall x \geq 0\).
-
(c)
(Bernoulli’s Inequality) Using the MVT and the Power Rule, prove the following extension of Bernoulli’s inequality (cf. Proposition 2.1.23):
$$\displaystyle\begin{array}{rcl} (1 + x)^{r}& \geq & 1 + rx\quad \forall \ x > -1\quad \mbox{ if $r \leq 0$ or $r \geq 1,$} {}\\ (1 + x)^{r}& \leq & 1 + rx\quad \forall \ x > -1\quad \mbox{ if $0 \leq r \leq 1$.} {}\\ \end{array}$$Show that the above inequalities are strict if x ≠ 0 and r ≠ 0, 1. Also prove the following version:
$$\displaystyle{(1 - x)^{r} \geq 1 - rx\quad \forall \ x \in [0,1]\quad \mbox{ if $r \geq 1$.}}$$
Exercise 6.4.18.
Let \(f: I \rightarrow \mathbb{R}\). Recall that f is said to be Lipschitz of order α on I, 0 < α ≤ 1, if there is constant A > 0 such that
In this case, we write \(f \in \boldsymbol{ Lip}^{\alpha }(I)\). If α = 1, then \(f\) is said to be Lipschitz on I and we write \(f \in \boldsymbol{ Lip}(I) =\boldsymbol{ Lip}^{1}(I)\).
-
1.
Let a ∈ I ∘ and assume that f ′(a) exists. Show that there exists δ > 0 such that \(f \in \boldsymbol{ Lip}(B_{\delta }(a))\). Show, by an example, that the converse is false.
-
2.
Show that, if \(f \in \boldsymbol{ Lip}^{\alpha }(I)\) for some α > 1, then f is constant on I.
Exercise 6.4.19 (A Version of Gronwall’s Inequality).
Let \(f: [0,\infty ) \rightarrow \mathbb{R}\) be continuous on [0, ∞) and differentiable on (0, ∞). If f(0) = 0 and \(\vert f^{{\prime}}(x)\vert \leq \vert f(x)\vert \ \forall x \in (0,\infty ),\) show that \(f(x) = 0\ \forall x \geq 0\). Hint: Differentiate the function \(g(x)\!:= [f(x)]^{2}e^{-2x}\).
Corollary 6.4.20.
A differentiable function \(f: I \rightarrow \mathbb{R}\) is Lipschitz on I if and only if f ′ is bounded on I. In particular, if f ′ is continuous on I, then f is Lipschitz on every compact subset (e.g., closed, bounded subinterval) of I.
Proof.
Let \(A\!:=\sup \{ \vert f^{{\prime}}(x)\vert: x \in I\}\). Then, given any x < x ′ in I, the MVT implies that \(f(x^{{\prime}}) - f(x) = (x^{{\prime}}- x)f^{{\prime}}(\xi ),\) for some ξ ∈ (x, x ′). Therefore,
which shows that f is indeed Lipschitz and proves the last statement as well. (Why?) Conversely, if \(\vert f(x) - f(x^{{\prime}})\vert \leq A\vert x - x^{{\prime}}\vert \) for all \(x,\ x^{{\prime}}\in I,\) then for each \(x_{0} \in I,\ \vert (f(x) - f(x_{0}))/(x - x_{0})\vert \leq A\) for all x ≠ x 0. Since \(f^{{\prime}}(x_{0})\!:=\lim _{x\rightarrow x_{0}}(f(x) - f(x_{0}))/(x - x_{0}),\) we have \(\vert f^{{\prime}}(x_{0})\vert \leq A\). □
Our last version of the MVT extends all the previous ones but, as the proof shows, is in fact equivalent to them. This version, called Cauchy’s Mean Value Theorem (henceforth abbreviated Cauchy’s MVT), will be used in the proof of l’Hôpital’s Rule.
Theorem 6.4.21 (Cauchy’s Mean Value Theorem).
If two real-valued functions f and g are both continuous on [a,b] and differentiable on (a,b), then there exists c ∈ (a,b) such that
Proof.
Well, consider the function
Then h is continuous on [a, b] and differentiable on (a, b), and we have \(h(a) = 0 = h(b)\). Therefore, by Rolle’s Theorem, there exists c ∈ (a, b) such that h ′(c) = 0, and the theorem follows at once. □
Remark 6.4.22.
-
1.
If, in Theorem 6.4.21, we assume that \(g^{{\prime}}(x)\neq 0\ \forall x \in (a,b),\) then we must have g(a) ≠ g(b) (why?) and the conclusion of the theorem can also be written as
$$\displaystyle{\frac{f(b) - f(a)} {g(b) - g(a)} = \frac{f^{{\prime}}(c)} {g^{{\prime}}(c)}.}$$ -
2.
Under the assumptions of Theorem 6.4.21, it follows from Theorem 6.4.8 that, for some c 1, c 2 ∈ (a, b), we have \(f(b) - f(a) = (b - a)f^{{\prime}}(c_{1}),\) and \(g(b) - g(a) = (b - a)g^{{\prime}}(c_{2})\). In particular, if \(g^{{\prime}}(c_{2})\neq 0,\) we have
$$\displaystyle{\frac{f(b) - f(a)} {g(b) - g(a)} = \frac{f^{{\prime}}(c_{1})} {g^{{\prime}}(c_{2})}.}$$Note, however, that c 1 ≠ c 2, in general. For example, consider the functions \(f(x)\!:= x^{3} - 8x + 3\) and \(g(x)\!:= x^{2} - 2x + 2\) on [0, 4]. Then, a computation shows that \(c_{1} = 4/\sqrt{3}\) and c 2 = 2. In this case, the number c ∈ (0, 4) guaranteed by Theorem 6.4.21 is \(c = 8/3\). The reader is invited to check these simple facts.
Exercise 6.4.23.
Consider the functions \(f(x)\!:= x - x^{2}\) and \(g(x)\!:= 2x^{3} - 3x^{4}\) on [0, 1]. Show that there is no number c ∈ (0, 1) such that
Does this contradict Cauchy’s MVT?
Finally, as pointed out in Corollary 6.4.20, if a differentiable function \(f: I \rightarrow \mathbb{R}\) has bounded derivative, say \(m \leq f^{{\prime}}(x) \leq M\) for all x ∈ I, then for any a < b in I, the Mean Value Theorem gives \(m(b - a) \leq f(b) - f(a) \leq M(b - a)\). It turns out that this can be obtained with much weaker assumptions on f:
Proposition 6.4.24.
Let D be a countable subset of an interval I and let \(f: I \rightarrow \mathbb{R}\) be continuous. If f is right differentiable at every \(x \in I\setminus D\) and \(m \leq f_{+}^{{\prime}}(x) \leq M\) for all \(x \in I\setminus D\) , then for any a < b in I we have
and the inequalities are strict when f is not an affine function on [a,b].
Proof.
Let us first show that if f + ′(x) ≥ 0 for all x ∉ D, then f is increasing on I. Indeed, given any \(\varepsilon > 0\) and x ∉ D, the assumption \(f_{+}^{{\prime}}(x) \geq 0\) implies that for every small enough number h > 0 we must have
It follows that the function \(g(x)\!:= f(x) +\varepsilon x\) satisfies the conditions of Proposition 4.4.13 (why?) and hence is increasing. Since \(\varepsilon > 0\) was arbitrary, the function f itself is also increasing. Now suppose that \(m \leq f_{+}^{{\prime}}(x) \leq M\) for all \(x \in I\setminus D\). Then the functions \(h(x)\!:= Mx - f(x)\) and \(k(x)\!:= f(x) - mx\) satisfy h + ′(x) ≥ 0 and k + ′(x) ≥ 0 for all x ∉ D and hence are increasing and the desired inequalities follow. Finally, if f is not an affine function with f ′ = M, then the function \(h(x) = Mx - f(x)\) is not constant on [a, b] and hence
A similar argument is used for \(k(x) = f(x) - mx\). □
6.5 L’Hôpital’s Rule
Indeterminate forms were discussed in Sect. 3.5. Of particular importance were limits having the indeterminate forms 0∕0 and ∞∕∞. In this section, we shall see how derivatives can be used to compute some such limits. The basic tool will be Cauchy’s MVT.
Theorem 6.5.1 (L’Hôpital’s Rule).
Let \(-\infty \leq a < b \leq +\infty,\) and let \(f,\;g: (a,b) \rightarrow \mathbb{R}\) be differentiable functions on (a,b), with \(g^{{\prime}}(x)\neq 0\ \forall x \in (a,b)\) . Suppose that either
-
(i)
\(\lim _{x\rightarrow a}f(x) = 0 =\lim _{x\rightarrow a}g(x)\)
or
-
(ii)
\(\lim _{x\rightarrow a}g(x) = \pm \infty.\)
If, for some \(L \in [-\infty,+\infty ],\) we have
-
(iii)
\(\lim _{x\rightarrow a}\frac{f^{{\prime}}(x)} {g^{{\prime}}(x)} = L,\)
then we also have
-
(iv)
\(\lim _{x\rightarrow a}\frac{f(x)} {g(x)} = L.\)
The same conclusion holds if \(\lim _{x\rightarrow a}\) is replaced by \(\lim _{x\rightarrow b}\) throughout. Note that, for finite a, we obviously have \(\lim _{x\rightarrow a} =\lim _{x\rightarrow a+}\) .
Proof.
- (Case 1: \(\boldsymbol{a > -\infty }\) ).:
-
If (i) holds and if we define \(f(a) = g(a)\!:= 0,\) then both f and g become continuous on [a, b). Applying Cauchy’s MVT on [a, x] where x ∈ (a, b), we have
$$\displaystyle{ \frac{f(x)} {g(x)} = \frac{f(x) - f(a)} {g(x) - g(a)} = \frac{f^{{\prime}}(\xi )} {g^{{\prime}}(\xi )}, }$$(*)for some ξ ∈ (a, x). Since x → a implies ξ → a, (iv) follows at once from (iii) and (*). Assume next that (ii) holds with + ∞ (for the case \(\lim _{x\rightarrow a}g(x) = -\infty,\) replace g by − g) and that L is finite. In view of (iii), for each \(\varepsilon > 0\) we can find t > a such that g(u) > 0 and \(\vert f^{{\prime}}(u)/g^{{\prime}}(u) - L\vert <\varepsilon\) for all u ∈ (a, t]. Applying Cauchy’s MVT on [x, t] ⊂ (a, t], we can find η ∈ (x, t) with
$$\displaystyle{[f(t) - f(x)]g^{{\prime}}(\eta ) = [g(t) - g(x)]f^{{\prime}}(\eta ),}$$which can also be written as
$$\displaystyle{ \frac{f(x)} {g(x)} = \frac{f^{{\prime}}(\eta )} {g^{{\prime}}(\eta )} - \frac{g(t)} {g(x)} \cdot \frac{f^{{\prime}}(\eta )} {g^{{\prime}}(\eta )} + \frac{f(t)} {g(x)}. }$$(**)Let \(M =\sup \{ \vert f^{{\prime}}(u)/g^{{\prime}}(u)\vert: u \in (a,t]\}\). (Why is M finite?) Then (**) implies that
$$\displaystyle{\left \vert \frac{f(x)} {g(x)} - L\right \vert <\varepsilon +\left \vert \frac{g(t)} {g(x)}\right \vert M + \left \vert \frac{f(t)} {g(x)}\right \vert.}$$Letting x → a, we obtain
$$\displaystyle{\left \vert \frac{f(x)} {g(x)} - L\right \vert \leq \varepsilon,}$$which implies (iv). If \(L = +\infty \) and if B > 0 is arbitrary, we can pick t > a such that g(t) > 0 and \(f^{{\prime}}(u)/g^{{\prime}}(u) > B\) for all u ∈ (a, t). Keeping t fixed, we can pick t ′ ∈ (a, t) such that 0 < g(t) < g(x) for all x ∈ (a, t ′). (Why?) It then follows from (**) that
$$\displaystyle{\frac{f(x)} {g(x)} > B\left (1 - \frac{g(t)} {g(x)}\right ) + \frac{f(t)} {g(x)}\qquad \forall x \in (a,t^{{\prime}}).}$$Since the right side converges to B as x → a, (iv) follows. The case \(L = -\infty \) is treated similarly. The proof of case (1) is now complete.
- (Case 2: \(\boldsymbol{a = -\infty }\) ).:
-
Here we may assume that b < 0. Now observe that \(x\mapsto - 1/x\) is a homeomorphism of (−∞, b) onto \((0,-1/b)\) and that x → −∞ if and only if \(-1/x \rightarrow 0+\). Therefore,
$$\displaystyle{\lim _{x\rightarrow -\infty }\frac{f(x)} {g(x)} =\lim _{x\rightarrow 0+}\frac{f(-1/x)} {g(-1/x)},}$$and it suffices to show that the right side converges to L. This, however, follows from Case 1, because
$$\displaystyle{\lim _{x\rightarrow 0+}\frac{[f(-1/x)]^{{\prime}}} {[g(-1/x)]^{{\prime}}} =\lim _{x\rightarrow 0+}\frac{[f^{{\prime}}(-1/x)]/x^{2}} {[g^{{\prime}}(-1/x)]/x^{2}} =\lim _{x\rightarrow -\infty }\frac{f^{{\prime}}(x)} {g^{{\prime}}(x)} = L.}$$
□
The above rule handles the cases in which x → a or x → b, where a and b are the left and right endpoints of an interval, respectively. To have a rule which can also be applied to the cases x → c, where c is an interior point, we have the following
Corollary 6.5.2.
Let f and g be differentiable on (a,c) and (c,b), with \(g^{{\prime}}(x)\neq 0\ \forall x \in (a,c) \cup (c,b)\) . Suppose that either
-
(i)
\(\lim _{x\rightarrow c}f(x) = 0 =\lim _{x\rightarrow c}g(x)\)
or
-
(ii)
\(\lim _{x\rightarrow c}g(x) = \pm \infty.\)
If, for some \(L \in [-\infty,+\infty ],\) we have
-
(iii)
\(\lim _{x\rightarrow c}\frac{f^{{\prime}}(x)} {g^{{\prime}}(x)} = L,\)
then we also have
-
(iv)
\(\lim _{x\rightarrow c}\frac{f(x)} {g(x)} = L.\)
Proof.
This follows at once from Theorem 6.5.1 by applying it to f and g on the intervals (a, c) and (c, b) separately and using the fact that \(\lim _{x\rightarrow c}\frac{f^{{\prime}}(x)} {g^{{\prime}}(x)} = L\) if and only if \(\lim _{x\rightarrow c-}\frac{f^{{\prime}}(x)} {g^{{\prime}}(x)} = L =\lim _{x\rightarrow c+}\frac{f^{{\prime}}(x)} {g^{{\prime}}(x)}\). □
Remark 6.5.3.
-
1.
As we saw in Sect. 3.5, one can always change an indeterminate form ∞∕∞ to an indeterminate form 0∕0, by observing that \(f(x)/g(x) = [1/g(x)]/[1/f(x)]\). If this is done, however, l’Hôpital’s Rule becomes
$$\displaystyle{\lim _{x\rightarrow a}f(x)/g(x) =\lim _{x\rightarrow a}[1/g(x)]^{{\prime}}/[1/f(x)]^{{\prime}},}$$which is not the same as the rule in Theorem 6.5.1. The case (ii) in l’Hôpital’s Rule is therefore important in general. In fact, changing ∞∕∞ to 0∕0 may actually complicate matters, as the following simple example shows. Let f(x): = x and g(x): = e x on (−∞, ∞). Then, \(\lim _{x\rightarrow +\infty }x = +\infty =\lim _{x\rightarrow +\infty }e^{x}\) and l’Hôpital’s Rule implies that
$$\displaystyle{\lim _{x\rightarrow +\infty } \frac{x} {e^{x}} =\lim _{x\rightarrow +\infty } \frac{(x)^{{\prime}}} {(e^{x})^{{\prime}}} =\lim _{x\rightarrow +\infty } \frac{1} {e^{x}} = 0.}$$On the other hand, if we write \(x/e^{x} = e^{-x}/(1/x),\) which has the indeterminate form 0∕0 as x → +∞, then the rule implies
$$\displaystyle{\lim _{x\rightarrow +\infty }\frac{e^{-x}} {1/x} =\lim _{x\rightarrow +\infty }\frac{(e^{-x})^{{\prime}}} {(1/x)^{{\prime}}} =\lim _{x\rightarrow +\infty } \frac{e^{-x}} {1/x^{2}},}$$which is more complicated than \(\lim _{x\rightarrow +\infty }e^{-x}/(1/x)\).
-
2.
Although a powerful tool for computing limits of indeterminate forms, l’Hôpital’s Rule is not necessarily the right one in all cases. The following simple example illustrates this point. Recall that \(x\mapsto x/\sqrt{1 + x^{2}}\) is a homeomorphism of \(\mathbb{R}\) onto the open unit interval (−1, 1). The inverse homeomorphism is \(x\mapsto x/\sqrt{1 - x^{2}}\). Now, \(\lim _{x\rightarrow \infty }x =\lim _{x\rightarrow \infty }\sqrt{1 + x^{2}} = +\infty,\) so that \(\lim _{x\rightarrow \infty }x/\sqrt{1 + x^{2}}\) has the indeterminate form ∞∕∞. If we use l’Hôpital’s Rule, we get
$$\displaystyle{\lim _{x\rightarrow \infty } \frac{x} {\sqrt{1 + x^{2}}} =\lim _{x\rightarrow \infty } \frac{1} {x/\sqrt{1 + x^{2}}} =\lim _{x\rightarrow \infty }\frac{\sqrt{1 + x^{2}}} {x}.}$$Therefore, the rule does not help at all. In fact, a second application of it will send us back to the original limit. On the other hand, we can find the limit easily as follows:
$$\displaystyle{\lim _{x\rightarrow \infty } \frac{x} {\sqrt{1 + x^{2}}} =\lim _{x\rightarrow \infty } \frac{x} {\vert x\vert \sqrt{1 + 1/x^{2}}} =\lim _{x\rightarrow \infty } \frac{1} {\sqrt{1 + 1/x^{2}}} = 1.}$$ -
3.
L’Hôpital’s Rule can be applied repeatedly as long as the required conditions are all satisfied. For example, we have
$$\displaystyle{\lim _{x\rightarrow 0}\frac{1 -\cos x} {x^{2}} =\lim _{x\rightarrow 0} \frac{\sin x} {2x} =\lim _{x\rightarrow 0}\frac{\cos x} {2} = \frac{\cos (0)} {2} = \frac{1} {2}.}$$ -
4.
It should be noted that the converse of Theorem 6.5.1 (or its corollary) is not true. In other words, \(\lim _{x\rightarrow a}\frac{f(x)} {g(x)}\) may very well exist even though \(\lim _{x\rightarrow a}\frac{f^{{\prime}}(x)} {g^{{\prime}}(x)}\) does not. A simple example is the following. Consider the functions \(f(x)\!:= x -\sin x\) and g(x): = x on \(\mathbb{R}\). Then
$$\displaystyle{\lim _{x\rightarrow +\infty }\frac{f(x)} {g(x)} =\lim _{x\rightarrow +\infty }\frac{x -\sin x} {x} =\lim _{x\rightarrow +\infty }(1 -\frac{\sin x} {x}) = 1,}$$because \(\vert \sin x/x\vert \leq 1/\vert x\vert \rightarrow 0\) as x → +∞. On the other hand,
$$\displaystyle{\lim _{x\rightarrow +\infty }\frac{f^{{\prime}}(x)} {g^{{\prime}}(x)} =\lim _{x\rightarrow +\infty }(1 -\cos x)}$$does not exist.
Example 6.5.4.
-
1.
Let α and β be arbitrary positive numbers. Then we have x α = o(e β x)(x → ∞). We must show that \(\lim _{x\rightarrow \infty }x^{\alpha }/e^{\beta x} = 0\). Now \(x^{\alpha }/e^{\beta x} = (x/e^{\beta x/\alpha })^{\alpha },\) and \(\lim _{t\rightarrow 0+}t^{\alpha } = 0\) for all α > 0. The claim is therefore a consequence of l’Hôpital’s Rule:
$$\displaystyle{\lim _{x\rightarrow \infty } \frac{x} {e^{\beta x/\alpha }} =\lim _{x\rightarrow \infty } \frac{1} {\frac{\beta }{\alpha }e^{\beta x/\alpha }} = 0.}$$ -
2.
Given any α > 0, we have \(\lim _{x\rightarrow 0+}x^{\alpha }\log x = 0\). To see this, note that \(\lim _{x\rightarrow 0+}x^{\alpha } = 0\) and \(\lim _{x\rightarrow 0+}\log x = -\infty \). These facts will be proved later, when we define the logarithms and (general) power functions rigorously. Therefore, we are dealing with an indeterminate form 0 ⋅ ∞. To compute it, we use l’Hôpital’s Rule as follows:
$$\displaystyle{\lim _{x\rightarrow 0+}x^{\alpha }\log x =\lim _{x\rightarrow 0+} \frac{\log x} {x^{-\alpha }} =\lim _{x\rightarrow 0+} \frac{x^{-1}} {-\alpha x^{-\alpha -1}} = -\frac{1} {\alpha } \lim _{x\rightarrow 0+}x^{\alpha } = 0.}$$ -
3.
Let us show that \(\lim _{x\rightarrow 0+}x^{x} = 1\). Note first that this limit has the indeterminate form 00. Now, by definition, \(x^{x}\!:=\exp (x\log x)\ \forall x > 0\). Also, by Example (2) above, \(\lim _{x\rightarrow 0+}x\log x = 0\). Since exp is continuous, we obtain
$$\displaystyle{\lim _{x\rightarrow 0+}x^{x} =\exp (\lim _{ x\rightarrow 0+}x\log x) =\exp (0) = 1.}$$ -
4.
Show that \(\lim _{x\rightarrow \infty }(1 +\alpha /x)^{x} = e^{\alpha }\). Here the limit has the indeterminate form 1∞. By definition, we have \((1 +\alpha /x)^{x}\!:=\exp [x\log (1 +\alpha /x)],\) so we must find \(\lim _{x\rightarrow \infty }x\log (1 +\alpha /x),\) which has the indeterminate form ∞⋅ 0 (or 0 ⋅ ∞). Using l’Hôpital’s Rule, we have
$$\displaystyle{\lim _{x\rightarrow \infty }x\log (1 +\alpha /x) =\lim _{x\rightarrow \infty }\frac{\log (1 +\alpha /x)} {1/x} =\lim _{x\rightarrow \infty }\frac{ \frac{-\alpha /x^{2}} {1 +\alpha /x}} {-1/x^{2}} =\lim _{x\rightarrow \infty } \frac{\alpha } {1 +\alpha /x} =\alpha,}$$and the claim follows from the continuity of exp.
-
5.
Let us show that log(1 + x) ∼ x (x → 0). Recall (cf. Sect. 3.5) that this is equivalent to \(\lim _{x\rightarrow 0}\log (1 + x)/x = 1,\) which follows at once from l’Hôpital’s Rule:
$$\displaystyle{\lim _{x\rightarrow 0}\frac{\log (1 + x)} {x} =\lim _{x\rightarrow 0}\frac{1/(1 + x)} {1} = 1.}$$
Exercise 6.5.5.
Find the following limits, where a, b, α, and β are arbitrary positive constants.
Exercise 6.5.6.
Find each limit.
6.6 Higher Derivatives and Taylor’s Formula
If \(f: I \rightarrow \mathbb{R}\) is differentiable on I, then its derivative defines a new function \(f^{{\prime}}: I \rightarrow \mathbb{R}\) and it is quite legitimate to ask whether this new function f ′ is differentiable at a point x 0 ∈ I. Hence the following definition:
Definition 6.6.1 (Higher-Order Derivatives).
Let \(f: I \rightarrow \mathbb{R}\) and suppose that f is differentiable near x 0 ∈ I; i.e., that f ′(x) exists for all \(x \in B_{\delta }(x_{0}) \cap I\) and some δ > 0. If the derivative of f ′ exists at x 0, then we say that f is twice differentiable at x 0 and we write \(f^{{\prime\prime}}(x_{0})\!:= (f^{{\prime}})^{{\prime}}(x_{0})\). The number f ′ ′(x 0) is called the second derivative of f at x 0. Inductively, we define \(f^{(0)}(x_{0})\!:= f(x_{0}),\ f^{(1)}(x_{0})\!:= f^{{\prime}}(x_{0}),\) and, for each positive integer \(n \in \mathbb{N},\) we define \(f^{(n)}(x_{0})\!:= (f^{(n-1)})^{{\prime}}(x_{0})\). If f (n)(x 0) exists, we call it the nth derivative (or nth-order derivative) of f at x 0 and say that f is n-times differentiable at x 0. If f (n)(x) exists for all x ∈ I, we say that f is n-times differentiable on I.
Remark 6.6.2.
-
1.
If \(f: I \rightarrow \mathbb{R}\) and f ′ ′(x 0) exists for some x 0 ∈ I, then f ′ must be defined near x 0. In other words, we can find δ > 0 such that f is differentiable on \(B_{\delta }(x_{0}) \cap I\). More generally, if f (n)(x 0) exists, then f is (n − 1)-times differentiable on \(B_{\delta }(x_{0}) \cap I\) for some δ > 0.
-
2.
If \(f: I \rightarrow \mathbb{R}\) is n-times differentiable on I for some \(n \in \mathbb{N},\) then the derivatives f (k), 0 ≤ k ≤ n − 1 are all defined and continuous (why?) on I.
Notation 6.6.3.
As in the case of the (first) derivative, there are several ways to denote higher derivatives of a function, each having its own merits. We shall use all these forms in this text. If \(f: I \rightarrow \mathbb{R}\) and if f (n)(x 0) exists for some x 0 ∈ I and \(n \in \mathbb{N},\) then we write
We even abuse the notation, occasionally, and write (f(x))(n) instead of f (n)(x).
Exercise 6.6.4.
Let \(f: I \rightarrow \mathbb{R}\) and assume that \(f^{{\prime\prime}}(x_{0})\) exists for some x 0 ∈ I ∘. Show that
and give an example where this limit exists even though f ′ ′(x 0) does not. Hint: Use l’Hôpital’s Rule and, for example, consider an odd function.
Definition 6.6.5 (The Classes C n).
Let \(f: I \rightarrow \mathbb{R}\) and \(n \in \mathbb{N}\). We say that f is of class C n on I, and write f ∈ C n(I), if f (n) is defined and continuous on all of I. We say that f is of class C ∞ on I, and write f ∈ C ∞(I), if \(f \in C^{n}(I)\ \forall n \in \mathbb{N}\). The class of continuous functions on I will be denoted by C(I) instead of C 0(I). We call C n(I) the class of n-times continuously differentiable functions on I. For n = 1, it is called the class of continuously differentiable functions on I. Finally, C ∞(I) is called the class of infinitely differentiable functions on I.
Remark 6.6.6.
Note that, as was pointed out above, the existence of f (n) on I automatically guarantees the existence and continuity of f, f ′ , …, f (n−1) on I. Also, it is obvious that we have the inclusions
We should keep in mind that all the above inclusions are proper, as the following exercise demonstrates.
Exercise 6.6.7.
For n = 0, 1, 2, …, consider the functions \(f_{n}:\mathbb{R}\rightarrow \mathbb{R}\) defined by f 0(x): = | x | , and \(f_{n}(x)\!:= x^{n}\vert x\vert \ \forall n \geq 1\). Show that, for all n ≥ 0, we have \(f_{n} \in C^{n}(\mathbb{R})\) but \(f_{n}\not\in C^{n+1}(\mathbb{R})\). What are the successive derivatives of f n ? Hint: Note that \(f_{n} = xf_{n-1}\ \forall n \geq 1,\) and use induction.
The following proposition is an extension of the Product Rule to higher-order derivatives.
Proposition 6.6.8 (Leibniz Rule).
Let \(f,\;g: I \rightarrow \mathbb{R}\) be n-times differentiable functions on I for some \(n \in \mathbb{N}\) . Then the product fg is also n-times differentiable on I and we have
where, for any k-times differentiable function h, D k h:= h (k) and \(D^{0}h = h^{(0)}\!:= h\) .
Proof.
We use induction on n. For n = 1, the rule is reduced to the Product Rule: \(D(fg) = Df \cdot g + f \cdot Dg\). Thus, we must only show that if (†) is satisfied for any n and if f and g are (n + 1)-times differentiable, then so is fg and the rule holds for n + 1; i.e., we have
Now, differentiating both sides of (†), we obtain
-
(i)
\(D^{n+1}(fg) =\sum _{ k=0}^{n}\binom{n}{k}D^{n+1-k}f \cdot D^{k}g +\sum _{ k=0}^{n}\binom{n}{k}D^{n-k}f \cdot D^{k+1}g.\)
If we set k = j in the first sum on the right side of (i) and \(k = j - 1\) in the second sum, we get
-
(ii)
\(D^{n+1}(fg) =\sum _{ j=0}^{n}\binom{n}{j}D^{n+1-j}f \cdot D^{j}g +\sum _{ j=1}^{n+1}\binom{n}{j - 1}D^{n+1-j}f \cdot D^{j}g.\)
If we isolate the first term of the first sum and the last term of the second sum and combine the remaining sums, then the right side of (ii) is
where we have used the identity \(\binom{n}{j} + \binom{n}{j - 1} = \binom{n + 1}{j}\) (cf. Exercise 1.3.29). This establishes (‡) and completes the proof. □
Corollary 6.6.9.
Suppose that \(f: I \rightarrow \mathbb{R},\ g: J \rightarrow \mathbb{R}\) and that f(I) ⊂ J. If f is n-times differentiable on I and g is n-times differentiable on J, then the composite function g ∘ f is n-times differentiable on I.
Proof.
We use induction on n, the case n = 1 being obviously true. (Why?) Now, it follows from the Chain Rule that \(D(g \circ f) = (g^{{\prime}}\circ f) \cdot f^{{\prime}}\). Since f ′ is (n − 1)-times differentiable on I, the corollary follows from the Leibniz Rule if we can show that g ′∘ f is also (n − 1)-times differentiable on I. This, however, follows from our inductive hypothesis. □
Definition 6.6.10 (\(\boldsymbol{C^{n}}\)-Diffeomorphism).
Let I and J be open intervals. A function f: I → J is said to be a C n -diffeomorphism if it is a bijection such that f ∈ C n(I) and f −1 ∈ C n(J).
The following extension of Corollary 6.4.16 is remarkable.
Corollary 6.6.11 (Smoothness of the Inverse Function).
Let I ≠ ∅ be an open interval. If f ∈ C n (I) satisfies \(f^{{\prime}}(x)\neq 0\ \forall x \in I,\) then it is a C n -diffeomorphism onto f(I); i.e., the inverse function f −1 is n-times continuously differentiable on the interval f(I).
Proof.
We proceed by induction again, the case n = 1 being Corollary 6.4.16 which also provides the formula \((f^{-1})^{{\prime}} = 1/f^{{\prime}}\circ f^{-1}\). Next, since f ′ is never zero on I, the same holds for f ′∘ f −1 on f(I). In view of the Quotient Rule, it is therefore sufficient to show that f ′∘ f −1 is (n − 1)-times differentiable on the interval f(I). (Why?) By Corollary 6.6.9, this will follow if f −1 is (n − 1)-times differentiable on the interval f(I). But this is precisely the inductive step and the proof is complete. □
Our last corollary will be an extension of the Leibniz Rule to products involving more than two functions:
Corollary 6.6.12.
If \(f_{j}: I \rightarrow \mathbb{R},\;j = 1,2,\ldots,k,\) are n-times differentiable on I, then so is their product, f 1 f 2 ⋯f k , and we have
Proof.
The case k = 2 is the Leibniz Rule. Inductively, we may assume that f 2 f 3⋯f k is n-times differentiable on I and apply the Leibniz Rule to obtain
Applying (*) (with n replaced by \(m = n - n_{1}\)) to the k − 1 functions f 2, f 3, …, f k , the right side of (**) is then
□
Our next goal is to prove Taylor’s formula, which is an extension of the MVT, and plays an important role in the study of local approximation of functions by polynomials. In Sect. 4.7, we proved the Weierstrass Approximation Theorem, which asserts that a continuous function on a closed bounded interval can be uniformly approximated by polynomials on that interval. In fact, in Theorem 4.7.9, we approximated any continuous function on [0, 1] by its Bernstein polynomials. Despite the importance of this uniform approximation, one drawback is that the nth Bernstein polynomial depends on the values of the function at n + 1 equally spaced points in the interval. By contrast, the Taylor polynomials (defined below) depend only on the values of the function and some of its derivatives at a single point in the interval. Therefore, they are more suitable for local approximation, i.e., approximation in a neighborhood of a given point. Let us begin with the following.
Exercise 6.6.13.
-
(a)
Using the Power Rule, show that, for each \(k \in \mathbb{N}\) and each j = 0, 1, ⋯ , k, we have
$$\displaystyle{[(x - c)^{k}]^{(j)} = k(k - 1)\cdots (k - j + 1)(x - c)^{k-j} = \frac{k!} {(k - j)!}(x - c)^{k-j}.}$$In particular, \([(x - c)^{k}]^{(k)} = k!\) and \([(x - c)^{k}]^{(\ell)} \equiv 0\ \forall \ell > k\). Deduce that the polynomial function
$$\displaystyle{ p(x)\!:=\sum _{ k=0}^{n}a_{ k}(x - c)^{k}, }$$(*)where the a k and c are real constants, has the property that
$$\displaystyle{ a_{k} = \frac{p^{(k)}(c)} {k!} \quad \;(0 \leq k \leq n), }$$(**)and p (k)(x) = 0 for all k > n and all \(x \in \mathbb{R}\).
-
(b)
Consider the function
$$\displaystyle{ f(x)\!:= \left \{\begin{array}{@{}l@{\quad }l@{}} e^{-1/x}\quad &\mbox{ if $x > 0$,} \\ 0 \quad &\mbox{ if $x \leq 0$.}\end{array} \right. }$$Show that \(f \in C^{\infty }(\mathbb{R})\) and that f (n)(0) = 0 for all \(n \in \mathbb{N}\). Hint: Concentrating at x = 0, use induction and l’Hôpital’s Rule.
Remark 6.6.14.
If, in the above exercise, we replace x by ξ +η and c by ξ in (*) and use (**), then we obtain the identity
Definition 6.6.15 (Taylor Polynomials).
Let \(f: I \rightarrow \mathbb{R}\) be n-times differentiable at x 0 ∈ I; i.e., suppose that f (n)(x 0) exists. The nth Taylor polynomial of f at x 0 is then defined to be
The coefficients \(f^{(j)}(x_{0})/j!,\ 0 \leq j \leq n,\) are called the Taylor coefficients of f at x 0. It follows at once from Exercise 6.6.13 that we have
Exercise 6.6.16.
-
(a)
Let \(f(x)\!:= e^{x}\ \forall x \in \mathbb{R}\). Show that the nth Taylor polynomial of f at x 0 = 0 is
$$\displaystyle{P_{n,0}(x) = 1 + \frac{x} {1!} + \frac{x^{2}} {2!} + \frac{x^{3}} {3!} + \cdots + \frac{x^{n}} {n!}.}$$ -
(b)
Let \(g(x)\!:=\sin x\ \forall x \in \mathbb{R}\). Show that the (2n + 1)th Taylor polynomial of g at x 0 = 0 is
$$\displaystyle{P_{2n+1,0}(x) = x -\frac{x^{3}} {3!} + \frac{x^{5}} {5!} -\frac{x^{7}} {7!} + \cdots + (-1)^{n} \frac{x^{2n+1}} {(2n + 1)!}.}$$ -
(c)
Let \(h(x)\!:=\log x\ \forall x > 0\). Show that
$$\displaystyle{\log ^{(j)}(x) = \frac{(-1)^{j-1}(j - 1)!} {x^{j}} \quad \;\;(j = 1,\;2,\ldots ).}$$Deduce that the nth Taylor polynomial of h at x 0 = 1 is
$$\displaystyle{P_{n,1}(x) = (x - 1) -\frac{(x - 1)^{2}} {2} + \frac{(x - 1)^{3}} {3} -\frac{(x - 1)^{4}} {4} + \cdots + \frac{(-1)^{n-1}(x - 1)^{n}} {n}.}$$If, instead of h, we consider the function \(k(x)\!:=\log (x + 1)\ \forall x > -1,\) show that the nth Taylor polynomial of k at x 0 = 0 is
$$\displaystyle{P_{n,0}(x) = x -\frac{x^{2}} {2} + \frac{x^{3}} {3} -\frac{x^{4}} {4} + \cdots + \frac{(-1)^{n-1}x^{n}} {n}.}$$
The following proposition shows how the Taylor polynomials of a function at a given point approximate the function in a neighborhood of that point.
Proposition 6.6.17.
Let \(f: I \rightarrow \mathbb{R}\) be n-times differentiable at a point x 0 ∈ I and let \(P_{n,x_{0}}\) be its nth Taylor polynomial at x 0 . Then we have
More precisely, there exists a function \(\zeta: I \rightarrow \mathbb{R},\) with \(\lim _{x\rightarrow x_{0}}\zeta (x) = 0,\) such that
If, in addition, f (n+1) (x 0 ) exists, then we have
Proof.
Consider the function
As was pointed out above, the existence of f (n)(x 0) implies that there exists an interval J, with x 0 ∈ J ⊂ I, such that f is (n − 1)-times (continuously) differentiable on J. It is easily seen that all the derivatives of order ≤ n − 1 of the numerator and denominator of (*) are zero at x 0. Since f (n−1)(x) is defined for all x ∈ J, we can apply l’Hôpital’s Rule n − 1 times to (*) to obtain
if the limit on the right side exists. But, by hypothesis, f (n)(x 0) exists; i.e., we have
Therefore, the limit in (**) is indeed zero, as desired. If we define ζ(x 0): = 0, then the first part of the proposition is proved. To prove (†), note that the existence of f (n+1)(x 0) implies that f is n-times differentiable on an interval J, with x 0 ∈ J ⊂ I. We can therefore apply l’Hôpital’s Rule n times to the function \(\zeta (x)/(x - x_{0})\) to obtain
□
Note that the above proposition only gives the behavior of the remainder \(R_{n,x_{0}}(x)\!:= f(x) - P_{n,x_{0}}(x)\) as x → x 0. In particular, the remainder will be small in a sufficiently small neighborhood of x 0. If we impose more restrictions on the function f, we can find a more precise form of the remainder and give it an upper bound over the entire interval I.
Theorem 6.6.18 (Taylor’s Formula with Lagrange’s Remainder).
Let \(f: I \rightarrow \mathbb{R}\) be (n + 1)-times differentiable on I and let x 0 ∈ I be fixed. Then for each x ∈ I,x ≠ x 0 , there exists a point ξ between x 0 and x such that we have
The term \(R_{n,x_{0}}(x)\!:= f^{(n+1)}(\xi )(x - x_{0})^{n+1}/(n + 1)!\) is called Lagrange’s remainder (or Lagrange’s form of the remainder). In particular, if
then we have
Proof.
Assume x 0 < x; the other case is similar. On the interval [x 0, x], consider the function
-
(i)
\(F(t)\!:= f(x) - f(t) -\frac{f^{{\prime}}(t)} {1!} (x - t) -\frac{f^{{\prime\prime}}(t)} {2!} (x - t)^{2} -\cdots -\frac{f^{(n)}(t)} {n!} (x - t)^{n}.\)
Computing F ′(t), all but one of the terms cancel out and we obtain
Next, introduce the function
-
(ii)
\(G(t)\!:= \frac{(x - t)^{n+1}} {(n + 1)!} \qquad (\forall t \in [x_{0},x]),\)
so that \(G^{{\prime}}(t) = -(x - t)^{n}/n!\). Note, in particular, that we have \(F(x) = G(x) = 0,\) and
-
(iii)
\(\frac{F^{{\prime}}(t)} {G^{{\prime}}(t)} = f^{(n+1)}(t).\)
Applying Cauchy’s MVT to F and G on [x 0, x], and using (iii), we can find a point ξ between x 0 and x such that
In other words, we have \(F(x_{0}) = G(x_{0})f^{(n+1)}(\xi ),\) which, in view of the definitions (i) and (ii), completes the proof of (†). The last statement is an obvious consequence of (†). □
Remark 6.6.19 (Cauchy’s Form of the Remainder).
If, in the above proof, we use the function \(G(t)\!:= x - t\) instead of (ii), then the remainder takes the form
which is called Cauchy’s form of the remainder. The reader is invited to supply the details. There is another important form of the remainder which requires integration and will be given in the next chapter.
The following corollaries are immediate consequences of Taylor’s Formula.
Corollary 6.6.20.
Let \(f: I \rightarrow \mathbb{R}\) be (n + 1)-times differentiable on I. Then for each x ∈ I and each \(h \in \mathbb{R},\) with x + h ∈ I, there exists a θ ∈ (0,1) such that
Corollary 6.6.21.
Let \(f\!:\! I \rightarrow \!\mathbb{R}\) be (n + 1)-times differentiable on I. If \(f^{(n+1)}(x) = 0\ \forall x \in I,\) then, on the interval I, f is a polynomial of degree at most n.
Let us also include the following uniqueness property of Taylor’s Formula:
Proposition 6.6.22.
Let \(f: I \rightarrow \mathbb{R}\) be n-times differentiable at a point x 0 ∈ I. Suppose that for each x ∈ I we have
where a 0 , a 1 ,…, a n are real constants and \(\zeta: I \rightarrow \mathbb{R}\) satisfies \(\lim _{x\rightarrow x_{0}}\zeta (x) = 0\) . Then we have
Proof.
Substituting x = x 0 in (*), we get a 0 = f(x 0). This implies that \([f(x) - f(x_{0})]/(x - x_{0}) = a_{1} + o(1),\) as x → x 0, and hence f ′(x 0) = a 1. Similarly, \([f(x) - f(x_{0}) - f^{{\prime}}(x_{0})(x - x_{0})]/(x - x_{0})^{2} = a_{2} + o(1),\) as x → x 0, which (applying l’Hôpital’s Rule twice) gives \(a_{2} = f^{{\prime\prime}}(x_{0})/2\). Continuing, we deduce that the a k are given by (**). □
As was pointed out before, Taylor’s Formula can be used for local approximation of differentiable functions by polynomials. Here is an example:
Example 6.6.23.
Let us approximate the function f(x): = e x by a polynomial on the interval [−1, 1], with error less than 10−10. Since f ′(x) = f(x), we have \(f^{(n)}(x) = e^{x}\ \forall n \in \mathbb{N}\). In particular, \(f^{(n)}(0) = 1\ \forall n \in \mathbb{N}\). By Taylor’s Formula, for each x ≠ 0 in [−1, 1], we can find a number ξ between 0 and x such that
Now | ξ | < 1 implies e ξ < e < 3. Therefore,
A calculation shows that \(13! = 0.62270208 \times 10^{10} < 3 \times 10^{10} < 14! = 8.71782912 \times 10^{10}\). Therefore, if n = 13, the error is indeed less than 10−10. In particular, for x = 1, we have
which is correct to 10 decimal places. In fact, e ≈ 2. 718281828459045.
Exercise 6.6.24.
Using Taylor’s Formula, find an approximate value of sin1 with error less than 10−5.
Taylor’s Formula can be used to prove a generalization of the Leibniz Rule. Before giving it, let us introduce some convenient terminology:
Definition 6.6.25.
-
(a)
(Differential Operator, Symbol) Given any polynomial with real coefficients
$$\displaystyle{ p(\xi ) =\sum _{ k=0}^{n}a_{ k}\xi ^{k} = a_{ 0} + a_{1}\xi + a_{2}\xi ^{2} + \cdots + a_{ n}\xi ^{n}, }$$(*)we can associate with it the corresponding differential polynomial
$$\displaystyle{p(D) =\sum _{ k=0}^{n}a_{ k}D^{k} = a_{ 0} + a_{1}D + a_{2}D^{2} + \cdots + a_{ n}D^{n},}$$where \(D = d/dx\). Given any n-times differentiable function \(u: I \rightarrow \mathbb{R},\) the differential polynomial p(D) can be applied to it in a natural way:
$$\displaystyle{p(D)u =\sum _{ k=0}^{n}a_{ k}D^{k}u =\sum _{ k=0}^{n}a_{ k}\frac{d^{k}u} {dx^{k}}.}$$In this case, p(D) is said to operate on u. When p(D) operates on functions, we call it a differential operator. The polynomial (*) is then called the symbol of p(D). If a n ≠ 0, then p(D) is said to be an nth-order (ordinary) differential operator with (constant) coefficients a 0 , a 1 ,…, a n .
-
(b)
(Differential Equation, Solution) An equation of the form
$$\displaystyle{p(D)u = f,}$$where \(f: I \rightarrow \mathbb{R}\) is a given function (with certain differentiability conditions) and \(u: I \rightarrow \mathbb{R}\) is an unknown (n-times differentiable function) to be determined, is called a differential equation. A solution to this equation is any function u that satisfies it.
Remark 6.6.26.
It is obvious that an nth-order differential operator p(D) defines a map p(D): C m(I) → C m−n(I) for each integer m ≥ n. It is also clear that a differential operator p(D) of any order defines a map \(p(D): C^{\infty }(I) \rightarrow C^{\infty }(I)\). The most natural setting for the study of differential operators is the theory of distributions (also known as generalized functions), because the differentiation is then always possible. Distribution theory is treated in more advanced courses on analysis and plays a fundamental role in the study of partial differential equations.
We are now ready to prove the extension of the Leibniz Rule mentioned above.
Theorem 6.6.27 (Hörmander’s Generalized Leibniz Rule).
Let \(u,\;v: I \rightarrow \mathbb{R}\) be n-times differentiable functions on I. Given any nth-order differential operator \(p(D) =\sum _{ k=0}^{n}a_{k}D^{k},\) we have
Proof.
Well, first note that the Leibniz Rule gives
If, for each k = 0, 1, …, n, we group all the terms on the right side of (*) containing D k v, then (∗) can be written as
where the q k are polynomials to be determined. Next, note that, for each fixed ξ, we have \(D_{x}e^{\xi x} =\xi e^{\xi x}\). Repeated use of this fact implies that
for every polynomial q. Therefore, if we apply (*) with u(x): = e ξ x and v(x): = e η x for fixed \(\xi,\;\eta \in \mathbb{R},\) we have
which implies the identity
On the other hand, by Taylor’s Formula (cf. Remark 6.6.14), we have
Comparing the identities (†) and (‡), we conclude that \(q_{k} = p^{(k)}/k!\) and the proof is complete. □
Exercise 6.6.28.
Show that the Leibniz Rule is an immediate consequence of the above generalized version.
6.7 Convex Functions
The reader is certainly familiar with the notion of convexity introduced in calculus, where it is usually referred to as concavity, and where one also encounters the terms concave up and concave down. The definitions given in calculus textbooks are often geometric and assume the differentiability of the function. The goal is then to explain the connection to the sign of the second derivative and to the extrema (via the second derivative test). The definition of convexity given below is more general and we shall see that convexity on an interval implies differentiability at all but a countable set of points in that interval.
Definition 6.7.1 (Convex Function, Concave Function).
Let \(f: I \rightarrow \mathbb{R}\). We say that f is convex on I if for every s, t ∈ I and every λ ∈ [0, 1], we have
We say that f is concave on I if − f is convex on I. Since \(\{(\lambda s + (1-\lambda )t,\lambda f(s) + (1-\lambda )f(t)):\lambda \in [0,1]\}\) is simply the chord joining the points (s, f(s)) and (t, f(t)) on the graph of f, the inequality (†) means, geometrically, that this chord is above the graph for every s, t ∈ I.
Proposition 6.7.2 (Jensen’s Inequality).
If \(f: I \rightarrow \mathbb{R}\) is convex on I, then it satisfies Jensen’s inequality :
for any x 1 , …,x n ∈ I and any λ 1 ,…,λ n ∈ [0,1], with \(\sum _{k=1}^{n}\lambda _{k} = 1\) .
Proof.
We use (†) and induction, assuming the inequality for any x i ∈ I and λ i ∈ [0, 1], with \(\sum _{i=1}^{m}\lambda _{i} = 1\) and m < n. We then have, for λ < 1,
□
Proposition 6.7.3 (Three Chords Lemma).
Let \(f: I \rightarrow \mathbb{R}\) . Then, f is convex on I if and only if, for any points a, b, c ∈ I with a < b < c, we have
which is equivalent to saying that for any fixed x 0 ∈ I the function
[i.e., the slope of the chord joining \(\big(x_{0},f(x_{0})\big)\) and \(\big(x,f(x)\big)\) ] is increasing.
Proof.
Note that, with
we have \(\;b =\lambda _{1}a +\lambda _{2}c\). Applying Jensen’s inequality, we have
Now we subtract f(a) from both sides of (*) to get the first inequality in (‡), and subtract f(c) from both sides of (*) to get the second one. For the converse, we must show that (‡) implies (†) of the above definition. But a simple computation transforms the first inequality in (‡) into (*), which is precisely (†) with \(a = s,\ c = t,\) and \(b =\lambda a + (1-\lambda )c\). The last statement is an immediate consequence. □
Exercise 6.7.4.
-
1.
Show that an affine function \(f(x) = ax + b,\ a,\ b \in \mathbb{R}\) is convex on \(\mathbb{R}\). In fact, \(f \in \mathbb{R}^{\mathbb{R}}\) is affine if and only it is both convex and concave.
-
2.
Show that the quadratic function f(x) = x 2 is convex on \(\mathbb{R}\).
Remark 6.7.5.
It should be noted that a convex function need not be differentiable. For example, the function \(f(x)\!:= \vert x\vert \ \forall x \in \mathbb{R}\) is obviously convex (why?) but is not differentiable at x = 0. In fact, a convex function on a closed interval need not even be continuous. Indeed the function \(f: [0,1] \rightarrow \mathbb{R}\) defined by f(x): = 0 ∀x ∈ (0, 1] and f(0): = 1 is convex on [0, 1] but discontinuous at 0. It turns out, however, that a convex function on an open interval is automatically continuous there. In fact, such a function has finite left and right derivatives at every point of the interval and is differentiable except at a countable number of points:
Lemma 6.7.6.
Let \(f: (a,b) \rightarrow \mathbb{R}\) be a convex function. Then the left and right derivatives f − ′ (x) and f + ′ (x) are finite at every x ∈ (a,b)—hence f is continuous on (a,b)—and \(f_{-}^{{\prime}}(x) \leq f_{+}^{{\prime}}(x)\) . Moreover, the inequalities
are satisfied for any s < t in (a,b). In particular, f − ′ and f + ′ are both increasing on (a,b) and the set of x ∈ (a,b) at which f is not differentiable is countable.
Proof.
Let x 0 ∈ (a, b) be arbitrary and let ϕ be the (slope) function defined in Proposition 6.7.3. If A: = {ϕ(x): x ∈ (a, x 0)} and B: = {ϕ(x): x ∈ (x 0, b)}, then α ≤ β for all α ∈ A and β ∈ B because ϕ is increasing on the intervals (a, x 0) and (x 0, b). It follows (why?) that
Thus f has finite one-sided derivatives at x 0 with \(f_{-}^{{\prime}}(x_{0}) \leq f_{+}^{{\prime}}(x_{0})\). Since x 0 was arbitrary, the same is then true for every x ∈ (a, b). In particular (by Corollary 6.1.13), f is both right and left continuous at each x ∈ (a, b) and hence is continuous there. Next, let s < t in (a, b) and let x ∈ (s, t). Then, by what we just proved, we have
from which (**) follows. Moreover, (**) implies that if f is not differentiable at s and t, then the open intervals \(\big(f_{-}^{{\prime}}(s),f_{+}^{{\prime}}(s)\big)\) and \(\big(f_{-}^{{\prime}}(t),f_{+}^{{\prime}}(t)\big)\) are disjoint and hence we cannot have more than a countable number of them. (Why?) □
Corollary 6.7.7.
Let I be an open interval and let \(f: I \rightarrow \mathbb{R}\) be convex on I. Given any x 0 ∈ I and any number m with \(f_{-}^{{\prime}}(x_{0}) \leq m \leq f_{+}^{{\prime}}(x_{0}),\) we have \(f(x) \geq f(x_{0}) + m(x - x_{0})\ \forall x \in I\) .
Proof.
Since f is convex, the slope \(\phi (x)\!:= (f(x) - f(x_{0}))/(x - x_{0})\) is increasing. Now, if x > x 0, then the definition of the right derivative implies that we have \(f(x) - f(x_{0}) \geq (x - x_{0})f_{+}^{{\prime}}(x_{0}) \geq m(x - x_{0})\). If x < x 0, a similar argument shows that \(f(x_{0}) - f(x) \leq (x_{0} - x)f_{-}^{{\prime}}(x_{0}) \leq m(x_{0} - x),\) which is the desired inequality if we multiply the two sides by − 1. □
We can now characterize convex functions on open intervals completely:
Theorem 6.7.8.
Let \(I \subset \mathbb{R}\) be an open interval and \(f: I \rightarrow \mathbb{R}\) . Then, f is convex on I if and only if there is a countable subset D ⊂ I such that f is continuous on I, has a finite right derivative f + ′ (x) at every x ∈ I∖D, and f + ′ is increasing on I∖D.
Proof.
If f is convex, then (by Lemma 6.7.6) the conditions of the theorem are all satisfied. To prove the converse, we show that the slope function ϕ in Proposition 6.7.3 is increasing. So let a, b, c be points of I with a < b < c and define
It then follows from Proposition 6.4.24 that
Since m ≤ M (why?), we get
and the proof is complete. □
Corollary 6.7.9.
Let I be an open interval and \(f: I \rightarrow \mathbb{R}\) .
-
1.
Suppose f is differentiable on I. Then f is convex on I if and only if f ′ is increasing on I.
-
2.
Suppose f is 2-times differentiable on I. Then f is convex on I if and only if \(f^{{\prime\prime}}(x) \geq 0\ \forall x \in I\) .
Proof.
This is an immediate consequence of Theorem 6.7.8 and Corollary 6.4.14. □
Exercise 6.7.10.
Prove Corollary 6.7.9 directly by using the Mean Value Theorem to show that the (slope) function ϕ in Proposition 6.7.3 is increasing.
Remark 6.7.11.
-
1.
Note that, as was pointed out in Remark 6.4.15(a), the increasing in the above corollary cannot be replaced by strictly increasing.
-
2.
(Support Line) Corollary 6.7.7 can be interpreted, geometrically, as follows: Given a convex function f on an open interval I, through each point P 0: = (x 0, f(x 0)) of the graph of f, we can draw a straight line lying entirely below the graph of f. Such a line is called a support line of f.
Exercise 6.7.12.
Let I be an open interval, x 0 ∈ I, and let \(f: I \rightarrow \mathbb{R}\) be convex on I. Show that, if f ′(x 0) exists, then there is a unique support line through P 0 = (x 0, f(x 0)), namely, the line tangent to the graph of f at P 0 whose equation is obviously \(y = f(x_{0}) + f^{{\prime}}(x_{0})(x - x_{0})\).
Example 6.7.13.
Let p > 1 and let q be the positive number (necessarily > 1) such that \(1/p + 1/q = 1\). The following inequalities are then satisfied for any a ≥ 0, b ≥ 0:
First note that the inequalities are obvious if ab = 0. Now, to prove (i), note that \((-\log x)^{{{\prime\prime}} } = (-1/x)^{{\prime}} = 1/x^{2} > 0\) for all x > 0. Thus, − log is convex on (0, ∞). By Jensen’s inequality, we have
Since
the inequality (i) follows from (*) and the fact that exp is increasing. The inequality (ii) is an immediate consequence of Jensen’s inequality (with \(\lambda _{1} =\lambda _{2} = 1/2\)) applied to the function \(f(x)\!:= x^{p}\ \forall x \in (0,\infty ),\) which is convex in view of the fact that \(f^{{\prime\prime}}(x) = p(p - 1)x^{p-2} > 0\ \forall x > 0\).
Exercise 6.7.14 (Hölder and Minkowski Inequalities).
Given any finite sequences (a k ) k = 1 n and (b k ) k = 1 n in \(\mathbb{R},\) prove the following inequalities:
-
(a)
For any p > 1, q > 1 with \(1/p + 1/q = 1,\) we have
$$\displaystyle{ \left \vert \sum _{k=1}^{n}a_{ k}b_{k}\right \vert \leq \left (\sum _{k=1}^{n}\vert a_{ k}\vert ^{p}\right )^{1/p}\left (\sum _{ k=1}^{n}\vert b_{ k}\vert ^{q}\right )^{1/q}. }$$(Hölder)Hint: Show that we may assume a k ≥ 0, b k ≥ 0, for all k, and \(\sum _{k=1}^{n}\vert a_{k}\vert ^{p} =\sum _{ k=1}^{n}\vert b_{k}\vert ^{q} = 1\). Now use (i) of the above example with a = a k p and b = b k q.
-
(b)
For any p ≥ 1, we have
$$\displaystyle{ \left (\sum _{k=1}^{n}\vert a_{ k} + b_{k}\vert ^{p}\right )^{1/p} \leq \left (\sum _{ k=1}^{n}\vert a_{ k}\vert ^{p}\right )^{1/p} + \left (\sum _{ k=1}^{n}\vert b_{ k}\vert ^{p}\right )^{1/p}. }$$(Minkowski)Hint: Assume a k ≥ 0, b k ≥ 0 for all k. Now, for p > 1, let \(q\!:= p/(p - 1)\) and apply Hölder’s inequality to the two sums on the right side of the identity
$$\displaystyle{\sum _{k=1}^{n}(a_{ k} + b_{k})^{p} =\sum _{ k=1}^{n}a_{ k}(a_{k} + b_{k})^{p-1} +\sum _{ k=1}^{n}b_{ k}(a_{k} + b_{k})^{p-1}.}$$ -
(c)
Extend both inequalities in (a) and (b) to the case of infinite sequences (a n ) n = 1 ∞ and (b n ) n = 1 ∞. Hint: Look at partial sums.
Here is a more general definition of convexity that does not imply continuity:
Exercise 6.7.15.
Suppose that \(f: I \rightarrow \mathbb{R}\) satisfies the condition
-
(a)
Show that \(f(\lambda s + (1-\lambda )t) \leq \lambda f(s) + (1-\lambda )f(t)\) holds for all s, t ∈ I and all λ ∈ [0, 1] of the form \(\lambda = m/2^{n},\) with integers m ≥ 0 and n ≥ 1. Hint: Use induction and the identity
$$\displaystyle{ \frac{m} {2^{n}}s + \left (1 - \frac{m} {2^{n}}\right )t = \frac{1} {2}\left [ \frac{m} {2^{n-1}}s + \left (1 - \frac{m} {2^{n-1}}\right )t + t\right ].}$$ -
(b)
Show that if f satisfies (†) and is continuous, then f is convex. Hint: Show that \(\{m/2^{n}: m/2^{n} \leq 1,\;m \in \mathbb{N}_{0},\;n \in \mathbb{N}\}\) is dense in [0, 1] and use part (a).
6.8 Problems
-
1.
Let f(x): = | x | 3. Find f ′(x) and f ′ ′(x). Show that f ′ ′ ′(0) does not exist.
-
2.
Give an example of a function \(f:\mathbb{R}\rightarrow \mathbb{R}\) such that f ′ ′ ′(x) exists for all \(x \in \mathbb{R}\) but is discontinuous at x = 0.
-
3.
Show that the function
$$\displaystyle{ f(x):= \left \{\begin{array}{@{}l@{\quad }l@{}} x \quad &\mbox{ if $x \in \mathbb{Q},$}\\ -x\quad &\mbox{ if $x \in \mathbb{Q}^{c}$} \end{array} \right. }$$is nowhere differentiable. Show, however, that (f ∘ f)(x) = x for all \(x \in \mathbb{R}\).
-
4.
Suppose that f(x) = xg(x) where g is continuous at x = 0. Show that f is differentiable at x = 0 and find f ′(0).
-
5.
Consider the function
$$\displaystyle{ f(x) = \left \{\begin{array}{@{}l@{\quad }l@{}} x^{2}\quad &\mbox{ if $x \in \mathbb{Q}$}, \\ 0 \quad &\mbox{ if $x\not\in \mathbb{Q}$.} \end{array} \right. }$$Show that f is differentiable at x = 0 and find f ′(0).
-
6.
Let α ∈ (0, 1) and δ > 0 be constants and assume that f(0) = 0 and | f(x) | ≥ | x | α for x ∈ (−δ, δ). Show that f ′(0) does not exist.
-
7 (Differentiable Periodic Function).
Let \(f:\mathbb{R}\rightarrow \mathbb{R}\) be a differentiable, periodic function with period a, i.e., \(f(x + a) = f(x)\) for all \(x \in \mathbb{R}\). Show that f ′ is also periodic. What is its period?
-
8.
Let \(f:\mathbb{R}\rightarrow \mathbb{R}\) be differentiable. Show directly (i.e., without using the Chain Rule) that \([f(cx)]^{{\prime}} = cf^{{\prime}}(cx)\) for all \(c \in \mathbb{R}\).
-
9 (Euler’s Theorem).
A function \(f \in \mathbb{R}^{\mathbb{R}}\) is said to be homogeneous of order \(n \in \mathbb{R},\) if f(tx) = t n f(x) for all t > 0. If such a function is differentiable, show that xf ′(x) = nf(x), for all \(x \in \mathbb{R}\).
-
10.
Given a polynomial function \(p(x) = a_{0} + a_{1}x + \cdots + a_{n}x^{n},\) find a polynomial q(x) with q ′(x) = p(x) for all \(x \in \mathbb{R}\).
-
11 (Diffeomorphism).
Let I and J be open intervals. A map f: I → J is called a diffeomorphism if it is bijective and if f and f −1 are both differentiable. Show that \(f(x):= x^{3} + x\) is a diffeomorphism of \(\mathbb{R}\) (onto \(\mathbb{R}\)) and find (f −1)′(2).
-
12.
Let arcsinx and arctanx denote the inverses of sinx (restricted to \([-\pi /2,\pi /2]\)) and tanx (restricted to \((-\pi /2,\pi /2)\)), respectively. Find the derivatives \((\arcsin )^{{\prime}}(x)\) (for x ∈ (−1, 1)) and (arctan)′(x) (for \(x \in \mathbb{R}\)).
-
13.
-
(a)
Let \(f,\ g: (a,b) \rightarrow \mathbb{R}\) be differentiable. Show that, between any pair of consecutive zeros of f, there is always a zero of f ′ + fg ′. Hint: Look at the function fe g.
-
(b)
Show that, between any pair of consecutive zeros of \(f(x):= 1 - e^{x}\sin x,\) there is at least one zero of \(g(x):= 1 + e^{x}\cos x\).
-
(a)
-
14.
-
(a)
Show that a polynomial of even degree attains its absolute minimum.
-
(b)
Show that the polynomial
$$\displaystyle{p(x):= 1 + x + \frac{x^{2}} {2!} + \frac{x^{3}} {3!} + \cdots + \frac{x^{n}} {n!} }$$has a unique real root if n is odd and no real roots if n is even. Hint: Note that, when p ′(x) = 0, we have \(p(x) = x^{n}/n!\).
-
(a)
-
15.
Let \(p(x) = a_{0} + a_{1}x + \cdots + a_{n}x^{n}\) and assume that \(a_{0} + a_{1}/2 + \cdots + a_{n}/(n + 1) = 0\). Show that p(ζ) = 0 for some ζ ∈ (0, 1). Hint: Find a polynomial q(x) with q ′ = p.
-
16.
Show that, if a polynomial p(x) with real coefficients has m distinct real roots, then p ′(x) has m − 1 distinct real roots.
-
17.
Prove the inequalities
$$\displaystyle{ \frac{x} {1 + x} \leq \log (1 + x) \leq x\qquad (\forall x > -1).}$$ -
18.
Prove the following inequalities.
$$\displaystyle{\frac{m(x - 1)} {x^{1-m}} < x^{m} - 1 < m(x - 1)\qquad (0 < m < 1,\ x > 1).}$$ -
19.
Show that, if \(f: (a,b) \rightarrow \mathbb{R}\) is differentiable and f ′ is bounded on (a, b), then f(a + 0) and f(b − 0) exist.
-
20.
Show that, if f is differentiable on I and f ′ = kf, then f(x) = Ce kx for some constant C and all x ∈ I.
-
21.
Let \(f:\mathbb{R}\rightarrow \mathbb{R}\) satisfy the functional equation
$$\displaystyle{f(x + y) = f(x)f(y)\qquad (\forall x,\ y \in \mathbb{R}).}$$-
(a)
Show that f is differentiable (on \(\mathbb{R}\)) if and only if f ′(0) exists.
-
(b)
Show that, if f is differentiable and is not identically zero, then f(x) = e cx for a constant \(c \in \mathbb{R}\).
-
(c)
Show that the statement in (b) is true if f is merely continuous (instead of differentiable).
-
(a)
-
22.
Find the following limit.
$$\displaystyle{\lim _{x\rightarrow 0} \frac{\sin x -\tan x} {\tan ^{-1}x -\sin ^{-1}x}.}$$ -
23 (“Sublinear” Function).
Let us define a function \(f:\mathbb{R}\rightarrow \mathbb{R}\) to be sublinear if f(x) = o(x) as | x | → ∞. Let \(f \in \mathbb{R}^{\mathbb{R}}\) be a differentiable function. Show that if \(\lim _{\vert x\vert \rightarrow \infty }f^{{\prime}}(x) = 0,\) then f is sublinear and we have \(\lim _{\vert x\vert \rightarrow \infty }[f(x + y) - f(x)] = 0\) for each \(y \in \mathbb{R}\).
-
24.
Suppose that f is continuous on [0, ∞), differentiable on (0, ∞), f(0) = 0, and f ′ is increasing. Show that the function \(g(x):= f(x)/x\) is increasing on (0, ∞).
-
25.
Let \(f \in \mathbb{R}^{\mathbb{R}}\) be differentiable.
-
(a)
Show that, if | f ′(x) | < 1 for all \(x \in \mathbb{R},\) then f has at most one fixed point.
-
(b)
Show that the function \(f(x):= x + 1/(1 + e^{x})\) satisfies \(\vert f^{{\prime}}(x)\vert < 1\) for all \(x \in \mathbb{R},\) but has no fixed point.
-
(a)
-
26.
Let \(f: (0, 1) \rightarrow \mathbb{R}\) be differentiable and | f ′(x) | ≤ 1 for all x ∈ (0, 1). Show that the sequence \((f(1/n))_{n\in \mathbb{N}}\) is convergent.
-
27.
Show that, if \(f,\ g: [0,\infty ) \rightarrow \mathbb{R}\) are differentiable, f(0) = g(0), and \(f^{{\prime}}(x) \leq g^{{\prime}}(x)\) for all x > 0, then f(x) ≤ g(x) for all x ≥ 0.
-
28.
Let \(f: [0, 1] \rightarrow \mathbb{R}\) be a differentiable function such that there is no point x ∈ [0, 1] with \(f(x) = f^{{\prime}}(x) = 0\). Show that the set \(Z:=\{ x \in [0, 1]: f(x) = 0\}\) of zeros of f is finite.
-
29.
Let \(f: [1, 3] \rightarrow \mathbb{R}\) be continuous on [1, 3] and differentiable on (1, 3), and assume that \(f^{{\prime}}(x) = [f(x)]^{2} + 4\). Explain whether \(f(3) - f(1) = 5\) is possible.
-
30.
Can the Dirichlet function \(\chi _{\mathbb{Q}}\) be the derivative of any function?
-
31 (Symmetric Derivative).
Let \(f \in \mathbb{R}^{\mathbb{R}}\). For each \(x \in \mathbb{R},\) define the symmetric derivative of f at x by
$$\displaystyle{f^{s}(x):=\lim _{ h\rightarrow 0+}\frac{f(x + h) - f(x - h)} {2h},}$$if the limit exists. Show that, if f ′(x) exists, then f s(x) = f ′(x). Let \(g(x):= 2\vert x\vert + x\). Show that g s(x) exists for all \(x \in \mathbb{R}\) even though g ′(0) does not exist. Also show that g attains its absolute minimum (i.e., \(\min \{g(x): x \in \mathbb{R}\}\)) at x = 0, but g s(0) ≠ 0.
-
32 (Uniform Differentiability).
Let f be differentiable on [a, b]. Show that f ′ is continuous on [a, b] if and only if f is uniformly differentiable on [a, b]; i.e., given any \(\varepsilon > 0,\) there is a \(\delta =\delta (\varepsilon ) > 0\) such that for any x 0 ∈ [a, b], we have
$$\displaystyle{0 < \vert x - x_{0}\vert <\delta \Longrightarrow\left \vert \frac{f(x) - f(x_{0})} {x - x_{0}} - f^{{\prime}}(x_{ 0})\right \vert <\varepsilon.}$$ -
33.
Let \(f \in \mathbb{R}^{\mathbb{R}}\) be differentiable with bounded derivative, i.e., | f ′(x) | ≤ M for all \(x \in \mathbb{R}\) and some M > 0. Show that the function \(g(x):= x +\varepsilon f(x)\) is injective for small enough \(\varepsilon > 0\).
-
34.
Suppose that \(f: [a,\infty ) \rightarrow \mathbb{R}\) satisfies \(\lim _{x\rightarrow \infty }[f^{{\prime}}(x) +\alpha f(x)] = 0\) for some α > 0. Show that \(\lim _{x\rightarrow \infty }f(x) = 0\). Hint: Apply Cauchy’s MVT to f(x)e α x.
-
35.
Show that if \(f:\mathbb{R}\rightarrow [0,\infty )\) is twice differentiable and f ′ ′ ≤ 0 on \(\mathbb{R},\) then f is constant.
-
36.
Let \(f: (0, 1) \rightarrow \mathbb{R}\) be a differentiable function such that \(\lim _{x\rightarrow 0+}f(x)\) and \(\lim _{x\rightarrow 0+}xf^{{\prime}}(x)\) both exist. Show that \(\lim _{x\rightarrow 0+}xf^{{\prime}}(x) = 0\)
-
37 (Subexponential Function).
Let us define a function \(f:\mathbb{R}\rightarrow \mathbb{R}\) to be subexponential if
$$\displaystyle{f(x) = o(e^{\varepsilon \vert x\vert })\quad \forall \ \varepsilon > 0,\quad \text{as}\ \vert x\vert \rightarrow \infty.}$$-
(a)
Show that, if \(f:\mathbb{R}\rightarrow \mathbb{R}\) satisfies | f(x) | > 0 and f ′(x) = o(f(x)) (as x → ∞), then f is subexponential. Hint: Show that (assuming f > 0) \(f(x)e^{-\varepsilon x}\) is decreasing (hence bounded) for all large x > 0 and use l’Hôpital’s Rule.
-
(b)
Let \(\langle x\rangle:= \sqrt{1 + x^{2}}\). Show that \(\exp (\langle x\rangle ^{\alpha })\) is subexponential for α < 1.
-
(c)
Give an example of a (nontrivial) bounded function \(f \in C^{\infty }(\mathbb{R})\) that satisfies \(f^{{\prime}}(x) = o(f(x)),\) as | x | → ∞.
-
(a)
-
38 (Schwarzian Derivative).
Let \(f: I \rightarrow \mathbb{R}\) and assume that f ′ ′ ′(x) exists and f ′(x) ≠ 0 for all x ∈ I. Define the Schwarzian derivative of f at x by
$$\displaystyle{\mathcal{D}f(x):= \frac{f^{{\prime\prime\prime}}(x)} {f^{{\prime}}(x)} -\frac{3} {2}\left [\frac{f^{{\prime\prime}}(x)} {f^{{\prime}}(x)} \right ]^{2} =\Big [\frac{f^{{\prime\prime}}(x)} {f^{{\prime}}(x)} \Big]^{{\prime}}-\frac{1} {2}\left [\frac{f^{{\prime\prime}}(x)} {f^{{\prime}}(x)} \right ]^{2}.}$$-
(a)
Show that \(\mathcal{D}(f \circ g) = (\mathcal{D}f \circ g) \cdot (g^{{\prime}})^{2} + \mathcal{D}g\).
-
(b)
Show that, if \(f(x) = (ax + b)/(cx + d),\) then \(\mathcal{D}f = 0\).
-
(c)
Show that \(\mathcal{D}g = \mathcal{D}h\) if and only if \(h = (ag + b)/(cg + d),\) where ad − bc ≠ 0.
-
(d)
Show that, if fg = 1, then \(\mathcal{D}f = \mathcal{D}g\).
-
(e)
Deduce the “if” part of (c) from (d). Hint: Note that, if c ≠ 0, then \((ag + b)/(cg + d) = a/c + (bc - ad)/[c(cg + d)]\).
-
(a)
-
39.
Let f be continuous on [a, b] and differentiable on (a, b) except possibly at a point x 0 ∈ (a, b). Show that, if \(\lim _{x\rightarrow x_{0}}f^{{\prime}}(x) =\ell\in \mathbb{R},\) then f is differentiable at x 0 and f ′(x 0) = ℓ so that f ′ is actually continuous at x 0. Hint: Apply the MVT on [x 0, x 0 + h] (resp., [x 0 + h, x 0]) for small h > 0 (resp., h < 0) or use l’Hôpital’s Rule.
-
40.
Consider the function
$$\displaystyle{ f(x):= \left \{\begin{array}{@{}l@{\quad }l@{}} e^{-1/x^{2}}\quad &\mbox{ if $x\neq 0$}, \\ 0 \quad &\mbox{ if $x = 0$.} \end{array} \right. }$$Show that \(f \in C^{\infty }(\mathbb{R})\) and f (n)(0) = 0 for all \(n \in \mathbb{N}\).
-
41 (Legendre’s Polynomials).
Define the polynomials
$$\displaystyle{P_{n}(x):= \frac{1} {2^{n}(n!)} \frac{d^{n}} {dx^{n}}(x^{2} - 1)^{n}\qquad (\forall n \in \mathbb{N}).}$$-
(a)
Show that P n (x) has degree n and has n distinct (hence simple) real roots all of which are in [−1, 1]. Hint: Let \(u:= (x^{2} - 1)^{n}\). Note that u (k) is even (resp., odd) for k even (resp., odd). Also, for k ≤ n − 1, we have u (k)(±1) = 0 if k is even, and \(u^{(k)}(\pm 1) = u^{(k)}(0) = 0\) if k is odd. Now use Rolle’s Theorem repeatedly.
-
(b)
Let \(u:= (x^{2} - 1)^{n}\) as above. Show that
$$\displaystyle{(x^{2} - 1)\frac{du} {dx} = 2nxu}$$
and, taking the (n + 1)th derivatives of both sides, that \(y:= P_{n} = u^{(n)}/2^{n}(n!)\) satisfies Legendre’s differential equation:
$$\displaystyle{(x^{2} - 1)\frac{d^{2}y} {dx^{2}} + 2x\frac{dy} {dx} - n(n + 1)y = 0\qquad (\forall x \in \mathbb{R}).}$$ -
(a)
-
42.
Show that if \(f \in \mathbb{R}^{\mathbb{R}}\) is (n + 1)-times differentiable and \(f^{(n+1)}(x) = 0\) for all \(x \in \mathbb{R},\) then f(x) is a polynomial of degree ≤ n.
-
43.
Let \((x_{n}) \in [a,b]^{\mathbb{N}},\ x_{n}\neq x_{m},\) for n ≠ m, and lim(x n ) = ξ. Also, let \(f: [a,b] \rightarrow \mathbb{R}\) be such that f(x n ) = 0 for all \(n \in \mathbb{N}\).
-
(a)
Show that, if f is twice differentiable, then \(f(\xi ) = f^{{\prime}}(\xi ) = f^{{\prime\prime}}(\xi ) = 0\).
-
(b)
Show that, if f ∈ C ∞([a, b]), then f (k)(ξ) = 0 for all \(k \in \mathbb{N}\cup \{0\}\).
-
(a)
-
44.
Let \(f: I \rightarrow \mathbb{R}\) and assume that f (n)(x) = 0 for all x ∈ I and f (k)(x 0) = 0 for 1 ≤ k ≤ n − 1 (recall that f (0): = f) and some x 0 ∈ I. Show that f is constant on I.
-
45 (The Newton–Raphson Process).
Let \(f \in \mathbb{R}^{\mathbb{R}}\) be a strictly increasing, convex function that is differentiable and f(ζ) = 0. Given a fixed x 1 > ζ, define \(x_{n+1}:= x_{n} - f(x_{n})/f^{{\prime}}(x_{n})\) for all \(n \in \mathbb{N}\). Show that lim(x n ) = ζ. Hint: Use Corollary 6.7.7 and Exercise 6.7.12.
-
46.
Let f ∈ C n(I) and x 0 ∈ I. Suppose that, for some polynomial p(x) of degree n, we have \(\vert f(x) - p(x)\vert \leq c\vert x - x_{0}\vert ^{n+1},\) for all x ∈ I and some constant c. Show that \(p(x) = P_{n,x_{0}}(x);\) i.e., p is the nth Taylor polynomial of f at x 0.
-
47.
Let \(\alpha \in \mathbb{R}\) and consider the function \(f(x):= (1 + x)^{\alpha }\) on \(I:= (-1,\infty )\). Find the nth Taylor polynomial of f at x 0 ∈ I.
-
48 (Landau’s Inequality).
Let \(f: (0,\infty ) \rightarrow \mathbb{R}\) be twice differentiable and define \(M_{j}:=\sup \{ f^{(j)}(x): x > 0\},\) for j = 0, 1, 2. Show that M 1 2 ≤ 4M 0 M 2. Hint: Note that, by Taylor’s Formula, \(f^{{\prime}}(x) = [f(x + 2h) - f(x)]/(2h) - hf^{{\prime\prime}}(\xi ),\) for some ξ between x and x + 2h. Deduce that \(\vert f^{{\prime}}(x)\vert \leq hM_{2} + M_{0}/h\) for all h > 0 and minimize the right side.
-
49.
Let f be twice differentiable on (0, ∞) and assume that f ′ ′(x) = O(1) and f(x) = o(1) as x → ∞. Show that f ′(x) = o(1) as x → ∞. Show that the statement need not be true if f ′ ′ is not bounded on (0, ∞).
-
50 (Difference Operators).
Given any \(f:\mathbb{R}\rightarrow \mathbb{R}\) and any \(h \in \mathbb{R},\) define the difference operators: \(\Delta _{h}f(x):= f(x + h) - f(x),\) and \(\Delta _{h}^{n+1}f(x):= \Delta _{h}(\Delta _{h}^{n}f(x))\) for all \(n \in \mathbb{N}\).
-
(a)
Using the binomial coefficients, find an explicit formula for \(\Delta _{h}^{n}f(x)\).
-
(b)
Show that, if \(f \in C^{n}(\mathbb{R}),\) then
$$\displaystyle{\Delta _{h}^{n}f(x) = h^{n}f^{(n)}(x + n\theta h),}$$for some θ ∈ [0, 1]. Use this to give a definition of f (n)(x) that is independent of the preceding derivatives \(f^{{\prime}},\ f^{{\prime\prime}},\ldots,\ f^{(n-1)}\).
-
(c)
Let \(f \in C(\mathbb{R})\). Show that f is a polynomial of degree ≤ n if and only if \(\Delta _{h}^{n+1}f(x) = 0\) for all \(x,\ h \in \mathbb{R}\).
-
(a)
-
51 (Littlewood).
Let \(f:\mathbb{R} ^{+} \rightarrow \mathbb{R}\) be (n + 1)-times differentiable, \(\lim _{x\rightarrow \infty }f(x) = L \in \mathbb{R},\) and \(f^{(n+1)} = O(1),\) as x → ∞. Show that f (n)(x) = o(1) as x → ∞. Hint: Use Problem 50.
-
52 (Local Extrema).
Let \(f: I \rightarrow \mathbb{R}\) and let x 0 ∈ I be an interior point. Suppose that f ∈ C n(J) for some open interval J with x 0 ∈ J ⊂ I, and \(f^{{\prime}}(x_{0}) = f^{{\prime\prime}}(x_{0}) = \cdots = f^{(n-1)}(x_{0}) = 0,\) but f (n)(x 0) ≠ 0.
-
(a)
If n is even and f (n)(x 0) > 0, then f has a local minimum at x 0.
-
(b)
If n is even and f (n)(x 0) < 0, then f has a local maximum at x 0.
-
(c)
If n is odd, then f has neither a local maximum nor a local minimum at x 0. Hint: Use Taylor’s Formula.
-
(a)
-
53 (The Maximum Principle).
Let \(f: [a,b] \rightarrow \mathbb{R}\) be continuous on [a, b] and twice differentiable on (a, b). Show that, if for some constant α > 0 we have \(f^{{\prime\prime}}(x) =\alpha f(x)\) for all x ∈ (a, b), then
$$\displaystyle{\vert f(x)\vert \leq \max \{\vert f(a)\vert,\vert f(b)\vert \}\quad \forall \ x \in [a,b].}$$ -
54
(Convex \(\boldsymbol{\Rightarrow }\) Locally Lipschitz). Show that any convex function \(f: (a,b) \rightarrow \mathbb{R}\) is locally Lipschitz.
-
55.
-
(a)
Let \(\varnothing \neq I \subset \mathbb{R}\) be an interval and f and g be convex functions on I. Show that if g is increasing, then g ∘ f is convex.
-
(b)
Show that if f: I → (0, ∞) is a positive function on an interval I ≠ ∅ and if log(f) is convex, the so is f. Show by an example that the converse is false.
-
(a)
-
56.
Prove the following inequality.
$$\displaystyle{(\sin x)^{\sin x} < (\cos x)^{\cos x}\quad \forall \ x \in (0,\pi /4).}$$ -
57.
Show that, if \(f \in \mathbb{R}^{\mathbb{R}}\) is differentiable, convex, and bounded, then it must be constant. Hint: Use Corollary 6.7.7.
-
58.
Show that, if \(f: I \rightarrow \mathbb{R}\) is continuous and satisfies \(f((x + y)/2) \leq [f(x) + f(y)]/2\) for all x, y ∈ I, then for every x 1, …, x n ∈ I, we have
$$\displaystyle{ f\left (\frac{x_{1} + x_{2} + \cdots + x_{n}} {n} \right ) \leq \frac{1} {n}[f(x_{1}) + f(x_{2}) + \cdots + f(x_{n})]. }$$(†)Deduce the Arithmetic–Geometric Means Inequality:
$$\displaystyle{\root{n}\of{x_{1}x_{2}\cdots x_{n}} \leq \frac{x_{1} + \cdots + x_{n}} {n} \qquad (\forall x_{1} \geq 0,\ldots,x_{n} \geq 0).}$$ -
59.
- (Corollaries of Jensen’s Inequality). :
-
For \(k = 1, 2,\ldots,n\), let x k > 0, 0y k > 0 and λ k > 0 with \(\sum _{k=1}^{n}\lambda _{k} = 1\). Define the following means:
$$\displaystyle\begin{array}{rcl} M_{-\infty } = M_{-\infty }(x_{1},x_{2},\ldots,x_{n})\!&:=& \min \{x_{1},x_{2},\ldots,x_{n}\}, {}\\ M_{\infty } = M_{\infty }(x_{1},x_{2},\ldots,x_{n})\!&:=& \max \{x_{1},x_{2},\ldots,x_{n}\}, {}\\ M_{0} = M_{0}(x_{1},x_{2},\ldots,x_{n})\!&:=& x_{1}^{\lambda _{1}}x_{ 2}^{\lambda _{2}}\cdots x_{n}^{\lambda _{n}}, {}\\ M_{t} = M_{t}(x_{1},x_{2},\ldots,x_{n})\!&:=& (\lambda _{1}x_{1}^{t} +\lambda _{ 2}x_{2}^{t} + \cdots +\lambda _{n}x_{n}^{t})^{1/t}, {}\\ \end{array}$$where t ≠ 0. Using Jensen’s inequality (Proposition 6.7.2), prove the following inequalities.
- (Power Mean Inequality). :
-
If s ≤ t, then we have
$$\displaystyle{M_{-\infty }\leq M_{s} \leq M_{t} \leq M_{\infty }.}$$ - (Weighted Arithmetic–Geometric Means Inequality). :
-
We have M 0 ≤ M 1, i.e.,
$$\displaystyle{x_{1}^{\lambda _{1}}x_{ 2}^{\lambda _{2}}\cdots x_{n}^{\lambda _{n}} \leq \lambda _{ 1}x_{1} +\lambda _{2}x_{2} + \cdots +\lambda _{n}x_{n}.}$$In particular, with \(\lambda _{k} = 1/n\) for \(k = 1,\ldots,n,\) we obtain the Arithmetic–Geometric Means Inequality:
$$\displaystyle{ \frac{1} {n}\sum _{k=1}^{n}x_{ k} \geq (x_{1}x_{2}\cdots x_{n})^{1/n}.}$$ - (Weighted Arithmetic–Harmonic Means Inequality). :
-
We have M −1 ≤ M 1, i.e.,
$$\displaystyle{\sum _{k=1}^{n}\lambda _{ k}x_{k} \geq \frac{1} {\sum _{k=1}^{n}\lambda _{k}/x_{k}}.}$$In particular, with \(\lambda _{k} = 1/n\) for \(k = 1,\ldots,n,\) we obtain the Arithmetic–Harmonic Means Inequality:
$$\displaystyle{ \frac{1} {n}\sum _{k=1}^{n}x_{ k} \geq \frac{1} { \frac{1} {n}\sum _{k=1}^{n}1/x_{k}}.}$$ - (Hölder’s Inequality). :
-
For any p > 1 and q > 1 with \(1/p + 1/q = 1\), we have
$$\displaystyle{\sum _{k=1}^{n}x_{ k}y_{k} \leq \Big (\sum _{k=1}^{n}x_{ k}^{p}\Big)^{1/p}\Big(\sum _{ k=1}^{n}x_{ k}^{q}\Big)^{1/q}.}$$ - (Minkowski’s Inequality). :
-
For any p ≥ 1, we have
$$\displaystyle{\Big(\sum _{k=1}^{n}(x_{ k} + y_{k})^{p}\Big)^{1/p} \leq \Big (\sum _{ k=1}^{n}x_{ k}^{p}\Big)^{1/p} +\Big (\sum _{ k=1}^{n}y_{ k}^{p}\Big)^{1/p}.}$$
-
60.
Find two smooth, convex functions \(f,\ g:\mathbb{R}\rightarrow \mathbb{R}\) such that f(x) = g(x) if and only if \(x \in \mathbb{Z}\).
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Sohrab, H.H. (2014). The Derivative. In: Basic Real Analysis. Birkhäuser, New York, NY. https://doi.org/10.1007/978-1-4939-1841-6_6
Download citation
DOI: https://doi.org/10.1007/978-1-4939-1841-6_6
Published:
Publisher Name: Birkhäuser, New York, NY
Print ISBN: 978-1-4939-1840-9
Online ISBN: 978-1-4939-1841-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)