Kittttttan*s Web

日本語 | English

Calculus

Applications of the Derivative

3.3 Second Derivatives: Minimum vs. Maximum

When ƒ'(x) is positive, ƒ(x) is increasing. When dy/dx is negative, y(x) is decreasing. That is clear, but what about the second derivative? From looking at the curve, can you decide the sign of ƒ"(x) or d²y/dx²? The answer is yes and the key is in the bending.

A straight line doesn't bend. The slope of y = mx + b is m (a constant). The second derivative is zero. We have to go to curves, to see a changing slope. Changes in the herivative show up in ƒ"(x):
ƒ = x² has ƒ' = 2x and ƒ" = 2 (this parabola bends up)
y = sin x has dy/dx = cos x and d²y/dx² = -sin x (the sine bends down)

The slope 2x gets larger even when the parabola is falling. The sign of ƒ or ƒ' is not revealed by ƒ". The second derivative tells about change in slope.

A function with ƒ"(x) > 0 is concave up. It bends upward as the slope increases. It is also called convex. A function with decreasing slope—this means ƒ"(x) < 0 —is concave down. Note how cos x and 1 + cos x and even 1+ ½x + cos x change from concave down to concave up at x = π/2. At that point ƒ" = -cos x changes from negative to positive. The extra 1 + ½x tilts the graph but the bending is the same.

Increasing slope = concave up (ƒ" > 0). Concave down is ƒ" < 0. Inflection point ƒ" = 0.
[Fig. 3.7]

Here is another way to see the sign of ƒ". Watch the tangent lines. When the curve is concave up, the tangent stays below it. A linear approximation is too low. This section computes a quadratic approximation—which includes the term with ƒ" > 0. When the curve bends down (ƒ" < 0), the opposite happens—the tangent lines are above the curve. The linear approximation is too high, and ƒ" lowers it.

In physical motion, ƒ"(t) is the acceleration—in units of distance/(time)². Acceleration is rate of change of velocity. The oscillation sin 2t has v = 2 cos 2t (maximum speed 2) and a = -4 sin 2t (maximum acceleration 4).

An increasing population means ƒ' > 0. An increasing growth rate means ƒ" > 0. Those are different. The rate can slow down while the growth continues.

MAXIMUM VS. MINIMUM

Remember that ƒ'(x) = 0 locates a stationary point. That may be a minimum or a maximum. The second derivative decides! Instead of computing ƒ(x) at many points, we compute ƒ"(x) at one point—the stationary point. It is a minimum if ƒ"(x) > 0.

3D When ƒ'(x) = 0 and ƒ"(x) > 0, there is a local minimum at x.
When ƒ'(x) = 0 and ƒ"(x) < 0,there is a local maximrcm at x.

To the left of a minimum, the curve is falling. After the minimum, the curve rises. The slope has changed from negative to positive. The graph bends upward and ƒ"(x) > 0.

At a maximum the slope drops from positive to negative. In the exceptional case, when ƒ'(x) = 0 and also ƒ"(x) = 0, anything can happen. An example is x³, which pauses at x = 0 and continues up (its slope is 3x² ≥ 0). However x⁴ pauses and goes down (with a very flat graph).

We emphasize that the information from ƒ'(x) and ƒ"(x) is only "local." To be certain of an absolute minimum or maximum, we need information over the whole domain.

EXAMPLE 1   ƒ(x) = x³ − x² has ƒ'(x) = 3x² − 2x and ƒ"(x) = 6x − 2.

To find the maximum and/or minimum, solve 3x² − 2x = 0. The stationary points are x = 0 and x = 2/3. At those points we need the second derivative. It is ƒ"(0)= -2 (local maximum) and ƒ"(4) = +2 (local minimum).

Between the maximum and minimum is the inflection point. That is where ƒ"(x) = 0. The curve changes from concave down to concave up. This example has ƒ"(x) = 6x − 2, so the inflection point is at x = 4.

INFLECTION POINTS

In mathematics it is a special event when a function passes through zero. When the function isf, its graph crosses the axis. When the function is ƒ', the tangent line is horizontal. When ƒ" goes through zero, we have an injection point.

The direction of bending changes at an inflection point. Your eye picks that out in a graph. For an instant the graph is straight (straight lines have ƒ" = 0). It is easy to see crossing points and stationary points and inflection points. Very few people can recognize where ƒ"' = 0 or ƒ"" = 0. I am not sure if those points have names.

There is a genuine maximum or minimum when ƒ'(x) changes sign. Similarly, there is a genuine inflection point when ƒ"(x) changes sign. The graph is concave down on one side of an inflection point and concave up on the other side.* The tangents are above the curve on one side and below it on the other side. At an inflection point, the tangent line crosses the curve (Figure 3.7b).

Notice that a parabola y = ax² + bx + c has no inflection points: y" is constant. A cubic curve has one inflection point, because ƒ" is linear. A fourth-degree curve might or might not have inflection points—the quadratic ƒ"(x) might or might not cross the axis.

EXAMPLE 2   x⁴ − 2x² is W-shaped, 4x³ − 4x has two bumps, 12x² − 4 is U-shaped. The table shows the signs at the important values of x:

x-√2-1-1/√301/√312
ƒ(x)0--0, 0--0
ƒ'(x)0+0-0
ƒ"(x)0-0

Between zeros of ƒ(x) come zeros of ƒ'(x) (stationary points). Between zeros of ƒ'(x) come zeros of ƒ"(x) (inflection points). In this example ƒ(x) has a double zero at the origin, so a single zero of ƒ' is caught there. It is a local maximum, since ƒ"(0) < 0.

Inflection points are important—not just for mathematics. We know the world population will keep rising. We don't know if the rate of growth will slow down. Remember: The rate of growth stops growing at the inflection point. Here is the 1990 report of the UN Population Fund.

The next ten years will decide whether the world population trebles or merely doubles before it finally stops growing. This may decide the future of the earth as a habitation for humans. The population, now 5.3 billion, is increasing by a quarter of a million every day. Between 90 and 100 million people will be added every year during the 1990s; a billion people—a whole China—over the decade. The fastest growth will come in the poorest countries.

A few years ago it seemed as if the rate of population growth was slowing? everywhere except in Africa and parts of South Asia. The world's population seemed set to stabilize around 10.2 billion towards the end of the next century.

Today, the situation looks less promising. The world has overshot the marker points of the 1984 "most likely" medium projection. It is now on course for an eventual total that will be closer to 11 billion than to 10 billion.

If fertility reductions continue to be slower than projected, the mark could be missed again. In that case the world could be headed towards a total of up to 14 billion people.

Starting with a census, the UN follows each age group in each country. They estimate the death rate and fertility rate—the medium estimates are published. This report is saying that we are not on track with the estimate.

Section 6.5 will come back to population, with an equation that predicts 10 billion. It assumes we are now at the inflection point. But China's second census just started on July 1, 1990. When it's finished we will know if the inflection point is still ahead.

You now understand the meaning of ƒ"(x).Its sign gives the direction of bending—the change in the slope. The rest of this section computes how much the curve bends—using the size of ƒ" and not just its sign. We find quadratic approximations based on ƒ"(x).In some courses they are optional—the main points are highlighted.

CENTERED DIFFERENCES AND SECOND DIFFERENCES

Calculus begins with average velocities, computed on either side of x:

ƒ(x + Δx) − ƒ(x)andƒ(x) − ƒ(x − Δx)are close to ƒ'(x).   (1)
ΔxΔx

We never mentioned it, but a better approximation to ƒ'(x) comes from averaging those two averages. This produces a centered difference, which is based on x + Δx and x − Δx. It divides by 2Δx:

ƒ'(x) ≈1ƒ(x + Δx) − ƒ(x)+ƒ(x) − ƒ(x − Δx)=ƒ(x + Δx) − ƒ(x − Δ).   (2)
2ΔxΔx2Δx

We claim this is better. The test is to try it on powers of x.

For ƒ(x) = x these ratios all give ƒ' = 1 (exactly). For ƒ(x)= x², only the centered difference correctly gives ƒ' = 2x. The one-sided ratio gave 2.x + Δx (in Chapter 1 it was 2t + h). It is only "first-order accurate." But centering leaves no error. We are averaging 2x − Δx with 2x − Δx. Thus the centered difference is "second-order accurate."

I ask now: What ratio converges to the second derivative? One answer is to take differences of the first derivative. Certainly Δƒ'/Δx approaches ƒ". But we want a ratio involving ƒ itself. A natural idea is to take diflerences of diferences, which brings us to "second differences":

ƒ(x + Δx) − ƒ(x)ƒ(x) − ƒ(x − Δx)
ΔxΔx=ƒ(x + Δx) − 2ƒ(x) + ƒ(x − Δx)d²ƒ.   (3)
Δx(Δx)²dx²

On the top, the difference of the difference is Δ(Δƒ)= Δ²ƒ. It corresponds to d²ƒ. On the bottom, (Δx)² corresponds to dx². This explains the way we place the 2's in d²ƒ/dx². To say it differently: dx is squared, dƒ is not squared—as in distance/(time)².

Note that (Δx)² becomes much smaller than Δx. If we divide Δƒ by (Δx)², the ratio blows up. It is the extra cancellation in the second difference Δ²ƒ that allows the limit to exist. That limit is ƒ"(x).

Application The great majority of differential equations can't be solved exactly. A typical case is ƒ"(x) = - sin ƒ(x) (the pendulum equation). To compute a solution, I would replace ƒ"(x) by the second difference in equation (3). Approximations at points spaced by Δx are a very large part of scientific computing.

To test the accuracy of these differences, here is an experiment on ƒ(x) = sin x + cos x. The table shows the errors at x = 0 from formulas (1), (2), (3):

step length Δxone-sided errorscentered errorssecond difference errors
1/4.1347.0104-.0052
1/8.0650.0026-.0013
1/16.0319.0007-.0003
1/32.0158.0002-.0001

The one-sided errors are cut in half when Δx is cut in half. The other columns decrease like (Δx)². Each reduction divides those errors by 4. The errors from onesided differences are O(Δx) and the errors from centered differences are O(Δx)².

The "big 0" notation When the errors are of order Δx, we write E = O(Δx). This means that E ≤ CΔx for some constant C. We don't compute C—in fact we don't want to deal with it. The statement "one-sided errors are Oh of delta x" captures what is important. The main point of the other columns is E = O(Δx)².

LINEAR APPROXIMATION VS. QUADRATIC APPROXIMATION

The second derivative gives a tremendous improvement over linear approximation ƒ(a) + ƒ'(a)(x − a). A tangent line starts out close to the curve, but the line has no way to bend. After a while it overshoots or undershoots the true function (see Figure 3.8). That is especially clear for the model ƒ(x) = x², when the tangent is the x axis and the parabola curves upward.

You can almost guess the term with bending. It should involve ƒ", and also (Δx)². It might be exactly ƒ"(x) times (Δx)² but it is not. The model function x² has ƒ" = 2. There must be a factor 1 to cancel that 2:

3E   The quadratic approximation to a smooth function ƒ(x) near x = a is
ƒ(x) ≈ ƒ(a) + ƒ'(a)(x − a) + ½ƒ"(a)(x − a)².   (4)

At the basepoint this is ƒ(a) = ƒ(a). The derivatives also agree at x = a. Furthermore the second derivatives agree. On both sides of (4), the second derivative at x = a is ƒ"(a).

The quadratic approximation bends with the function. It is not the absolutely final word, because there is a cubic term -ƒ"'(a)(x − a)³ and a fourth-degree term N ƒ""(a)(x − a)⁴ and so on. The whole infinite sum is a "Taylor series." Equation (4) carries that series through the quadratic term—which for practical purposes gives a terrific approximation. You will see that in numerical experiments.

Two things to mention. First, equation (4) shows why ƒ" > 0 brings the curve above the tangent line. The linear part gives the line, while the quadratic part is positive and bends upward. Second, equation (4) comes from (2) and (3). Where one-sided differences give ƒ(x + Δx) ≈ ƒ(x) + ƒ'(x)Δx, centered differences give the quadratic:

from (2): ƒ(x + Δx) ≈ ƒ(x − Δx) + 2ƒ'(x)Δx
from (3): ƒ(x + Δx) ≈ 2ƒ(x) − ƒ(x − Δx) + ƒ"(x)(Δx)².

Add and divide by 2. The result is ƒ(x + Δx) ≈ ƒ(x) + ƒ'(x)Δx + ½ƒ"(Δx)². This is correct through (Δx)² and misses by (Δx)³, as examples show:

1 + x + x² near 1/(1 − x)
[Fig. 3.8]

EXAMPLE 3   (x + Δx)³ ≈ (x³) + (3x²)(Δx) + ½(6x)(Δx)² + error(Δx)³.

EXAMPLE 4   (1 + x)ⁿ ≈ 1 + nx + ½n(n − 1)x².

The first derivative at x = 0 is n. The second derivative is n(n-1). The cubic term would be (1/6)n(n-1)(n-2)x³.We are just producing the binomial expansion!

EXAMPLE 51≈ 1 + x + x² = start of a geometric series.
1 − x

1/(1-x) has derivative 1/(1-x)². Its second derivative is 2/(1-x)³. At x = 0 those equal 1,1,2. The factor ½ cancels the 2, which leaves 1,1,1. This explains 1 + x + x².

The next terms are x³ and xⁿ. The whole series is 1/(1-x) = 1 + x + x² + x³ + …

Numerical experiment   1/√(1 + x) ≈ 1 − ½x + (3/8)x² is tested for accuracy. Dividing x by 2 almost divides the error by 8. If we only keep the linear part 1 − ½x, the error is only divided by 4. Here are the errors at x = 1/4, 1/8, and 1/16:

linear approximation (error ≈ (3/8)x²): .0194 .0053 .0014
quadratic approximation (error ≈ (-5/16)x³): -.00401 -.00055 -.00007