Kittttttan*s Web

日本語 | English

Calculus

Applications of the Derivative

3.7 Newton's Method and Chaos

The equation to be solved is ƒ(x) = 0. Its solution x* is the point where the graph crosses the x axis. Figure 3.22 shows x* and a starting guess x₀. Our goal is to come as close as possible to x*, based on the information ƒ(x₀) and ƒ'(x₀).

Section 3.6 reached Newton's formula for x₁ (the next guess). We now do that directly.

What do we see at x₀? The graph has height ƒ(x₀) and slope ƒ'(x₀). We know where we are, and which direction the curve is going. We don't know if the curve bends (we don't have ƒ"). The best plan is to follow the tangent line, which uses all the information we have.

Newton replaces ƒ(x) by its linear approximation (= tangent approximation):
ƒ(x) ≈ ƒ(x₀) + ƒ'(x₀)(x − x₀). (1)

We want the left side to be zero. The best we can do is to make the right side zero! The tangent line crosses the axis at x,, while the curve crosses at x*. The new guess x, comes from f(x₀) + ƒ'(x₀)(x₁ − x₀) = 0. Dividing by ƒ'(x₀) and solving for x₁,this is step 1 of Newton's method:

x₁ = x₀ −	ƒ(x₀)	. (2)
	ƒ'(x₀)

At this new point, compute ƒ(x₁) and ƒ'(x₁) —the height and slope at x₁. They give a new tangent line, which crosses at x₂. At every step we want ƒ(x_n+1) = 0 and we settle for ƒ(x_n) + ƒ'(x_n)(x_n+1 − x_n) = 0. After dividing by ƒ'(x_n), the formula for x_n+1 is Newton's method.

3L The tangent line from x, crosses the axis at xn + 1:
Newton's method x_n+1 = x_n − ƒ(x_n) / ƒ'(x_n) (3)
Usually this iteration x_n+1 = F(x_n) converges quickly to x*.

Newton's method along tangent lines from x₀ to x₁ to x₂.
[Fig. 3.22]

Linear approximation involves three numbers. They are Δx (across) and Δƒ (up) and the slope ƒ'(x). If we know two of those numbers, we can estimate the third. It is remarkable to realize that calculus has now used all three calculations—they are the key to this subject:

Estimate the slope ƒ'(x) from Δƒ/Δx (Section 2.1)
Estimate the change Δƒ from ƒ'(x)Δx (Section 3.1)
Estimate the change Δx from Δƒ/ƒ'(x) (Newton's method)

The desired Δƒ is -ƒ(x_n). Formula (3) is exactly Δx = -ƒ(x_n) / ƒ'(x_n).

EXAMPLE 1 (Square roots) ƒ(x)= x² − b is zero at x* = b and also at -b. Newton's method is a quick way to find square roots—probably built into your calculator. The slope is ƒ'(x_n) = 2x_n, and formula (3) for the new guess becomes

x_n+1 = x_n −	x_n² − b	= x_n −	1	x_n +	b	. (4)
	2x_n		2		2x_n

This simplifies to x_n+1 = ½(x_n + b/x_n). Guess the square root, divide into b, and average the two numbers. The ancient Babylonians had this same idea, without knowing functions or slopes. They iterated x_n+1 = F(x_n):

F(x) = ½(x + b / x) and F'(x) = ½(1 − b / x²). (5)

The Babylonians did exactly the right thing. The slope F' is zero at the solution, when x² = b. That makes Newton's method converge at high speed. The convergence test is |F'(x*)| < 1. Newton achieves F'(x*)= 0 —which is superconvergence.

To find a, start the iteration xn+ ,= f(xn+ 4/xn) at xo = 1. Then x, = f(1 + 4):
x₁ = 2.5 x₂ = 2.05 x₃ = 2.0006 x₄ = 2.000000009.

The wrong decimal is twice as far out at each step. The error is squared. Subtracting x* = 2 from both sides of x,+~= F(xn) gives an error equation which displays that square:
x_n+1 − = ½(x_n + 4 / x_n) − 2 = (1/2x_n)(x_n − 2)². (6)

This is (error)_n+1 ≈ (1/4)(error)_n². It explains the speed of Newton's method.

Remark 1 You can't start this iteration at x₀ = 0. The first step computes 410 and blows up. Figure 3.22a shows why—the tangent line at zero is horizontal. It will never cross the axis.

Remark 2 Starting at x₀ = -1, Newton converges to -√2 instead of +√2. That is the other x*. Often it is difficult to predict which x* Newton's method will choose. Around every solution is a "basin of attraction," but other parts of the basin may be far away. Numerical experiments are needed, with many starts x₀. Finding basins of attraction was one of the problems that led to fractals.

EXAMPLE 2 Solve 1/x − a = 0 to find x* = 1/a without dividing by a. Here ƒ(x) = (1/x) − a. Newton uses ƒ'(x) = -1/x². Surprisingly, we don't divide:

x_n+1 = x_n −	(1/x_n) − a	= x_n + x_n − ax_n². (7)
	-1/x_n²

Do these iterations converge? I will take a = 2 and aim for x* = ½. Subtracting 4from both sides of (7) changes the iteration into the error equation:
x_n+1 = 2x_n − 2x_n² becomes x_n+1 − ½ = -2(x_n − ½)². (8)

At each step the error is squared. This is terrific if (and only if) you are close to x* = ½. Otherwise squaring a large error and multiplying by -2 is not good:

x₀ = .70	x₁ = .42	x₂ = .487	x₃ = .4997	x₄ = .4999998
x₀ = 1.21	x₁ = -.5	x₂ = -1.5	x₃ = -7.5	x₄ = -127.5

The algebra in Problem 18 confirrhs those experiments. There is fast convergence if 0 < x₀ < 1. There is divergence if x, is negative or x₀ > 1. The tangent line goes to a negative x₁. After that Figure 3.22 shows a long trip backwards.

In the previous section we drew F(x). The iteration x_n+1 = F(x_n) converged to the 45° line, where x* = F(x*). In this section we are drawing ƒ(x). Now x* is the point on the axis where ƒ(x*) = 0.

To repeat: It is ƒ(x*) = 0 that we aim for. But it is the slope F'(x*) that decides whether we get there. Example 2 has F(x) = 2x − 2x². The fixed points are x* = ƒ(our solution) and x* = 0 (not attractive). The slopes F'(x*) are zero (typical Newton) and 2 (typical repeller). The key to Newton's method is F'= 0 at the solution:

The slope of F(x)= x − ƒ(x) / ƒ'(x) is ƒ(x)ƒ"(x) / (ƒ'(x))². Then F'(x) = 0 when ƒ(x)= 0.

The examples x² = b and 1/x = a show fast convergence or failure. In Chapter 13, and in reality, Newton's method solves much harder equations. Here I am going to choose a third example that came from pure curiosity about what might happen. The results are absolutely amazing. The equation is x² = -1.

EXAMPLE 3 What happens to Newton's method ifyou ask it to solve ƒ(x) = x² + 1 = 0?

The only solutions are the imaginary numbers x* = i and x* = -i. There is no real square root of -1. Newton's method might as well give up. But it has no way to know that! The tangent line still crosses the axis at a new point x_n+1, even if the curve y = x² + 1 never crosses. Equation (5) still gives the iteration for b = -1:
x_n+1 = ½(x_n − 1/x_n) = F(x_n). (9)

The x's cannot approach i or -i (nothing is imaginary). So what do they do?

The starting guess x₀ = 1 is interesting. It is followed by x₁ = 0. Then x² divides by zero and blows up. I expected other sequences to go to infinity. But the experiments showed something different (and mystifying). When x_n is large, x_n+1 is less than half as large. After x_n = 10 comes x_n+1 = ½(10 − 1/10) = 4.95. After much indecision and a long wait, a number near zero eventually appears. Then the next guess divides by that small number and goes far out again. This reminded me of "chaos."

It is tempting to retreat to ordinary examples, where Newton's method is a big success. By trying exercises from the book or equations of your own, you will see that the fast convergence to √4 is very typical. The function can be much more complicated than x² − 4 (in practice it certainly is). The iteration for 2x = cos x was in the previous section, and the error was squared at every step. If Newton's method starts close to x*, its convergence is overwhelming. That has to be the main point of this section: Follow the tangent line.

Instead of those good functions, may I stay with this strange example x² + 1 = 0? It is not so predictable, and maybe not so important, but somehow it is more interesting. There is no real solution x*, and Newton's method x_n+1 = ½(x_n − 1/x_n) bounces around. We will now discover x_n.

A FORMULA FOR x_n

The key is an exercise from trigonometry books. Most of those problems just give practice with sines and cosines, but this one exactly fits ½(x_n − 1/x_n):

cos θ

−

sin θ

cos 2θ

cot θ −

= cot 2θ

sin θ

cos θ

sin 2θ

cot θ

In the left equation, the common denominator is 2 sin θ cos θ (which is sin 2θ). The numerator is cos2θ − sin2θ (which is cos 2θ). Replace cosinelsine by cotangent, and the identity says this:
If x₀ = cot θ then x₁ = cot 2θ. Then x₂ = cot 4θ. Then x_n = cot 2ⁿθ.

This is the formula. Our points are on the cotangent curve. Figure 3.23 starts from x₀ = 2 = cot θ, and every iteration doubles the angle.

Example A The sequence x₀ = 1, x₁ = 0, x₂ = ∞ matches the cotangents of π/4, π/2, and π. This sequence blows up because x₂ has a division by x₁ = 0.

Newton's method for x² + 1 = 0. Iteration gives x_n = cot 2ⁿθ.
[Fig. 3.23]

Example B The sequence 1/√3, -1/√3, 1/√3 matches the cotangents of π/3, 2π/3, and 4π/3. This sequence cycles forever because x₀ = x₂ = x₄ = ….

Example C Start with a large x₀ (a small θ). Then x₁ is about half as large (at 2θ). Eventually one of the angles 4θ, 8θ, … hits on a large cotangent, and the x's go far out again. This is typical. Examples A and B were special, when θ/π was 1/4 or 1/3.

What we have here is chaos. The x's can't converge. They are strongly repelled by all points. They are also extremely sensitive to the value of θ. After ten steps θ is multiplied by 2¹⁰ = 1024. The starting angles 60° and 61° look close, but now they are different by 1024°. If that were a multiple of 180°, the cotangents would still be close. In fact the x₁₀'s are 0.6 and 14.

This chaos in mathematics is also seen in nature. The most familiar example is the weather, which is much more delicate than you might think. The headline "Forecasting Pushed Too Far" appeared in Science (1989). The article said that the snow-balling of small errors destroys the forecast after six days. We can't follow the weather equations for a month—the flight of a plane can change everything. This is a revolutionary idea, that a simple rule can lead to answers that are too sensitive to compute.

We are accustomed to complicated formulas (or no formulas). We are not accustomed to innocent-looking formulas like cot 2ⁿθ, which are absolutely hopeless after 100 steps

CHAOS FROM A PARABOLA

Now I get to tell you about new mathematics. First I will change the iteration x_n+1 = ½(x_n − 1/x_n) into one that is even simpler. By switching from x to z = 1/(1 + x²), each new z turns out to involve only the old z and z²:
z_n+1 = 4z_n − 4z_n². (10)

This is the most famous quadratic iteration in the world. There are books about it, and Problem 28 shows where it comes from. Our formula for x, leads to z,:
z_n = 1 / (1 + x_n) = 1 / (1 + (cot 2ⁿθ)²) = (sin 2ⁿθ)². (11)

The sine is just as unpredictable as the cotangent, when 2"8gets large. The new thing is to locate this quadratic as the last member (when a = 4) of the family
z_n+1 = az_n − az_n², 0≤a≤4. (12)

Example 2 happened to be the middle member a = 2, converging to ½. I would like to give a brief and very optional report on this iteration, for different a's.

The general principle is to start with a number zo between 0 and 1, and compute z₁, z₂, z₃, …. It is fascinating to watch the behavior change as a increases. You can see it on your own computer. Here we describe some things to look for. All numbers stay between 0 and 1 and they may approach a limit. That happens when a is small:
for 0 ≤ a ≤ 1 the z_n approach z* = 0
for 1 ≤ a ≤ 3 the z_n approach z* = (a − 1)/a

Those limit points are the solutions of z = F(z). They are the fixed points where z* = az* − a(z*)². But remember the test for approaching a limit: The slope at z* cannot be larger than one. Here F = az − az² has F' = a − 2az. It is easy to check |F'| ≤ 1 at the limits predicted above. The hard problem—sometimes impossible—is to predict what happens above a = 3. Our case is a = 4.

The z's cannot approach a limit when |Ft(z*)l > 1. Something has to happen, and there are at least three possibilities:
The z₀'s can cycle or Jill the whole interval (0,1) or approach a Cantor set.

I start with a random number zo, take 100 steps, and write down steps 101 to 105:

	a=3.4	a=3.5	a=3.8	a=4.0
z₁₀₁	.842	.875	.336	.169
z₁₀₂	.452	.383	.848	.562
z₁₀₃	.842	.827	.491	.985
z₁₀₄	.452	.501	.950	.060
z₁₀₅	.842	.875	.182	.225

The first column is converging to a "2-cycle." It alternates between x = 342 and y = .452. Those satisfy y = F(x) and x = F(y) = F(F(x)). If we look at a double step when a = 3.4, x and y are fixed points of the double iteration z_n+2 = F(F(z_n)). When a increases past 3.45, this cycle becomes unstable.

At that point the period doublesfrom 2 to 4. With a = 3.5 you see a "4-cycle" in the table—it repeats after four steps. The sequence bounces from 375 to .383 to .827 to .501 and back to .875. This cycle must be attractive or we would not see it. But it also becomes unstable as a increases. Next comes an 8-cycle, which is stable in a little window (you could compute it) around a = 3.55. The cycles are stable for shorter and shorter intervals of a's. Those stability windows are reduced by the Feigenbaum shrinking factor 4.6692…. Cycles of length 16 and 32 and 64 can be seen in physical experiments, but they are all unstable before a = 3.57. What happens then?

The new and unexpected behavior is between 3.57 and 4. Down each line of Figure 3.24, the computer has plotted the values of z₁₀₀₁ to z₂₀₀₀—omitting the first thousand points to let a stable period (or chaos) become established. No points appeared in the big white wedge. I don't know why. In the window for period 3, you see only three 2's. Period 3 is followed by 6, 12, 24, …. There is period doubling at the end of every window (including all the windows that are too small to see). You can reproduce this figure by iterating z_n+1 = az_n − az_n² from any zo and plotting the results.

Period doubling and chaos from iterating F(z) (stolen by special permission from Introduction to Applied Mathematics by Gilbert Strang, Wellesley-Cambridge Press).
[Fig. 3.24]

CANTOR SETS AND FRACIALS

I can't tell what happens at a = 3.8. There may be a stable cycle of some long period. The z's may come close to every point between 0 and 1. A third possibility is to approach a very thin limit set, which looks like the famous Cantor set:

To construct the Cantor set, divide [0,1] into three pieces and remove the open interval (1/3, 2/3). Then remove (1/9, 2/9) and (7/9, 8/9) from what remains. At each step take out the middle thirds. The points that are left form the Cantor set.

All the endpoints 1/3, 2/3, 1/9, 2/9, … are in the set. So is 1/4 (Problem 42). Nevertheless the lengths of the removed intervals add to 1 and the Cantor set has "measure zero." What is especially striking is its self-similarity: Between 0 and 1/3 you see the same Cantor set three times smaller. From 0 to 6 the Cantor set is there again, scaled down by 9. Every section, when blown up, copies the larger picture.

Fractals That self-similarity is typical of a fractal. There is an infinite sequence of scales. A mathematical snowflake starts with a triangle and adds a bump in the middle of each side. At every step the bumps lengthen the sides by 4/3. The final boundary is self-similar, like an infinitely long coastline.

The word "fractal" comes from fractional dimension. The snowflake boundary has dimension larger than 1 and smaller than 2. The Cantor set has dimension larger than 0 and smaller than 1. Covering an ordinary line segment with circles of radius r would take c/r circles. For fractals it takes c/r^D circles—and D is the dimension.

Cantor set (middle thirds removed). Fractal snowflake (infinite boundary).
[Fig. 3.25]

Our iteration z_n+1 = 4z_n − 4z_n² has a = 4, at the end of Figure 3.24. The sequence z₀, z₁, … goes everywhere and nowhere. Its behavior is chaotic, and statistical tests find no pattern. For all practical purposes the numbers are random.

Think what this means in an experiment (or the stock market). If simple rules produce chaos, there is absolutely no way to predict the results. No measurement can ever be sufficiently accurate. The newspapers report that Pluto's orbit is chaotic—even though it obeys the law of gravity. The motion is totally unpredictable over long times. I don't know what that does for astronomy (or astrology).

The most readable book on this subject is Gleick's best-seller Chaos: Making a New Science. The most dazzling books are The Beauty of Fractals and The Science of Fractal Images, in which Peitgen and Richter and Saupe show photographs that have been in art museums around the world. The most original books are Mandelbrot's Fractals and Fractal Geometry. Our cover has a fractal from Figure 13.11.

We return to friendlier problems in which calculus is not helpless.

NEWTON'S METHOD VS. SECANT METHOD: CALCULATOR PROGRAMS

The hard part of Newton's method is to find dƒ/dx. We need it for the slope of the tangent line. But calculus can approximate by Δƒ/Δx —using the values of ƒ(x) already computed at x_n and x_n-1.

The secant method follows the secant line instead of the tangent line:
Secant: x_n+1 = x_n − ƒ(x_n)/(Δƒ/Δx)_n where (Δƒ/Δx)_n = ƒ(x_n) − ƒ(x_n-1) / (x_n − x_n-1). (13)

The secant line connects the two latest points on the graph of ƒ(x). Its equation is y − ƒ(x_n) = (Δƒ/Δx)(x − x_n). Set y = 0 to find equation (13) for the new x = x_n+1, where the line crosses the axis.

Prediction: Three secant steps are about as good as two Newton steps. Both should give four times as many correct decimals: (error) → (error)⁴. Probably the secant method is also chaotic for x² + 1 = 0.

These Newton and secant programs are for the TI-81. Place the formula for ƒ(x) in slot Y1 and the formula for ƒ'(x) in slot Y2 on the Y = function edit screen. Answer the prompt with the initial x₀ = X0. The programs pause to display each approximation x_n, the value ƒ(x_n), and the difference x_n − x_n-1. Press ENTER to continue or press ON and select item 2: Quit to break. If ƒ(x_n) = 0, the programs display ROOT AT and the root x_n.

PrgmN:NEWTON :Disp "ENTER FORMORE" PrgmS: SECANT :Y+T :Disp "X0=" :Disp "ON2TOBREAK" :Disp "X0=" :Y1→Y :Input X :Disp " " :Input X :Disp "ENTER FORMORE" :X→S :Disp "XN FXN XN-XNM1" :X→S :Disp "XN FXN XN-XNM1" :Y1&rarrY :Disp X :Y1→T :Disp X :LbL 1 :Disp Y :Disp "X1=" :Disp Y :X-Y/Y2→X :Disp D :Input X :Disp D :X-S→D :Pause :Y1→Y :Pause :X→S :If Y≠0 :LbL 1 :If Y≠O :Y1→Y :Goto 1 :X-S→D :Goto 1 :Disp "ROOT AT" :X→S :Disp "ROOT AT" :Disp X :X-YD/(Y-T)→X :Disp X

next:3.8 The Mean Value Theorem and l'Hôpital's Rule