The equation to be solved is ƒ(x) = 0. Its solution x* is the point where the graph crosses the x axis. Figure 3.22 shows x* and a starting guess x0. Our goal is to come as close as possible to x*, based on the information ƒ(x0) and ƒ'(x0).
Section 3.6 reached Newton's formula for x1 (the next guess). We now do that directly.
What do we see at x0? The graph has height ƒ(x0) and slope ƒ'(x0). We know where we are, and which direction the curve is going. We don't know if the curve bends (we don't have ƒ"). The best plan is to follow the tangent line, which uses all the information we have.
Newton replaces ƒ(x) by its linear approximation (= tangent approximation):
ƒ(x) ≈ ƒ(x0) + ƒ'(x0)(x − x0). (1)
We want the left side to be zero. The best we can do is to make the right side zero! The tangent line crosses the axis at x,, while the curve crosses at x*. The new guess x, comes from f(x0) + ƒ'(x0)(x1 − x0) = 0. Dividing by ƒ'(x0) and solving for x1,this is step 1 of Newton's method:
x1 = x0 − | ƒ(x0) | . (2) |
ƒ'(x0) |
At this new point, compute ƒ(x1) and ƒ'(x1) —the height and slope at x1. They give a new tangent line, which crosses at x2. At every step we want ƒ(xn+1) = 0 and we settle for ƒ(xn) + ƒ'(xn)(xn+1 − xn) = 0. After dividing by ƒ'(xn), the formula for xn+1 is Newton's method.
3L The tangent line from x, crosses the axis at xn + 1:
Newton's method xn+1 = xn − ƒ(xn) / ƒ'(xn) (3)
Usually this iteration xn+1 = F(xn) converges quickly to x*.
[Fig. 3.22]
Linear approximation involves three numbers. They are Δx (across) and Δƒ (up) and the slope ƒ'(x). If we know two of those numbers, we can estimate the third. It is remarkable to realize that calculus has now used all three calculations—they are the key to this subject:
The desired Δƒ is -ƒ(xn). Formula (3) is exactly Δx = -ƒ(xn) / ƒ'(xn).
EXAMPLE 1 (Square roots) ƒ(x)= x² − b is zero at x* = b and also at -b. Newton's method is a quick way to find square roots—probably built into your calculator. The slope is ƒ'(xn) = 2xn, and formula (3) for the new guess becomes
xn+1 = xn − | xn² − b | = xn − | 1 | xn + | b | . (4) |
2xn | 2 | 2xn |
This simplifies to xn+1 = ½(xn + b/xn). Guess the square root, divide into b, and average the two numbers. The ancient Babylonians had this same idea, without knowing functions or slopes. They iterated xn+1 = F(xn):
F(x) = ½(x + b / x) and F'(x) = ½(1 − b / x²). (5)
The Babylonians did exactly the right thing. The slope F' is zero at the solution, when x² = b. That makes Newton's method converge at high speed. The convergence test is |F'(x*)| < 1. Newton achieves F'(x*)= 0 —which is superconvergence.
To find a, start the iteration xn+ ,= f(xn+ 4/xn) at xo = 1. Then x, = f(1 + 4):
x1 = 2.5 x2 = 2.05 x3 = 2.0006 x4 = 2.000000009.
The wrong decimal is twice as far out at each step. The error is squared. Subtracting x* = 2 from both sides of x,+~= F(xn) gives an error equation which displays that square:
xn+1 − = ½(xn + 4 / xn) − 2 = (1/2xn)(xn − 2)². (6)
This is (error)n+1 ≈ (1/4)(error)n². It explains the speed of Newton's method.
Remark 1 You can't start this iteration at x0 = 0. The first step computes 410 and blows up. Figure 3.22a shows why—the tangent line at zero is horizontal. It will never cross the axis.
Remark 2 Starting at x0 = -1, Newton converges to -√2 instead of +√2. That is the other x*. Often it is difficult to predict which x* Newton's method will choose. Around every solution is a "basin of attraction," but other parts of the basin may be far away. Numerical experiments are needed, with many starts x0. Finding basins of attraction was one of the problems that led to fractals.
EXAMPLE 2 Solve 1/x − a = 0 to find x* = 1/a without dividing by a. Here ƒ(x) = (1/x) − a. Newton uses ƒ'(x) = -1/x². Surprisingly, we don't divide:
xn+1 = xn − | (1/xn) − a | = xn + xn − axn². (7) |
-1/xn² |
Do these iterations converge? I will take a = 2 and aim for x* = ½. Subtracting 4from both sides of (7) changes the iteration into the error equation:
xn+1 = 2xn − 2xn² becomes xn+1 − ½ = -2(xn − ½)². (8)
At each step the error is squared. This is terrific if (and only if) you are close to x* = ½. Otherwise squaring a large error and multiplying by -2 is not good:
x0 = .70 | x1 = .42 | x2 = .487 | x3 = .4997 | x4 = .4999998 |
x0 = 1.21 | x1 = -.5 | x2 = -1.5 | x3 = -7.5 | x4 = -127.5 |
The algebra in Problem 18 confirrhs those experiments. There is fast convergence if 0 < x0 < 1. There is divergence if x, is negative or x0 > 1. The tangent line goes to a negative x1. After that Figure 3.22 shows a long trip backwards.
In the previous section we drew F(x). The iteration xn+1 = F(xn) converged to the 45° line, where x* = F(x*). In this section we are drawing ƒ(x). Now x* is the point on the axis where ƒ(x*) = 0.
To repeat: It is ƒ(x*) = 0 that we aim for. But it is the slope F'(x*) that decides whether we get there. Example 2 has F(x) = 2x − 2x². The fixed points are x* = ƒ(our solution) and x* = 0 (not attractive). The slopes F'(x*) are zero (typical Newton) and 2 (typical repeller). The key to Newton's method is F'= 0 at the solution:
The slope of F(x)= x − ƒ(x) / ƒ'(x) is ƒ(x)ƒ"(x) / (ƒ'(x))². Then F'(x) = 0 when ƒ(x)= 0.
The examples x² = b and 1/x = a show fast convergence or failure. In Chapter 13, and in reality, Newton's method solves much harder equations. Here I am going to choose a third example that came from pure curiosity about what might happen. The results are absolutely amazing. The equation is x² = -1.
EXAMPLE 3 What happens to Newton's method ifyou ask it to solve ƒ(x) = x² + 1 = 0?
The only solutions are the imaginary numbers x* = i and x* = -i. There is no real square root of -1. Newton's method might as well give up. But it has no way to know that! The tangent line still crosses the axis at a new point xn+1, even if the curve y = x² + 1 never crosses. Equation (5) still gives the iteration for b = -1:
xn+1 = ½(xn − 1/xn) = F(xn). (9)
The x's cannot approach i or -i (nothing is imaginary). So what do they do?
The starting guess x0 = 1 is interesting. It is followed by x1 = 0. Then x² divides by zero and blows up. I expected other sequences to go to infinity. But the experiments showed something different (and mystifying). When xn is large, xn+1 is less than half as large. After xn = 10 comes xn+1 = ½(10 − 1/10) = 4.95. After much indecision and a long wait, a number near zero eventually appears. Then the next guess divides by that small number and goes far out again. This reminded me of "chaos."
It is tempting to retreat to ordinary examples, where Newton's method is a big success. By trying exercises from the book or equations of your own, you will see that the fast convergence to √4 is very typical. The function can be much more complicated than x² − 4 (in practice it certainly is). The iteration for 2x = cos x was in the previous section, and the error was squared at every step. If Newton's method starts close to x*, its convergence is overwhelming. That has to be the main point of this section: Follow the tangent line.
Instead of those good functions, may I stay with this strange example x² + 1 = 0? It is not so predictable, and maybe not so important, but somehow it is more interesting. There is no real solution x*, and Newton's method xn+1 = ½(xn − 1/xn) bounces around. We will now discover xn.
The key is an exercise from trigonometry books. Most of those problems just give practice with sines and cosines, but this one exactly fits ½(xn − 1/xn):
1 | cos θ | − | sin θ | = | cos 2θ | or | 1 | cot θ − | 1 | = cot 2θ | ||||
2 | sin θ | cos θ | sin 2θ | 2 | cot θ |
In the left equation, the common denominator is 2 sin θ cos θ (which is sin 2θ). The numerator is cos2θ − sin2θ (which is cos 2θ). Replace cosinelsine by cotangent, and the identity says this:
If x0 = cot θ then x1 = cot 2θ. Then x2 = cot 4θ. Then xn = cot 2ⁿθ.
This is the formula. Our points are on the cotangent curve. Figure 3.23 starts from x0 = 2 = cot θ, and every iteration doubles the angle.
Example A The sequence x0 = 1, x1 = 0, x2 = ∞ matches the cotangents of π/4, π/2, and π. This sequence blows up because x2 has a division by x1 = 0.
[Fig. 3.23]
Example B The sequence 1/√3, -1/√3, 1/√3 matches the cotangents of π/3, 2π/3, and 4π/3. This sequence cycles forever because x0 = x2 = x4 = ….
Example C Start with a large x0 (a small θ). Then x1 is about half as large (at 2θ). Eventually one of the angles 4θ, 8θ, … hits on a large cotangent, and the x's go far out again. This is typical. Examples A and B were special, when θ/π was 1/4 or 1/3.
What we have here is chaos. The x's can't converge. They are strongly repelled by all points. They are also extremely sensitive to the value of θ. After ten steps θ is multiplied by 210 = 1024. The starting angles 60° and 61° look close, but now they are different by 1024°. If that were a multiple of 180°, the cotangents would still be close. In fact the x10's are 0.6 and 14.
This chaos in mathematics is also seen in nature. The most familiar example is the weather, which is much more delicate than you might think. The headline "Forecasting Pushed Too Far" appeared in Science (1989). The article said that the snow-balling of small errors destroys the forecast after six days. We can't follow the weather equations for a month—the flight of a plane can change everything. This is a revolutionary idea, that a simple rule can lead to answers that are too sensitive to compute.
We are accustomed to complicated formulas (or no formulas). We are not accustomed to innocent-looking formulas like cot 2ⁿθ, which are absolutely hopeless after 100 steps
Now I get to tell you about new mathematics. First I will change the iteration xn+1 = ½(xn − 1/xn) into one that is even simpler. By switching from x to z = 1/(1 + x²), each new z turns out to involve only the old z and z²:
zn+1 = 4zn − 4zn². (10)
This is the most famous quadratic iteration in the world. There are books about it, and Problem 28 shows where it comes from. Our formula for x, leads to z,:
zn = 1 / (1 + xn) = 1 / (1 + (cot 2ⁿθ)²) = (sin 2ⁿθ)². (11)
The sine is just as unpredictable as the cotangent, when 2"8gets large. The new thing
is to locate this quadratic as the last member (when a = 4) of the family
zn+1 = azn − azn², 0≤a≤4. (12)
Example 2 happened to be the middle member a = 2, converging to ½. I would like to give a brief and very optional report on this iteration, for different a's.
The general principle is to start with a number zo between 0 and 1, and compute z1, z2, z3, …. It is fascinating to watch the behavior change as a increases. You can see it on your own computer. Here we describe some things to look for. All numbers stay between 0 and 1 and they may approach a limit. That happens when a is small:
for 0 ≤ a ≤ 1 the zn approach z* = 0
for 1 ≤ a ≤ 3 the zn approach z* = (a − 1)/a
Those limit points are the solutions of z = F(z). They are the fixed points where z* = az* − a(z*)². But remember the test for approaching a limit: The slope at z* cannot be larger than one. Here F = az − az² has F' = a − 2az. It is easy to check |F'| ≤ 1 at the limits predicted above. The hard problem—sometimes impossible—is to predict what happens above a = 3. Our case is a = 4.
The z's cannot approach a limit when |Ft(z*)l > 1. Something has to happen, and there are at least three possibilities:
The z0's can cycle or Jill the whole interval (0,1) or approach a Cantor set.
I start with a random number zo, take 100 steps, and write down steps 101 to 105:
a=3.4 | a=3.5 | a=3.8 | a=4.0 | |
z101 | .842 | .875 | .336 | .169 |
z102 | .452 | .383 | .848 | .562 |
z103 | .842 | .827 | .491 | .985 |
z104 | .452 | .501 | .950 | .060 |
z105 | .842 | .875 | .182 | .225 |
The first column is converging to a "2-cycle." It alternates between x = 342 and y = .452. Those satisfy y = F(x) and x = F(y) = F(F(x)). If we look at a double step when a = 3.4, x and y are fixed points of the double iteration zn+2 = F(F(zn)). When a increases past 3.45, this cycle becomes unstable.
At that point the period doublesfrom 2 to 4. With a = 3.5 you see a "4-cycle" in the table—it repeats after four steps. The sequence bounces from 375 to .383 to .827 to .501 and back to .875. This cycle must be attractive or we would not see it. But it also becomes unstable as a increases. Next comes an 8-cycle, which is stable in a little window (you could compute it) around a = 3.55. The cycles are stable for shorter and shorter intervals of a's. Those stability windows are reduced by the Feigenbaum shrinking factor 4.6692…. Cycles of length 16 and 32 and 64 can be seen in physical experiments, but they are all unstable before a = 3.57. What happens then?
The new and unexpected behavior is between 3.57 and 4. Down each line of Figure 3.24, the computer has plotted the values of z1001 to z2000—omitting the first thousand points to let a stable period (or chaos) become established. No points appeared in the big white wedge. I don't know why. In the window for period 3, you see only three 2's. Period 3 is followed by 6, 12, 24, …. There is period doubling at the end of every window (including all the windows that are too small to see). You can reproduce this figure by iterating zn+1 = azn − azn² from any zo and plotting the results.
[Fig. 3.24]
I can't tell what happens at a = 3.8. There may be a stable cycle of some long period. The z's may come close to every point between 0 and 1. A third possibility is to approach a very thin limit set, which looks like the famous Cantor set:
To construct the Cantor set, divide [0,1] into three pieces and remove the open interval (1/3, 2/3). Then remove (1/9, 2/9) and (7/9, 8/9) from what remains. At each step take out the middle thirds. The points that are left form the Cantor set.
All the endpoints 1/3, 2/3, 1/9, 2/9, … are in the set. So is 1/4 (Problem 42). Nevertheless the lengths of the removed intervals add to 1 and the Cantor set has "measure zero." What is especially striking is its self-similarity: Between 0 and 1/3 you see the same Cantor set three times smaller. From 0 to 6 the Cantor set is there again, scaled down by 9. Every section, when blown up, copies the larger picture.
Fractals That self-similarity is typical of a fractal. There is an infinite sequence of scales. A mathematical snowflake starts with a triangle and adds a bump in the middle of each side. At every step the bumps lengthen the sides by 4/3. The final boundary is self-similar, like an infinitely long coastline.
The word "fractal" comes from fractional dimension. The snowflake boundary has dimension larger than 1 and smaller than 2. The Cantor set has dimension larger than 0 and smaller than 1. Covering an ordinary line segment with circles of radius r would take c/r circles. For fractals it takes c/rD circles—and D is the dimension.
[Fig. 3.25]
Our iteration zn+1 = 4zn − 4zn² has a = 4, at the end of Figure 3.24. The sequence z0, z1, … goes everywhere and nowhere. Its behavior is chaotic, and statistical tests find no pattern. For all practical purposes the numbers are random.
Think what this means in an experiment (or the stock market). If simple rules produce chaos, there is absolutely no way to predict the results. No measurement can ever be sufficiently accurate. The newspapers report that Pluto's orbit is chaotic—even though it obeys the law of gravity. The motion is totally unpredictable over long times. I don't know what that does for astronomy (or astrology).
The most readable book on this subject is Gleick's best-seller Chaos: Making a New Science. The most dazzling books are The Beauty of Fractals and The Science of Fractal Images, in which Peitgen and Richter and Saupe show photographs that have been in art museums around the world. The most original books are Mandelbrot's Fractals and Fractal Geometry. Our cover has a fractal from Figure 13.11.
We return to friendlier problems in which calculus is not helpless.
The hard part of Newton's method is to find dƒ/dx. We need it for the slope of the tangent line. But calculus can approximate by Δƒ/Δx —using the values of ƒ(x) already computed at xn and xn-1.
The secant method follows the secant line instead of the tangent line:
Secant: xn+1 = xn − ƒ(xn)/(Δƒ/Δx)n where (Δƒ/Δx)n = ƒ(xn) − ƒ(xn-1) / (xn − xn-1). (13)
The secant line connects the two latest points on the graph of ƒ(x). Its equation is y − ƒ(xn) = (Δƒ/Δx)(x − xn). Set y = 0 to find equation (13) for the new x = xn+1, where the line crosses the axis.
Prediction: Three secant steps are about as good as two Newton steps. Both should give four times as many correct decimals: (error) → (error)⁴. Probably the secant method is also chaotic for x² + 1 = 0.
These Newton and secant programs are for the TI-81. Place the formula for ƒ(x) in slot Y1
and the formula for ƒ'(x) in slot Y2
on the Y
= function edit screen. Answer the prompt with the initial x0 = X0
. The programs pause to display each approximation xn, the value ƒ(xn), and the difference xn − xn-1. Press ENTER
to continue or press ON
and select item 2: Quit
to break. If ƒ(xn) = 0, the programs display ROOT AT
and the root xn.
PrgmN:NEWTON :Disp "ENTER FORMORE" PrgmS: SECANT :Y+T
:Disp "X0=" :Disp "ON2TOBREAK" :Disp "X0=" :Y1→Y
:Input X :Disp " " :Input X :Disp "ENTER FORMORE"
:X→S :Disp "XN FXN XN-XNM1" :X→S :Disp "XN FXN XN-XNM1"
:Y1&rarrY :Disp X :Y1→T :Disp X
:LbL 1 :Disp Y :Disp "X1=" :Disp Y
:X-Y/Y2→X :Disp D :Input X :Disp D
:X-S→D :Pause :Y1→Y :Pause
:X→S :If Y≠0 :LbL 1 :If Y≠O
:Y1→Y :Goto 1 :X-S→D :Goto 1
:Disp "ROOT AT" :X→S :Disp "ROOT AT"
:Disp X :X-YD/(Y-T)→X :Disp X