Kittttttan*s Web

日本語 | English

Calculus

Applications of the Derivative

3.6 Iterations xn+1 = F(xn)

Iteration means repeating the same function. Suppose the function is F(x) = cos x. Choose any starting value, say x₀ = 1. Take its cosine: x₁ = cos x₀ = .54. Then take the cosine of x₁. That produces x₂ = cos .54 = .86. The iteration is xn+1 = cos xn. I am in radian mode on a calculator, pressing "cos" each time. The early numbers are not important, what is important is the output after 12 or 30 or 100 steps:

EXAMPLE 1   x₁₂ = .75, x₁₃ = .73, x₁₄ = .74, …, x₂₉ = .7391, x₃₀ = .7391.

The goal is to explain why the x's approach x* = .739085 …. Every starting value x₀ leads to this same number x*. What is special about .7391?

Note on iterations Do x₁ = cos x, and x₂ = cos x₁ mean that x₂ = cos² x₀? Absolutely not! Iteration creates a new and different function cos (cos x). It uses the cos button, not the squaring button. The third step creates F(F(F(x))). As soon as you can, iterate with xn+1 = ½cos xn. What limit do the x's approach? Is it ½(.7931)?

Let me slow down to understand these questions. The central idea is expressed by the equation xn+1 = F(xn). Substituting xo into F gives x₁. This output x, is the input that leads to x₂. In its turn, x₂ is the input and out comes x₃ = F(x₂). This is iteration, and it produces the sequence x₀, x₁, x₂, ….

The x's may approach a limit x*, depending on the function F. Sometimes x* also depends on the starting value x₀. Sometimes there is no limit. Look at a second example, which does not need a calculator.

EXAMPLE 2   xn+1 = F(xn) = ½xn + 4. Starting from x₀ = 0 the sequence is x₁ = ½⋅0 + 4 = 4, x₂ = ½⋅4 + 4 = 6 , x₃ = ½⋅6 + 4 = 7, x₄ = ½⋅7 + 4 = 7½, ….

Those numbers 0, 4, 6, 7, 7½, … seem to be approaching x* = 8. A computer would convince us. So will mathematics, when we see what is special about 8:
When the x's approach x*, the limit of xn+1 = ½xn + 4 is X* = ½x* + 4. This limiting equation yields x* = 8.

8 is the "steady state" where input equals output: 8 = F(8). It is thefixedpoint.

If we start at x, = 8, the sequence is 8, 8, 8, ... . When we start at x₀ = 12, the sequence goes back toward 8:
x₁ = ½⋅12 + 4 = 10, x₂ = ½⋅10 + 4 = 9 , x₃ = ½⋅9 + 4 = 8.5, …

Equation for limit: If the iterations xn+1 = F(xn) converge to x*, then x* = F(x*). To repeat: 8 is special because it equals ½⋅8 + 4. The number .7391 … is special because it equals cos .7391 …. The graphs of y = x and y = F(x) intersect at x*. To explain why the x's converge (or why they don't) is the job of calculus.

EXAMPLE 3   xn-1 = xn² has two fixed points: 0 = 0² and 1 = 1². Here F(x) = x². Starting from x₀ = 1/2 the sequence 1/4, 1/16, 1/256, … goes quickly to x* = 0. The only approaches to x* = 1 are from x₀ = 1 (of course) and from x₀ = -1. Starting from x₀ = 2 we get 4, 16, 256, … and the sequence diverges to +∞.

Each limit x* has a "basin of attraction." The basin contains all starting points x₀ that lead to x*. For Examples 1 and 2, every x₀ led to .7391 and 8. The basins were the whole line (that is still to be proved). Example 3 had three basins—the interval -1 < x₀ < 1, the two points x₀ = +1, and all the rest. The outer basin |x₀| < 1 led to ±∞. I challenge you to find the limits and the basins of attraction (by calculator) for F(x) = x − tan x.

In Example 3, x* = 0 is attracting. Points near x* move toward x*. The fixed point x* = 1 is repelling. Points near 1 move away. We now find the rule that decides whether x* is attracting or repelling. The key is the slope dF/dx at x*.

3J   Start from any x₀ near a fixed point x* = F(x*):
x* is attracting if |dF/dx| is below 1 at x*
x* is repelling if |dF/dx| is above 1 at x*.

First I will give a calculus proof. Then comes a picture of convergence, by "cobwebs." Both methods throw light on this crucial test for attraction: |dF/dx| > 1.

First proof: Subtract x* = F(x*) from xn+1 = F(xn). The difference xn+1 − x* is the same as F(xn) − F(x*). This is AF. The basic idea of calculus is that ΔF is close to F'Δx:
xn+1 − x* = F(x,) − F(x*) ≈ F'(x*)(xn − x*).   (1)

The "error" xn − x* is multiplied by the slope dF/dx. The next error xn+1 − x* is smaller or larger, based on |F'| < 1 or |F'| > 1 at x*. Every step multiplies approximately by F'(x*). Its size controls the speed of convergence.

In Example 1, F(x) is cos x and F1(x) is -sin x. There is attraction to .7391 because |sin x*| < 1. In Example 2, F is ½x + 4 and F' is ½. There is attraction to 8. In Example 3, F is x² and F' is 2x. There is superattraction to x* = 0 (where F' = 0). There is repulsion from x* = 1 (where F' = 2).

I admit one major difficulty. The approximation in equation (1) only holds near x*. If x, is far away, does the sequence still approach x*? When there are several attracting points, which x* do we reach? This section starts with good iterations, which solve the equation x* = F(x*) or ƒ(x) = 0. At the end we discover Newton's method. The next section produces crazy but wonderful iterations, not converging and not blowing up. They lead to "fractals" and "Cantor sets" and "chaos."

The mathematics of iterations is not finished. It may never be finished, but we are converging on the answers. Please choose a function and join in.

THE GRAPH OF AN ITERATION: COBWEBS

The iteration xn+1 = F(xn) involves two graphs at the same time. One is the graph of y = F(x). The other is the graph of y = x (the 45° line). The iteration jumps back and forth between these graphs. It is a very convenient way to see the whole process.

Example 1 was xn+1 = cos xn. Figure 3.19 shows the graph of cos x and the "cobweb." Starting at (x₀, x₀) on the 45° line, the rule is based on x₁ = F(x₀):
From (x₀, x₀) go up or down to (x₀, x₁) on the curve.
From (x₀, x₁) go across to (x₁, x₁) on the 45° line.

These steps are repeated forever. From x, go up to the curve at F(x₁). That height is x, . Now cross to the 45" line at (x₂, x₂). The iterations are aiming for (x*, x*) = (.7391, .7391). This is the crossing point of the two graphs y = F(x) and y = x.

Cobwebs go from (x₀, x₀) to (x₀, x₁) to (x₁ ,x₁) —line to curve to line.
[Fig. 3.19]

Example 2 was xn+1 = ½xn + 4. Both graphs are straight lines. The cobweb is one-sided, from (0,0) to (0,4) to (4,4) to (4,6) to (6,6). Notice how y changes (vertical line) and then x changes (horizontal line). The slope of F(x) is ½, so the distance to 8 is multiplied by ½ at every step.

Example 3 was xn+1 = xn². The graph of y = x² crosses the 45° line at two fixed points: O² = 0 and 1² = 1. Figure 3.20a starts the iteration close to 1, but it quickly goes away. This fixed point is repelling because F'(1) = 2. Distance from x* = 1 is doubled (at the start). One path moves down to x* = 0 —which is superattractive because F' = 0. The path from x₀ > 1 diverges to infinity.

EXAMPLE 4   F(x) has two attracting points x* (a repelling x* is always between).

Figure 3.20b shows two crossings with slope zero. The iterations and cobwebs converge quickly. In between, the graph of F(x) must cross the 45° line from below. That requires a slope greater than one. Cobwebs diverge from this unstable point, which separates the basins of attraction. The fixed point x = π is in a basin by itself!

Note 1   To draw cobwebs on a calculator, graph y = F(x) on top of y = x. On a Casio, one way is to plot (x₀, x₀) and give the command LINE: PLOT X ,Y followed by EXE. Now move the cursor vertically to y = F(x) and press EXE. Then move horizontally to y = x and press EXE. Continue. Each step draws a line.

Converging and diverging cobwebs: F(x) = x² and F(x) = x − sin x.
[Fig. 3.20]

For the TI-81 (and also the Casio) a short program produces a cobweb. Store F(x) in the Y = function slot Y1. Set the range (square window or autoscaling). Run the program and answer the prompt with x₀:
PrgmC:COBWEB   :DISP "INITIAL X0"   :Input X   :All-Off   :Y1-On   :"X"→Y4   :Lbl 1   :X→S   :Y1→T   :Line(S,S,S,T)   :Line(S,T,T,T)   :T→X   :Pause   :Goto 1

Note 2   The x's approach x* from one side when 0 < dF/dx < 1.

Note 3   A basin of attraction can include faraway x₀'s (basins can come in infinitely many pieces). This makes the problem interesting. If no fixed points are attracting, see Section 3.7 for "cycles" and "chaos."

THE ITERATION xn+1 = xn − cƒ(xn)

At this point we offer the reader a choice. One possibility is to jump ahead to the next section on "Newton's Method." That method is an iteration to solve ƒ(x) = 0. The function F(x) combines x, and ƒ(xn) and ƒ'(xn) into an optimal formula for x,+ ,. We will see how quickly Newton's method works (when it works). It is the outstanding algorithm to solve equations, and it is totally built on tangent approximations.

The other possibility is to understand (through calculus) a whole family of iterations. This family depends on a number c, which is at our disposal. The best choice of c produces Newton's method. I emphasize that iteration is by no means a new and peculiar idea. It is a fundamental technique in scientiJic computing

We start by recognizing that there are many ways to reach ƒ(x*) = 0. (I write x* for the solution.) A good algorithm may switch to Newton as it gets close. Theiterations use ƒ(xn) to decide on the next point xn+1:
xn+1 = F(xn) = xn − cƒ(xn).   (2)

Notice how F(x) is constructed from ƒ(x)—they are different! We move ƒ to the right side and multiply by a "preconditioner" c. The choice of c (or cn, if it changes from step to step) is absolutely critical. The starting guess x₀ is also important—but its accuracy is not always under our control.

Suppose the x, converge to x*. Then the limit of equation (2) is
x* = x* − cƒ(x*).   (3)

That gives ƒ(x*) = 0. If the xn's have a limit, it solves the right equation. It is a fixed point of F (we can assume cn → c ≠ 0 and ƒ(xn) → ƒ(x*)). There are two key questions, and both of them are answered by the slope F'(x*):

  1. How quickly does xn approach x* (or do the xn diverge)?
  2. What is a good choice of c (or cn)?

EXAMPLE 5   ƒ(x) = ax − b is zero at x* = b/a. The iteration xnn+1 = xn − c(axn − b) intends to find bla without actually dividing. (Early computers could not divide; they used iteration.) Subtracting x* from both sides leaves an equation for the error:
xn+1 − x* = xn − x* − c(axn − b).

Replace b by ax*. The right side is (1 − ca)(xn − x*). This "error equation" is
(error)n+1 = (1 − ca)(error)n.   (4)

At every step the error is multiplied by (1 − ca), which is F'. The error goes to zero if |F'| is less than 1. The absolute value |1 − ca| decides everything:
xn converges to x* if and only if -1 < 1 − ca < 1.   (5)

The perfect choice (if we knew it) is c = 1/a, which turns the multiplier 1 − ca into zero. Then one iteration gives the exact answer: x₁ = x₀ − (1/a)(ax₀ − b) = b/a. That is the horizontal line in Figure 3.21a, converging in one step. But look at the other lines.

This example did not need calculus. Linear equations never do. The key idea is that close to x* the nonlinear equation ƒ(x) = 0 is nearly linear. We apply the tangent approximation. You are seeing how calculus is used, in a problem that doesn't start by asking for a derivative.

THE BEST CHOICE OF c

The immediate goal is to study the errors xn − x*. They go quickly to zero, if the multiplier is small. To understand xn+1 = xn − cƒ(xn), subtract the equation x* = x* − cƒ(x*):
xn+1 − x* = xn − x* − c( ƒ(xn) − ƒ(x*)).   (6)

Now calculus enters. When you see a difference of ƒ's think of dƒ/dx. Replace ƒ(xn) − ƒ(x*) by A(xn − x*), where A stands for the slope dƒ/dx at x*:
xn+1 − x* ≈ (1 − cA)(xn − x*).   (7)

This is the error equation. The new error at step n + 1 is approximately the old error multiplied by m = 1 − cA. This corresponds to m = 1 − ca in the linear example. We keep returning to the basic test |m| = |F'(x*)| < 1:

3K   Startingnear x*, the errors xn − x* go to zero if the multiplier |m| < 1. The perfect choice is c = 1/A = 1/ƒ'(x*). Then m = 1 − cA = 0.

There is only one difficulty: We don't know x*. Therefore we don't know the perfect c. It depends on the slope A = ƒ'(x*) at the unknown solution. However we can come close, by using the slope at x,:
Choose cn = 1/ƒ'(xn). Then xn+1 = xn − ƒ(xn)/ƒ'(xn) = F(xn).

This is Newton's method. The multiplier m = 1 − cA is as near to zero as we can make it. By building dfldx into F(x),Newton speeded up the convergence of the iteration.

The error multiplier is m = 1 − cƒ'(x*). Newton has c = 1/ƒ'(x_n) and m → 0.
[Fig. 3.21]

EXAMPLE 6   Solve ƒ(x) = 2x − cos x = 0 with different iterations (different c's).

The line y = 2x crosses the cosine curve somewhere near x = 1/2. The intersection point where 2x* = cos x* has no simple formula. We start from x₀ = 1/2 and iterate xn+1 = xn − c(2xn − cos xn) with three diflerent choices of c.

Take c = 1 or c = 1/ƒ'(x₀) or update c by Newton's rule cn = 1/ ƒ'(xn):

x₀ = .50c=1c=1/ƒ'(x₀)cn=1/ƒ'(xn)
x₁ =.38.45063.45062669
x₂ =.55.45019.45018365
x₃ =.30.45018.45018361…

The column with c = 1 is diverging (repelled from x*). The second column shows convergence (attracted to x*). The third column (Newton's method) approaches x* so quickly that .4501836 and seven more digits are exact for x₃.

How does this convergence match the prediction? Note that ƒ'(x) = 2 + sin x so A = 2.435. Look to see whether the actual errors xn − x*, going down each column, are multiplied by the predicted m below that column:

c=1c=1/(2 + sin ½)c=1/(2 + sin xn)
x₀ − x* =0.054.98⋅10⁻²4.98⋅10⁻²
x₁ − x* =-0.074.43⋅10⁻⁴4.43⋅10⁻⁴
x₂ − x* =0.107.88⋅10⁻⁶3.63⋅10⁻⁸
x₃ − x* =-0.151.41⋅10⁻⁷2.78⋅10⁻¹⁶
multiplierm = -1.4m = .018m→0(Newton)

The first column shows a multiplier below -1. The errors grow at every step. Because m is negative the errors change sign—the cobweb goes outward.

The second column shows convergence with m = .018. It takes one genuine Newton step, then c is fixed. After n steps the error is closely proportional to mn = (.018)n —that is "linear convergence" with a good multiplier.

The third column shows the "quadratic convergence" of Newton's method. Multiplying the error by m is more attractive than ever, because m → 0. In fact m itself is proportional to the error, so at each step the error is squared. Problem 3.8.31 will show that (error)n+1 ≤ (error)n². This squaring carries us from 10⁻² to 10⁻⁴ to 10⁻⁸ to "machine E" in three steps. The number of correct digits is doubled at every step as Newton converges.

Note 1   The choice c = 1 produces xn+1 = xn − ƒ(xn). This is "successive substitution." The equation ƒ(x) = 0 is rewritten as x = x − ƒ(x), and each xn is substituted back to produce xn+1. Iteration with c = 1 does not always fail!

Note 3   Edwards and Penney happened to choose the same example 2x = cos x. But they cleverly wrote it as xn+1 = ½ cos xn, which has |F'| = |½ sin X| < 1. This iteration fits into our family with c = ½, and it succeeds. We asked earlier if its limit is ½(.7391). No, it is x* = .450….

Note 4   The choice c = 1/ƒ'(x₀) is "modified Newton." After one step of Newton's method, c is fixed. The steps are quicker, because they don't require a new ƒ'(xn). But we need more steps. Millions of dollars are spent on Newton's method, so speed is important. In all its forms, ƒ(x) = 0 is the central problem of computing.