Kittttttan*s Web

日本語 | English

Calculus

The Chain Rule

4.1 Derivatives by the Chain Rule

You remember that the derivative of ƒ(x)g(x) is not (dƒ/dx)(dg/dx). The derivative of sin x times x² is not cos x times 2x. The product rule gave two terms, not one term. But there is another way of combining the sine function ƒ and the squaring function g into a single function. The derivative of that new function does involve the cosine times 2x (but with a certain twist). We will first explain the new function, and then find the "chain rule" for its derivative.

May I say here that the chain rule is important. It is easy to learn, and you will use it often. I see it as the third basic way to find derivatives of new functions from derivatives of old functions. (So far the old functions are xn, sin x, and cos x. Still ahead are ex and log x.) When ƒ and g are added and multiplied, derivatives come from the sum rule and product rule. This section combines ƒ and g in a third way.

The new function is sin(x²)—the sine of x². It is created out of the two original functions: if x = 3 then x² = 9 and sin(x²) = sin 9. There is a "chain" of functions, combining sin x and x² into the composite function sin(x²). You start with x, then find g(x), then find ƒ(g(x)):

The squaring function gives y = x². This is g(x).
The sine function produces z = sin y = sin(x²). This is ƒ(g(x)).

The "inside function" g(x) gives y. This is the input to the "outside function" ƒ(y). That is called composition. It starts with x and ends with z. The composite function is sometimes written ƒ⋅g (the circle shows the difference from an ordinary product ƒg). More often you will see ƒ(g(x)):

z(x) = ƒ⋅g(x) = ƒ(g(x)).   (1)

Other examples are cos 2x and (2x)³, with g = 2x. On a calculator you input x, then push the "g" button, then push the "ƒ" button:
From x compute y = g(x)   From y compute z = ƒ(y).

There is not a button for every function! But the squaring function and sine function are on most calculators, and they are used in that order. Figure 4.la shows how squaring will stretch and squeeze the sine function.

That graph of sin x² is a crazy FM signal (the Frequency is Modulated). The wave goes up and down like sin x, but not at the same places. Changing to sin g(x) moves the peaks left and right. Compare with a product g(x) sin x, which is an AM signal (the Amplitude is Modulated).

Remark   ƒ(g(x)) is usually different from g(ƒ(x)). The order of ƒ and g is usually important. For ƒ(x) = sin x and g(x) = x², the chain in the opposite order g(ƒ(x)) gives something different:

First apply the sine function: y = sin x
Then apply the squaring function: z = (sin x)².

That result is often written sin²x, to save on parentheses. It is never written sin x², which is totally different. Compare them in Figure 4.1.

Fig.4.1 ƒ(g(x)) is different from g(ƒ(x)). Apply g then ƒ, or ƒ then g.

EXAMPLE 1   The composite functionfig can be deceptive. If g(x) = x³ and ƒ(y) = y⁴, how does ƒ(g(x)) differ from the ordinary product ƒ(x)g(x)? The ordinary product is x⁷. The chain starts with y = x³, and then z = y⁴ = x¹². The composition of x³ and y⁴ gives ƒ(g(x)) = x¹².

EXAMPLE 2   In Newton's method, F(x) is composed with itself. This is iteration. Every output xn is fed back as input, to find xn+1 = F(xn). The example F(x) = ½x + 4 has F(F(x)) = ½(½x + 4) + 4. That produces z = ¼x + 6.

The derivative of F(x) is ½. The derivative of z = F(F(x)) is ¼, which is ½ times ½. We multiply derivatives. This is a special case of the chain rule.

An extremely special case is ƒ(x) = x and g(x) = x. The ordinary product is x². The chain ƒ(g(x)) produces only x! The output from the "identity function" is g(x) = x.* When the second identity function operates on x it produces x again. The derivative is 1 times 1. I can give more composite functions in a table:

y = g(x)z = ƒ(y)z = ƒ(g(x))
x² − 1yx² − 1
cos x(cos x)³
2x2y22x
x + 5y − 5x

The last one adds 5 to get y. Then it subtracts 5 to reach z. So z = x. Here output equals input: ƒ(g(x)) = x. These "inverse functions" are in Section 4.3. The other examples create new functions z(x) and we want their derivatives.

THE DERIVATIVE OF ƒ(g(x))

What is the derivative of z = sin x²? It is the limit of Δz/Δx. Therefore we look at a nearby point x + Δx. That change in x produces a change in y = x² —which moves to y + Δy = (x + Δx)². From this change in y, there is a change in z = ƒ(y). It is a "domino effect," in which each changed input yields a changed output: Δx produces Δy produces Δz. We have to connect the final Δz to the original Δx.

The key is to write Δz/Δx as Δz/Δy times Δy/Δx. Then let Δx approach zero. In the limit, dz/dx is given by the "chain rule":

Δz=ΔzΔybecomes the chain ruledz=dzdy.   (2)
ΔxΔyΔxdxdydx

As Δx goes to zero, the ratio Δy/Δx approaches dy/dx. Therefore Δy must be going to zero, and Δz/Δy approaches dz/dy. The limit of a product is the product of the separate limits (end of quick proof). We multiply derivatiues:

4AChah Raze   Suppose g(x) has a derivative at x dƒ(y) has a derivative at y = g(x). Then the derivative of z = ƒ(g(x)) is
dz/dx = (dz/dy)(dy/dx) = ƒ'(g(x))g'(x).   (3)
The slope at x is dƒ/dy (at y) times dg/dx (at x).

Caution   The chain rule does not say that the derivative of sin x² is (cos x)(2x). True, cos y is the derivative of sin y. The point is that cos y must be evaluated at y (not at x). We do not want dƒ/dx at x, we want dƒ/dy at y = x²:
The derivative of sin x² is (cos x²) times (2x).   (4)

EXAMPLE 3   If z =(sin x)² then dz/dx =(2 sin x)(cos x). Here y = sin x is inside. In this order, z = y² leads to dz/dy = 2y. It does not lead to 2x. The inside function sin x produces dy/dx = cos x. The answer is 2y cos x. We have not yet found the function whose derivative is 2x cos x.

EXAMPLE 4   The derivative of z = sin 3x is dz/dx = (dz/dy)(dy/dx) = 3 cos 3x.

Fig.4.2 The chain rule: Δz/Δx = (Δz/Δy)(Δy/Δx) approaches dz/dx = (dz/dy)(dy/dx).

The outside function is z = sin y. The inside function is y = 3x. Then dzldy = cos y —this is cos 3x, not cos x. Remember the other factor dy/dx = 3.

I can explain that factor 3, especially if x is switched to t. The distance is z = sin 3t. That oscillates like sin t except three times as fast. The speeded-up function sin 3t completes a wave at time 2π/3 (instead of 2π). Naturally the velocity contains the extra factor 3 from the chain rule.

EXAMPLE 5   Let z = ƒ(y) = yn. Find the derivative of ƒ(g(x)) = [g(x)]n. In this case dz/dy is nyn-1. The chain rule multiplies by dy/dx:
dz/dx = nyn-1(dy/dx)   or   d/dx[g(x)]n = n[g(x)]n-1(dg/dx).   (5)

This is the power rule! It was already discovered in Section 2.5. Square roots (when n = 1/2) are frequent and important. Suppose y = x² − 1:
d/dx√x² − 1 = ½(x² − 1)(2x) = x / √x² − 1.   (6)

Question   A Buick uses 1/20 of a gallon of gas per mile. You drive at 60 miles per hour. How many gallons per hour?
Answer   (Gallons/hour) = (gallons/mile) (mileslhour). The chain rule is (d y/d t) = (dy/dx)(dx/dt). The answer is (1/20)(60) = 3 gallons/hour.

Proof of the chain rule The discussion above was correctly based on
Δz/Δx = (Δz/Δy)(Δy/Δx)   and   dz/dx = (dz/dy)(dy/dx).   (7)

It was here, over the chain rule, that the "battle of notation" was won by Leibniz. His notation practically tells you what to do: Take the limit of each term. (I have to mention that when Δx is approaching zero, it is theoretically possible that Δy might hit zero. If that happens, Δz/Δy becomes 0/0. We have to assign it the correct meaning, which is dzldy.) As Δx → 0,
Δy/Δx → g'(x)   and   Δz/Δy → ƒ'(y) = ƒ'(g(x)).

Then AzlAx approaches ƒ'(y) times g'(x), which is the chain rule (dz/dy)(dy/dx). In the table below, the derivative of (sin x)³ cos x. That extra factor cos x is easy is 3(sin x)² to forget. It is even easier to forget the -1 in the last example.

z = (x³ + 1)⁵dz/dx = 5(x³ + 1)⁴times 3x²
z = (sin x)³dz/dx = 3 sin² xtimes cos x
z = (1 − x)²dz/dx = 2(1 − x)times -1

Important   All kinds of letters are used for the chain rule. We named the output z. Very often it is called y, and the inside function is called u:

The derivative of y = sin u(x) is dy/dx = cos u(du/dx).

Examples with du/dx are extremely common. I have to ask you to accept whatever letters may come. What never changes is the key idea—derivative of outside function times derivative of inside function.

EXAMPLE 6   The chain rule is barely needed for sin(x − 1). Strictly speaking the inside function is u = x − 1. Then du/dx is just 1 (not -1). If y = sin(x − 1) then dy/dx = cos(x − 1). The graph is shifted and the slope shifts too.

Notice especially: The cosine is computed at x − 1 and not at the unshifted x.

RECOGNIZING ƒ(y) AND g(x)

A big part of the chain rule is recognizing the chain. The table started with (x³ + 1)⁵. You look at it for a second. Then you see it as u⁵. The inside function is u = x³ + 1. With practice this decomposition (the opposite of composition) gets easy:

cos (2x + 1) is cos u   √1 + sin t is √u   x sin x is … (product rule!)

In calculations, the careful way is to write down all the functions:

z = cos u   u = 2x + 1   dz/dx = (-sin u)(2) = -2 sin (2x + 1).

The quick way is to keep in your mind "the derivative of what's inside." The slope of cos(2x + 1) is -sin(2x + 1), times 2 from the chain rule. The derivative of 2x + 1 is remembered—without z or u or ƒ or g.

EXAMPLE 7   sin √1 − x is a chain of z = sin y, y = √u, u = 1 − x (three functions). With that triple chain you will have the hang of the chain rule:

The derivative of sin √1 − x is (cos √1 − x)(1 / (2√1 − x))(-1).

This is (dz/dy)(dy/du)(du/dx). Evaluate them at the right places y, u, x.

Finally there is the question of second derivatives. The chain rule gives dzldx as a product, so d²z/dx² needs the product rule:

dz=dzdy  leads to  d²z=dzd²y+ddzdy.   (8)
dxdydxdx²dydx²dxdydx

That last term needs the chain rule again. It becomes d²z/dy² times (dy/dx)².

EXAMPLE 8   The derivative of sin x² is 2x cos x². Then the product rule gives d²z/dx² = 2 cos x² − 4x² sin x². In this case y'' = 2 and (y')² = 4x².