Kittttttan*s Web

日本語 | English

Calculus

The Chain Rule

4.3 Inverse Functions and Their Derivatives

There is a remarkable special case of the chain rule. It occurs when ƒ(y) and g(x) are "inverse functions." That idea is expressed by a very short and powerful equation: ƒ(g(x)) = x. Here is what that means.

Inverse functions: Start with any input, say x = 5. Compute y = g(x), say y = 3. Then compute ƒ(y), and the answer must be 5. What one function does, the inverse function undoes. If g(5) = 3 then ƒ(3) = 5. The inverse function ftakes the output y back to the input x.

EXAMPLE 1   g(x) = x − 2 and ƒ(y) = y + 2 are inverse functions. Starting with x = 5, the function g subtracts 2. That produces y = 3. Then the function ƒ adds 2. That brings back x = 5. To say it directly: The inverse of y = x − 2 is x = y + 2.

EXAMPLE 2   y = g(x) = (5/9)(x − 32) and x = ƒ(y) = (9/5)y + 32 are inverse functions (for temperature). Here x is degrees Fahrenheit and y is degrees Celsius. From x = 32 (freezing in Fahrenheit) you find y = 0 (freezing in Celsius). The inverse function takes y = 0 back to x = 32. Figure 4.4 shows how x = 50°F matches y = 10°C.

Notice that (5/9)(x − 32) subtracts 32 first. The inverse (9/5)y + 32 adds 32 last. In the same way g multiplies last by 5/9 while f multiplies first by 9/5.

Fig.4.4 °F to °C to °F. Always g⁻¹(g(x)) = x and g(g⁻¹(y)) = y. If ƒ = g⁻¹ then g = ƒ⁻¹.

The inverse function is written ƒ = g⁻¹ and pronounced "g inverse." It is not 1/g(x).

If the demand y is a function of the price x, then the price is a function of the demand. Those are inverse functions. Their derivatives obey a fundamental rule: dyldx times dx/dy equals 1. In Example 2, dyldx is 5/9 and dxldy is 9/5.

There is another important point. When f and g are applied in the opposite order, they still come back to the start. First ƒ adds 2, then g subtracts 2. The chain g(ƒ(y)) = (y + 2) − 2 brings back y. If ƒ is the inverse of g then g is the inverse of ƒ. The relation is completely symmetric, and so is the definition:
Inverse function: If y = g(x) then x = g⁻¹(y). If x = g⁻¹(y) then y = g(x).

The loop in the figure goes from x to y to x. The composition g⁻¹(g(x)) is the "identity function." Instead of a new point z it returns to the original x. This will make the chain rule particularly easy—leading to (dy/dx)(dx/dy) = 1.

EXAMPLE 3   y = g(x) = √x and x = ƒ(y) = y² are inverse functions.

Starting from x = 9 we find y = 3. The inverse gives 3² = 9. The square of √x is ƒ(g(x)) = x. In the opposite direction, the square root of y² is g(ƒ(y)) = y.

Caution   That example does not allow x to be negative. The domain of g—the set of numbers with square roots—is restricted to x ≥ 0. This matches the range of g⁻¹. The outputs y² are nonnegative. With domain of g = range of g⁻¹, the equation x = (√x)² is possible and true. The nonnegative x goes into g and comes out of g⁻¹.

In this example y is also nonnegative. You might think we could square anything, but y must come back as the square root of y². So y ≥ 0.

To summarize: The domain of a function matches the range of its inverse. The inputs to g⁻¹ are the outputs from g. The inputs to g are the outputs from g⁻¹.

If g(x) = y then solving that equation for x gives x = g⁻¹(y):
if y = 3x − 6   then   x = (1/3)(y + 6)   (this is g⁻¹(y))
if y = x³ + 1   then   x = ³√y − 1   (this is g⁻¹(y))

In practice that is how g⁻¹ is computed: Solve g(x) = y. This is the reason inverses are important. Every time we solve an equation we are computing a value of g⁻¹.

Not all equations have one solution. Not all functions have inverses. For each y, the equation g(x) = y is only allowed to produce one x. That solution is x = g⁻¹(y). If there is a second solution, then g⁻¹ will not be a function—because a function cannot produce two x's from the same y.

EXAMPLE 4   There is more than one solution to sin x = ½. Many angles have the same sine. On the interval 0 ≤ x ≤ π, the inverse of y = sin x is not a function. Figure 4.5 shows how two x's give the same y.

Prevent x from passing π/2 and the sine has an inverse. Write x = sin⁻¹ y.

The function g has no inverse if two points x1 and x2 give g(x1) = g(x2). Its inverse would have to bring the same y back to x1 and x2. No function can do that; g⁻¹(y) cannot equal both x1 and x2. There must be only one x for each y.

To be invertible over an interval, g must be steadily increasing or steadily decreasing.

Fig.4.5 Inverse exists (one x for each y). No inverse function (two x's for one y).

THE DERWNE OF g⁻¹

It is time for calculus. Forgive me for this very humble example.

EXAMPLE 5 (ordinary multiplication) The inverse of y = g(x) = 3x is x = ƒ(y) = (1/3)y. This shows with special clarity the rule for derivatives: The slopes dy/dx = 3 and dx/dy = 5 multiply to give 1. This rule holds for all inverse functions, even if their slopes are not constant. It is a crucial application of the chain rule to the derivative of ƒ(g(x)) = x.

4C (Derivative of inverse function)   From ƒ(g(x)) = x the chain rule gives ƒ'(g(x))g'(x) = 1. Writing y = g(x) and x = ƒ(y), this rule looks better:
(dx/dy)(dy/dx) = 1   or   dx/dy = 1 / (dy/dx).   (1)
The slope of x = g⁻¹(y) times the slope of y = g(x) equals one.

This is the chain rule with a special feature. Since ƒ(g(x)) = x, the derivative of both sides is 1. If we know g' we now know ƒ'. That rule will be tested on a familiar example. In the next section it leads to totally new derivatives.

EXAMPLE 6   The inverse of y = x³ is x = y1/3. We can find dx/dy two ways:

directly: dx = 1y-2/3   indirectly: dx = 1 = 1 = 1
dy3dydy/dx3x²3y2/3

The equation (dx/dy)(dy/dx) = 1 is not ordinary algebra, but it is true. Those derivatives are limits of fractions. The fractions are (Δx/Δy)(Δy/Δx) = 1 and we let Δx → 0.

Fig.4.6 Graphs of inverse functions: x = (1/3)y is the mirror image of y = 3x.

Before going to new functions, I want to draw graphs. Figure 4.6 shows y = √x and y = 3x. What is s ecial is that the same graphs also show the inverse functions. The inverse of y = √x is x = y². The pair x = 4, y = 2 is the same for both. That is the whole point of inverse functions—if 2 = g(4) then 4 = g⁻¹(2). Notice that the graphs go steadily up.

The only problem is, the graph of x = g⁻¹(y) is on its side. To change the slope from 3 to 1/3, you would have to turn the figure. After that turn there is another problem—the axes don't point to the right and up. You also have to look in a mirror! (The typesetter refused to print the letters backward. He thinks it's crazy but it's not.) To keep the book in position, and the typesetter in position, we need a better idea.

The graph of x = (1/3)y comes from turning the picture across the 45° line. The y axis becomes horizontal and x goes upward. The point (2,6) on the line y = 3x goes into the point (6,2) on the line x = (1/3)y. The eyes see a reflection across the 45° line (Figure 4.6c). The mathematics sees the same pairs x and y. The special properties of g and g⁻¹ allow us to know two functions—and draw two graphs—at the same time.* The graph of x = g⁻¹(y) is the mirror image of the graph of y = g(x).

EXPONENTIALS AND LOGARITHMS

I would like to add two more examples of inverse functions, because they are so important. Both examples involve the exponential and the logarithm. One is made up of linear pieces that imitate 2x; it appeared in Chapter 1. The other is the true function 2x, which is not yet defined—and it' is not going to be defined here. The functions bx and logby are so overwhelmingly important that they deserve and will get a whole chapter of the book (at least). But you have to see the graphs.

The slopes in the linear model are powers of 2. So are the heights y at the start of each piece. The slopes 1, 2, 4, … equal the heights 1, 2, 4, … at those special points.

The inverse is a discrete model for the logarithm (to base 2). The logarithm of 1 is 0,because 2⁰ = 1. The logarithm of 2 is 1, because 2ⁱ = 2. The logarithm of 2j is the exponent j. Thus the model gives the correct x = log2 y at the breakpoints y = 1, 2, 4, 8, …. The slopes are 1, ½, ¼, 1/8, … because dx/dy = 1/(dy/dx).

The model is good, but the real thing is better. The figure on the right shows the true exponential y = 2x. At x = 0, 1, 2, … the heights y are the same as before. But now the height at x = ½ is the number 21/2, which is √2.The height at x = .10 is the tenth root 21/10 = 1.07…. The slope at x = 0 is no longer 1 —it is closer to Δy/Δx = .07/. 10. The exact slope is a number c (near .7) that we are not yet prepared to reveal.

The special property of y = 2x is that the slope at all points is cy. The slope is proportional to the function. The exponential solves dy/dx = cy.

Now look at the inverse function—the logarithm. Its graph is the mirror image:
If y = 2x then x = log2 y. If 21/10 ≈ 1.07 then log2 1.07 ≈ 1/10.

What the exponential does, the logarithm undoes—and vice versa. The logarithm of 2x is the exponent x. Since the exponential starts with slope c, the logarithm must start with slope 1/c. Check that numerically. The logarithm of 1.07 is near 1/10. The slope is near .10/.07. The beautiful property is that dx/dy = 1/cy.

Fig.4.7 Piecewise linear models and smooth curves: y = 2^x and x = log_2 y. Base b = 2.

I have to mention that calculus avoids logarithms to base 2. The reason lies in that mysterious number c. It is the "natural logarithm" of 2, which is .693147…—and who wants that? Also 1/.693 147… enters the slope of log2 y. Then (dx/dy)(dy/dx) = 1. The right choice is to use "natural logarithms" throughout. In place of 2, they are based on the special number e:

y = ex is the inverse of x = ln y.   (2)

The derivatives of those functions are sensational—they are saved for Chapter 6. Together with xn and sin x and cos x, they are the backbone of calculus.

Note It is almost possible to go directly to Chapter 6. The inverse functions x = sin⁻¹ y and x = tan⁻¹ y can be done quickly. The reason for including integrals first (Chapter 5) is that they solve differential equations with no guesswork:

dy/dx or dx/dy = 1/y leads to ∫dx = ∫dy/y or x = ln y + C.

Integrals have applications of all kinds, spread through the rest of the book. But do not lose sight of 2x and ex. They solve dy/dx = cy —the key to applied calculus.

THE INVERSE OF A CHAIN h(g(x))

The functions g(x) = x − 2 and h(y) = 3y were easy to invert. For g⁻¹ we added 2, and for h⁻¹ we divided by 3. Now the question is: If we create the composite function z = h(g(x)), or z = 3(x − 2), what is its inverse?

Virtually all known functions are created in this way, from chains of simpler functions. The problem is to invert a chain using the inverse of each piece. The answer is one of the fundamental rules of mathematics:

4D   The inverse of z = h(g(x))is a chain of inverses in the opposite order:
x = g⁻¹(h⁻¹(z)).   (3)
h⁻¹ is applied first because h was applied last: g⁻¹(h⁻¹(h(g(x)))) = x.

That last equation looks like a mess, but it holds the key. In the middle you see h⁻¹ and h. That part of the chain does nothing! The inverse functions cancel, to leave g⁻¹(g(x)). But that is x. The whole chain collapses, when g⁻¹ and h⁻¹ are in the correct order—which is opposite to the order of h(g(x)).

EXAMPLE 7   z = h(g(x)) = 3(x − 2) and x = g⁻¹(h⁻¹(z)) = (1/3)z + 2.

First h⁻¹ divides by 3. Then g⁻¹ adds 2. The inverse of h⋅g is g⁻¹⋅h⁻¹. It can be found directly by solving z = 3(x − 2). A chain of inverses is like writing in prose—we do it without knowing it.

EXAMPLE 8   Invert z = √x − 2 by writing z² = x − 2 and then x = z² + 2.

The inverse adds 2 and takes the square—but not in that order. That would give (z + 2)², which is wrong. The correct order is z² + 2.

The domains and ranges are explained by Figure 4.8. We start with x ≥ 2. Subtracting 2 gives y ≥ 0. Taking the square root gives z ≥ 0. Taking the square brings back y ≥ 0. Adding 2 brings back x ≥ 2 —which is in the original domain of g.

Fig.4.8 The chain g⁻¹(h⁻¹(h(g(x)))) = x is one-to-one at every step.

EXAMPLE 9   Inverse matrices (AB)⁻¹ = B⁻¹A⁻¹ (this linear algebra is optional).

Suppose a vector x is multiplied by a square matrix B: y = g(x) = Bx. The inverse function multiplies by the inverse matrix: x = g⁻¹(y) = B⁻¹y. It is like multiplication by B = 3 and B⁻¹ = 1/3, except that x and y are vectors.

Now suppose a second function multiplies by another matrix A: z = h(g(x)) = ABx. The problem is to recover x from z. The first step is to invert A, because that came last: Bx = A⁻¹z. Then the second step multiplies by B⁻¹ and brings back x = B⁻¹A⁻¹z. The product B⁻¹A⁻¹ inverts the product AB. The rule for matrix inverses is like the rule for function inverses—in fact it is a special case.

I had better not wander too far from calculus. The next section introduces the inverses of the sine and cosine and tangent, and finds their derivatives. Remember that the ultimate source is the chain rule.