This section has two main goals. One is to find the derivatives of ƒ(x)= x³ and x⁴ and x⁵ (and more generally ƒ(x)= xn). The power or exponent n is at first a positive integer. Later we allow xπ and x2.2 and every xn.
The other goal is different. While computing these derivatives, we look ahead to their applications. In using calculus, we meet equations with derivatives in them—"diflerential equations." It is too early to solve those equations. But it is not too early to see the purpose of what we are doing. Our examples come from economics and biology.
With n = 2, the derivative of x² is 2x. With n = -1, the slope of x-1 is -1x-2. Those are two pieces in a beautiful pattern, which it will be a pleasure to discover. We begin with x³ and its derivative 3x², before jumping to xn.
EXAMPLE 1 If ƒ(t) = x³ then Δƒ = (x+h)³ &minus x³ = (x³ + 3x²h + 3xh² + h³) − x³.
Step 1: Cancel x³. Step 2: Divide by h. Step 3: h goes to zero.
Δƒ ⁄ h = 3x² + 3xh + h² approaches dƒ ⁄ dx =3x².
That is straightforward, and you see the crucial step. The power (x + h)³ yields four separate terms x³ + 3x²h + 3xh² + h³. (Notice 1, 3, 3, 1.) After x³ is subtracted, we can divide by h. At the limit (h = 0) we have 3x².
For ƒ(x) = xn the plan is the same. A step of size h leads to ƒ(x + h) = (x + h)n. One reason for algebra is to calculate powers like (x + h)n, and if you have forgotten the binomial formula we can recapture its main point. Start with n = 4:
(x + h)(x + h)(x + h)(x + h) = x⁴ + ??? + h⁴ (1)
Multiplying the four x's gives x⁴. Multiplying the four h's gives h⁴. These are the easy terms, but not the crucial ones. The subtraction (x + h)⁴ − x⁴ will remove x⁴, and the limiting step h → 0 will wipe out h⁴ (even after division by h). The products that matter are those with exactly one h. In Example 1 with (x + h)³, this key term was 3x² h. Division by h left 3x².
With only one h, there are n places it can come from. Equation (1) has four h's in parentheses, and four ways to produce x³h. Therefore the key term is 4x³h. (Division by h leaves 4x³.) In general there are n parentheses and n ways to produce xn-1h, so the binomial formula contains nxn-1h:
(x + h)n = xn + nxn-1h + … + hn. (2)
2B For n = 1, 2, 3, 4, …, the derivative of xn is nxn-1.
Subtract xn from (2). Divide by h. The key term is nxn-1. The rest disappears as h → 0:
Δƒ ⁄ Δx = ((x + h)n − xn) ⁄ h = (nxn-1 + … + hn) ⁄ h so dƒ ⁄ dx = nxn-1.
The terms replaced by the dots involve h² and h³ and higher powers. After dividing by h, they still have at least one factor h. All those terms vanish as h approaches zero.
EXAMPLE 2 (x + h)⁴ = x⁴ + 4x³h + 6x²h² + 4xh³ + h⁴. This is n = 4 in detail.
Subtract x⁴, divide by h, let h &rar; 0. The derivative is 4x³. The coefficients 1, 4, 6, 4, 1 are in Pascal's triangle below. For (x + h)⁵ the next row is 1, 5, 10, ?.
Remark The missing terms in the binomial formula (replaced by the dots) contain all the products xn-jhj. An x or an h comes from each parenthesis. The binomial coefficient "n choose j" is the number of ways to choose j h's out of n parentheses. It involves n factorial, which is n(n − 1) ... (1). Thus 5! = 5·4·3·2·1 = 120.
These are numbers that gamblers know and love:
In the last row, the coefficient of x³h is 4! / (1!3!) = 4 * 3 * 2 * 1 / 1 * 3 * 2 - 1 = 4. For the x²h² term, with j = 2, there are 4·3·2·1 / 2·1·2·1 = 6 ways to choose two h's. Notice that 1 + 4 + 6 + 4 + 1 equals 16, which is 2⁴. Each row of Pascal's triangle adds to a power of 2.
Choosing 6 numbers out of 49 in a lottery, the odds are 49·48·47·46·45·44 / 6! to 1. That number is N = "49 choose 6" = 13,983,816. It is the coefficient of x43h⁶ in (x + h)49. If λ times N tickets are bought, the expected number of winners is λ. The chance of no winner is e-λ. The chance of one winner is λe-λ. See Section 8.4.
Florida's lottery in September 1990 (these rules) had six winners out of 109,163,978 tickets.
Now we have an infinite list of functions and their derivatives:
x x² x³ x⁴ x⁵ … 1 2x 3x² 4x³ 5x⁴ …
The derivative of xn is n times the next lower power xn-1. That rule extends beyond these integers 1, 2, 3, 4, 5 to all powers:
ƒ = 1/x has ƒ' = -1/x²: Example 3 of Section 2.1 (n = -1)
ƒ = 1/x² has ƒ' = -2/x³: Example 6 of Section 2.1 (n = -2)
ƒ = √x has ƒ' = ½x-½: true but not yet checked (n = ½)
Remember that x-2 means 1/x² and x-½ means 1/√x. Negative powers lead to decreasing functions, approaching zero as x gets large. Their slopes have minus signs.
Question What are the derivatives of x10 and x2.2 and x-½?
Answer 10x⁹ and 2.2x1.2 and -½x-3/2. Maybe (x + h)2.2 is a little unusual. Pascal's triangle can't deal with this fractional power, but the formula stays firm: After x2.2 comes 2.2x1.2h. The complete binomial formula is in Section 10.5.
That list is a good start, but plenty of functions are left. What comes next is really simple. A tremendous number of new functions are "linear combinations" like
ƒ(x) = 6x³ or 6x³ + ½x² or 6x³ − ½x².
What are their derivatives? The answers are known for x³ and x², and we want to multiply by 6 or divide by 2 or add or subtract. Do the same to the derivatices:
ƒ'(x) = 18x² or 18x² + x or 18x² − x
2C The derivative of c times ƒ(x) is c times ƒ'(x).
2D The derivative of ƒ(x) + g(x) is ƒ'(x) + gƒ(x).
The number c can be any constant. We can add (or subtract) any functions. The rules allow any combination of f and g: The derivative of 9ƒ(x) − 7g(x) is 9ƒ'(x) − 7g'(x).
The reasoning is direct. When ƒ(x) is multiplied by c, so is ƒ(x + h). The difference Δƒ is also multiplied by c. All averages Δƒ/h contain c, so their limit is cf'. The only incomplete step is the last one (the limit). We still have to say what "limit" means.
Rule 2D is similar. Adding ƒ + g means adding Δƒ + Δg. Now divide by h. In the limit as h → 0 we reach ƒ' + g' —because a limit of sums is a sum of limits. Any example is easy and so is the proof—it is the definition of limit that needs care (Section 2.6).
You can now find the derivative of every polynomial. A "polynomial" is a combination of 1, x, x², … ,xn —for example 9 + 2x − x⁵. That particular polynomial has slope 2 − 5x⁴. Note that the derivative of 9 is zero! A constant just raises or lowers the graph, without changing its slope. It alters the mileage before starting the car.
The disappearance of constants is one of the nice things in differential calculus. The reappearance of those constants is one of the headaches in integral calculus. When you find v from ƒ, the starting mileage doesn't matter. The constant in ƒ has no effect on v. (Δƒ is measured by a trip meter; Δt comes from a stopwatch.) To find distance from velocity, you need to know the mileage at the start.
We know that y = x³ has the derivative dy/dx = 3x². Starting with the function, we found its slope. Now reverse that process. Start with the slope andfind the function. This is what science does all the time—and it seems only reasonable to say so.
Begin with dy/dx = 3x². The slope is given, the function y is not given.
Question Can you go backward to reach y = x³?
Answer Almost but not quite. You are only entitled to say that y = x3 + C. The constant C is the starting value of y (when x = 0). Then the dzrerential equation dyldx = 3x2 is solved.
Every time you find a derivative, you can go backward to solve a differential equation. The function y = x² + x has the slope dy/dx = 2x + 1. In reverse, the slope 2x + 1 produces x² + x —and all the other functions x² + x + C, shifted up and down. After going from distance ƒ to velocity v, we return to ƒ + C. But there is a lot more to differential equations. Here are two crucial points:
To summarize: Chapters 2-4 compute and use derivatives. Chapter 5 goes in reverse. Integral calculus discovers the function from its slope. Given dy/dx we find y(x). Then Chapter 6 solves the differential equation dy/dt = y, function mixed with slope. Calculus moves from derivatives to integrals to diferential equations.
This discussion of the purpose of calculus should mention a sp~cific example. Differential equations are applied to an epidemic (like AIDS). In most epidemics the number of cases grows exponentially. The peak is quickly reached by e, and the epidemic dies down. Amazingly, exponential growth is not happening witb AIDS—the best fit to the data through 1988 is a cubic polynomial (Los Alamos Sciehce, 1989):
The number of cases fits a cubic within 2%: y = 174.6(t − 1981.2)³ + 340.
This is dramatically different from other epidemics. Instead of dy/dt = y we have dy/dt = 3y/t. Before this book is printed, we may know what has been preventing d (fortunately). Eventually the curve will turn away from a cubic—I hope that mathematical models will lead to knowledge that saves lives.
Added in proof: In 1989 the curve for the U.S. dropped from t³ to t².
First point about economics: The marginal cost and marginal income are crucially important. The average cost of making automobiles may be $10,000. But it is the $8000 cost of the next car that decides whether Ford makes it. "The average describes the past, the marginal predicts the future." For bank deposits or work hours or wheat, which come in smaller units, the amounts are continuous variables. Then the word "marginal" says one thing: Take the derivative*.
The average pay over all the hours we ever worked may be low. We wouldn't work another hour for that! This average is rising, but the pay for each additional hour rises faster—possibly it jumps. When $10/hour increases to $15/hour after a 40-hour week, a 50-hour week pays $550. The average income is $ll/hour. The marginal income is $15/hour—the overtime rate.
Concentrate next on cost. Let y(x) be the cost of producing x tons of steel. The cost of x + Δx tons is y(x + Δx). The extra cost is the difference Δy. Divide by Δx, the number of extra tons. The ratio Δy/Δx is the average cost per extra ton. When Δx is an ounce instead of a ton, we are near the marginal cost dy/dx.
Example: When the cost is x², the average cost is x²/x = x. The marginal cost is 2x. Figure 2.4 has increasing slope—an example of "diminishing returns to scale."
[Fig. 2.4]
This raises another point about economics. The units are arbitrary. In yen per kilogram the numbers look different. The way to correct for arbitrary units is to work with percentage change or relative change. An increase of Δx tons is a relative increase of Δx/x. A cost increase Δy is a relative increase of Δy/y. Those are dimensionless, the same in tons/tons or dollars/dollars or yen/yen.
A third example is the demand y at price x. Now dyldx is negative. But again the units are arbitrary. The demand is in liters or gallons, the price is in dollars or pesos. Relative changes are better. When the price goes up by 10%, the demand may drop by 5%. If that ratio stays the same for small increases, the elasticity of demand is ½.
Actually this number should be -½. The price rose, the demand dropped. In our definition, the elasticity will be -½. In conversation between economists the minus sign is left out (I hope not forgotten).
DEFINITION The elasticity of the demand function y(x) is
E(x) = limΔx→0 (Δy/y) ⁄ (Δx/x) = (dy/dx) ⁄ (y/x). (3)
Elasticity is "marginal" divided by "average." E(x) is also relative change in y divided by relative change in x. Sometimes E(x)is the same at all prices—this important case is discussed below.
EXAMPLE 1 Suppose the demand is y = c/x when the price is x. The derivative dy/dx = -c/x² comes from calculus. The division y/x = c/x² is only algebra. The ratio is E= -1:
For the demand y = c/x, the elasticity is (-c/x²) ⁄ (c/x²) = -1.
All demand curves are compared with this one. The demand is inelastic when |E| < 1. It is elastic when |E| > 1. The demand 20/√x is inelastic (E = -½), while x-3 is elastic (E = -3). The power y = cxn, whose derivative we know, is the function with constant elasticity n:
if y = cxn then dy/dx = cnxn-1 and E = cnxn-1 ⁄ (cxn/x) = n.
It is because y = cxn sets the standard that we could come so early to economics.
In the special case when y = c/x, consumers spend the same at all prices. Price x times quantity y remains constant at xy = c.
EXAMPLE 2 The supply curve has E > 0 —supply increases with price. Now the baseline case is y = cx. The slope is c and the average is y/x = c. The elasticity is E = c/c = 1.
Compare E = 1 with E = 0 and E = ∞. A constant supply is "perfectly inelastic." The power n is zero and the slope is zero: y = c. No more is available when the harvest is over. Whatever the price, the farmer cannot suddenly grow more wheat. Lack of elasticity makes farm economics difficult.
The other extreme E = ∞ is "perfectly elastic." The supply is unlimited at a fixed price x. Once this seemed true of water and timber. In reality the steep curve x = constant is leveling off to a flat curve y = constant. Fixed price is changing to fixed supply, E = ∞ is becoming E = 0, and the supply of water follows a "gamma curve" shaped like Γ.
EXAMPLE 3 Demand is an increasing function of income—more income, more demand. The income elasticity is E(I)= (dy/dI) ⁄ (y/I).A luxury has E > 1 (elastic). Doubling your income more than doubles the demand for caviar. A necessity has E < 1 (inelastic). The demand for bread does not double. Please recognize how the central ideas of calculus provide a language for the central ideas of economics.
Important note on supply = demand This is the basic equation of microeconomics. Where the supply curve meets the demand curve, the economy finds the equilibrium price. Supply = demand assumes perfect competition. With many suppliers, no one can raise the price. If someone tries, the customers go elsewhere.
The opposite case is a monopoly—no competition. Instead of many small producers of wheat, there is one producer of electricity. An airport is a monopolist (and maybe the National Football League). If the price is raised, some demand remains.
Price fixing occurs when several producers act like a monopoly—which antitrust laws try to prevent. The price is not set by supply = demand. The calculus problem is different—to maximize profit. Section 3.2 locates the maximum where the marginal profit (the slope!) is zero.
Question on income elasticity From an income of $10,000 you save $500. The income elasticity of savings is E = 2. Out of the next dollar what fraction do you save?
Answer The savings is y = cx² because E = 2. The number c must give 500 = c(10,000)², so c is 5·10-6. Then the slope dy/dx is 2cx = 10·10-6·10⁴ = 1/10. This is the marginal savings, ten cents on the dollar. Average savings is 5%, marginal savings is 10%, and E = 2.