Difference between revisions of "Chain rule for differentiation"
(→Significance) |
(→Nontrivial examples) |
||
(36 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | {{perspectives}} | ||
{{differentiation rule}} | {{differentiation rule}} | ||
Line 23: | Line 24: | ||
A one-sided version of sorts holds, but we need to be careful, since we want the direction of differentiability of <math>f</math> to be the same as the direction of approach of <math>g(x)</math> to <math>g(x_0)</math>. The following are true: | A one-sided version of sorts holds, but we need to be careful, since we want the direction of differentiability of <math>f</math> to be the same as the direction of approach of <math>g(x)</math> to <math>g(x_0)</math>. The following are true: | ||
− | + | {| class="sortable" border="1" | |
− | + | ! Condition on <math>g</math> at <math>x_0</math> !! Condition on <math>f</math> at <math>g(x_0)</math> !! Conclusion | |
− | + | |- | |
− | + | | left differentiable at <math>x_0</math>|| differentiable at <math>g(x_0)</math> || The left hand derivative of <math>f \circ g</math> at <math>x_0</math> is <math>f'(g(x_0))</math> times the left hand derivative of <math>g</math> at <math>x_0</math>. | |
− | + | |- | |
− | + | | right differentiable at <math>x_0</math> || differentiable at <math>g(x_0)</math>|| The right hand derivative of <math>f \circ g</math> at <math>x_0</math> is <math>f'(g(x_0))</math> times the right hand derivative of <math>g</math> at <math>x_0</math>. | |
+ | |- | ||
+ | | left differentiable at <math>x_0</math>, and increasing for <math>x</math> on the immediate left of <math>x_0</math> || left differentiable at <math>g(x_0)</math> || the left hand derivative is the left hand derivative of <math>f</math> at <math>g(x_0)</math> times the left hand derivative of <math>g</math> at <math>x_0</math>. | ||
+ | |- | ||
+ | | right differentiable at <math>x_0</math>, and increasing for <math>x</math> on the immediate right of <math>x_0</math> || right differentiable at <math>g(x_0)</math> || the right hand derivative is the right hand derivative of <math>f</math> at <math>g(x_0)</math> times the left hand derivative of <math>g</math> at <math>x_0</math>. | ||
+ | |- | ||
+ | | left differentiable at <math>x_0</math>, and decreasing for <math>x</math> on the immediate left of <math>x_0</math> || right differentiable at <math>g(x_0)</math> || the left hand derivative is the right hand derivative of <math>f</math> at <math>g(x_0)</math> times the left hand derivative of <math>g</math> at <math>x_0</math>. | ||
+ | |- | ||
+ | | right differentiable at <math>x_0</math>, and decreasing for <math>x</math> on the immediate right of <math>x_0</math> || left differentiable at <math>g(x_0)</math> || the right hand derivative is the left hand derivative of <math>f</math> at <math>g(x_0)</math> times the left hand derivative of <math>g</math> at <math>x_0</math>. | ||
+ | |} | ||
==Statement for multiple functions== | ==Statement for multiple functions== | ||
Line 45: | Line 55: | ||
==Related rules== | ==Related rules== | ||
+ | |||
+ | ===Similar facts in single variable calculus=== | ||
* [[Chain rule for higher derivatives]] | * [[Chain rule for higher derivatives]] | ||
Line 51: | Line 63: | ||
* [[Differentiation is linear]] | * [[Differentiation is linear]] | ||
* [[Inverse function theorem]] (gives formula for derivative of inverse function). | * [[Inverse function theorem]] (gives formula for derivative of inverse function). | ||
+ | * [[Chain rule for differentiation of formal power series]] | ||
+ | |||
+ | ===Similar facts in multivariable calculus=== | ||
+ | |||
+ | * [[Chain rule for partial differentiation]] | ||
+ | |||
+ | ==Reversal for integration== | ||
+ | |||
+ | If a function is differentiated using the chain rule, then retrieving the original function from the derivative typically requires a method of integration called [[integration by substitution]]. Specifically, that method of integration targets expressions of the form: | ||
+ | |||
+ | <math>\int h(g(x))g'(x) \, dx</math> | ||
+ | |||
+ | The <math>u</math>-substitution idea is to set <math>u = g(x)</math> and obtain: | ||
+ | |||
+ | <math>\int h(u) \, du</math> | ||
+ | |||
+ | We now need to find a function <math>f</math> such that <math>f' = h</math>. The integral is <math>f(u) + C</math>. Plugging back <math>u = g(x)</math>, we obtain that the indefinite integral is <math>f(g(x)) + C</math>. | ||
==Significance== | ==Significance== | ||
+ | |||
+ | ===Why more naive chain rules don't make sense=== | ||
+ | |||
+ | There are two naive versions of the chain rule one might come up with, neither of which holds: | ||
+ | |||
+ | <math>(f \circ g)'(x) = f'(g'(x))</math> | ||
+ | |||
+ | and | ||
+ | |||
+ | <math>(f \circ g)'(x) = f'(x)g'(x)</math> | ||
+ | |||
+ | Even without doing any mathematics, we can deduce that neither of these rules can be correct. How? Any rule that holds generically ''must'' involve evaluating <math>f</math> or <math>f'</math> only at points that we ''know'' to be in the domain of <math>f</math>. The only such point in this context is <math>g(x)</math>. Therefore, the chain rule ''cannot'' involve evaluating <math>f</math> or <math>f'</math> at any point other than <math>g(x)</math>. | ||
+ | |||
+ | Note that our actual chain rule: | ||
+ | |||
+ | <math>(f \circ g)'(x) = f'(g(x))g'(x)</math> | ||
+ | |||
+ | is quite similar to the naive but false rule <math>\! (f \circ g)'(x) = f'(x)g'(x)</math>, and can be viewed as the corrected version of the rule once we account for the fact that <math>f'</math> can only be calculated after transforming <math>x</math> to <math>g(x)</math>. | ||
===Qualitative and existential significance=== | ===Qualitative and existential significance=== | ||
Line 77: | Line 124: | ||
| specific point, named functions || If we know the values (''in the sense of numerical values'') <math>g'(x_0)</math> and <math>f'(g(x_0))</math>, we can use these to compute <math>(f \circ g)'(x_0)</math>. | | specific point, named functions || If we know the values (''in the sense of numerical values'') <math>g'(x_0)</math> and <math>f'(g(x_0))</math>, we can use these to compute <math>(f \circ g)'(x_0)</math>. | ||
|- | |- | ||
− | | generic point, named functions || This tells us that knowledge of the ''general expressions'' for the derivatives of <math>f</math> and <math>g</math> (along with expressions for the functions themselves) allows us to compute the general expression for the derivative of <math>f \circ g</math>. | + | | generic point, named functions || This tells us that knowledge of the ''general expressions'' for the derivatives of <math>f</math> and <math>g</math> (along with expressions for the functions themselves) allows us to compute the general expression for the derivative of <math>f \circ g</math>.<br>Note that we do not need to know <math>f</math> itself (it suffices to know <math>f'</math>, which tells us what <math>f</math> is up to additive constants), but we ''do'' need to know what <math>g</math> is. It does not suffice to know <math>g</math> merely up to additive constants. |
|} | |} | ||
Line 85: | Line 132: | ||
! Shorthand !! Significance | ! Shorthand !! Significance | ||
|- | |- | ||
− | | significance of derivative being zero || If <math>g'(x_0) = 0</math>, and <math>f</math> is differentiable at <math>g(x_0)</math>, then <math>(f \circ g)'(x_0) = 0</math>. Note that the conclusion need ''not'' follow if <math>f</math> is not differentiable at <math>g(x_0)</math>.<br>Also, if <math>f'(g(x_0)) = 0</math> and <math>g</math> is differentiable at <math>x_0</math>, then <math>(f \circ g)'(x_0) = 0</math>. | + | | significance of derivative being zero || If <math>\! g'(x_0) = 0</math>, and <math>f</math> is differentiable at <math>g(x_0)</math>, then <math>\! (f \circ g)'(x_0) = 0</math>. Note that the conclusion need ''not'' follow if <math>f</math> is not differentiable at <math>g(x_0)</math>.<br>Also, if <math>\! f'(g(x_0)) = 0</math> and <math>g</math> is differentiable at <math>x_0</math>, then <math>(f \circ g)'(x_0) = 0</math>.<br>Note that ''it is essential in both cases that the other function be differentiable at the appropriate point.'' Here are some counterexamples when it's not: <toggledisplay>Take <math>f(x) := x^{1/3}, g(x) := x^3, x_0 = 0</math>. Then, <math>g'(x_0) = 0</math> but <math>(f \circ g)'(x_0) = 1 \ne 0</math>. The reason is that <math>f</math> is not differentiable at <math>g(x_0)</math>. Similarly, setting <math>f(x) := x^3, g(x) := x^{1/3}, x_0 = 0</math>, we get <math>f'(g(x_0)) = 0</math> but <math>(f \circ g)'(x_0) \ne 0</math> because <math>g'(x_0)</math> does not exist.</toggledisplay> |
|- | |- | ||
− | | significance of sign of derivative || The product of the signs of <math>f'(g(x_0))</math> and <math>g'(x_0)</math> gives the sign of <math>(f \circ g)'(x_0)</math>. In particular, if both have the same sign, then <math>(f \circ g)'</math> is positive. If both have opposite signs, then <math>(f \circ g)'</math> is negative. This is related to the idea that a [[composite of increasing functions is increasing]], and similar ideas. | + | | significance of sign of derivative || The product of the signs of <math>\! f'(g(x_0))</math> and <math>\! g'(x_0)</math> gives the sign of <math>(f \circ g)'(x_0)</math>. In particular, if both have the same sign, then <math>(f \circ g)'(x_0)</math> is positive. If both have opposite signs, then <math>(f \circ g)'(x_0)</math> is negative. This is related to the idea that a [[composite of increasing functions is increasing]], and similar ideas. |
|- | |- | ||
− | | significance of uniform bounds on derivatives || If <math>f'</math> and <math>g'</math> are uniformly bounded, then so is <math>(f \circ g)'</math>, with a possible uniform bound being the product of the uniform bounds for <math>f'</math> and <math>g'</math>. | + | | significance of uniform bounds on derivatives || If <math>\! f'</math> and <math>\! g'</math> are uniformly bounded, then so is <math>(f \circ g)'</math>, with a possible uniform bound being the product of the uniform bounds for <math>\! f'</math> and <math>\! g'</math>. |
|} | |} | ||
+ | |||
+ | ==Compatibility checks== | ||
+ | |||
+ | ===Associative symmetry=== | ||
+ | |||
+ | This is a compatibility check for showing that for a composite of three functions <math>f_1 \circ f_2 \circ f_3</math>, the formula for the derivative obtained using the chain rule is the same whether we associate it as <math>f_1 \circ (f_2 \circ f_3)</math> or as <math>(f_1 \circ f_2) \circ f_3</math>. | ||
+ | |||
+ | * Derivative as <math>f_1 \circ (f_2 \circ f_3)</math>. We first apply the chain rule for the pair of functions <math>(f_1, f_2 \circ f_3)</math> and then for the pair of functions <math>(f_2, f_3)</math>: | ||
+ | |||
+ | In point-free notation: | ||
+ | |||
+ | <math>(f_1 \circ (f_2 \circ f_3))' = (f_1' \circ (f_2 \circ f_3)) \cdot (f_2 \circ f_3)' = (f_1' \circ (f_2 \circ f_3)) \cdot (f_2' \circ f_3) \cdot f_3'</math> | ||
+ | |||
+ | In point notation (i.e., including a symbol for the point where the function is applied): | ||
+ | |||
+ | <math>(f_1 \circ (f_2 \circ f_3))'(x) = f_1'(f_2 \circ f_3(x))(f_2 \circ f_3)'(x) = f_1'(f_2(f_3(x)))(f_2 \circ f_3)'(x) = f_1'(f_2(f_3(x)))f_2'(f_3(x))f_3'(x)</math> | ||
+ | |||
+ | * Derivative as <math>(f_1 \circ f_2) \circ f_3</math>. We first apply the chain rule for the pair of functions <math>(f_1 \circ f_2, f_3)</math> and then for the pair of functions <math>(f_1, f_2)</math>: | ||
+ | |||
+ | In point-free notation: | ||
+ | |||
+ | <math>((f_1 \circ f_2) \circ f_3)' = ((f_1 \circ f_2)' \circ f_3) \cdot f_3' = ((f_1' \circ f_2) \cdot f_2') \circ f_3) \cdot f_3' = ((f_1' \circ f_2) \circ f_3) \cdot (f_2' \circ f_3) \cdot f_3'</math> | ||
+ | |||
+ | In point notation (i.e., including a symbol for the point where the function is applied): | ||
+ | |||
+ | <math>((f_1 \circ f_2) \circ f_3)'(x) = ((f_1 \circ f_2)' \circ f_3)(x)f_3'(x) = (f_1 \circ f_2)'(f_3(x))f_3'(x) = f_1'(f_2(f_3(x)))f_2'(f_3(x))f_3'(x)</math> | ||
+ | |||
+ | ===Compatibility with linearity=== | ||
+ | |||
+ | Consider functions <math>f_1,f_2,g</math>. We have that: | ||
+ | |||
+ | <math>(f_1 + f_2) \circ g = (f_1 \circ g) + (f_2 \circ g)</math> | ||
+ | |||
+ | The function <math>(f_1 + f_2) \circ g</math> can be differentiated either by differentiating the left side or by differentiating the right side. The compatibility check is to ensure that we get the same result from both methods: | ||
+ | |||
+ | * Left side: In point-free notation: | ||
+ | |||
+ | <math>\! ((f_1 + f_2) \circ g)' = ((f_1 + f_2)' \circ g) \cdot g' = ((f_1' + f_2') \circ g) \cdot g' = ((f_1' \circ g) + (f_2' \circ g)) \cdot g' = ((f_1' \circ g) \cdot g') + ((f_2' \circ g) \cdot g')</math> | ||
+ | |||
+ | In point notation (i.e., including a symbol for the point of application): | ||
+ | |||
+ | <math>\! ((f_1 + f_2) \circ g)'(x) = (f_1 + f_2)'(g(x))g'(x) = (f_1'(g(x)) + f_2'(g(x)))g'(x) = f_1'(g(x))g'(x) + f_2'(g(x))g'(x)</math> | ||
+ | |||
+ | * Right side: In point-free notation: | ||
+ | |||
+ | We get <math>\! (f_1 \circ g + f_2 \circ g)' = (f_1 \circ g)' + (f_2 \circ g)' = ((f_1' \circ g) \cdot g') + ((f_2' \circ g) \cdot g')</math>. | ||
+ | |||
+ | In point notation: | ||
+ | |||
+ | <math>(f_1 \circ g + f_2 \circ g)'(x) = (f_1 \circ g)'(x) + (f_2 \circ g)'(x) = f_1'(g(x))g'(x) + f_2'(g(x))g'(x)</math> | ||
+ | |||
+ | Thus, we get the same result on both sides, indicating compatibility. | ||
+ | |||
+ | Note that it is ''not'' in general true that <math>f \circ (g_1 + g_2) = (f \circ g_1) + (f \circ g_2)</math>, so there is no compatibility check to be made there. | ||
+ | |||
+ | ===Compatibility with product rule=== | ||
+ | |||
+ | Consider functions <math>f_1,f_2,g</math>. We have that: | ||
+ | |||
+ | <math>(f_1 \cdot f_2) \circ g = (f_1 \circ g) \cdot (f_2 \circ g)</math> | ||
+ | |||
+ | The function <math>(f_1 \cdot f_2) \circ g</math> can be differentiated either by differentiating the left side or by differentiating the right side. The two processes use the [[product rule for differentiation]] in different ways. The compatibility check is to ensure that we get the same result from both methods: | ||
+ | |||
+ | * Left side: In point-free notation: | ||
+ | |||
+ | <math>\! ((f_1 \cdot f_2) \circ g)' = ((f_1 \cdot f_2)' \circ g) \cdot g' = ((f_1' \cdot f_2 + f_1 \cdot f_2') \circ g) \cdot g' = ((f_1' \cdot f_2) \circ g) \cdot g' + ((f_1 \cdot f_2') \circ g) \cdot g'</math> | ||
+ | |||
+ | In point notation: | ||
+ | |||
+ | <math>\! ((f_1 \cdot f_2) \circ g)' = ((f_1 \cdot f_2)'(g(x)) g'(x) = (f_1'(g(x))f_2(g(x)) + f_1(g(x))f_2'(g(x))) g'(x)</math> | ||
+ | |||
+ | * Right side: In point-free notation: | ||
+ | |||
+ | <math>\! ((f_1 \circ g) \cdot (f_2 \circ g))' = (f_1 \circ g)' \cdot (f_2 \circ g) + (f_1 \circ g) \cdot (f_2 \circ g)' = (f_1' \circ g) \cdot g' \cdot (f_2 \circ g) + (f_1 \circ g) \cdot (f_2' \circ g) \cdot g'</math> <math>\! = [(f_1' \circ g) \cdot (f_2 \circ g)] \cdot g' + [(f_1 \circ g) \cdot (f_2' \circ g)] \cdot g' = ((f_1' \cdot f_2) \circ g) \cdot g' + ((f_1 \cdot f_2') \circ g) \cdot g'</math> | ||
+ | |||
+ | In point notation: | ||
+ | |||
+ | <math>\! ((f_1 \circ g) \cdot (f_2 \circ g))'(x) = (f_1 \circ g)'(x)(f_2 \circ g)(x) + (f_1 \circ g)(x)(f_2 \circ g)'(x) = (f_1'(g(x))g'(x)f_2(g(x)) + f_1(g(x))g'(x)f_2'(g(x))</math> | ||
+ | |||
+ | <math>\! = (f_1'(g(x))f_2(g(x)) + f_1(g(x))f_2'(g(x))) g'(x)</math> | ||
+ | |||
+ | Note that it is ''not'' in general true that <math>f \circ (g_1 \cdot g_2) = (f \circ g_1) \cdot (f \circ g_2)</math>, so no compatibility check needs to be made there. | ||
+ | |||
+ | ===Compatibility with notions of order=== | ||
+ | |||
+ | This section explains why the chain rule is compatible with notions of order <math>\operatorname{ord}</math> that satisfy: | ||
+ | |||
+ | * <math>\operatorname{ord}(f') = \operatorname{ord}(f) - 1</math> | ||
+ | * <math>\operatorname{ord}(f \circ g) = \operatorname{ord}(f)\operatorname{ord}(g)</math> | ||
+ | * <math>\operatorname{ord}(f \cdot g) = \operatorname{ord}(f) + \operatorname{ord}(g)</math> | ||
+ | |||
+ | Suppose <math>\operatorname{ord}(f) = m</math> and <math>\operatorname{ord}(g) = n</math>. Then we have the following: | ||
+ | |||
+ | * <math>(f \circ g)'</math> has order <math>mn -1</math>: First, note that <math>f \circ g</math> has order <math>mn</math> by the product relation for order. Next, note that differentiating pushes the order down by one. | ||
+ | * <math>(f' \circ g) \cdot g'</math> has order <math>mn - 1</math>: Note that <math>f' \circ g</math> has order <math>(m - 1)n</math> and <math>g'</math> has order <math>n - 1</math>. Adding, we get <math(m - 1)n + n - 1 = mn - 1</math>. | ||
+ | |||
+ | Note that this compatibility check ''fails'' on both the false chain rules discussed in [[#Significance|the significance section]]: <toggledisplay> | ||
+ | |||
+ | * The rule <math>(f \circ g)'(x) = f'(g'(x))</math> fails because the order of the right side computes to <math>(m - 1)(n - 1)</math>, which is not the same as <math>mn - 1</math>. | ||
+ | * The rule <math>(f \circ g)'(x) = f'(x)g'(x)</math> fails because the order of the right side computes to <math>m - 1 + n - 1 = m + n - 2</math>, which is not the same as <math>mn - 1</math>.</toggledisplay> | ||
+ | |||
+ | Some examples of the notion of order which illustrate this are: | ||
+ | |||
+ | * For nonzero polynomials, the ''order'' notion above can be taken as the degree of the polynomial. | ||
+ | * For functions that are zero at a particular point, the ''order'' notion above can be taken as the [[order of zero]] at the point. Note that in this case, the order of zero for <math>f</math> will be calculated at 0 rather than the original point at which <math>g</math> is evaluated. | ||
==Examples== | ==Examples== | ||
− | === | + | ===Sanity checks=== |
We first consider examples where the chain rule for differentiation confirms something we already knew by other means: | We first consider examples where the chain rule for differentiation confirms something we already knew by other means: | ||
{| class="sortable" border="1" | {| class="sortable" border="1" | ||
− | ! Case on <math>f</math> !! Case on <math>g</math> !! | + | ! Case on <math>f</math> !! Case on <math>g</math> !! <math>(f \circ g)'</math> !! Direct justification, without using the chain rule !! Justification using the chain rule, i.e., by computing <math>(f' \circ g) \cdot g'</math> |
|- | |- | ||
− | | a [[constant function]] || any differentiable function || <math>f \circ g</math> is a constant function, so its derivative is the [[zero function]]. || By the chain rule, <math>(f \circ g)'(x) = f'(g(x))g'(x)</math>. <math>f</math> being constant forces <math>f'(g(x))</math> to be zero everywhere, hence the product <math>f'(g(x))g'(x)</math> is also zero everywhere. Thus, <math>(f \circ g)'</math> is also zero everywhere. | + | | a [[constant function]] || any differentiable function || [[zero function]] || <math>f \circ g</math> is a constant function, so its derivative is the [[zero function]]. || By the chain rule, <math>(f \circ g)'(x) = f'(g(x))g'(x)</math>. <math>f</math> being constant forces <math>f'(g(x))</math> to be zero everywhere, hence the product <math>f'(g(x))g'(x)</math> is also zero everywhere. Thus, <math>(f \circ g)'</math> is also zero everywhere. |
|- | |- | ||
− | | any differentiable function || a [[constant function]] with value <math>k</math> || <math>f \circ g</math> is a constant function with value <math>f(k)</math>, so its derivative is the [[zero function]]. || By the chain rule, <math>(f \circ g)'(x) = f'(g(x))g'(x)</math>. <math>g</math> being constant forces that <math>g'(x) = 0</math> everywhere, hence the product <math>f'(g(x))g'(x)</math> is also zero everywhere. Thus, <math>(f \circ g)'</math> is also zero everywhere. | + | | any differentiable function || a [[constant function]] with value <math>k</math> || [[zero function]] || <math>f \circ g</math> is a constant function with value <math>f(k)</math>, so its derivative is the [[zero function]]. || By the chain rule, <math>(f \circ g)'(x) = f'(g(x))g'(x)</math>. <math>g</math> being constant forces that <math>g'(x) = 0</math> everywhere, hence the product <math>f'(g(x))g'(x)</math> is also zero everywhere. Thus, <math>(f \circ g)'</math> is also zero everywhere. |
|- | |- | ||
− | | the [[identity function]], i.e., the function <math>x \mapsto x</math> || any differentiable function || <math>f \circ g = g</math>, so <math>(f \circ g)' = g'</math>. || <math>(f \circ g)' = (f' \circ g) \cdot g'</math>. Since <math>f</math> is the function <math>x \mapsto x</matH>, its derivative is the function <math>x \mapsto 1</math>. Plugging this in, we get that <math>f' \circ g</math> is also the constant function <math>x \mapsto 1</math>, so <math>(f \circ g)' = 1g' = g'</math>. | + | | the [[identity function]], i.e., the function <math>x \mapsto x</math> || any differentiable function || <math>\! g'</math> || <math>f \circ g = g</math>, so <math>(f \circ g)' = g'</math>. || <math>(f \circ g)' = (f' \circ g) \cdot g'</math>. Since <math>f</math> is the function <math>x \mapsto x</matH>, its derivative is the function <math>x \mapsto 1</math>. Plugging this in, we get that <math>f' \circ g</math> is also the constant function <math>x \mapsto 1</math>, so <math>(f \circ g)' = 1g' = g'</math>. |
|- | |- | ||
− | | any differentiable function || the [[identity function]] || <math>f \circ g = f</math>, so <math>(f \circ g)' = f'</math>. || <math>(f \circ g)' = (f' \circ g) \cdot g'</math>. Since <math>g</math> is the identity function, <math>g'</math> is the function <math>x \mapsto 1</math>. Also, <math>f' \circ g = f'</math>. Thus, <math>(f \circ g)' = f' \cdot 1 = f'</math>. | + | | any differentiable function || the [[identity function]] || <math>\! f'</math> || <math>f \circ g = f</math>, so <math>(f \circ g)' = f'</math>. || <math>(f \circ g)' = (f' \circ g) \cdot g'</math>. Since <math>g</math> is the identity function, <math>g'</math> is the function <math>x \mapsto 1</math>. Also, <math>f' \circ g = f'</math>. Thus, <math>(f \circ g)' = f' \cdot 1 = f'</math>. |
|- | |- | ||
− | | the [[square function]] || any differentiable function || <math>f(g(x)) = (g(x))^2</math> and hence its derivative can be computed using the [[product rule for differentiation]]. It comes out as <math>2g(x)g'(x)</math>. || <math>(f \circ g)' = (f' \circ g) \cdot g'</math>. <math>f'</math> is the derivative of the square function, and therefore is <math>x \mapsto 2x</math>. Thus, <math>f'(g(x)) = 2g(x)</math>. We thus get <math>(f \circ g)' = 2g(x)g'(x)</math>. | + | | the [[square function]] || any differentiable function || <math>\! x \mapsto 2g(x)g'(x)</math> || <math>f(g(x)) = (g(x))^2</math> and hence its derivative can be computed using the [[product rule for differentiation]]. It comes out as <math>2g(x)g'(x)</math>. || <math>(f \circ g)' = (f' \circ g) \cdot g'</math>. <math>f'</math> is the derivative of the square function, and therefore is <math>x \mapsto 2x</math>. Thus, <math>\! f'(g(x)) = 2g(x)</math>. We thus get <math>(f \circ g)' = 2g(x)g'(x)</math>. |
+ | |- | ||
+ | | a one-one differentiable function || the [[inverse function]] of <math>f</math> || 1 || <math>f(g(x)) = x</math> for all <math>x</math>, so the derivative is the function 1. || <math>(f \circ g)' = (f' \circ g) \cdot g'</math>. By the [[inverse function theorem]], we know that <math>g' = 1/(f' \circ g)</math>, so plugging in, we get <math>(f \circ g)' = (f' \circ g) \cdot 1/(f' \circ g) = 1</math>. | ||
|} | |} | ||
+ | |||
+ | ===Nontrivial examples=== | ||
+ | |||
+ | The chain rule is necessary for computing the derivatives of functions whose definition ''requires'' one to compose functions. The chain rule still isn't the only option: one can always compute the derivative as a limit of a [[difference quotient]]. But it does offer the only option if one restricts oneself to operating within the family of [[:Category:Differentiation rules|differentiation rules]]. | ||
+ | |||
+ | Some examples of functions for which the chain rule needs to be used include: | ||
+ | |||
+ | * A trigonometric function applied to a nonlinear algebraic function | ||
+ | * An exponential function applied to a nonlinear algebraic function | ||
+ | * A composite of two trigonometric functions, two exponential functions, or an exponential and a trigonometric function | ||
+ | |||
+ | A few examples are below. | ||
+ | |||
+ | ====Sine of square function==== | ||
+ | |||
+ | Consider the [[sine of square function]]: | ||
+ | |||
+ | <math>x \mapsto \sin(x^2)</math>. | ||
+ | |||
+ | {{#lst:sine of square function|differentiation}} | ||
+ | |||
+ | ====Sine of sine function==== | ||
+ | |||
+ | Consider the [[sine of sine function]]: | ||
+ | |||
+ | <math>x \mapsto \sin(\sin x)</math> | ||
+ | |||
+ | The derivative is: | ||
+ | |||
+ | <math>(\sin \circ \sin)'(x) = (\sin' \circ \sin)(x)\sin'(x) = \cos(\sin x)\cos(x)</math> |
Latest revision as of 19:58, 3 May 2015
ORIGINAL FULL PAGE: Chain rule for differentiation
STUDY THE TOPIC AT MULTIPLE LEVELS:
ALSO CHECK OUT: Quiz (multiple choice questions to test your understanding) |Page with videos on the topic, both embedded and linked to
This article is about a differentiation rule, i.e., a rule for differentiating a function expressed in terms of other functions whose derivatives are known.
View other differentiation rules
Contents
Statement for two functions
The chain rule is stated in many versions:
Version type | Statement |
---|---|
specific point, named functions | Suppose and are functions such that is differentiable at a point , and is differentiable at . Then the composite is differentiable at , and we have: |
generic point, named functions, point notation | Suppose and are functions of one variable. Then, we have wherever the right side expression makes sense. |
generic point, named functions, point-free notation | Suppose and are functions of one variable. Then, where the right side expression makes sense, where denotes the pointwise product of functions. |
pure Leibniz notation | Suppose is a function of and is a function of . Then, |
MORE ON THE WAY THIS DEFINITION OR FACT IS PRESENTED: We first present the version that deals with a specific point (typically with a subscript) in the domain of the relevant functions, and then discuss the version that deals with a point that is free to move in the domain, by dropping the subscript. Why do we do this?
The purpose of the specific point version is to emphasize that the point is fixed for the duration of the definition, i.e., it does not move around while we are defining the construct or applying the fact. However, the definition or fact applies not just for a single point but for all points satisfying certain criteria, and thus we can get further interesting perspectives on it by varying the point we are considering. This is the purpose of the second, generic point version.
One-sided version
A one-sided version of sorts holds, but we need to be careful, since we want the direction of differentiability of to be the same as the direction of approach of to . The following are true:
Condition on at | Condition on at | Conclusion |
---|---|---|
left differentiable at | differentiable at | The left hand derivative of at is times the left hand derivative of at . |
right differentiable at | differentiable at | The right hand derivative of at is times the right hand derivative of at . |
left differentiable at , and increasing for on the immediate left of | left differentiable at | the left hand derivative is the left hand derivative of at times the left hand derivative of at . |
right differentiable at , and increasing for on the immediate right of | right differentiable at | the right hand derivative is the right hand derivative of at times the left hand derivative of at . |
left differentiable at , and decreasing for on the immediate left of | right differentiable at | the left hand derivative is the right hand derivative of at times the left hand derivative of at . |
right differentiable at , and decreasing for on the immediate right of | left differentiable at | the right hand derivative is the left hand derivative of at times the left hand derivative of at . |
Statement for multiple functions
Suppose are functions. Then, the following is true wherever the right side makes sense:
For instance, in the case , we get:
In point notation, this is:
Related rules
Similar facts in single variable calculus
- Chain rule for higher derivatives
- Product rule for differentiation
- Product rule for higher derivatives
- Differentiation is linear
- Inverse function theorem (gives formula for derivative of inverse function).
- Chain rule for differentiation of formal power series
Similar facts in multivariable calculus
Reversal for integration
If a function is differentiated using the chain rule, then retrieving the original function from the derivative typically requires a method of integration called integration by substitution. Specifically, that method of integration targets expressions of the form:
The -substitution idea is to set and obtain:
We now need to find a function such that . The integral is . Plugging back , we obtain that the indefinite integral is .
Significance
Why more naive chain rules don't make sense
There are two naive versions of the chain rule one might come up with, neither of which holds:
and
Even without doing any mathematics, we can deduce that neither of these rules can be correct. How? Any rule that holds generically must involve evaluating or only at points that we know to be in the domain of . The only such point in this context is . Therefore, the chain rule cannot involve evaluating or at any point other than .
Note that our actual chain rule:
is quite similar to the naive but false rule , and can be viewed as the corrected version of the rule once we account for the fact that can only be calculated after transforming to .
Qualitative and existential significance
Each of the versions has its own qualitative significance:
Version type | Significance |
---|---|
specific point, named functions | This tells us that if is differentiable at a point and is differentiable at , then is differentiable at . |
generic point, named functions, point notation | If is a differentiable function and is a differentiable function on the intersection of its domain with the range of , then is a differentiable function. |
generic point, named functions, point-free notation | We can deduce properties of based on properties of . In particular, if and are both continuous functions, so is . Another way of putting this is that if and are both continuously differentiable functions, so is . |
Computational feasibility significance
Each of the versions has its own computational feasibility significance:
Version type | Significance |
---|---|
specific point, named functions | If we know the values (in the sense of numerical values) and , we can use these to compute . |
generic point, named functions | This tells us that knowledge of the general expressions for the derivatives of and (along with expressions for the functions themselves) allows us to compute the general expression for the derivative of . Note that we do not need to know itself (it suffices to know , which tells us what is up to additive constants), but we do need to know what is. It does not suffice to know merely up to additive constants. |
Computational results significance
Shorthand | Significance |
---|---|
significance of derivative being zero | If , and is differentiable at , then . Note that the conclusion need not follow if is not differentiable at . Also, if and is differentiable at , then . Note that it is essential in both cases that the other function be differentiable at the appropriate point. Here are some counterexamples when it's not: [SHOW MORE] |
significance of sign of derivative | The product of the signs of and gives the sign of . In particular, if both have the same sign, then is positive. If both have opposite signs, then is negative. This is related to the idea that a composite of increasing functions is increasing, and similar ideas. |
significance of uniform bounds on derivatives | If and are uniformly bounded, then so is , with a possible uniform bound being the product of the uniform bounds for and . |
Compatibility checks
Associative symmetry
This is a compatibility check for showing that for a composite of three functions , the formula for the derivative obtained using the chain rule is the same whether we associate it as or as .
- Derivative as . We first apply the chain rule for the pair of functions and then for the pair of functions :
In point-free notation:
In point notation (i.e., including a symbol for the point where the function is applied):
- Derivative as . We first apply the chain rule for the pair of functions and then for the pair of functions :
In point-free notation:
In point notation (i.e., including a symbol for the point where the function is applied):
Compatibility with linearity
Consider functions . We have that:
The function can be differentiated either by differentiating the left side or by differentiating the right side. The compatibility check is to ensure that we get the same result from both methods:
- Left side: In point-free notation:
In point notation (i.e., including a symbol for the point of application):
- Right side: In point-free notation:
We get .
In point notation:
Thus, we get the same result on both sides, indicating compatibility.
Note that it is not in general true that , so there is no compatibility check to be made there.
Compatibility with product rule
Consider functions . We have that:
The function can be differentiated either by differentiating the left side or by differentiating the right side. The two processes use the product rule for differentiation in different ways. The compatibility check is to ensure that we get the same result from both methods:
- Left side: In point-free notation:
In point notation:
- Right side: In point-free notation:
In point notation:
Note that it is not in general true that , so no compatibility check needs to be made there.
Compatibility with notions of order
This section explains why the chain rule is compatible with notions of order that satisfy:
Suppose and . Then we have the following:
- has order : First, note that has order by the product relation for order. Next, note that differentiating pushes the order down by one.
- has order : Note that has order and has order . Adding, we get <math(m - 1)n + n - 1 = mn - 1</math>.
Some examples of the notion of order which illustrate this are:
- For nonzero polynomials, the order notion above can be taken as the degree of the polynomial.
- For functions that are zero at a particular point, the order notion above can be taken as the order of zero at the point. Note that in this case, the order of zero for will be calculated at 0 rather than the original point at which is evaluated.
Examples
Sanity checks
We first consider examples where the chain rule for differentiation confirms something we already knew by other means:
Case on | Case on | Direct justification, without using the chain rule | Justification using the chain rule, i.e., by computing | |
---|---|---|---|---|
a constant function | any differentiable function | zero function | is a constant function, so its derivative is the zero function. | By the chain rule, . being constant forces to be zero everywhere, hence the product is also zero everywhere. Thus, is also zero everywhere. |
any differentiable function | a constant function with value | zero function | is a constant function with value , so its derivative is the zero function. | By the chain rule, . being constant forces that everywhere, hence the product is also zero everywhere. Thus, is also zero everywhere. |
the identity function, i.e., the function | any differentiable function | , so . | . Since is the function , its derivative is the function . Plugging this in, we get that is also the constant function , so . | |
any differentiable function | the identity function | , so . | . Since is the identity function, is the function . Also, . Thus, . | |
the square function | any differentiable function | and hence its derivative can be computed using the product rule for differentiation. It comes out as . | . is the derivative of the square function, and therefore is . Thus, . We thus get . | |
a one-one differentiable function | the inverse function of | 1 | for all , so the derivative is the function 1. | . By the inverse function theorem, we know that , so plugging in, we get . |
Nontrivial examples
The chain rule is necessary for computing the derivatives of functions whose definition requires one to compose functions. The chain rule still isn't the only option: one can always compute the derivative as a limit of a difference quotient. But it does offer the only option if one restricts oneself to operating within the family of differentiation rules.
Some examples of functions for which the chain rule needs to be used include:
- A trigonometric function applied to a nonlinear algebraic function
- An exponential function applied to a nonlinear algebraic function
- A composite of two trigonometric functions, two exponential functions, or an exponential and a trigonometric function
A few examples are below.
Sine of square function
Consider the sine of square function:
.
We use the chain rule for differentiation viewing the function as the composite of the square function on the inside and the sine function on the outside:
Sine of sine function
Consider the sine of sine function:
The derivative is: