Chain rule for differentiation: Difference between revisions

Revision as of 01:02, 28 November 2011

This article is about a differentiation rule, i.e., a rule for differentiating a function expressed in terms of other functions whose derivatives are known.
View other differentiation rules

Statement for two functions

The chain rule is stated in many versions:

Version type	Statement
specific point, named functions	Suppose $f$ and $g$ are functions such that $g$ is differentiable at a point $x = x_{0}$ , and $f$ is differentiable at $g (x_{0})$ . Then the composite $f \circ g$ is differentiable at $x_{0}$ , and we have: $\frac{d}{d x} [f (g (x))] \|_{x = x_{0}} = f^{'} (g (x_{0})) g^{'} (x_{0})$
generic point, named functions, point notation	Suppose $f$ and $g$ are functions of one variable. Then, we have $\frac{d}{d x} [f (g (x))] = f^{'} (g (x)) g^{'} (x)$ wherever the right side expression makes sense.
generic point, named functions, point-free notation	Suppose $f$ and $g$ are functions of one variable. Then, $(f \circ g)^{'} = (f^{'} \circ g) \cdot g^{'}$ where the right side expression makes sense, where $\cdot$ denotes the pointwise product of functions.
pure Leibniz notation	Suppose $u = g (x)$ is a function of $x$ and $v = f (u)$ is a function of $u$ . Then, $\frac{d v}{d x} = \frac{d v}{d u} \frac{d u}{d x}$

MORE ON THE WAY THIS DEFINITION OR FACT IS PRESENTED: We first present the version that deals with a specific point (typically with a
${}_{0}$
subscript) in the domain of the relevant functions, and then discuss the version that deals with a point that is free to move in the domain, by dropping the subscript. Why do we do this?
The purpose of the specific point version is to emphasize that the point is fixed for the duration of the definition, i.e., it does not move around while we are defining the construct or applying the fact. However, the definition or fact applies not just for a single point but for all points satisfying certain criteria, and thus we can get further interesting perspectives on it by varying the point we are considering. This is the purpose of the second, generic point version.

One-sided version

A one-sided version of sorts holds, but we need to be careful, since we want the direction of differentiability of $f$ to be the same as the direction of approach of $g (x)$ to $g (x_{0})$ . The following are true:

Condition on $g$ at $x_{0}$	Condition on $f$ at $g (x_{0})$	Conclusion
left differentiable at $x_{0}$	differentiable at $g (x_{0})$	The left hand derivative of $f \circ g$ at $x_{0}$ is $f^{'} (g (x_{0}))$ times the left hand derivative of $g$ at $x_{0}$ .
right differentiable at $x_{0}$	differentiable at $g (x_{0})$	The right hand derivative of $f \circ g$ at $x_{0}$ is $f^{'} (g (x_{0}))$ times the right hand derivative of $g$ at $x_{0}$ .
left differentiable at $x_{0}$ , and increasing for $x$ on the immediate left of $x_{0}$	left differentiable at $g (x_{0})$	the left hand derivative is the left hand derivative of $f$ at $g (x_{0})$ times the left hand derivative of $g$ at $x_{0}$ .
right differentiable at $x_{0}$ , and increasing for $x$ on the immediate right of $x_{0}$	right differentiable at $g (x_{0})$	the right hand derivative is the right hand derivative of $f$ at $g (x_{0})$ times the left hand derivative of $g$ at $x_{0}$ .
left differentiable at $x_{0}$ , and decreasing for $x$ on the immediate left of $x_{0}$	right differentiable at $g (x_{0})$	the left hand derivative is the right hand derivative of $f$ at $g (x_{0})$ times the left hand derivative of $g$ at $x_{0}$ .
right differentiable at $x_{0}$ , and decreasing for $x$ on the immediate right of $x_{0}$	left differentiable at $g (x_{0})$	the right hand derivative is the left hand derivative of $f$ at $g (x_{0})$ times the left hand derivative of $g$ at $x_{0}$ .

Statement for multiple functions

Suppose $f_{1}, f_{2}, \dots, f_{n}$ are functions. Then, the following is true wherever the right side makes sense:

$(f_{1} \circ f_{2} \circ f_{3} \dots \circ f_{n})^{'} = ({f_{1^{'}}}^{'} \circ f_{2} \circ \dots \circ f_{n}) \cdot ({f_{2^{'}}}^{'} \circ \dots \circ f_{n}) \cdot \dots \cdot ({f_{n^{'} - 1}}^{'} \circ f_{n}) \cdot {f_{n^{'}}}^{'}$

For instance, in the case $n = 3$ , we get:

$(f_{1} \circ f_{2} \circ f_{3})^{'} = ({f_{1^{'}}}^{'} \circ f_{2} \circ f_{3}) \cdot ({f_{2^{'}}}^{'} \circ f_{3}) \cdot {f_{3^{'}}}^{'}$

In point notation, this is:

$\frac{d}{d x} [f_{1} (f_{2} (f_{3} (x)))] = {f_{1^{'}}}^{'} (f_{2} (f_{3} (x)) {f_{2^{'}}}^{'} (f_{3} (x)) {f_{3^{'}}}^{'} (x)$

Related rules

Similar facts in single variable calculus

Chain rule for higher derivatives
Product rule for differentiation
Product rule for higher derivatives
Differentiation is linear
Inverse function theorem (gives formula for derivative of inverse function).

Similar facts in multivariable calculus

Chain rule for partial derivatives

Reversal for integration

If a function is differentiated using the chain rule, then retrieving the original function from the derivative typically requires a method of integration called integration by u-substitution. Specifically, that method of integration targets expressions of the form:

$\int h (g (x)) g^{'} (x) d x$

The $u$ -substitution idea is to set $u = g (x)$ and obtain:

$\int h (u) d u$

We now need to find a function $f$ such that $f^{'} = h$ . The integral is $f (u) + C$ . Plugging back $u = g (x)$ , we obtain that the indefinite integral is $f (g (x)) + C$ .

Significance

Qualitative and existential significance

Each of the versions has its own qualitative significance:

Version type	Significance
specific point, named functions	This tells us that if $g$ is differentiable at a point $x_{0}$ and $f$ is differentiable at $g (x_{0})$ , then $f \circ g$ is differentiable at $x_{0}$ .
generic point, named functions, point notation	If $g$ is a differentiable function and $f$ is a differentiable function on the intersection of its domain with the range of $g$ , then $f \circ g$ is a differentiable function.
generic point, named functions, point-free notation	We can deduce properties of $(f \circ g)^{'}$ based on properties of $f^{'}, g^{'}, f, g$ . In particular, if $f^{'}$ and $g^{'}$ are both continuous functions, so is $(f \circ g)^{'}$ . Another way of putting this is that if $f$ and $g$ are both continuously differentiable functions, so is $f \circ g$ .

Computational feasibility significance

Each of the versions has its own computational feasibility significance:

Version type	Significance
specific point, named functions	If we know the values (in the sense of numerical values) $g^{'} (x_{0})$ and $f^{'} (g (x_{0}))$ , we can use these to compute $(f \circ g)^{'} (x_{0})$ .
generic point, named functions	This tells us that knowledge of the general expressions for the derivatives of $f$ and $g$ (along with expressions for the functions themselves) allows us to compute the general expression for the derivative of $f \circ g$ .

Computational results significance

Shorthand	Significance
significance of derivative being zero	If $g^{'} (x_{0}) = 0$ , and $f$ is differentiable at $g (x_{0})$ , then $(f \circ g)^{'} (x_{0}) = 0$ . Note that the conclusion need not follow if $f$ is not differentiable at $g (x_{0})$ . Also, if $f^{'} (g (x_{0})) = 0$ and $g$ is differentiable at $x_{0}$ , then $(f \circ g)^{'} (x_{0}) = 0$ .
significance of sign of derivative	The product of the signs of $f^{'} (g (x_{0}))$ and $g^{'} (x_{0})$ gives the sign of $(f \circ g)^{'} (x_{0})$ . In particular, if both have the same sign, then $(f \circ g)^{'}$ is positive. If both have opposite signs, then $(f \circ g)^{'}$ is negative. This is related to the idea that a composite of increasing functions is increasing, and similar ideas.
significance of uniform bounds on derivatives	If $f^{'}$ and $g^{'}$ are uniformly bounded, then so is $(f \circ g)^{'}$ , with a possible uniform bound being the product of the uniform bounds for $f^{'}$ and $g^{'}$ .

Examples

Sanity checks

We first consider examples where the chain rule for differentiation confirms something we already knew by other means:

Case on $f$	Case on $g$	$(f \circ g)^{'}$	Direct justification, without using the chain rule	Justification using the chain rule, i.e., by computing $(f^{'} \circ g) \cdot g^{'}$
a constant function	any differentiable function	zero function	$f \circ g$ is a constant function, so its derivative is the zero function.	By the chain rule, $(f \circ g)^{'} (x) = f^{'} (g (x)) g^{'} (x)$ . $f$ being constant forces $f^{'} (g (x))$ to be zero everywhere, hence the product $f^{'} (g (x)) g^{'} (x)$ is also zero everywhere. Thus, $(f \circ g)^{'}$ is also zero everywhere.
any differentiable function	a constant function with value $k$	zero function	$f \circ g$ is a constant function with value $f (k)$ , so its derivative is the zero function.	By the chain rule, $(f \circ g)^{'} (x) = f^{'} (g (x)) g^{'} (x)$ . $g$ being constant forces that $g^{'} (x) = 0$ everywhere, hence the product $f^{'} (g (x)) g^{'} (x)$ is also zero everywhere. Thus, $(f \circ g)^{'}$ is also zero everywhere.
the identity function, i.e., the function $x \mapsto x$	any differentiable function	$g^{'}$	$f \circ g = g$ , so $(f \circ g)^{'} = g^{'}$ .	$(f \circ g)^{'} = (f^{'} \circ g) \cdot g^{'}$ . Since $f$ is the function $x \mapsto x$ , its derivative is the function $x \mapsto 1$ . Plugging this in, we get that $f^{'} \circ g$ is also the constant function $x \mapsto 1$ , so $(f \circ g)^{'} = 1 g^{'} = g^{'}$ .
any differentiable function	the identity function	$f^{'}$	$f \circ g = f$ , so $(f \circ g)^{'} = f^{'}$ .	$(f \circ g)^{'} = (f^{'} \circ g) \cdot g^{'}$ . Since $g$ is the identity function, $g^{'}$ is the function $x \mapsto 1$ . Also, $f^{'} \circ g = f^{'}$ . Thus, $(f \circ g)^{'} = f^{'} \cdot 1 = f^{'}$ .
the square function	any differentiable function	$x \mapsto 2 g (x) g^{'} (x)$	$f (g (x)) = (g (x))^{2}$ and hence its derivative can be computed using the product rule for differentiation. It comes out as $2 g (x) g^{'} (x)$ .	$(f \circ g)^{'} = (f^{'} \circ g) \cdot g^{'}$ . $f^{'}$ is the derivative of the square function, and therefore is $x \mapsto 2 x$ . Thus, $f^{'} (g (x)) = 2 g (x)$ . We thus get $(f \circ g)^{'} = 2 g (x) g^{'} (x)$ .
a one-one differentiable function	the inverse function of $f$	1	$f (g (x)) = x$ for all $x$ , so the derivative is the function 1.	$(f \circ g)^{'} = (f^{'} \circ g) \cdot g^{'}$ . By the inverse function theorem, we know that $g^{'} = 1 / (f^{'} \circ g)$ , so plugging in, we get $(f \circ g)^{'} = (f^{'} \circ g) \cdot 1 / (f^{'} \circ g) = 1$ .

Nontrivial examples

Here are some examples that cannot be computed using methods other than the chain rule:

Consider the sine of square function:

$x \mapsto \sin (x^{2})$ .

We use the chain rule for differentiation viewing the function as the composite of the square function on the inside and the sine function on the outside:

$\frac{d}{d x} [\sin (x^{2})] = \frac{d (\sin (x^{2}))}{d (x^{2})} \frac{d (x^{2})}{d x} = (\cos (x^{2})) (2 x) = 2 x \cos (x^{2})$

@@ Line 139: / Line 139: @@
 |-
 | a one-one differentiable function || the [[inverse function]] of <math>f</math> || 1 || <math>f(g(x)) = x</math> for all <math>x</math>, so the derivative is the function 1. || <math>(f \circ g)' = (f' \circ g) \cdot g'</math>. By the [[inverse function theorem]], we know that <math>g' = 1/(f' \circ g)</math>, so plugging in, we get <math>(f \circ g)' = (f' \circ g) \cdot 1/(f' \circ g) = 1</math>.
-|}
 |}