Practical:Chain rule for differentiation

From Calculus

This article considers practical aspects of the chain rule for differentiation: how is this rule used in actual computations?

ORIGINAL FULL PAGE: Chain rule for differentiation
STUDY THE TOPIC AT MULTIPLE LEVELS:
ALSO CHECK OUT: Practical tips on the topic |Quiz (multiple choice questions to test your understanding) |Page with videos on the topic, both embedded and linked to

Statement to remember

The statement of the chain rule for differentiation that we will be using is:

where and .

NOTE: As a matter of convention, and to reduce confusion, we use a different variable ( in this case) for the generic input to compared to the variable ( in this case) that we use for the generic input to .

Procedure to apply the chain rule for differentiation

The chain rule for differentiation is useful as a technique for differentiating functions that are expressed in the form of composites of simpler functions.

Most explicit procedure

The explicit procedure is outlined below:

  1. Identify the two functions whose composite is the given function. In other words, explicitly decompose the function as a composite of two functions. We will here call the functions and , though you may choose to give them different names.
  2. Calculate the derivatives of and separately, on the side.
  3. Plug into the chain rule formula the expressions for the functions and their derivatives.
  4. Simplify the expression thus obtained (this is optional in general, though it may be required in some contexts).

For instance, consider the problem:

Differentiate the function

The procedure is as follows:

  1. Identify the two functions: The two functions are and (note: per the note included with the formulation of the chain rule, we use different variable names for the generic variable for the two functions, to reduce confusion regarding which one to apply on what).
  2. Calculate the derivatives: and .
  3. Plug into the chain rule formula: We get .
  4. Simplify the expression thus obtained: There isn't really anything to simplify, but we can rearrange the terms to the more conventional order where the algebraic part is before the trigonometric part, obtaining the final answer .

More inline procedure using Leibniz notation

Although the explicit procedure above is fairly clear, Step (2) of the procedure can be a waste of time in the sense of having to do the derivative calculations separately. If you are more experienced with doing differentiation quickly, you can combine Steps (2) and (3) by calculating the derivatives while plugging into the formula, rather than doing the calculations separately prior to plugging into the formula. Further, we do not need to explicitly name the functions if we use the Leibniz notation to compute the derivatives inline.

The shorter procedure is outlined below:

  1. Identify the two functions being composed (but you don't have to give them names).
  2. Plug into the formula for the chain rule, using the Leibniz notation for derivatives that have not yet been computed.
  3. Compute derivatives and simplify

For instance, consider the problem:

Differentiate the function

  1. Identify the two functions being composed: The functions are (the outer/later function) and (the inner/earlier function)
  2. Plug into the formula for the chain rule: We get (here basically , though we don't have to say this explicitly)
  3. Compute derivatives and simplify: We get

Shortest inline procedure

If you are really experienced with doing derivatives in your head, you can shorten the procedure even further by combining Steps (2) and (3) in the previous procedure. The procedure has two steps:

  1. Identify the two functions being composed (but you don't have to give them names).
  2. Use the formula for the chain rule, computing the derivatives of the functions while plugging them into the formula

For instance, consider the problem:

Differentiate the function

  1. Identity the two functions: The functions are (the outer/later function) and (the inner/earlier function)
  2. Use the formula for the chain rule, computing the derivatives of the functions while plugging them into the formula: We get

Choosing between procedures

The procedures are not fundamentally different, but they differ in the degree of explicitness of the steps. Generally speaking, the following are recommended:

  • If the functions being composed are fairly easy to differentiate mentally, use the shortest inline procedure -- this is fast and reliable.
  • If the functions being composed are somewhat more difficult to differentiate, then choose between the other two more explicit procedures, based on whether you are more comfortable with writing large inline expressions or with doing separate work on the side.

Error types

Incorrect formula

A common mistake in differentiating a composite of functions is the use of an incorrect formula, such as or . See Chain rule for differentiation#Why more naive chain rules don't_make_sense for more background on why these formulas are incorrect.

Writing only one piece of the chain rule

This is an error of the incomplete task form and is harder to avoid. What happens here is that you forget to write one of the two pieces being multiplied, so perhaps you end up doing:

Why this error occurs: Usually, this error is common if you are trying to use the shortest inline procedure, i.e., differentiating the functions and applying the chain rule simultaneously, and one of the functions being differentiated is rather tricky to differentiate, requiring a product rule or chain rule for differentiation in and of itself.

How to avoid this error:

  • When the functions being differentiated are tricky to differentiate, use either the fully explicit procedure or the inline procedure with Leibniz notation. Do not try to simultaneously differentiate the pieces and use the chain rule.
  • After finishing a chain rule problem, ask the following sanity check question: did I get a product of two distinct terms as originally anticipated?

How to remember the formula

Different versions of the formula

Recall that we stated the chain rule as:

This way of writing the formula is outside-in: we start with the derivative of the outer/later function , then discard that outer layer and move to the derivative of the inner/earlier function.

We could also write this in the other order (since multiplication is commutative):

Consistently following either order works; however, it is recommended to use the first version for a couple of reasons:

  • By starting off at the outer layer and peeling it off, this process makes a bit easier to see progress/simplification as we proceed, and also giving us psychological satisfaction. This can be particularly important when working with composites involving three or more functions: for such composites, we peel the outermost layer, then the second outermost layer, and so on. The expressions get simpler as we proceed.
  • Since this is the form in which the chain rule is typically written, it makes work more legible to reviewers and graders. This is particularly important if we're skipping the explicit declaration of functions and executing on the chain rule through inline Leibniz notation or equivalent.
  • For various multivariable generalizations, order of multiplication becomes significant and in such cases, it does matter that we keep the standard order.

Quick rationalization of the formula

If you are a little shaky about the chain rule for differentiation, how do you do a reality check on the formula? The Leibniz notation version is easiest to remember for rationalization:

Here, the first step can be justified with the crude logic that cancels between the denominator of one factor and the numerator of the next. Though this is not rigorous reasoning, it's close enough to the actual rigorous reasoning and is fine to use as a quick rationalization.

It's also worth keeping in mind that and its derivatives can only be evaluated after applying , since is only known to exist on the output of . In particular, we can't have or because neither nor is a legitimate input to .

We can also think of the formula as peeling layers of the composite. For we peel from the outer/later function , going to the inner/earlier function . For each layer we peel, the price we pay to peel it is the derivative at that layer, applied to the composite of the layers inside. This way of thinking of the chain rule can also help justify the chain rule for differentiating a composite of three or more functions:

Applying the chain rule for differentiation to compute values at a point

Knowledge of the values (in the sense of numerical values) at a specific point is sufficient to compute the value of .

For instance, suppose we are given that , we obtain that .