Proof of chain rule for differentiation

From Calculus
Revision as of 03:24, 21 January 2013 by Vipul (talk | contribs) (→‎Fixing the bug in the proof)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

This article describes a proof of the chain rule for differentiation.

Specific point, named functions version

Suppose f and g are functions such that g is differentiable at a point x=x0, and f is differentiable at g(x0). Then the composite fg is differentiable at x0, and we have:
ddx[f(g(x))]|x=x0=f(g(x0))g(x0)

Pure Leibniz notation version

Suppose u=g(x) is a function of x and v=f(u) is a function of u. Then,
dvdx=dvdududx

Proof

Intuitive proof using the pure Leibniz notation version

The following intuitive proof is not rigorous, but captures the underlying idea:

  • Start with the expression dvdududx.
  • Cancel the du between the denominator and the numerator.
  • We are left with dvdx.

First attempt at formalizing the intuition

This again is not a complete proof, but it gets closer:

Note the following product relation between difference quotients when measured between any pair of distinct points x=x1 and x=x2:

ΔvΔx=ΔvΔuΔuΔx

In the functional notation, this would read as saying the obvious thing that:

f(g(x2))f(g(x1))x2x1=f(g(x2))f(g(x1))g(x2)g(x1)g(x2)g(x1)x2x1

Now, set x1=x0 and let x2=x be some number close enough to x0. We get:

f(g(x))f(g(x0))xx0=f(g(x))f(g(x0))g(x)g(x0)g(x)g(x0)xx0

Now, take the limit on both sides as xx0, and we get:

limxx0f(g(x))f(g(x0))xx0=limxx0f(g(x))f(g(x0))g(x)g(x0)limxx0g(x)g(x0)xx0

Since g is differentiable at x0, g is continuous at x0 as well (using the fact that differentiable implies continuous) and thus the first limit on the right side can be taken as:

limxx0f(g(x))f(g(x0))xx0=limg(x)g(x0)f(g(x))f(g(x0))g(x)g(x0)limxx0g(x)g(x0)xx0

The three limits are now definitionally derivatives, so we get:

(fg)(x0)=f(g(x0))g(x0)

In the Δ-notation, we are simply taking the appropriate limits on

ΔvΔx=ΔvΔuΔuΔx

and getting that:

dvdx=dvdududx

Fixing the bug in the proof

The proof above is correct in essentials, but has one bug -- namely, the issue that Δu=g(x)g(x0) may be equal to zero for xx0, making the difference quotient Δv/Δu undefined at these points. This is not an issue if this happens only at finitely many points, because we can take the limit close enough. It does become an issue, however, if g(x)=g(x0) at points x arbitrarily close to x0.

The solution to those problem is to replace the expression f(g(x))f(g(x0))g(x)g(x0) by the expression:

H(x):={f'(g(x0)), if g(x)=g(x0)f(g(x))f(g(x0))g(x)g(x0), if g(x)g(x0)

Alternatively, H(x)=h(g(x)) where:

h(u):={f'(u), if u=g(x0)f(u)f(g(x0))ug(x0), if ug(x0)

In other words, we fill in the removable discontinuity before plugging it into the product expression. If we use this expression instead, the proof becomes rigorous.

Here is the full rigorous proof. We first verify that the following identity always holds for xx0:

f(g(x))f(g(x0))xx0=H(x)g(x)g(x0)xx0

We prove this by making two cases:

  • g(x)=g(x0): In this case, the left side is zero, and the right side is f(g(x0))0=0.
  • g(x)g(x0): In this case, cancel g(x)g(x0) between the denominator of the first expression and the numerator of the second expression on the right side to get the left side. This is similar to the buggy proof.

Since the equality now holds for all xx0, it makes sense to try to take the limit. We have:

limxx0f(g(x))f(g(x0))xx0=limxx0[H(x)g(x)g(x0)xx0]

We now split the limit for the product on the right side. Note that such a splitting is a gamble, and works only if both pieces have clear limits.

limxx0f(g(x))f(g(x0))xx0=limxx0H(x)limxx0g(x)g(x0)xx0

The left side is (fg)(x0). The second expression on the right side is g(x0). It remains to calculate limxx0H(x). This is limxx0h(g(x)). We compute this as follows:

  • The differentiability of f at g(x0) guarantees that h is continuous at g(x0).
  • The differentiability of g at x0 guarantees that g is continuous at x0.

With both these facts, we get that limxx0h(g(x))=h(g(x0))=f(g(x0)) by definition. Plugging this back in completes the proof.