Notational confusion of multivariable derivatives

From Calculus
Revision as of 22:50, 30 May 2020 by IssaRice (talk | contribs)

I think there's several different confusions that arise from multivariable derivative notation:

  • The thing where wt can mean two different things on LHS and RHS when t is used as both an initial and intermediate variable. (See Folland for details.)
  • The thing where if f(x,y)=(x2,y2) then fx(x,x) feels like it might be (2x,2x) even though it's actually (2x,0). (Example from Tao.) See also [1]
  • The ambiguity of expressions like f(Ax)
  • dual basis stuff -- see Tao's explanation of this in p. 225 of [2]

Working off the example from Tao above, let f(x,y)=(x2,y2). What does ddxf(x,x) mean? Here are four possibilities:

  1. It's f (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point (x0,y0), then f(x0,y0) is a linear map R2R2 defined by the matrix (2x0002y0). Applying this linear map to (x,x), we get (2x0002y0)(xx)=(2x0x2y0x)R2.
  2. It's dfdx (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule dfdx=fx+fydydx to compute) evaluated at the point (x,x). We have (2x,0)+(0,2y)dydx=(2x,2ydydx). We can't evaluate this further since we don't know how x and y are related. If we assume the relationship y=x then this reduces to (2x,2x).
  3. We first compute f(x,x) to get an expression involving only x, which implicitly defines a function RR2. We now differentiate this function. The result is the function x(2x,2x):RR2.
  4. There's implicitly a function ϕ(x,y)=(x,x), so ddxf(x,x)=ddxf(ϕ(x,y)). Using the chain rule, this is (fϕ)(x,y)=f(ϕ(x,y))ϕ(x,y)=(2x0002y0)|(x0,y0)=ϕ(x,y)(1010)=(2x02x0).

Big picture

Why is this notation so confusing? I think there are two (?) big reasons:

  • The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland's example of wt meaning two different things.
  • The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with f(Ax) and ddxf(x,x).

The derivative as a linear transformation in the several variable case and a number in the single-variable case

  • The thing where the total derivative for n=m=1 "should" be a function but people treat it as a number. Refer to "Appendix A: Perorations of Dieudonne" (p. 337) in Pugh's Real Mathematical Analysis.

Total derivative versus derivative matrix

Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations RnRm and m by n matrices, so many books call the total derivative a matrix or equate the two.

A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.

See also