Notational confusion of multivariable derivatives: Difference between revisions

Revision as of 23:07, 30 May 2020

I think there's several different confusions that arise from multivariable derivative notation:

The thing where ${\frac {\partial w}{\partial t}}$ can mean two different things on LHS and RHS when $t$ is used as both an initial and intermediate variable. (See Folland for details.)
The thing where if $f(x,y)=(x^{2},y^{2})$ then ${\frac {\partial f}{\partial x}}(x,x)$ feels like it might be $(2x,2x)$ even though it's actually $(2x,0)$ . (Example from Tao.) See also [1]
The ambiguity of expressions like $\nabla f(Ax)$
dual basis stuff -- see Tao's explanation of this in p. 225 of [2]

Working off the example from Tao above, let $f(x,y)=(x^{2},y^{2})$ . What does ${\frac {d}{dx}}f(x,x)$ mean? Here are four possibilities:

It's $f'$ (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point $(x_{0},y_{0})$ , then $f'(x_{0},y_{0})$ is a linear map $\mathbf {R} ^{2}\to \mathbf {R} ^{2}$ defined by the matrix ${\begin{pmatrix}2x_{0}&0\\0&2y_{0}\end{pmatrix}}$ . Since our point is $(x_{0},y_{0})=(x,x)$ , we have ${\begin{pmatrix}2x&0\\0&2x\end{pmatrix}}$ .
It's ${\frac {df}{dx}}$ (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule ${\frac {df}{dx}}={\frac {\partial f}{\partial x}}+{\frac {\partial f}{\partial y}}{\frac {dy}{dx}}$ to compute) evaluated at the point (x,x). We have $(2x,0)+(0,2y){\frac {dy}{dx}}=(2x,2y{\frac {dy}{dx}})$ . We can't evaluate this further since we don't know how x and y are related. If we assume the relationship y=x then this reduces to (2x,2x).
We first compute f(x,x) to get an expression involving only x, which implicitly defines a function $\mathbf {R} \to \mathbf {R} ^{2}$ . We now differentiate this function. The result is the function $x\mapsto (2x,2x):\mathbf {R} \to \mathbf {R} ^{2}$ .
There's implicitly a function $\phi (x,y)=(x,x)$ , so ${\frac {d}{dx}}f(x,x)={\frac {d}{dx}}f(\phi (x,y))$ . Using the chain rule, this is $(f\circ \phi )'(x,y)=f'(\phi (x,y))\phi '(x,y)=\left.{\begin{pmatrix}2x_{0}&0\\0&2y_{0}\end{pmatrix}}\right\vert _{(x_{0},y_{0})=\phi (x,y)}{\begin{pmatrix}1&0\\1&0\end{pmatrix}}={\begin{pmatrix}2x&0\\2x&0\end{pmatrix}}$ .

Big picture

Why is this notation so confusing? I think there are two (?) big reasons:

The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland's example of ${\frac {\partial w}{\partial t}}$ meaning two different things.
The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with $\nabla f(Ax)$ and ${\frac {d}{dx}}f(x,x)$ .

The derivative as a linear transformation in the several variable case and a number in the single-variable case

The thing where the total derivative for $n=m=1$ "should" be a function but people treat it as a number. Refer to "Appendix A: Perorations of Dieudonne" (p. 337) in Pugh's Real Mathematical Analysis.

Total derivative versus derivative matrix

Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations $\mathbf {R} ^{n}\to \mathbf {R} ^{m}$ and $m$ by $n$ matrices, so many books call the total derivative a matrix or equate the two.

A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.

@@ Line 8: / Line 8: @@
 Working off the example from Tao above, let <math>f(x,y) = (x^2,y^2)</math>. What does <math>\frac{d}{dx} f(x,x)</math> mean? Here are four possibilities:
-# It's <math>f'</math> (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point <math>(x_0,y_0)</math>, then <math>f'(x_0,y_0)</math> is a linear map <math>\mathbf R^2 \to \mathbf R^2</math> defined by the matrix <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}</math>. Applying this linear map to (x,x), we get <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}\begin{pmatrix}x \\ x\end{pmatrix} = \begin{pmatrix}2x_0x \\ 2y_0x\end{pmatrix} \in \mathbf R^2</math>.
+# It's <math>f'</math> (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point <math>(x_0,y_0)</math>, then <math>f'(x_0,y_0)</math> is a linear map <math>\mathbf R^2 \to \mathbf R^2</math> defined by the matrix <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}</math>. Since our point is <math>(x_0, y_0) = (x,x)</math>, we have <math>\begin{pmatrix}2x & 0 \\ 0 & 2x\end{pmatrix}</math>.
 # It's <math>\frac{df}{dx}</math> (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule <math>\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}</math> to compute) evaluated at the point (x,x). We have <math>(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})</math>. We can't evaluate this further since we don't know how x and y are related. If we assume the relationship y=x then this reduces to (2x,2x).
 # We first compute f(x,x) to get an expression involving only x, which implicitly defines a function <math>\mathbf R \to \mathbf R^2</math>. We now differentiate this function. The result is the function <math>x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2</math>.

Revision as of 23:07, 30 May 2020

Big picture

The derivative as a linear transformation in the several variable case and a number in the single-variable case

Total derivative versus derivative matrix

See also