Calculus - User contributions [en]

Summary table of multivariable derivatives

2020-06-17T20:28:46Z

IssaRice: /* External links */

This page is a '''summary table of multivariable derivatives'''.

* TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied

==Single-variable real function==

For comparison and completeness, we give a summary table of the single-variable derivative. Let <math>f\colon \mathbf R \to \mathbf R</math> be a single-variable real function.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Derivative of <math>f</math> || <math>f'</math> or <math>\frac{df}{dx}</math> || <math>\mathbf R \to \mathbf R</math> || <math>f'(x) = \lim_{h\to0} \frac{f(x+h) - f(x)}{h}</math> ||
|-
| Derivative of <math>f</math> at <math>x_0 \in \mathbf R</math> || <math>f'(x_0)</math> or <math>\frac{df}{dx}(x_0)</math> or <math>\left.\frac{d}{dx}f(x)\right|_{x=x_0}</math> || <math>\mathbf R</math> || <math>\begin{align}f'(x_0) &= \lim_{h\to0} \frac{f(x_0+h) - f(x_0)}{h} \\ &= \lim_{x\to x_0} \frac{f(x) - f(x_0)}{x-x_0}\end{align}</math> || In the most general multivariable case, <math>f'(x_0)</math> will become a linear transformation, so analogously we may wish to talk about the single-variable <math>f'(x_0)</math> as the function <math>f'(x_0)\colon \mathbf R \to \mathbf R</math> defined by <math>f'(x_0)(x) = f'(x_0)x</math>, where on the left side "<math>f'(x_0)</math>" is a function and on the right side "<math>f'(x_0)</math>" is a number. If "<math>f'(x_0)</math>" is a function, we can evaluate it at <math>1</math> to recover the number: <math>f'(x_0)(1)</math>. This is pretty confusing, and in practice everyone thinks of "<math>f'(x_0)</math>" in the single-variable case as a number, making the notation divergent; see [[Notational confusion of multivariable derivatives#The derivative as a linear transformation in the several variable case and a number in the single-variable case|Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case]] for more information.
|}

==Real-valued function of '''R'''''n''==

Let <math>f\colon \mathbf R^n \to \mathbf R</math> be a real-valued function of <math>\mathbf R^n</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Partial derivative of <math>f</math> with respect to its <math>j</math>th variable || <math>\partial_j f</math> or <math>\partial_{x_j} f</math> or <math>\frac{\partial f}{\partial x_j}</math> or <math>f_{x_j}</math> or <math>f_j</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}</math> || Here <math>e_j = (0,\ldots,1,\ldots,0)</math> is the <math>j</math>th vector of the standard basis, i.e. the vector with all zeroes except a one in the <math>j</math>th spot. Therefore <math>x + te_j</math> can also be written <math>(x_1,\ldots, x_j + t, \ldots, x_n)</math> when broken down into components.
|-
| Gradient || <math>\nabla f</math> || <math>\mathbf R^n \to \mathbf R^n</math> || <math>\nabla f(x) = (\partial_1 f(x), \ldots, \partial_n f(x))</math> ||
|-
| Gradient at <math>x_0 \in \mathbf R^n</math> || <math>\nabla f(x_0)</math> || <math>\mathbf R^n</math> or <math>\mathcal M_{1,n}(\mathbf R)</math> || <math>(\partial_1 f(x_0), \ldots, \partial_n f(x_0))</math> or the vector <math>c</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - c\cdot (x-x_0)\|}{\|x-x_0\|} = 0 </math> ||
|-
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> || When <math>v = e_j</math>, this reduces to the <math>j</math>th partial derivative.
|-
| Total derivative with respect to the <math>j</math>th variable || <math>\frac{df}{dx_j}</math> || <math>\mathbf R \to \mathbf R</math> || For <math>i \ne j</math>, we treat the variable <math>x_i = g_i(x_j)</math> as a function of <math>x_j</math>, and take the single-variable derivative with respect to <math>x_j</math> (more formally, <math>g : \mathbf R \to \mathbf R^n</math> is a function such that the <math>j</math>th component <math>g_j = \mathrm{id}</math> is the identity function). From the chain rule this becomes <math>\frac{df}{dx_j} = \nabla f(x) \cdot g'(x) = \frac{\partial f}{\partial x_1} \frac{dx_1}{dx_j} + \cdots + \frac{\partial f}{\partial x_n} \frac{dx_n}{dx_j}</math> ||
|}

I think in this case, since <math>f'(x_0)(v)</math> coincides with <math>\nabla f(x_0)\cdot v</math>, people don't usually define the derivative separately. For example, Folland in ''Advanced Calculus'' defines ''differentiability'' but not the derivative! He just says that the vector that makes a function differentiable is the gradient.

"Total derivative" is used for two different things (which coincide in the special case where they both make sense); see my answer https://math.stackexchange.com/a/3698838/35525 for details.

TODO: answer questions like "Is the gradient the derivative?"

==Vector-valued function of '''R'''==

Let <math>f\colon \mathbf R \to \mathbf R^m</math> be a vector-valued function of <math>\mathbf R</math>. A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like <math>\mathbf f</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Velocity vector at <math>t</math> || <math>v(t)</math> or <math>Df(t)</math> || <math>\mathbf R \to \mathbf R^m</math> || <math>(f_1'(t), \ldots, f_n'(t))</math> ||
|}

Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.

==Vector-valued function of '''R'''''n''==

Let <math>f\colon \mathbf R^n \to \mathbf R^m</math> be a vector-valued function of <math>\mathbf R^n</math>. Since the function is vector-valued, some authors use a boldface letter like <math>\mathbf f</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Partial derivative with respect to the <math>j</math>th variable || <math>\partial_j f</math> or <math>\partial_{x_j} f</math> or <math>\frac{\partial f}{\partial x_j}</math> or <math>f_{x_j}</math> or <math>f_j</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}</math> ||
|-
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> ||
|-
| Total or Fréchet derivative (sometimes just called the derivative) at point <math>x_0\in \mathbf R^n</math> || <math>f'(x_0)</math> or <math>(Df)_{x_0}</math> or <math>d_{x_0}f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || The linear transformation <math>L</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 </math> || The derivative ''at a given point'' is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to "<math>f'</math>" as we can in the single-variable case. Its type would have to be <math>\mathbf R^n \to \mathbf R^n \to \mathbf R^m</math> or more specifically <math>\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)</math> (where <math>\mathcal L(\mathbf R^n, \mathbf R^m)</math> is the set of linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math>). Also the notation <math>f'(x_0)</math> is slightly confusing: if the total derivative is a function, what happens if <math>n=m=1</math>? We see that <math>f'(x_0)\colon \mathbf R \to \mathbf R</math>, so the single-variable derivative isn't actually a number! To get the actual slope of the tangent line, we must evaluate the function at <math>1</math>: <math>f'(x_0)(1) \in \mathbf R</math>. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.
|-
| Derivative matrix, differential matrix, Jacobian matrix at point <math>x_0\in \mathbf R^n</math> || <math>Df(x_0)</math> or <math>\mathcal M(f'(x_0))</math> || <math>\mathcal M_{m,n}(\mathbf R)</math> || <math>\begin{pmatrix}\partial_1 f_1(x_0) & \cdots & \partial_n f_1(x_0) \\ \vdots & \ddots & \vdots \\ \partial_1 f_n(x_0) & \cdots & \partial_n f_n(x_0)\end{pmatrix}</math> || Since the total derivative is a linear transformation, and since linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math> have a one-to-one correspondence with real-valued <math>m</math> by <math>n</math> matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative ''is'' the matrix. TODO: talk about gradient vectors as rows.
|-
| Total derivative with respect to the <math>j</math>th variable || <math>\frac{df}{dx_j}</math> || || ||
|}

Note the absence of the gradient in the above table. The generalization of the gradient to the <math>\mathbf R^n \to \mathbf R^m</math> case is the derivative matrix.

==See also==

* [[Notational confusion of multivariable derivatives]]
* [[Relation between gradient vector and partial derivatives]]
* [[Relation between gradient vector and directional derivatives]]
* [[Directional derivative]]
* [[machinelearning:Summary table of probability terms]]

==References==

* Tao, Terence. ''Analysis II''. 2nd ed. Hindustan Book Agency. 2009.
* Folland, Gerald B. ''Advanced Calculus''. Pearson. 2002.
* Pugh, Charles Chapman. ''Real Mathematical Analysis''. Springer. 2010.

==External links==

* this post does a similar thing: https://reallyeli.com/posts/total_derivative.html

Notational confusion of multivariable derivatives

2020-05-30T23:43:41Z

IssaRice:

I think there's several different confusions that arise from multivariable derivative notation:

* The thing where <math>\frac{\partial w}{\partial t}</math> can mean two different things on LHS and RHS when <math>t</math> is used as both an initial and intermediate variable. (See Folland for details.)
* the meaning of partial derivatives depends on which variables you take to be independent; see p. 75 of folland
* The thing where if <math>f(x,y) = (x^2,y^2)</math> then <math>\frac{\partial f}{\partial x}(x,x)</math> feels like it might be <math>(2x,2x)</math> even though it's actually <math>(2x,0)</math>. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]
* The ambiguity of expressions like <math>\nabla f(Ax)</math>
* dual basis stuff -- see Tao's explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]

Working off the example from Tao above, let <math>f(x,y) = (x^2,y^2)</math>. What does <math>\frac{d}{dx} f(x,x)</math> mean? Here are four possibilities:

# It's <math>f'</math> (i.e., the total derivative of f) at the point (x,x). Once we fix a point <math>(x_0,y_0)</math>, then <math>f'(x_0,y_0)</math> is a linear map <math>\mathbf R^2 \to \mathbf R^2</math> defined by the matrix <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}</math>. Since our point is <math>(x_0, y_0) = (x,x)</math>, we have <math>\begin{pmatrix}2x & 0 \\ 0 & 2x\end{pmatrix}</math>.
# It's <math>\frac{df}{dx}</math> (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule <math>\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}</math> to compute) evaluated at the point (x,x). We have <math>(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})</math>. We can't evaluate this further since we don't know how x and y are related. If we assume the relationship y=x then this reduces to (2x,2x).
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function <math>\mathbf R \to \mathbf R^2</math>. We now differentiate this function. The result is the function <math>x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2</math>.
# There's implicitly a function <math>\phi(x,y) = (x,x)</math>, so <math>\frac{d}{dx} f(x,x) = \frac{d}{dx} f(\phi(x,y))</math>. Using the chain rule, this is <math>(f\circ \phi)'(x,y) = f'(\phi(x,y))\phi'(x,y) = \left.\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}\right\vert_{(x_0,y_0) = \phi(x,y)} \begin{pmatrix}1 & 0 \\ 1 & 0\end{pmatrix} = \begin{pmatrix}2x & 0 \\ 2x & 0\end{pmatrix}</math>.

==Big picture==

Why is this notation so confusing? I think there are two (?) big reasons:

* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland's example of <math>\frac{\partial w}{\partial t}</math> meaning two different things.
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with <math>\nabla f(Ax)</math> and <math>\frac{d}{dx} f(x,x)</math>.

==The derivative as a linear transformation in the several variable case and a number in the single-variable case==

* The thing where the total derivative for <math>n=m=1</math> "should" be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&lpg=PR1&pg=PA357 "Appendix A: Perorations of Dieudonne"] (p. 337) in Pugh's ''Real Mathematical Analysis''.

==Total derivative versus derivative matrix==

Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations <math>\mathbf R^n \to \mathbf R^m</math> and <math>m</math> by <math>n</math> matrices, so many books call the total derivative a matrix or equate the two.

A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.

==See also==

* [[Summary table of multivariable derivatives]]

Notational confusion of multivariable derivatives

2020-05-30T23:07:30Z

IssaRice:

I think there's several different confusions that arise from multivariable derivative notation:

* The thing where <math>\frac{\partial w}{\partial t}</math> can mean two different things on LHS and RHS when <math>t</math> is used as both an initial and intermediate variable. (See Folland for details.)
* The thing where if <math>f(x,y) = (x^2,y^2)</math> then <math>\frac{\partial f}{\partial x}(x,x)</math> feels like it might be <math>(2x,2x)</math> even though it's actually <math>(2x,0)</math>. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]
* The ambiguity of expressions like <math>\nabla f(Ax)</math>
* dual basis stuff -- see Tao's explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]

Working off the example from Tao above, let <math>f(x,y) = (x^2,y^2)</math>. What does <math>\frac{d}{dx} f(x,x)</math> mean? Here are four possibilities:

# It's <math>f'</math> (i.e., the total derivative of f) at the point (x,x). Once we fix a point <math>(x_0,y_0)</math>, then <math>f'(x_0,y_0)</math> is a linear map <math>\mathbf R^2 \to \mathbf R^2</math> defined by the matrix <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}</math>. Since our point is <math>(x_0, y_0) = (x,x)</math>, we have <math>\begin{pmatrix}2x & 0 \\ 0 & 2x\end{pmatrix}</math>.
# It's <math>\frac{df}{dx}</math> (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule <math>\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}</math> to compute) evaluated at the point (x,x). We have <math>(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})</math>. We can't evaluate this further since we don't know how x and y are related. If we assume the relationship y=x then this reduces to (2x,2x).
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function <math>\mathbf R \to \mathbf R^2</math>. We now differentiate this function. The result is the function <math>x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2</math>.
# There's implicitly a function <math>\phi(x,y) = (x,x)</math>, so <math>\frac{d}{dx} f(x,x) = \frac{d}{dx} f(\phi(x,y))</math>. Using the chain rule, this is <math>(f\circ \phi)'(x,y) = f'(\phi(x,y))\phi'(x,y) = \left.\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}\right\vert_{(x_0,y_0) = \phi(x,y)} \begin{pmatrix}1 & 0 \\ 1 & 0\end{pmatrix} = \begin{pmatrix}2x & 0 \\ 2x & 0\end{pmatrix}</math>.

==Big picture==

Why is this notation so confusing? I think there are two (?) big reasons:

* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland's example of <math>\frac{\partial w}{\partial t}</math> meaning two different things.
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with <math>\nabla f(Ax)</math> and <math>\frac{d}{dx} f(x,x)</math>.

==The derivative as a linear transformation in the several variable case and a number in the single-variable case==

* The thing where the total derivative for <math>n=m=1</math> "should" be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&lpg=PR1&pg=PA357 "Appendix A: Perorations of Dieudonne"] (p. 337) in Pugh's ''Real Mathematical Analysis''.

==Total derivative versus derivative matrix==

Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations <math>\mathbf R^n \to \mathbf R^m</math> and <math>m</math> by <math>n</math> matrices, so many books call the total derivative a matrix or equate the two.

A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.

==See also==

* [[Summary table of multivariable derivatives]]

Notational confusion of multivariable derivatives

2020-05-30T23:07:09Z

IssaRice:

I think there's several different confusions that arise from multivariable derivative notation:

* The thing where <math>\frac{\partial w}{\partial t}</math> can mean two different things on LHS and RHS when <math>t</math> is used as both an initial and intermediate variable. (See Folland for details.)
* The thing where if <math>f(x,y) = (x^2,y^2)</math> then <math>\frac{\partial f}{\partial x}(x,x)</math> feels like it might be <math>(2x,2x)</math> even though it's actually <math>(2x,0)</math>. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]
* The ambiguity of expressions like <math>\nabla f(Ax)</math>
* dual basis stuff -- see Tao's explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]

Working off the example from Tao above, let <math>f(x,y) = (x^2,y^2)</math>. What does <math>\frac{d}{dx} f(x,x)</math> mean? Here are four possibilities:

# It's <math>f'</math> (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point <math>(x_0,y_0)</math>, then <math>f'(x_0,y_0)</math> is a linear map <math>\mathbf R^2 \to \mathbf R^2</math> defined by the matrix <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}</math>. Since our point is <math>(x_0, y_0) = (x,x)</math>, we have <math>\begin{pmatrix}2x & 0 \\ 0 & 2x\end{pmatrix}</math>.
# It's <math>\frac{df}{dx}</math> (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule <math>\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}</math> to compute) evaluated at the point (x,x). We have <math>(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})</math>. We can't evaluate this further since we don't know how x and y are related. If we assume the relationship y=x then this reduces to (2x,2x).
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function <math>\mathbf R \to \mathbf R^2</math>. We now differentiate this function. The result is the function <math>x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2</math>.
# There's implicitly a function <math>\phi(x,y) = (x,x)</math>, so <math>\frac{d}{dx} f(x,x) = \frac{d}{dx} f(\phi(x,y))</math>. Using the chain rule, this is <math>(f\circ \phi)'(x,y) = f'(\phi(x,y))\phi'(x,y) = \left.\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}\right\vert_{(x_0,y_0) = \phi(x,y)} \begin{pmatrix}1 & 0 \\ 1 & 0\end{pmatrix} = \begin{pmatrix}2x & 0 \\ 2x & 0\end{pmatrix}</math>.

==Big picture==

Why is this notation so confusing? I think there are two (?) big reasons:

* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland's example of <math>\frac{\partial w}{\partial t}</math> meaning two different things.
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with <math>\nabla f(Ax)</math> and <math>\frac{d}{dx} f(x,x)</math>.

==The derivative as a linear transformation in the several variable case and a number in the single-variable case==

* The thing where the total derivative for <math>n=m=1</math> "should" be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&lpg=PR1&pg=PA357 "Appendix A: Perorations of Dieudonne"] (p. 337) in Pugh's ''Real Mathematical Analysis''.

==Total derivative versus derivative matrix==

Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations <math>\mathbf R^n \to \mathbf R^m</math> and <math>m</math> by <math>n</math> matrices, so many books call the total derivative a matrix or equate the two.

A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.

==See also==

* [[Summary table of multivariable derivatives]]

Notational confusion of multivariable derivatives

2020-05-30T22:50:33Z

IssaRice:

I think there's several different confusions that arise from multivariable derivative notation:

* The thing where <math>\frac{\partial w}{\partial t}</math> can mean two different things on LHS and RHS when <math>t</math> is used as both an initial and intermediate variable. (See Folland for details.)
* The thing where if <math>f(x,y) = (x^2,y^2)</math> then <math>\frac{\partial f}{\partial x}(x,x)</math> feels like it might be <math>(2x,2x)</math> even though it's actually <math>(2x,0)</math>. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]
* The ambiguity of expressions like <math>\nabla f(Ax)</math>
* dual basis stuff -- see Tao's explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]

Working off the example from Tao above, let <math>f(x,y) = (x^2,y^2)</math>. What does <math>\frac{d}{dx} f(x,x)</math> mean? Here are four possibilities:

# It's <math>f'</math> (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point <math>(x_0,y_0)</math>, then <math>f'(x_0,y_0)</math> is a linear map <math>\mathbf R^2 \to \mathbf R^2</math> defined by the matrix <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}</math>. Applying this linear map to (x,x), we get <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}\begin{pmatrix}x \\ x\end{pmatrix} = \begin{pmatrix}2x_0x \\ 2y_0x\end{pmatrix} \in \mathbf R^2</math>.
# It's <math>\frac{df}{dx}</math> (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule <math>\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}</math> to compute) evaluated at the point (x,x). We have <math>(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})</math>. We can't evaluate this further since we don't know how x and y are related. If we assume the relationship y=x then this reduces to (2x,2x).
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function <math>\mathbf R \to \mathbf R^2</math>. We now differentiate this function. The result is the function <math>x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2</math>.
# There's implicitly a function <math>\phi(x,y) = (x,x)</math>, so <math>\frac{d}{dx} f(x,x) = \frac{d}{dx} f(\phi(x,y))</math>. Using the chain rule, this is <math>(f\circ \phi)'(x,y) = f'(\phi(x,y))\phi'(x,y) = \left.\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}\right\vert_{(x_0,y_0) = \phi(x,y)} \begin{pmatrix}1 & 0 \\ 1 & 0\end{pmatrix} = \begin{pmatrix}2x & 0 \\ 2x & 0\end{pmatrix}</math>.

==Big picture==

Why is this notation so confusing? I think there are two (?) big reasons:

* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland's example of <math>\frac{\partial w}{\partial t}</math> meaning two different things.
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with <math>\nabla f(Ax)</math> and <math>\frac{d}{dx} f(x,x)</math>.

==The derivative as a linear transformation in the several variable case and a number in the single-variable case==

* The thing where the total derivative for <math>n=m=1</math> "should" be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&lpg=PR1&pg=PA357 "Appendix A: Perorations of Dieudonne"] (p. 337) in Pugh's ''Real Mathematical Analysis''.

==Total derivative versus derivative matrix==

Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations <math>\mathbf R^n \to \mathbf R^m</math> and <math>m</math> by <math>n</math> matrices, so many books call the total derivative a matrix or equate the two.

A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.

==See also==

* [[Summary table of multivariable derivatives]]

Notational confusion of multivariable derivatives

2020-05-30T22:49:54Z

IssaRice:

I think there's several different confusions that arise from multivariable derivative notation:

* The thing where <math>\frac{\partial w}{\partial t}</math> can mean two different things on LHS and RHS when <math>t</math> is used as both an initial and intermediate variable. (See Folland for details.)
* The thing where if <math>f(x,y) = (x^2,y^2)</math> then <math>\frac{\partial f}{\partial x}(x,x)</math> feels like it might be <math>(2x,2x)</math> even though it's actually <math>(2x,0)</math>. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]
* The ambiguity of expressions like <math>\nabla f(Ax)</math>
* dual basis stuff -- see Tao's explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]

Working off the example from Tao above, let <math>f(x,y) = (x^2,y^2)</math>. What does <math>\frac{d}{dx} f(x,x)</math> mean? Here are three possibilities:

# It's <math>f'</math> (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point <math>(x_0,y_0)</math>, then <math>f'(x_0,y_0)</math> is a linear map <math>\mathbf R^2 \to \mathbf R^2</math> defined by the matrix <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}</math>. Applying this linear map to (x,x), we get <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}\begin{pmatrix}x \\ x\end{pmatrix} = \begin{pmatrix}2x_0x \\ 2y_0x\end{pmatrix} \in \mathbf R^2</math>.
# It's <math>\frac{df}{dx}</math> (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule <math>\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}</math> to compute) evaluated at the point (x,x). We have <math>(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})</math>. We can't evaluate this further since we don't know how x and y are related. If we assume the relationship y=x then this reduces to (2x,2x).
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function <math>\mathbf R \to \mathbf R^2</math>. We now differentiate this function. The result is the function <math>x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2</math>.
# There's implicitly a function <math>\phi(x,y) = (x,x)</math>, so <math>\frac{d}{dx} f(x,x) = \frac{d}{dx} f(\phi(x,y))</math>. Using the chain rule, this is <math>(f\circ \phi)'(x,y) = f'(\phi(x,y))\phi'(x,y) = \left.\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}\right\vert_{(x_0,y_0) = \phi(x,y)} \begin{pmatrix}1 & 0 \\ 1 & 0\end{pmatrix} = \begin{pmatrix}2x & 0 \\ 2x & 0\end{pmatrix}</math>.

==Big picture==

Why is this notation so confusing? I think there are two (?) big reasons:

* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland's example of <math>\frac{\partial w}{\partial t}</math> meaning two different things.
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with <math>\nabla f(Ax)</math> and <math>\frac{d}{dx} f(x,x)</math>.

==The derivative as a linear transformation in the several variable case and a number in the single-variable case==

* The thing where the total derivative for <math>n=m=1</math> "should" be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&lpg=PR1&pg=PA357 "Appendix A: Perorations of Dieudonne"] (p. 337) in Pugh's ''Real Mathematical Analysis''.

==Total derivative versus derivative matrix==

Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations <math>\mathbf R^n \to \mathbf R^m</math> and <math>m</math> by <math>n</math> matrices, so many books call the total derivative a matrix or equate the two.

A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.

==See also==

* [[Summary table of multivariable derivatives]]

Notational confusion of multivariable derivatives

2020-05-30T22:48:28Z

IssaRice:

I think there's several different confusions that arise from multivariable derivative notation:

* The thing where <math>\frac{\partial w}{\partial t}</math> can mean two different things on LHS and RHS when <math>t</math> is used as both an initial and intermediate variable. (See Folland for details.)
* The thing where if <math>f(x,y) = (x^2,y^2)</math> then <math>\frac{\partial f}{\partial x}(x,x)</math> feels like it might be <math>(2x,2x)</math> even though it's actually <math>(2x,0)</math>. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]
* The ambiguity of expressions like <math>\nabla f(Ax)</math>
* dual basis stuff -- see Tao's explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]

Working off the example from Tao above, let <math>f(x,y) = (x^2,y^2)</math>. What does <math>\frac{d}{dx} f(x,x)</math> mean? Here are three possibilities:

# It's <math>f'</math> (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point <math>(x_0,y_0)</math>, then <math>f'(x_0,y_0)</math> is a linear map <math>\mathbf R^2 \to \mathbf R^2</math> defined by the matrix <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}</math>. Applying this linear map to (x,x), we get <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}\begin{pmatrix}x \\ x\end{pmatrix} = \begin{pmatrix}2x_0x \\ 2y_0x\end{pmatrix} \in \mathbf R^2</math>.
# It's <math>\frac{df}{dx}</math> (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule <math>\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}</math> to compute) evaluated at the point (x,x). We have <math>(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})</math>. We can't evaluate this further since we don't know how x and y are related. If we assume the relationship y=x then this reduces to (2x,2x).
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function <math>\mathbf R \to \mathbf R^2</math>. We now differentiate this function. The result is the function <math>x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2</math>.
# There's implicitly a function <math>\phi(x,y) = (x,x)</math>, so <math>\frac{d}{dx} f(x,x) = \frac{d}{dx} f(\phi(x,y))</math>. Using the chain rule, this is <math>(f\circ \phi)'(x,y) = f'(\phi(x,y))\phi'(x,y) = \left.\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}\right\vert_{(x_0,y_0) = \phi(x,y)} \begin{pmatrix}1 & 0 \\ 1 & 0\end{pmatrix}</math>

==Big picture==

Why is this notation so confusing? I think there are two (?) big reasons:

* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland's example of <math>\frac{\partial w}{\partial t}</math> meaning two different things.
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with <math>\nabla f(Ax)</math> and <math>\frac{d}{dx} f(x,x)</math>.

==The derivative as a linear transformation in the several variable case and a number in the single-variable case==

* The thing where the total derivative for <math>n=m=1</math> "should" be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&lpg=PR1&pg=PA357 "Appendix A: Perorations of Dieudonne"] (p. 337) in Pugh's ''Real Mathematical Analysis''.

==Total derivative versus derivative matrix==

Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations <math>\mathbf R^n \to \mathbf R^m</math> and <math>m</math> by <math>n</math> matrices, so many books call the total derivative a matrix or equate the two.

A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.

==See also==

* [[Summary table of multivariable derivatives]]

Summary table of multivariable derivatives

2020-05-30T22:16:34Z

IssaRice: /* Real-valued function of Rn */

Notational confusion of multivariable derivatives

2020-05-30T09:25:27Z

IssaRice:

I think there's several different confusions that arise from multivariable derivative notation:

* The thing where <math>\frac{\partial w}{\partial t}</math> can mean two different things on LHS and RHS when <math>t</math> is used as both an initial and intermediate variable. (See Folland for details.)
* The thing where if <math>f(x,y) = (x^2,y^2)</math> then <math>\frac{\partial f}{\partial x}(x,x)</math> feels like it might be <math>(2x,2x)</math> even though it's actually <math>(2x,0)</math>. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]
* The ambiguity of expressions like <math>\nabla f(Ax)</math>
* dual basis stuff -- see Tao's explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]

Working off the example from Tao above, let <math>f(x,y) = (x^2,y^2)</math>. What does <math>\frac{d}{dx} f(x,x)</math> mean? Here are three possibilities:

# It's <math>f'</math> (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point <math>(x_0,y_0)</math>, then <math>f'(x_0,y_0)</math> is a linear map <math>\mathbf R^2 \to \mathbf R^2</math> defined by the matrix <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}</math>. Applying this linear map to (x,x), we get <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}\begin{pmatrix}x \\ x\end{pmatrix} = \begin{pmatrix}2x_0x \\ 2y_0x\end{pmatrix} \in \mathbf R^2</math>.
# It's <math>\frac{df}{dx}</math> (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule <math>\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}</math> to compute) evaluated at the point (x,x). We have <math>(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})</math>. We can't evaluate this further since we don't know how x and y are related. If we assume the relationship y=x then this reduces to (2x,2x).
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function <math>\mathbf R \to \mathbf R^2</math>. We now differentiate this function. The result is the function <math>x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2</math>.

==Big picture==

Why is this notation so confusing? I think there are two (?) big reasons:

* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland's example of <math>\frac{\partial w}{\partial t}</math> meaning two different things.
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with <math>\nabla f(Ax)</math> and <math>\frac{d}{dx} f(x,x)</math>.

==The derivative as a linear transformation in the several variable case and a number in the single-variable case==

* The thing where the total derivative for <math>n=m=1</math> "should" be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&lpg=PR1&pg=PA357 "Appendix A: Perorations of Dieudonne"] (p. 337) in Pugh's ''Real Mathematical Analysis''.

==Total derivative versus derivative matrix==

Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations <math>\mathbf R^n \to \mathbf R^m</math> and <math>m</math> by <math>n</math> matrices, so many books call the total derivative a matrix or equate the two.

A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.

==See also==

* [[Summary table of multivariable derivatives]]

Notational confusion of multivariable derivatives

2020-05-30T09:25:11Z

IssaRice:

I think there's several different confusions that arise from multivariable derivative notation:

* The thing where <math>\frac{\partial w}{\partial t}</math> can mean two different things on LHS and RHS when <math>t</math> is used as both an initial and intermediate variable. (See Folland for details.)
* The thing where if <math>f(x,y) = (x^2,y^2)</math> then <math>\frac{\partial f}{\partial x}(x,x)</math> feels like it might be <math>(2x,2x)</math> even though it's actually <math>(2x,0)</math>. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]
* The ambiguity of expressions like <math>\nabla f(Ax)</math>
* dual basis stuff -- see Tao's explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]

Working off the example from Tao above, let <math>f(x,y) = (x^2,y^2)</math>. What does <math>\frac{d}{dx} f(x,x)</math> mean? Here are three possibilities:

# It's <math>f'</math> (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point <math>(x_0,y_0)</math>, then <math>f'(x_0,y_0)</math> is a linear map <math>\mathbf R^2 \to \mathbf R^2</math> defined by the matrix <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}</math>. Applying this linear map to (x,x), we get <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}\begin{pmatrix}x \\ x\end{pmatrix} = \begin{pmatrix}2x_0x \\ 2y_0x\end{pmatrix} \in \mathbf R^2</math>.
# It's <math>\frac{df}{dx}</math> (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule <math>\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}</math> to compute) evaluated at the point (x,x). We have <math>(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})</math>. We can't evaluate this further since we don't know how x and y are related. If we assume the relationship y=x then this reduces to the following.
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function <math>\mathbf R \to \mathbf R^2</math>. We now differentiate this function. The result is the function <math>x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2</math>.

==Big picture==

Why is this notation so confusing? I think there are two (?) big reasons:

* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland's example of <math>\frac{\partial w}{\partial t}</math> meaning two different things.
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with <math>\nabla f(Ax)</math> and <math>\frac{d}{dx} f(x,x)</math>.

==The derivative as a linear transformation in the several variable case and a number in the single-variable case==

* The thing where the total derivative for <math>n=m=1</math> "should" be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&lpg=PR1&pg=PA357 "Appendix A: Perorations of Dieudonne"] (p. 337) in Pugh's ''Real Mathematical Analysis''.

==Total derivative versus derivative matrix==

Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations <math>\mathbf R^n \to \mathbf R^m</math> and <math>m</math> by <math>n</math> matrices, so many books call the total derivative a matrix or equate the two.

A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.

==See also==

* [[Summary table of multivariable derivatives]]

Summary table of multivariable derivatives

2020-05-30T09:18:55Z

IssaRice: /* Vector-valued function of Rn */

This page is a '''summary table of multivariable derivatives'''.

* TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied

==Single-variable real function==

For comparison and completeness, we give a summary table of the single-variable derivative. Let <math>f\colon \mathbf R \to \mathbf R</math> be a single-variable real function.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Derivative of <math>f</math> || <math>f'</math> or <math>\frac{df}{dx}</math> || <math>\mathbf R \to \mathbf R</math> || <math>f'(x) = \lim_{h\to0} \frac{f(x+h) - f(x)}{h}</math> ||
|-
| Derivative of <math>f</math> at <math>x_0 \in \mathbf R</math> || <math>f'(x_0)</math> or <math>\frac{df}{dx}(x_0)</math> or <math>\left.\frac{d}{dx}f(x)\right|_{x=x_0}</math> || <math>\mathbf R</math> || <math>\begin{align}f'(x_0) &= \lim_{h\to0} \frac{f(x_0+h) - f(x_0)}{h} \\ &= \lim_{x\to x_0} \frac{f(x) - f(x_0)}{x-x_0}\end{align}</math> || In the most general multivariable case, <math>f'(x_0)</math> will become a linear transformation, so analogously we may wish to talk about the single-variable <math>f'(x_0)</math> as the function <math>f'(x_0)\colon \mathbf R \to \mathbf R</math> defined by <math>f'(x_0)(x) = f'(x_0)x</math>, where on the left side "<math>f'(x_0)</math>" is a function and on the right side "<math>f'(x_0)</math>" is a number. If "<math>f'(x_0)</math>" is a function, we can evaluate it at <math>1</math> to recover the number: <math>f'(x_0)(1)</math>. This is pretty confusing, and in practice everyone thinks of "<math>f'(x_0)</math>" in the single-variable case as a number, making the notation divergent; see [[Notational confusion of multivariable derivatives#The derivative as a linear transformation in the several variable case and a number in the single-variable case|Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case]] for more information.
|}

==Real-valued function of '''R'''''n''==

Let <math>f\colon \mathbf R^n \to \mathbf R</math> be a real-valued function of <math>\mathbf R^n</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Partial derivative of <math>f</math> with respect to its <math>j</math>th variable || <math>\partial_j f</math> or <math>\partial_{x_j} f</math> or <math>\frac{\partial f}{\partial x_j}</math> or <math>f_{x_j}</math> or <math>f_j</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}</math> || Here <math>e_j = (0,\ldots,1,\ldots,0)</math> is the <math>j</math>th vector of the standard basis, i.e. the vector with all zeroes except a one in the <math>j</math>th spot. Therefore <math>x + te_j</math> can also be written <math>(x_1,\ldots, x_j + t, \ldots, x_n)</math> when broken down into components.
|-
| Gradient || <math>\nabla f</math> || <math>\mathbf R^n \to \mathbf R^n</math> || <math>\nabla f(x) = (\partial_1 f(x), \ldots, \partial_n f(x))</math> ||
|-
| Gradient at <math>x_0 \in \mathbf R^n</math> || <math>\nabla f(x_0)</math> || <math>\mathbf R^n</math> or <math>\mathcal M_{1,n}(\mathbf R)</math> || <math>(\partial_1 f(x_0), \ldots, \partial_n f(x_0))</math> or the vector <math>c</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - c\cdot (x-x_0)\|}{\|x-x_0\|} = 0 </math> ||
|-
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> || When <math>v = e_j</math>, this reduces to the <math>j</math>th partial derivative.
|-
| Total derivative with respect to the <math>j</math>th variable || <math>\frac{df}{dx_j}</math> || <math>\mathbf R \to \mathbf R</math> || For <math>i \ne j</math>, we treat the variable <math>x_i = g_i(x_j)</math> as a function of <math>x_j</math>, and take the single-variable derivative with respect to <math>x_j</math> (more formally, <math>g : \mathbf R \to \mathbf R^n</math> is a function such that the <math>j</math>th component <math>g_j = \mathrm{id}</math> is the identity function). From the chain rule this becomes <math>\frac{df}{dx_j} = \nabla f(x) \cdot g'(x) = \frac{\partial f}{\partial x_1} \frac{dx_1}{dx_j} + \cdots + \frac{\partial f}{\partial x_n} \frac{dx_n}{dx_j}</math> ||
|}

I think in this case, since <math>f'(x_0)(v)</math> coincides with <math>\nabla f(x_0)\cdot v</math>, people don't usually define the derivative separately. For example, Folland in ''Advanced Calculus'' defines ''differentiability'' but not the derivative! He just says that the vector that makes a function differentiable is the gradient.

TODO: answer questions like "Is the gradient the derivative?"

==Vector-valued function of '''R'''==

Let <math>f\colon \mathbf R \to \mathbf R^m</math> be a vector-valued function of <math>\mathbf R</math>. A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like <math>\mathbf f</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Velocity vector at <math>t</math> || <math>v(t)</math> or <math>Df(t)</math> || <math>\mathbf R \to \mathbf R^m</math> || <math>(f_1'(t), \ldots, f_n'(t))</math> ||
|}

Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.

==Vector-valued function of '''R'''''n''==

Let <math>f\colon \mathbf R^n \to \mathbf R^m</math> be a vector-valued function of <math>\mathbf R^n</math>. Since the function is vector-valued, some authors use a boldface letter like <math>\mathbf f</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Partial derivative with respect to the <math>j</math>th variable || <math>\partial_j f</math> or <math>\partial_{x_j} f</math> or <math>\frac{\partial f}{\partial x_j}</math> or <math>f_{x_j}</math> or <math>f_j</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}</math> ||
|-
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> ||
|-
| Total or Fréchet derivative (sometimes just called the derivative) at point <math>x_0\in \mathbf R^n</math> || <math>f'(x_0)</math> or <math>(Df)_{x_0}</math> or <math>d_{x_0}f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || The linear transformation <math>L</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 </math> || The derivative ''at a given point'' is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to "<math>f'</math>" as we can in the single-variable case. Its type would have to be <math>\mathbf R^n \to \mathbf R^n \to \mathbf R^m</math> or more specifically <math>\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)</math> (where <math>\mathcal L(\mathbf R^n, \mathbf R^m)</math> is the set of linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math>). Also the notation <math>f'(x_0)</math> is slightly confusing: if the total derivative is a function, what happens if <math>n=m=1</math>? We see that <math>f'(x_0)\colon \mathbf R \to \mathbf R</math>, so the single-variable derivative isn't actually a number! To get the actual slope of the tangent line, we must evaluate the function at <math>1</math>: <math>f'(x_0)(1) \in \mathbf R</math>. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.
|-
| Derivative matrix, differential matrix, Jacobian matrix at point <math>x_0\in \mathbf R^n</math> || <math>Df(x_0)</math> or <math>\mathcal M(f'(x_0))</math> || <math>\mathcal M_{m,n}(\mathbf R)</math> || <math>\begin{pmatrix}\partial_1 f_1(x_0) & \cdots & \partial_n f_1(x_0) \\ \vdots & \ddots & \vdots \\ \partial_1 f_n(x_0) & \cdots & \partial_n f_n(x_0)\end{pmatrix}</math> || Since the total derivative is a linear transformation, and since linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math> have a one-to-one correspondence with real-valued <math>m</math> by <math>n</math> matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative ''is'' the matrix. TODO: talk about gradient vectors as rows.
|-
| Total derivative with respect to the <math>j</math>th variable || <math>\frac{df}{dx_j}</math> || || ||
|}

Note the absence of the gradient in the above table. The generalization of the gradient to the <math>\mathbf R^n \to \mathbf R^m</math> case is the derivative matrix.

==See also==

* [[Notational confusion of multivariable derivatives]]
* [[Relation between gradient vector and partial derivatives]]
* [[Relation between gradient vector and directional derivatives]]
* [[Directional derivative]]
* [[machinelearning:Summary table of probability terms]]

==References==

* Tao, Terence. ''Analysis II''. 2nd ed. Hindustan Book Agency. 2009.
* Folland, Gerald B. ''Advanced Calculus''. Pearson. 2002.
* Pugh, Charles Chapman. ''Real Mathematical Analysis''. Springer. 2010.

==External links==

Summary table of multivariable derivatives

2020-05-30T09:18:39Z

IssaRice: /* Vector-valued function of Rn */

This page is a '''summary table of multivariable derivatives'''.

* TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied

==Single-variable real function==

For comparison and completeness, we give a summary table of the single-variable derivative. Let <math>f\colon \mathbf R \to \mathbf R</math> be a single-variable real function.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Derivative of <math>f</math> || <math>f'</math> or <math>\frac{df}{dx}</math> || <math>\mathbf R \to \mathbf R</math> || <math>f'(x) = \lim_{h\to0} \frac{f(x+h) - f(x)}{h}</math> ||
|-
| Derivative of <math>f</math> at <math>x_0 \in \mathbf R</math> || <math>f'(x_0)</math> or <math>\frac{df}{dx}(x_0)</math> or <math>\left.\frac{d}{dx}f(x)\right|_{x=x_0}</math> || <math>\mathbf R</math> || <math>\begin{align}f'(x_0) &= \lim_{h\to0} \frac{f(x_0+h) - f(x_0)}{h} \\ &= \lim_{x\to x_0} \frac{f(x) - f(x_0)}{x-x_0}\end{align}</math> || In the most general multivariable case, <math>f'(x_0)</math> will become a linear transformation, so analogously we may wish to talk about the single-variable <math>f'(x_0)</math> as the function <math>f'(x_0)\colon \mathbf R \to \mathbf R</math> defined by <math>f'(x_0)(x) = f'(x_0)x</math>, where on the left side "<math>f'(x_0)</math>" is a function and on the right side "<math>f'(x_0)</math>" is a number. If "<math>f'(x_0)</math>" is a function, we can evaluate it at <math>1</math> to recover the number: <math>f'(x_0)(1)</math>. This is pretty confusing, and in practice everyone thinks of "<math>f'(x_0)</math>" in the single-variable case as a number, making the notation divergent; see [[Notational confusion of multivariable derivatives#The derivative as a linear transformation in the several variable case and a number in the single-variable case|Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case]] for more information.
|}

==Real-valued function of '''R'''''n''==

Let <math>f\colon \mathbf R^n \to \mathbf R</math> be a real-valued function of <math>\mathbf R^n</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Partial derivative of <math>f</math> with respect to its <math>j</math>th variable || <math>\partial_j f</math> or <math>\partial_{x_j} f</math> or <math>\frac{\partial f}{\partial x_j}</math> or <math>f_{x_j}</math> or <math>f_j</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}</math> || Here <math>e_j = (0,\ldots,1,\ldots,0)</math> is the <math>j</math>th vector of the standard basis, i.e. the vector with all zeroes except a one in the <math>j</math>th spot. Therefore <math>x + te_j</math> can also be written <math>(x_1,\ldots, x_j + t, \ldots, x_n)</math> when broken down into components.
|-
| Gradient || <math>\nabla f</math> || <math>\mathbf R^n \to \mathbf R^n</math> || <math>\nabla f(x) = (\partial_1 f(x), \ldots, \partial_n f(x))</math> ||
|-
| Gradient at <math>x_0 \in \mathbf R^n</math> || <math>\nabla f(x_0)</math> || <math>\mathbf R^n</math> or <math>\mathcal M_{1,n}(\mathbf R)</math> || <math>(\partial_1 f(x_0), \ldots, \partial_n f(x_0))</math> or the vector <math>c</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - c\cdot (x-x_0)\|}{\|x-x_0\|} = 0 </math> ||
|-
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> || When <math>v = e_j</math>, this reduces to the <math>j</math>th partial derivative.
|-
| Total derivative with respect to the <math>j</math>th variable || <math>\frac{df}{dx_j}</math> || <math>\mathbf R \to \mathbf R</math> || For <math>i \ne j</math>, we treat the variable <math>x_i = g_i(x_j)</math> as a function of <math>x_j</math>, and take the single-variable derivative with respect to <math>x_j</math> (more formally, <math>g : \mathbf R \to \mathbf R^n</math> is a function such that the <math>j</math>th component <math>g_j = \mathrm{id}</math> is the identity function). From the chain rule this becomes <math>\frac{df}{dx_j} = \nabla f(x) \cdot g'(x) = \frac{\partial f}{\partial x_1} \frac{dx_1}{dx_j} + \cdots + \frac{\partial f}{\partial x_n} \frac{dx_n}{dx_j}</math> ||
|}

I think in this case, since <math>f'(x_0)(v)</math> coincides with <math>\nabla f(x_0)\cdot v</math>, people don't usually define the derivative separately. For example, Folland in ''Advanced Calculus'' defines ''differentiability'' but not the derivative! He just says that the vector that makes a function differentiable is the gradient.

TODO: answer questions like "Is the gradient the derivative?"

==Vector-valued function of '''R'''==

Let <math>f\colon \mathbf R \to \mathbf R^m</math> be a vector-valued function of <math>\mathbf R</math>. A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like <math>\mathbf f</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Velocity vector at <math>t</math> || <math>v(t)</math> or <math>Df(t)</math> || <math>\mathbf R \to \mathbf R^m</math> || <math>(f_1'(t), \ldots, f_n'(t))</math> ||
|}

Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.

==Vector-valued function of '''R'''''n''==

Let <math>f\colon \mathbf R^n \to \mathbf R^m</math> be a vector-valued function of <math>\mathbf R^n</math>. Since the function is vector-valued, some authors use a boldface letter like <math>\mathbf f</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Partial derivative with respect to the <math>j</math>th variable || <math>\partial_j f</math> or <math>\partial_{x_j} f</math> or <math>\frac{\partial f}{\partial x_j}</math> or <math>f_{x_j}</math> or <math>f_j</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}</math> ||
|-
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> ||
|-
| Total or Fréchet derivative (sometimes just called the derivative) at point <math>x_0\in \mathbf R^n</math> || <math>f'(x_0)</math> or <math>(Df)_{x_0}</math> or <math>d_{x_0}f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || The linear transformation <math>L</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 </math> || The derivative ''at a given point'' is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to "<math>f'</math>" as we can in the single-variable case. Its type would have to be <math>\mathbf R^n \to \mathbf R^n \to \mathbf R^m</math> or more specifically <math>\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)</math> (where <math>\mathcal L(\mathbf R^n, \mathbf R^m)</math> is the set of linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math>). Also the notation <math>f'(x_0)</math> is slightly confusing: if the total derivative is a function, what happens if <math>n=m=1</math>? We see that <math>f'(x_0)\colon \mathbf R \to \mathbf R</math>, so the single-variable derivative isn't actually a number! To get the actual slope of the tangent line, we must evaluate the function at <math>1</math>: <math>f'(x_0)(1) \in \mathbf R</math>. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.
|-
| Derivative matrix, differential matrix, Jacobian matrix at point <math>x_0\in \mathbf R^n</math> || <math>Df(x_0)</math> or <math>\mathcal M(f'(x_0))</math> || <math>\mathcal M_{m,n}(\mathbf R)</math> || <math>\begin{pmatrix}\partial_1 f_1(x_0) & \cdots & \partial_n f_1(x_0) \\ \vdots & \ddots & \vdots \\ \partial_1 f_n(x_0) & \cdots & \partial_n f_n(x_0)\end{pmatrix}</math> || Since the total derivative is a linear transformation, and since linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math> have a one-to-one correspondence with real-valued <math>m</math> by <math>n</math> matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative ''is'' the matrix. TODO: talk about gradient vectors as rows.
|-
| Total derivative with respect to the <math>j</math>th variable || <math>\frac{df}{dx_j}</math> || ||
|}

Note the absence of the gradient in the above table. The generalization of the gradient to the <math>\mathbf R^n \to \mathbf R^m</math> case is the derivative matrix.

==See also==

* [[Notational confusion of multivariable derivatives]]
* [[Relation between gradient vector and partial derivatives]]
* [[Relation between gradient vector and directional derivatives]]
* [[Directional derivative]]
* [[machinelearning:Summary table of probability terms]]

==References==

* Tao, Terence. ''Analysis II''. 2nd ed. Hindustan Book Agency. 2009.
* Folland, Gerald B. ''Advanced Calculus''. Pearson. 2002.
* Pugh, Charles Chapman. ''Real Mathematical Analysis''. Springer. 2010.

==External links==

Notational confusion of multivariable derivatives

2020-05-30T09:12:50Z

IssaRice:

I think there's several different confusions that arise from multivariable derivative notation:

* The thing where <math>\frac{\partial w}{\partial t}</math> can mean two different things on LHS and RHS when <math>t</math> is used as both an initial and intermediate variable. (See Folland for details.)
* The thing where if <math>f(x,y) = (x^2,y^2)</math> then <math>\frac{\partial f}{\partial x}(x,x)</math> feels like it might be <math>(2x,2x)</math> even though it's actually <math>(2x,0)</math>. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]
* The ambiguity of expressions like <math>\nabla f(Ax)</math>
* dual basis stuff -- see Tao's explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]

Working off the example from Tao above, let <math>f(x,y) = (x^2,y^2)</math>. What does <math>\frac{d}{dx} f(x,x)</math> mean? Here are three possibilities:

# It's <math>f'</math> (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point <math>(x_0,y_0)</math>, then <math>f'(x_0,y_0)</math> is a linear map <math>\mathbf R^2 \to \mathbf R^2</math> defined by the matrix <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}</math>. Applying this linear map to (x,x), we get <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}\begin{pmatrix}x \\ x\end{pmatrix} = \begin{pmatrix}2x_0x \\ 2y_0x\end{pmatrix} \in \mathbf R^2</math>.
# It's <math>\frac{df}{dx}</math> (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule <math>\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}</math> to compute) evaluated at the point (x,x). We have <math>(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})</math>. We can't evaluate this further since we don't know how x and y are related.
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function <math>\mathbf R \to \mathbf R^2</math>. We now differentiate this function. The result is the function <math>x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2</math>.

==Big picture==

Why is this notation so confusing? I think there are two (?) big reasons:

* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland's example of <math>\frac{\partial w}{\partial t}</math> meaning two different things.
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with <math>\nabla f(Ax)</math> and <math>\frac{d}{dx} f(x,x)</math>.

==The derivative as a linear transformation in the several variable case and a number in the single-variable case==

* The thing where the total derivative for <math>n=m=1</math> "should" be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&lpg=PR1&pg=PA357 "Appendix A: Perorations of Dieudonne"] (p. 337) in Pugh's ''Real Mathematical Analysis''.

==Total derivative versus derivative matrix==

Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations <math>\mathbf R^n \to \mathbf R^m</math> and <math>m</math> by <math>n</math> matrices, so many books call the total derivative a matrix or equate the two.

A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.

==See also==

* [[Summary table of multivariable derivatives]]

Notational confusion of multivariable derivatives

2020-05-30T09:07:33Z

IssaRice:

I think there's several different confusions that arise from multivariable derivative notation:

* The thing where <math>\frac{\partial w}{\partial t}</math> can mean two different things on LHS and RHS when <math>t</math> is used as both an initial and intermediate variable. (See Folland for details.)
* The thing where if <math>f(x,y) = (x^2,y^2)</math> then <math>\frac{\partial f}{\partial x}(x,x)</math> feels like it might be <math>(2x,2x)</math> even though it's actually <math>(2x,0)</math>. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]
* The ambiguity of expressions like <math>\nabla f(Ax)</math>
* dual basis stuff -- see Tao's explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]

Working off the example from Tao above, let <math>f(x,y) = (x^2,y^2)</math>. What does <math>\frac{d}{dx} f(x,x)</math> mean? Here are three possibilities:

# It's <math>f'</math> (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point <math>(x_0,y_0)</math>, then <math>f'(x_0,y_0)</math> is a linear map <math>\mathbf R^2 \to \mathbf R^2</math> defined by the matrix <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}</math>. Applying this linear map to (x,x), we get <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}\begin{pmatrix}x \\ x\end{pmatrix} = \begin{pmatrix}2x_0x \\ 2y_0x\end{pmatrix} \in \mathbf R^2</math>.
# It's <math>\frac{df}{dx}</math> (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule <math>\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}</math> to compute) evaluated at the point (x,x). We have <math>(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})</math>.
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function <math>\mathbf R \to \mathbf R^2</math>. We now differentiate this function. The result is the function <math>x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2</math>.

==Big picture==

Why is this notation so confusing? I think there are two (?) big reasons:

* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland's example of <math>\frac{\partial w}{\partial t}</math> meaning two different things.
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with <math>\nabla f(Ax)</math> and <math>\frac{d}{dx} f(x,x)</math>.

==The derivative as a linear transformation in the several variable case and a number in the single-variable case==

* The thing where the total derivative for <math>n=m=1</math> "should" be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&lpg=PR1&pg=PA357 "Appendix A: Perorations of Dieudonne"] (p. 337) in Pugh's ''Real Mathematical Analysis''.

==Total derivative versus derivative matrix==

Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations <math>\mathbf R^n \to \mathbf R^m</math> and <math>m</math> by <math>n</math> matrices, so many books call the total derivative a matrix or equate the two.

A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.

==See also==

* [[Summary table of multivariable derivatives]]

Notational confusion of multivariable derivatives

2020-05-30T09:05:16Z

IssaRice:

I think there's several different confusions that arise from multivariable derivative notation:

* The thing where <math>\frac{\partial w}{\partial t}</math> can mean two different things on LHS and RHS when <math>t</math> is used as both an initial and intermediate variable. (See Folland for details.)
* The thing where if <math>f(x,y) = (x^2,y^2)</math> then <math>\frac{\partial f}{\partial x}(x,x)</math> feels like it might be <math>(2x,2x)</math> even though it's actually <math>(2x,0)</math>. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]
* The ambiguity of expressions like <math>\nabla f(Ax)</math>
* dual basis stuff -- see Tao's explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]

Working off the example from Tao above, let <math>f(x,y) = (x^2,y^2)</math>. What does <math>\frac{d}{dx} f(x,x)</math> mean? Here are three possibilities:

# It's <math>f'</math> (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point <math>(x_0,y_0)</math>, then <math>f'(x_0,y_0)</math> is a linear map <math>\mathbf R^2 \to \mathbf R^2</math> defined by the matrix <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}</math>. Applying this linear map to (x,x), we get <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}\begin{pmatrix}x \\ x\end{pmatrix} = \begin{pmatrix}2x_0x \\ 2y_0x\end{pmatrix} \in \mathbf R^2</math>.
# It's <math>\frac{df}{dx}</math> (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule <math>\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}</math> to compute) evaluated at the point (x,x).
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function <math>\mathbf R \to \mathbf R^2</math>. We now differentiate this function. The result is the function <math>x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2</math>.

==Big picture==

Why is this notation so confusing? I think there are two (?) big reasons:

* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland's example of <math>\frac{\partial w}{\partial t}</math> meaning two different things.
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with <math>\nabla f(Ax)</math> and <math>\frac{d}{dx} f(x,x)</math>.

==The derivative as a linear transformation in the several variable case and a number in the single-variable case==

* The thing where the total derivative for <math>n=m=1</math> "should" be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&lpg=PR1&pg=PA357 "Appendix A: Perorations of Dieudonne"] (p. 337) in Pugh's ''Real Mathematical Analysis''.

==Total derivative versus derivative matrix==

Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations <math>\mathbf R^n \to \mathbf R^m</math> and <math>m</math> by <math>n</math> matrices, so many books call the total derivative a matrix or equate the two.

A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.

==See also==

* [[Summary table of multivariable derivatives]]

Notational confusion of multivariable derivatives

2020-05-30T09:03:41Z

IssaRice:

I think there's several different confusions that arise from multivariable derivative notation:

* The thing where <math>\frac{\partial w}{\partial t}</math> can mean two different things on LHS and RHS when <math>t</math> is used as both an initial and intermediate variable. (See Folland for details.)
* The thing where if <math>f(x,y) = (x^2,y^2)</math> then <math>\frac{\partial f}{\partial x}(x,x)</math> feels like it might be <math>(2x,2x)</math> even though it's actually <math>(2x,0)</math>. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]
* The ambiguity of expressions like <math>\nabla f(Ax)</math>
* dual basis stuff -- see Tao's explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]

Working off the example from Tao above, let <math>f(x,y) = (x^2,y^2)</math>. What does <math>\frac{d}{dx} f(x,x)</math> mean? Here are three possibilities:

# It's <math>f'</math> (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point <math>(x_0,y_0)</math>, then <math>f'(x_0,y_0)</math> is a linear map <math>\mathbf R^2 \to \mathbf R^2</math> defined by the matrix <math>\begin{pmatrix}2x_0 & 0 \\ 0 & 2y_0\end{pmatrix}</math>.
# It's <math>\frac{df}{dx}</math> (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule <math>\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}</math> to compute) evaluated at the point (x,x).
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function <math>\mathbf R \to \mathbf R^2</math>. We now differentiate this function. The result is the function <math>x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2</math>.

==Big picture==

Why is this notation so confusing? I think there are two (?) big reasons:

* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland's example of <math>\frac{\partial w}{\partial t}</math> meaning two different things.
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with <math>\nabla f(Ax)</math> and <math>\frac{d}{dx} f(x,x)</math>.

==The derivative as a linear transformation in the several variable case and a number in the single-variable case==

* The thing where the total derivative for <math>n=m=1</math> "should" be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&lpg=PR1&pg=PA357 "Appendix A: Perorations of Dieudonne"] (p. 337) in Pugh's ''Real Mathematical Analysis''.

==Total derivative versus derivative matrix==

Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations <math>\mathbf R^n \to \mathbf R^m</math> and <math>m</math> by <math>n</math> matrices, so many books call the total derivative a matrix or equate the two.

A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.

==See also==

* [[Summary table of multivariable derivatives]]

Notational confusion of multivariable derivatives

2020-05-30T09:01:03Z

IssaRice:

I think there's several different confusions that arise from multivariable derivative notation:

* The thing where <math>\frac{\partial w}{\partial t}</math> can mean two different things on LHS and RHS when <math>t</math> is used as both an initial and intermediate variable. (See Folland for details.)
* The thing where if <math>f(x,y) = (x^2,y^2)</math> then <math>\frac{\partial f}{\partial x}(x,x)</math> feels like it might be <math>(2x,2x)</math> even though it's actually <math>(2x,0)</math>. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]
* The ambiguity of expressions like <math>\nabla f(Ax)</math>
* dual basis stuff -- see Tao's explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]

Working off the example from Tao above, let <math>f(x,y) = (x^2,y^2)</math>. What does <math>\frac{d}{dx} f(x,x)</math> mean? Here are three possibilities:

# It's <math>f'</math> (i.e., the total derivative of f) evaluated at the point (x,x).
# It's <math>\frac{df}{dx}</math> (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule <math>\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}</math> to compute) evaluated at the point (x,x).
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function <math>\mathbf R \to \mathbf R^2</math>. We now differentiate this function. The result is the function <math>x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2</math>.

==Big picture==

Why is this notation so confusing? I think there are two (?) big reasons:

* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland's example of <math>\frac{\partial w}{\partial t}</math> meaning two different things.
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with <math>\nabla f(Ax)</math> and <math>\frac{d}{dx} f(x,x)</math>.

==The derivative as a linear transformation in the several variable case and a number in the single-variable case==

* The thing where the total derivative for <math>n=m=1</math> "should" be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&lpg=PR1&pg=PA357 "Appendix A: Perorations of Dieudonne"] (p. 337) in Pugh's ''Real Mathematical Analysis''.

==Total derivative versus derivative matrix==

Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations <math>\mathbf R^n \to \mathbf R^m</math> and <math>m</math> by <math>n</math> matrices, so many books call the total derivative a matrix or equate the two.

A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.

==See also==

* [[Summary table of multivariable derivatives]]

Notational confusion of multivariable derivatives

2020-05-30T08:59:32Z

IssaRice:

I think there's several different confusions that arise from multivariable derivative notation:

* The thing where <math>\frac{\partial w}{\partial t}</math> can mean two different things on LHS and RHS when <math>t</math> is used as both an initial and intermediate variable. (See Folland for details.)
* The thing where if <math>f(x,y) = (x^2,y^2)</math> then <math>\frac{\partial f}{\partial x}(x,x)</math> feels like it might be <math>(2x,2x)</math> even though it's actually <math>(2x,0)</math>. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]
* The ambiguity of expressions like <math>\nabla f(Ax)</math>
* dual basis stuff -- see Tao's explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]

Working off the example from Tao above, let <math>f(x,y) = (x^2,y^2)</math>. What does <math>\frac{d}{dx} f(x,x)</math> mean? Here are three possibilities:

# It's <math>f'</math> (i.e., the total derivative of f) evaluated at the point (x,x).
# It's <math>\frac{df}{dx}</math> (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule <math>\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}</math> to compute) evaluated at the point (x,x).
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function <math>\mathbf R \to \mathbf R^2</math>. We now differentiate this function.

==Big picture==

Why is this notation so confusing? I think there are two (?) big reasons:

* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland's example of <math>\frac{\partial w}{\partial t}</math> meaning two different things.
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with <math>\nabla f(Ax)</math> and <math>\frac{d}{dx} f(x,x)</math>.

==The derivative as a linear transformation in the several variable case and a number in the single-variable case==

* The thing where the total derivative for <math>n=m=1</math> "should" be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&lpg=PR1&pg=PA357 "Appendix A: Perorations of Dieudonne"] (p. 337) in Pugh's ''Real Mathematical Analysis''.

==Total derivative versus derivative matrix==

Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations <math>\mathbf R^n \to \mathbf R^m</math> and <math>m</math> by <math>n</math> matrices, so many books call the total derivative a matrix or equate the two.

A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.

==See also==

* [[Summary table of multivariable derivatives]]

Notational confusion of multivariable derivatives

2020-05-30T08:55:09Z

IssaRice:

Summary table of multivariable derivatives

2020-05-30T08:50:02Z

IssaRice: /* Real-valued function of Rn */

This page is a '''summary table of multivariable derivatives'''.

* TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied

==Single-variable real function==

For comparison and completeness, we give a summary table of the single-variable derivative. Let <math>f\colon \mathbf R \to \mathbf R</math> be a single-variable real function.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Derivative of <math>f</math> || <math>f'</math> or <math>\frac{df}{dx}</math> || <math>\mathbf R \to \mathbf R</math> || <math>f'(x) = \lim_{h\to0} \frac{f(x+h) - f(x)}{h}</math> ||
|-
| Derivative of <math>f</math> at <math>x_0 \in \mathbf R</math> || <math>f'(x_0)</math> or <math>\frac{df}{dx}(x_0)</math> or <math>\left.\frac{d}{dx}f(x)\right|_{x=x_0}</math> || <math>\mathbf R</math> || <math>\begin{align}f'(x_0) &= \lim_{h\to0} \frac{f(x_0+h) - f(x_0)}{h} \\ &= \lim_{x\to x_0} \frac{f(x) - f(x_0)}{x-x_0}\end{align}</math> || In the most general multivariable case, <math>f'(x_0)</math> will become a linear transformation, so analogously we may wish to talk about the single-variable <math>f'(x_0)</math> as the function <math>f'(x_0)\colon \mathbf R \to \mathbf R</math> defined by <math>f'(x_0)(x) = f'(x_0)x</math>, where on the left side "<math>f'(x_0)</math>" is a function and on the right side "<math>f'(x_0)</math>" is a number. If "<math>f'(x_0)</math>" is a function, we can evaluate it at <math>1</math> to recover the number: <math>f'(x_0)(1)</math>. This is pretty confusing, and in practice everyone thinks of "<math>f'(x_0)</math>" in the single-variable case as a number, making the notation divergent; see [[Notational confusion of multivariable derivatives#The derivative as a linear transformation in the several variable case and a number in the single-variable case|Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case]] for more information.
|}

==Real-valued function of '''R'''''n''==

Let <math>f\colon \mathbf R^n \to \mathbf R</math> be a real-valued function of <math>\mathbf R^n</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Partial derivative of <math>f</math> with respect to its <math>j</math>th variable || <math>\partial_j f</math> or <math>\partial_{x_j} f</math> or <math>\frac{\partial f}{\partial x_j}</math> or <math>f_{x_j}</math> or <math>f_j</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}</math> || Here <math>e_j = (0,\ldots,1,\ldots,0)</math> is the <math>j</math>th vector of the standard basis, i.e. the vector with all zeroes except a one in the <math>j</math>th spot. Therefore <math>x + te_j</math> can also be written <math>(x_1,\ldots, x_j + t, \ldots, x_n)</math> when broken down into components.
|-
| Gradient || <math>\nabla f</math> || <math>\mathbf R^n \to \mathbf R^n</math> || <math>\nabla f(x) = (\partial_1 f(x), \ldots, \partial_n f(x))</math> ||
|-
| Gradient at <math>x_0 \in \mathbf R^n</math> || <math>\nabla f(x_0)</math> || <math>\mathbf R^n</math> or <math>\mathcal M_{1,n}(\mathbf R)</math> || <math>(\partial_1 f(x_0), \ldots, \partial_n f(x_0))</math> or the vector <math>c</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - c\cdot (x-x_0)\|}{\|x-x_0\|} = 0 </math> ||
|-
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> || When <math>v = e_j</math>, this reduces to the <math>j</math>th partial derivative.
|-
| Total derivative with respect to the <math>j</math>th variable || <math>\frac{df}{dx_j}</math> || <math>\mathbf R \to \mathbf R</math> || For <math>i \ne j</math>, we treat the variable <math>x_i = g_i(x_j)</math> as a function of <math>x_j</math>, and take the single-variable derivative with respect to <math>x_j</math> (more formally, <math>g : \mathbf R \to \mathbf R^n</math> is a function such that the <math>j</math>th component <math>g_j = \mathrm{id}</math> is the identity function). From the chain rule this becomes <math>\frac{df}{dx_j} = \nabla f(x) \cdot g'(x) = \frac{\partial f}{\partial x_1} \frac{dx_1}{dx_j} + \cdots + \frac{\partial f}{\partial x_n} \frac{dx_n}{dx_j}</math> ||
|}

I think in this case, since <math>f'(x_0)(v)</math> coincides with <math>\nabla f(x_0)\cdot v</math>, people don't usually define the derivative separately. For example, Folland in ''Advanced Calculus'' defines ''differentiability'' but not the derivative! He just says that the vector that makes a function differentiable is the gradient.

TODO: answer questions like "Is the gradient the derivative?"

==Vector-valued function of '''R'''==

Let <math>f\colon \mathbf R \to \mathbf R^m</math> be a vector-valued function of <math>\mathbf R</math>. A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like <math>\mathbf f</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Velocity vector at <math>t</math> || <math>v(t)</math> or <math>Df(t)</math> || <math>\mathbf R \to \mathbf R^m</math> || <math>(f_1'(t), \ldots, f_n'(t))</math> ||
|}

Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.

==Vector-valued function of '''R'''''n''==

Let <math>f\colon \mathbf R^n \to \mathbf R^m</math> be a vector-valued function of <math>\mathbf R^n</math>. Since the function is vector-valued, some authors use a boldface letter like <math>\mathbf f</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Partial derivative with respect to the <math>j</math>th variable || <math>\partial_j f</math> or <math>\partial_{x_j} f</math> or <math>\frac{\partial f}{\partial x_j}</math> or <math>f_{x_j}</math> or <math>f_j</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}</math> ||
|-
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> ||
|-
| Total or Fréchet derivative (sometimes just called the derivative) at point <math>x_0\in \mathbf R^n</math> || <math>f'(x_0)</math> or <math>(Df)_{x_0}</math> or <math>d_{x_0}f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || The linear transformation <math>L</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 </math> || The derivative ''at a given point'' is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to "<math>f'</math>" as we can in the single-variable case. Its type would have to be <math>\mathbf R^n \to \mathbf R^n \to \mathbf R^m</math> or more specifically <math>\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)</math> (where <math>\mathcal L(\mathbf R^n, \mathbf R^m)</math> is the set of linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math>). Also the notation <math>f'(x_0)</math> is slightly confusing: if the total derivative is a function, what happens if <math>n=m=1</math>? We see that <math>f'(x_0)\colon \mathbf R \to \mathbf R</math>, so the single-variable derivative isn't actually a number! To get the actual slope of the tangent line, we must evaluate the function at <math>1</math>: <math>f'(x_0)(1) \in \mathbf R</math>. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.
|-
| Derivative matrix, differential matrix, Jacobian matrix at point <math>x_0\in \mathbf R^n</math> || <math>Df(x_0)</math> or <math>\mathcal M(f'(x_0))</math> || <math>\mathcal M_{m,n}(\mathbf R)</math> || <math>\begin{pmatrix}\partial_1 f_1(x_0) & \cdots & \partial_n f_1(x_0) \\ \vdots & \ddots & \vdots \\ \partial_1 f_n(x_0) & \cdots & \partial_n f_n(x_0)\end{pmatrix}</math> || Since the total derivative is a linear transformation, and since linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math> have a one-to-one correspondence with real-valued <math>m</math> by <math>n</math> matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative ''is'' the matrix. TODO: talk about gradient vectors as rows.
|}

Note the absence of the gradient in the above table. The generalization of the gradient to the <math>\mathbf R^n \to \mathbf R^m</math> case is the derivative matrix.

==See also==

* [[Notational confusion of multivariable derivatives]]
* [[Relation between gradient vector and partial derivatives]]
* [[Relation between gradient vector and directional derivatives]]
* [[Directional derivative]]
* [[machinelearning:Summary table of probability terms]]

==References==

* Tao, Terence. ''Analysis II''. 2nd ed. Hindustan Book Agency. 2009.
* Folland, Gerald B. ''Advanced Calculus''. Pearson. 2002.
* Pugh, Charles Chapman. ''Real Mathematical Analysis''. Springer. 2010.

==External links==

Summary table of multivariable derivatives

2020-05-30T08:49:44Z

IssaRice: /* Real-valued function of Rn */

This page is a '''summary table of multivariable derivatives'''.

* TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied

==Single-variable real function==

For comparison and completeness, we give a summary table of the single-variable derivative. Let <math>f\colon \mathbf R \to \mathbf R</math> be a single-variable real function.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Derivative of <math>f</math> || <math>f'</math> or <math>\frac{df}{dx}</math> || <math>\mathbf R \to \mathbf R</math> || <math>f'(x) = \lim_{h\to0} \frac{f(x+h) - f(x)}{h}</math> ||
|-
| Derivative of <math>f</math> at <math>x_0 \in \mathbf R</math> || <math>f'(x_0)</math> or <math>\frac{df}{dx}(x_0)</math> or <math>\left.\frac{d}{dx}f(x)\right|_{x=x_0}</math> || <math>\mathbf R</math> || <math>\begin{align}f'(x_0) &= \lim_{h\to0} \frac{f(x_0+h) - f(x_0)}{h} \\ &= \lim_{x\to x_0} \frac{f(x) - f(x_0)}{x-x_0}\end{align}</math> || In the most general multivariable case, <math>f'(x_0)</math> will become a linear transformation, so analogously we may wish to talk about the single-variable <math>f'(x_0)</math> as the function <math>f'(x_0)\colon \mathbf R \to \mathbf R</math> defined by <math>f'(x_0)(x) = f'(x_0)x</math>, where on the left side "<math>f'(x_0)</math>" is a function and on the right side "<math>f'(x_0)</math>" is a number. If "<math>f'(x_0)</math>" is a function, we can evaluate it at <math>1</math> to recover the number: <math>f'(x_0)(1)</math>. This is pretty confusing, and in practice everyone thinks of "<math>f'(x_0)</math>" in the single-variable case as a number, making the notation divergent; see [[Notational confusion of multivariable derivatives#The derivative as a linear transformation in the several variable case and a number in the single-variable case|Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case]] for more information.
|}

==Real-valued function of '''R'''''n''==

Let <math>f\colon \mathbf R^n \to \mathbf R</math> be a real-valued function of <math>\mathbf R^n</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Partial derivative of <math>f</math> with respect to its <math>j</math>th variable || <math>\partial_j f</math> or <math>\partial_{x_j} f</math> or <math>\frac{\partial f}{\partial x_j}</math> or <math>f_{x_j}</math> or <math>f_j</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}</math> || Here <math>e_j = (0,\ldots,1,\ldots,0)</math> is the <math>j</math>th vector of the standard basis, i.e. the vector with all zeroes except a one in the <math>j</math>th spot. Therefore <math>x + te_j</math> can also be written <math>(x_1,\ldots, x_j + t, \ldots, x_n)</math> when broken down into components.
|-
| Gradient || <math>\nabla f</math> || <math>\mathbf R^n \to \mathbf R^n</math> || <math>\nabla f(x) = (\partial_1 f(x), \ldots, \partial_n f(x))</math> ||
|-
| Gradient at <math>x_0 \in \mathbf R^n</math> || <math>\nabla f(x_0)</math> || <math>\mathbf R^n</math> or <math>\mathcal M_{1,n}(\mathbf R)</math> || <math>(\partial_1 f(x_0), \ldots, \partial_n f(x_0))</math> or the vector <math>c</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - c\cdot (x-x_0)\|}{\|x-x_0\|} = 0 </math> ||
|-
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> || When <math>v = e_j</math>, this reduces to the <math>j</math>th partial derivative.
|-
| Total derivative with respect to the <math>j</math>th variable || <math>\frac{df}{dx_j}</math> || <math>\mathbf R \to \mathbf R</math> || For <math>i \ne j</math>, we treat the variable <math>x_i = g_i(x_j)</math> as a function of <math>x_j</math>, and take the single-variable derivative with respect to <math>x_j</math> (more formally, <math>g : \mathbf R \to \mathbf R^n</math> is a function such that the <math>j</math>th component <math>g_j = \id</math> is the identity function). From the chain rule this becomes <math>\frac{df}{dx_j} = \nabla f(x) \cdot g'(x) = \frac{\partial f}{\partial x_1} \frac{dx_1}{dx_j} + \cdots + \frac{\partial f}{\partial x_n} \frac{dx_n}{dx_j}</math> ||
|}

I think in this case, since <math>f'(x_0)(v)</math> coincides with <math>\nabla f(x_0)\cdot v</math>, people don't usually define the derivative separately. For example, Folland in ''Advanced Calculus'' defines ''differentiability'' but not the derivative! He just says that the vector that makes a function differentiable is the gradient.

TODO: answer questions like "Is the gradient the derivative?"

==Vector-valued function of '''R'''==

Let <math>f\colon \mathbf R \to \mathbf R^m</math> be a vector-valued function of <math>\mathbf R</math>. A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like <math>\mathbf f</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Velocity vector at <math>t</math> || <math>v(t)</math> or <math>Df(t)</math> || <math>\mathbf R \to \mathbf R^m</math> || <math>(f_1'(t), \ldots, f_n'(t))</math> ||
|}

Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.

==Vector-valued function of '''R'''''n''==

Let <math>f\colon \mathbf R^n \to \mathbf R^m</math> be a vector-valued function of <math>\mathbf R^n</math>. Since the function is vector-valued, some authors use a boldface letter like <math>\mathbf f</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Partial derivative with respect to the <math>j</math>th variable || <math>\partial_j f</math> or <math>\partial_{x_j} f</math> or <math>\frac{\partial f}{\partial x_j}</math> or <math>f_{x_j}</math> or <math>f_j</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}</math> ||
|-
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> ||
|-
| Total or Fréchet derivative (sometimes just called the derivative) at point <math>x_0\in \mathbf R^n</math> || <math>f'(x_0)</math> or <math>(Df)_{x_0}</math> or <math>d_{x_0}f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || The linear transformation <math>L</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 </math> || The derivative ''at a given point'' is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to "<math>f'</math>" as we can in the single-variable case. Its type would have to be <math>\mathbf R^n \to \mathbf R^n \to \mathbf R^m</math> or more specifically <math>\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)</math> (where <math>\mathcal L(\mathbf R^n, \mathbf R^m)</math> is the set of linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math>). Also the notation <math>f'(x_0)</math> is slightly confusing: if the total derivative is a function, what happens if <math>n=m=1</math>? We see that <math>f'(x_0)\colon \mathbf R \to \mathbf R</math>, so the single-variable derivative isn't actually a number! To get the actual slope of the tangent line, we must evaluate the function at <math>1</math>: <math>f'(x_0)(1) \in \mathbf R</math>. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.
|-
| Derivative matrix, differential matrix, Jacobian matrix at point <math>x_0\in \mathbf R^n</math> || <math>Df(x_0)</math> or <math>\mathcal M(f'(x_0))</math> || <math>\mathcal M_{m,n}(\mathbf R)</math> || <math>\begin{pmatrix}\partial_1 f_1(x_0) & \cdots & \partial_n f_1(x_0) \\ \vdots & \ddots & \vdots \\ \partial_1 f_n(x_0) & \cdots & \partial_n f_n(x_0)\end{pmatrix}</math> || Since the total derivative is a linear transformation, and since linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math> have a one-to-one correspondence with real-valued <math>m</math> by <math>n</math> matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative ''is'' the matrix. TODO: talk about gradient vectors as rows.
|}

Note the absence of the gradient in the above table. The generalization of the gradient to the <math>\mathbf R^n \to \mathbf R^m</math> case is the derivative matrix.

==See also==

* [[Notational confusion of multivariable derivatives]]
* [[Relation between gradient vector and partial derivatives]]
* [[Relation between gradient vector and directional derivatives]]
* [[Directional derivative]]
* [[machinelearning:Summary table of probability terms]]

==References==

* Tao, Terence. ''Analysis II''. 2nd ed. Hindustan Book Agency. 2009.
* Folland, Gerald B. ''Advanced Calculus''. Pearson. 2002.
* Pugh, Charles Chapman. ''Real Mathematical Analysis''. Springer. 2010.

==External links==

Summary table of multivariable derivatives

2020-05-30T08:46:34Z

IssaRice: /* Real-valued function of Rn */

This page is a '''summary table of multivariable derivatives'''.

* TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied

==Single-variable real function==

For comparison and completeness, we give a summary table of the single-variable derivative. Let <math>f\colon \mathbf R \to \mathbf R</math> be a single-variable real function.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Derivative of <math>f</math> || <math>f'</math> or <math>\frac{df}{dx}</math> || <math>\mathbf R \to \mathbf R</math> || <math>f'(x) = \lim_{h\to0} \frac{f(x+h) - f(x)}{h}</math> ||
|-
| Derivative of <math>f</math> at <math>x_0 \in \mathbf R</math> || <math>f'(x_0)</math> or <math>\frac{df}{dx}(x_0)</math> or <math>\left.\frac{d}{dx}f(x)\right|_{x=x_0}</math> || <math>\mathbf R</math> || <math>\begin{align}f'(x_0) &= \lim_{h\to0} \frac{f(x_0+h) - f(x_0)}{h} \\ &= \lim_{x\to x_0} \frac{f(x) - f(x_0)}{x-x_0}\end{align}</math> || In the most general multivariable case, <math>f'(x_0)</math> will become a linear transformation, so analogously we may wish to talk about the single-variable <math>f'(x_0)</math> as the function <math>f'(x_0)\colon \mathbf R \to \mathbf R</math> defined by <math>f'(x_0)(x) = f'(x_0)x</math>, where on the left side "<math>f'(x_0)</math>" is a function and on the right side "<math>f'(x_0)</math>" is a number. If "<math>f'(x_0)</math>" is a function, we can evaluate it at <math>1</math> to recover the number: <math>f'(x_0)(1)</math>. This is pretty confusing, and in practice everyone thinks of "<math>f'(x_0)</math>" in the single-variable case as a number, making the notation divergent; see [[Notational confusion of multivariable derivatives#The derivative as a linear transformation in the several variable case and a number in the single-variable case|Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case]] for more information.
|}

==Real-valued function of '''R'''''n''==

Let <math>f\colon \mathbf R^n \to \mathbf R</math> be a real-valued function of <math>\mathbf R^n</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Partial derivative of <math>f</math> with respect to its <math>j</math>th variable || <math>\partial_j f</math> or <math>\partial_{x_j} f</math> or <math>\frac{\partial f}{\partial x_j}</math> or <math>f_{x_j}</math> or <math>f_j</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}</math> || Here <math>e_j = (0,\ldots,1,\ldots,0)</math> is the <math>j</math>th vector of the standard basis, i.e. the vector with all zeroes except a one in the <math>j</math>th spot. Therefore <math>x + te_j</math> can also be written <math>(x_1,\ldots, x_j + t, \ldots, x_n)</math> when broken down into components.
|-
| Gradient || <math>\nabla f</math> || <math>\mathbf R^n \to \mathbf R^n</math> || <math>\nabla f(x) = (\partial_1 f(x), \ldots, \partial_n f(x))</math> ||
|-
| Gradient at <math>x_0 \in \mathbf R^n</math> || <math>\nabla f(x_0)</math> || <math>\mathbf R^n</math> or <math>\mathcal M_{1,n}(\mathbf R)</math> || <math>(\partial_1 f(x_0), \ldots, \partial_n f(x_0))</math> or the vector <math>c</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - c\cdot (x-x_0)\|}{\|x-x_0\|} = 0 </math> ||
|-
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> || When <math>v = e_j</math>, this reduces to the <math>j</math>th partial derivative.
|-
| Total derivative with respect to the <math>j</math>th variable || <math>\frac{df}{dx_j}</math> || <math>\mathbf R \to \mathbf R</math> || For <math>i \ne j</math>, we treat the variable <math>x_i = g_i(x_j)</math> as a function of <math>x_j</math>, and take the single-variable derivative with respect to <math>x_j</math>. From the chain rule this becomes <math>\frac{df}{dx_j} = \frac{\partial f}{\partial x_1} \frac{dx_1}{dx_j} + \cdots + \frac{\partial f}{\partial x_n} \frac{dx_n}{dx_j}</math> ||
|}

I think in this case, since <math>f'(x_0)(v)</math> coincides with <math>\nabla f(x_0)\cdot v</math>, people don't usually define the derivative separately. For example, Folland in ''Advanced Calculus'' defines ''differentiability'' but not the derivative! He just says that the vector that makes a function differentiable is the gradient.

TODO: answer questions like "Is the gradient the derivative?"

==Vector-valued function of '''R'''==

Let <math>f\colon \mathbf R \to \mathbf R^m</math> be a vector-valued function of <math>\mathbf R</math>. A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like <math>\mathbf f</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Velocity vector at <math>t</math> || <math>v(t)</math> or <math>Df(t)</math> || <math>\mathbf R \to \mathbf R^m</math> || <math>(f_1'(t), \ldots, f_n'(t))</math> ||
|}

Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.

==Vector-valued function of '''R'''''n''==

Let <math>f\colon \mathbf R^n \to \mathbf R^m</math> be a vector-valued function of <math>\mathbf R^n</math>. Since the function is vector-valued, some authors use a boldface letter like <math>\mathbf f</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Partial derivative with respect to the <math>j</math>th variable || <math>\partial_j f</math> or <math>\partial_{x_j} f</math> or <math>\frac{\partial f}{\partial x_j}</math> or <math>f_{x_j}</math> or <math>f_j</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}</math> ||
|-
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> ||
|-
| Total or Fréchet derivative (sometimes just called the derivative) at point <math>x_0\in \mathbf R^n</math> || <math>f'(x_0)</math> or <math>(Df)_{x_0}</math> or <math>d_{x_0}f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || The linear transformation <math>L</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 </math> || The derivative ''at a given point'' is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to "<math>f'</math>" as we can in the single-variable case. Its type would have to be <math>\mathbf R^n \to \mathbf R^n \to \mathbf R^m</math> or more specifically <math>\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)</math> (where <math>\mathcal L(\mathbf R^n, \mathbf R^m)</math> is the set of linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math>). Also the notation <math>f'(x_0)</math> is slightly confusing: if the total derivative is a function, what happens if <math>n=m=1</math>? We see that <math>f'(x_0)\colon \mathbf R \to \mathbf R</math>, so the single-variable derivative isn't actually a number! To get the actual slope of the tangent line, we must evaluate the function at <math>1</math>: <math>f'(x_0)(1) \in \mathbf R</math>. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.
|-
| Derivative matrix, differential matrix, Jacobian matrix at point <math>x_0\in \mathbf R^n</math> || <math>Df(x_0)</math> or <math>\mathcal M(f'(x_0))</math> || <math>\mathcal M_{m,n}(\mathbf R)</math> || <math>\begin{pmatrix}\partial_1 f_1(x_0) & \cdots & \partial_n f_1(x_0) \\ \vdots & \ddots & \vdots \\ \partial_1 f_n(x_0) & \cdots & \partial_n f_n(x_0)\end{pmatrix}</math> || Since the total derivative is a linear transformation, and since linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math> have a one-to-one correspondence with real-valued <math>m</math> by <math>n</math> matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative ''is'' the matrix. TODO: talk about gradient vectors as rows.
|}

Note the absence of the gradient in the above table. The generalization of the gradient to the <math>\mathbf R^n \to \mathbf R^m</math> case is the derivative matrix.

==See also==

* [[Notational confusion of multivariable derivatives]]
* [[Relation between gradient vector and partial derivatives]]
* [[Relation between gradient vector and directional derivatives]]
* [[Directional derivative]]
* [[machinelearning:Summary table of probability terms]]

==References==

* Tao, Terence. ''Analysis II''. 2nd ed. Hindustan Book Agency. 2009.
* Folland, Gerald B. ''Advanced Calculus''. Pearson. 2002.
* Pugh, Charles Chapman. ''Real Mathematical Analysis''. Springer. 2010.

==External links==

Notational confusion of multivariable derivatives

2019-08-10T06:15:20Z

IssaRice:

Summary table of multivariable derivatives

2018-11-03T03:57:53Z

IssaRice: /* Vector-valued function of Rn */

This page is a '''summary table of multivariable derivatives'''.

* TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied

==Single-variable real function==

For comparison and completeness, we give a summary table of the single-variable derivative. Let <math>f\colon \mathbf R \to \mathbf R</math> be a single-variable real function.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Derivative of <math>f</math> || <math>f'</math> or <math>\frac{df}{dx}</math> || <math>\mathbf R \to \mathbf R</math> || <math>f'(x) = \lim_{h\to0} \frac{f(x+h) - f(x)}{h}</math> ||
|-
| Derivative of <math>f</math> at <math>x_0 \in \mathbf R</math> || <math>f'(x_0)</math> or <math>\frac{df}{dx}(x_0)</math> or <math>\left.\frac{d}{dx}f(x)\right|_{x=x_0}</math> || <math>\mathbf R</math> || <math>\begin{align}f'(x_0) &= \lim_{h\to0} \frac{f(x_0+h) - f(x_0)}{h} \\ &= \lim_{x\to x_0} \frac{f(x) - f(x_0)}{x-x_0}\end{align}</math> || In the most general multivariable case, <math>f'(x_0)</math> will become a linear transformation, so analogously we may wish to talk about the single-variable <math>f'(x_0)</math> as the function <math>f'(x_0)\colon \mathbf R \to \mathbf R</math> defined by <math>f'(x_0)(x) = f'(x_0)x</math>, where on the left side "<math>f'(x_0)</math>" is a function and on the right side "<math>f'(x_0)</math>" is a number. If "<math>f'(x_0)</math>" is a function, we can evaluate it at <math>1</math> to recover the number: <math>f'(x_0)(1)</math>. This is pretty confusing, and in practice everyone thinks of "<math>f'(x_0)</math>" in the single-variable case as a number, making the notation divergent; see [[Notational confusion of multivariable derivatives#The derivative as a linear transformation in the several variable case and a number in the single-variable case|Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case]] for more information.
|}

==Real-valued function of '''R'''''n''==

Let <math>f\colon \mathbf R^n \to \mathbf R</math> be a real-valued function of <math>\mathbf R^n</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Partial derivative of <math>f</math> with respect to its <math>j</math>th variable || <math>\partial_j f</math> or <math>\partial_{x_j} f</math> or <math>\frac{\partial f}{\partial x_j}</math> or <math>f_{x_j}</math> or <math>f_j</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}</math> || Here <math>e_j = (0,\ldots,1,\ldots,0)</math> is the <math>j</math>th vector of the standard basis, i.e. the vector with all zeroes except a one in the <math>j</math>th spot. Therefore <math>x + te_j</math> can also be written <math>(x_1,\ldots, x_j + t, \ldots, x_n)</math> when broken down into components.
|-
| Gradient || <math>\nabla f</math> || <math>\mathbf R^n \to \mathbf R^n</math> || <math>\nabla f(x) = (\partial_1 f(x), \ldots, \partial_n f(x))</math> ||
|-
| Gradient at <math>x_0 \in \mathbf R^n</math> || <math>\nabla f(x_0)</math> || <math>\mathbf R^n</math> or <math>\mathcal M_{1,n}(\mathbf R)</math> || <math>(\partial_1 f(x_0), \ldots, \partial_n f(x_0))</math> or the vector <math>c</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - c\cdot (x-x_0)\|}{\|x-x_0\|} = 0 </math> ||
|-
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> || When <math>v = e_j</math>, this reduces to the <math>j</math>th partial derivative.
|}

I think in this case, since <math>f'(x_0)(v)</math> coincides with <math>\nabla f(x_0)\cdot v</math>, people don't usually define the derivative separately. For example, Folland in ''Advanced Calculus'' defines ''differentiability'' but not the derivative! He just says that the vector that makes a function differentiable is the gradient.

TODO: answer questions like "Is the gradient the derivative?"

==Vector-valued function of '''R'''==

Let <math>f\colon \mathbf R \to \mathbf R^m</math> be a vector-valued function of <math>\mathbf R</math>. A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like <math>\mathbf f</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Velocity vector at <math>t</math> || <math>v(t)</math> or <math>Df(t)</math> || <math>\mathbf R \to \mathbf R^m</math> || <math>(f_1'(t), \ldots, f_n'(t))</math> ||
|}

Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.

==Vector-valued function of '''R'''''n''==

Let <math>f\colon \mathbf R^n \to \mathbf R^m</math> be a vector-valued function of <math>\mathbf R^n</math>. Since the function is vector-valued, some authors use a boldface letter like <math>\mathbf f</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Partial derivative with respect to the <math>j</math>th variable || <math>\partial_j f</math> or <math>\partial_{x_j} f</math> or <math>\frac{\partial f}{\partial x_j}</math> or <math>f_{x_j}</math> or <math>f_j</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}</math> ||
|-
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> ||
|-
| Total or Fréchet derivative (sometimes just called the derivative) at point <math>x_0\in \mathbf R^n</math> || <math>f'(x_0)</math> or <math>(Df)_{x_0}</math> or <math>d_{x_0}f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || The linear transformation <math>L</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 </math> || The derivative ''at a given point'' is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to "<math>f'</math>" as we can in the single-variable case. Its type would have to be <math>\mathbf R^n \to \mathbf R^n \to \mathbf R^m</math> or more specifically <math>\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)</math> (where <math>\mathcal L(\mathbf R^n, \mathbf R^m)</math> is the set of linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math>). Also the notation <math>f'(x_0)</math> is slightly confusing: if the total derivative is a function, what happens if <math>n=m=1</math>? We see that <math>f'(x_0)\colon \mathbf R \to \mathbf R</math>, so the single-variable derivative isn't actually a number! To get the actual slope of the tangent line, we must evaluate the function at <math>1</math>: <math>f'(x_0)(1) \in \mathbf R</math>. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.
|-
| Derivative matrix, differential matrix, Jacobian matrix at point <math>x_0\in \mathbf R^n</math> || <math>Df(x_0)</math> or <math>\mathcal M(f'(x_0))</math> || <math>\mathcal M_{m,n}(\mathbf R)</math> || <math>\begin{pmatrix}\partial_1 f_1(x_0) & \cdots & \partial_n f_1(x_0) \\ \vdots & \ddots & \vdots \\ \partial_1 f_n(x_0) & \cdots & \partial_n f_n(x_0)\end{pmatrix}</math> || Since the total derivative is a linear transformation, and since linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math> have a one-to-one correspondence with real-valued <math>m</math> by <math>n</math> matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative ''is'' the matrix. TODO: talk about gradient vectors as rows.
|}

Note the absence of the gradient in the above table. The generalization of the gradient to the <math>\mathbf R^n \to \mathbf R^m</math> case is the derivative matrix.

==See also==

* [[Notational confusion of multivariable derivatives]]
* [[Relation between gradient vector and partial derivatives]]
* [[Relation between gradient vector and directional derivatives]]
* [[Directional derivative]]
* [[machinelearning:Summary table of probability terms]]

==References==

* Tao, Terence. ''Analysis II''. 2nd ed. Hindustan Book Agency. 2009.
* Folland, Gerald B. ''Advanced Calculus''. Pearson. 2002.
* Pugh, Charles Chapman. ''Real Mathematical Analysis''. Springer. 2010.

==External links==

Notational confusion of multivariable derivatives

2018-11-03T03:48:45Z

IssaRice:

Summary table of multivariable derivatives

2018-11-03T03:44:16Z

IssaRice: /* See also */

This page is a '''summary table of multivariable derivatives'''.

* TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied

==Single-variable real function==

For comparison and completeness, we give a summary table of the single-variable derivative. Let <math>f\colon \mathbf R \to \mathbf R</math> be a single-variable real function.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Derivative of <math>f</math> || <math>f'</math> or <math>\frac{df}{dx}</math> || <math>\mathbf R \to \mathbf R</math> || <math>f'(x) = \lim_{h\to0} \frac{f(x+h) - f(x)}{h}</math> ||
|-
| Derivative of <math>f</math> at <math>x_0 \in \mathbf R</math> || <math>f'(x_0)</math> or <math>\frac{df}{dx}(x_0)</math> or <math>\left.\frac{d}{dx}f(x)\right|_{x=x_0}</math> || <math>\mathbf R</math> || <math>\begin{align}f'(x_0) &= \lim_{h\to0} \frac{f(x_0+h) - f(x_0)}{h} \\ &= \lim_{x\to x_0} \frac{f(x) - f(x_0)}{x-x_0}\end{align}</math> || In the most general multivariable case, <math>f'(x_0)</math> will become a linear transformation, so analogously we may wish to talk about the single-variable <math>f'(x_0)</math> as the function <math>f'(x_0)\colon \mathbf R \to \mathbf R</math> defined by <math>f'(x_0)(x) = f'(x_0)x</math>, where on the left side "<math>f'(x_0)</math>" is a function and on the right side "<math>f'(x_0)</math>" is a number. If "<math>f'(x_0)</math>" is a function, we can evaluate it at <math>1</math> to recover the number: <math>f'(x_0)(1)</math>. This is pretty confusing, and in practice everyone thinks of "<math>f'(x_0)</math>" in the single-variable case as a number, making the notation divergent; see [[Notational confusion of multivariable derivatives#The derivative as a linear transformation in the several variable case and a number in the single-variable case|Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case]] for more information.
|}

==Real-valued function of '''R'''''n''==

Let <math>f\colon \mathbf R^n \to \mathbf R</math> be a real-valued function of <math>\mathbf R^n</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Partial derivative of <math>f</math> with respect to its <math>j</math>th variable || <math>\partial_j f</math> or <math>\partial_{x_j} f</math> or <math>\frac{\partial f}{\partial x_j}</math> or <math>f_{x_j}</math> or <math>f_j</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}</math> || Here <math>e_j = (0,\ldots,1,\ldots,0)</math> is the <math>j</math>th vector of the standard basis, i.e. the vector with all zeroes except a one in the <math>j</math>th spot. Therefore <math>x + te_j</math> can also be written <math>(x_1,\ldots, x_j + t, \ldots, x_n)</math> when broken down into components.
|-
| Gradient || <math>\nabla f</math> || <math>\mathbf R^n \to \mathbf R^n</math> || <math>\nabla f(x) = (\partial_1 f(x), \ldots, \partial_n f(x))</math> ||
|-
| Gradient at <math>x_0 \in \mathbf R^n</math> || <math>\nabla f(x_0)</math> || <math>\mathbf R^n</math> or <math>\mathcal M_{1,n}(\mathbf R)</math> || <math>(\partial_1 f(x_0), \ldots, \partial_n f(x_0))</math> or the vector <math>c</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - c\cdot (x-x_0)\|}{\|x-x_0\|} = 0 </math> ||
|-
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> || When <math>v = e_j</math>, this reduces to the <math>j</math>th partial derivative.
|}

I think in this case, since <math>f'(x_0)(v)</math> coincides with <math>\nabla f(x_0)\cdot v</math>, people don't usually define the derivative separately. For example, Folland in ''Advanced Calculus'' defines ''differentiability'' but not the derivative! He just says that the vector that makes a function differentiable is the gradient.

TODO: answer questions like "Is the gradient the derivative?"

==Vector-valued function of '''R'''==

Let <math>f\colon \mathbf R \to \mathbf R^m</math> be a vector-valued function of <math>\mathbf R</math>. A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like <math>\mathbf f</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Velocity vector at <math>t</math> || <math>v(t)</math> or <math>Df(t)</math> || <math>\mathbf R \to \mathbf R^m</math> || <math>(f_1'(t), \ldots, f_n'(t))</math> ||
|}

Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.

==Vector-valued function of '''R'''''n''==

Let <math>f\colon \mathbf R^n \to \mathbf R^m</math> be a vector-valued function of <math>\mathbf R^n</math>. Since the function is vector-valued, some authors use a boldface letter like <math>\mathbf f</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Partial derivative with respect to the <math>j</math>th variable || <math>\partial_j f</math> or <math>\partial_{x_j} f</math> or <math>\frac{\partial f}{\partial x_j}</math> or <math>f_{x_j}</math> or <math>f_j</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}</math> ||
|-
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> ||
|-
| Total or Fréchet derivative (sometimes just called the derivative) at point <math>x_0\in \mathbf R^n</math> || <math>f'(x_0)</math> or <math>(Df)_{x_0}</math> or <math>d_{x_0}f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || The linear transformation <math>L</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 </math> || The derivative ''at a given point'' is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to "<math>f'</math>" as we can in the single-variable case. Its type would have to be <math>\mathbf R^n \to \mathbf R^n \to \mathbf R^m</math> or more specifically <math>\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)</math>. Also the notation <math>f'(x_0)</math> is slightly confusing: if the total derivative is a function, what happens if <math>n=m=1</math>? We see that <math>f'(x_0)\colon \mathbf R \to \mathbf R</math>, so the single-variable derivative isn't actually a number! To get the actual slope of the tangent line, we must evaluate the function at <math>1</math>: <math>f'(x_0)(1) \in \mathbf R</math>. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.
|-
| Derivative matrix, differential matrix, Jacobian matrix at point <math>x_0\in \mathbf R^n</math> || <math>Df(x_0)</math> or <math>\mathcal M(f'(x_0))</math> || <math>\mathcal M_{m,n}(\mathbf R)</math> || <math>\begin{pmatrix}\partial_1 f_1(x_0) & \cdots & \partial_n f_1(x_0) \\ \vdots & \ddots & \vdots \\ \partial_1 f_n(x_0) & \cdots & \partial_n f_n(x_0)\end{pmatrix}</math> || Since the total derivative is a linear transformation, and since linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math> have a one-to-one correspondence with real-valued <math>m</math> by <math>n</math> matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative ''is'' the matrix. TODO: talk about gradient vectors as rows.
|}

Note the absence of the gradient in the above table. The generalization of the gradient to the <math>\mathbf R^n \to \mathbf R^m</math> case is the derivative matrix.

==See also==

* [[Notational confusion of multivariable derivatives]]
* [[Relation between gradient vector and partial derivatives]]
* [[Relation between gradient vector and directional derivatives]]
* [[Directional derivative]]
* [[machinelearning:Summary table of probability terms]]

==References==

* Tao, Terence. ''Analysis II''. 2nd ed. Hindustan Book Agency. 2009.
* Folland, Gerald B. ''Advanced Calculus''. Pearson. 2002.
* Pugh, Charles Chapman. ''Real Mathematical Analysis''. Springer. 2010.

==External links==

Notational confusion of multivariable derivatives

2018-11-03T03:43:29Z

IssaRice: move from https://machinelearning.subwiki.org/wiki/Notational_confusion_of_multivariable_derivatives

Summary table of multivariable derivatives

2018-11-03T03:42:13Z

IssaRice: moving from https://machinelearning.subwiki.org/wiki/Summary_table_of_multivariable_derivatives

This page is a '''summary table of multivariable derivatives'''.

* TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied

==Single-variable real function==

For comparison and completeness, we give a summary table of the single-variable derivative. Let <math>f\colon \mathbf R \to \mathbf R</math> be a single-variable real function.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Derivative of <math>f</math> || <math>f'</math> or <math>\frac{df}{dx}</math> || <math>\mathbf R \to \mathbf R</math> || <math>f'(x) = \lim_{h\to0} \frac{f(x+h) - f(x)}{h}</math> ||
|-
| Derivative of <math>f</math> at <math>x_0 \in \mathbf R</math> || <math>f'(x_0)</math> or <math>\frac{df}{dx}(x_0)</math> or <math>\left.\frac{d}{dx}f(x)\right|_{x=x_0}</math> || <math>\mathbf R</math> || <math>\begin{align}f'(x_0) &= \lim_{h\to0} \frac{f(x_0+h) - f(x_0)}{h} \\ &= \lim_{x\to x_0} \frac{f(x) - f(x_0)}{x-x_0}\end{align}</math> || In the most general multivariable case, <math>f'(x_0)</math> will become a linear transformation, so analogously we may wish to talk about the single-variable <math>f'(x_0)</math> as the function <math>f'(x_0)\colon \mathbf R \to \mathbf R</math> defined by <math>f'(x_0)(x) = f'(x_0)x</math>, where on the left side "<math>f'(x_0)</math>" is a function and on the right side "<math>f'(x_0)</math>" is a number. If "<math>f'(x_0)</math>" is a function, we can evaluate it at <math>1</math> to recover the number: <math>f'(x_0)(1)</math>. This is pretty confusing, and in practice everyone thinks of "<math>f'(x_0)</math>" in the single-variable case as a number, making the notation divergent; see [[Notational confusion of multivariable derivatives#The derivative as a linear transformation in the several variable case and a number in the single-variable case|Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case]] for more information.
|}

==Real-valued function of '''R'''''n''==

Let <math>f\colon \mathbf R^n \to \mathbf R</math> be a real-valued function of <math>\mathbf R^n</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Partial derivative of <math>f</math> with respect to its <math>j</math>th variable || <math>\partial_j f</math> or <math>\partial_{x_j} f</math> or <math>\frac{\partial f}{\partial x_j}</math> or <math>f_{x_j}</math> or <math>f_j</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}</math> || Here <math>e_j = (0,\ldots,1,\ldots,0)</math> is the <math>j</math>th vector of the standard basis, i.e. the vector with all zeroes except a one in the <math>j</math>th spot. Therefore <math>x + te_j</math> can also be written <math>(x_1,\ldots, x_j + t, \ldots, x_n)</math> when broken down into components.
|-
| Gradient || <math>\nabla f</math> || <math>\mathbf R^n \to \mathbf R^n</math> || <math>\nabla f(x) = (\partial_1 f(x), \ldots, \partial_n f(x))</math> ||
|-
| Gradient at <math>x_0 \in \mathbf R^n</math> || <math>\nabla f(x_0)</math> || <math>\mathbf R^n</math> or <math>\mathcal M_{1,n}(\mathbf R)</math> || <math>(\partial_1 f(x_0), \ldots, \partial_n f(x_0))</math> or the vector <math>c</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - c\cdot (x-x_0)\|}{\|x-x_0\|} = 0 </math> ||
|-
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> || When <math>v = e_j</math>, this reduces to the <math>j</math>th partial derivative.
|}

I think in this case, since <math>f'(x_0)(v)</math> coincides with <math>\nabla f(x_0)\cdot v</math>, people don't usually define the derivative separately. For example, Folland in ''Advanced Calculus'' defines ''differentiability'' but not the derivative! He just says that the vector that makes a function differentiable is the gradient.

TODO: answer questions like "Is the gradient the derivative?"

==Vector-valued function of '''R'''==

Let <math>f\colon \mathbf R \to \mathbf R^m</math> be a vector-valued function of <math>\mathbf R</math>. A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like <math>\mathbf f</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Velocity vector at <math>t</math> || <math>v(t)</math> or <math>Df(t)</math> || <math>\mathbf R \to \mathbf R^m</math> || <math>(f_1'(t), \ldots, f_n'(t))</math> ||
|}

Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.

==Vector-valued function of '''R'''''n''==

Let <math>f\colon \mathbf R^n \to \mathbf R^m</math> be a vector-valued function of <math>\mathbf R^n</math>. Since the function is vector-valued, some authors use a boldface letter like <math>\mathbf f</math>.

{| class="sortable wikitable"
|-
! Term !! Notation !! Type !! Definition !! Notes
|-
| Partial derivative with respect to the <math>j</math>th variable || <math>\partial_j f</math> or <math>\partial_{x_j} f</math> or <math>\frac{\partial f}{\partial x_j}</math> or <math>f_{x_j}</math> or <math>f_j</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}</math> ||
|-
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> ||
|-
| Total or Fréchet derivative (sometimes just called the derivative) at point <math>x_0\in \mathbf R^n</math> || <math>f'(x_0)</math> or <math>(Df)_{x_0}</math> or <math>d_{x_0}f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || The linear transformation <math>L</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 </math> || The derivative ''at a given point'' is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to "<math>f'</math>" as we can in the single-variable case. Its type would have to be <math>\mathbf R^n \to \mathbf R^n \to \mathbf R^m</math> or more specifically <math>\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)</math>. Also the notation <math>f'(x_0)</math> is slightly confusing: if the total derivative is a function, what happens if <math>n=m=1</math>? We see that <math>f'(x_0)\colon \mathbf R \to \mathbf R</math>, so the single-variable derivative isn't actually a number! To get the actual slope of the tangent line, we must evaluate the function at <math>1</math>: <math>f'(x_0)(1) \in \mathbf R</math>. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.
|-
| Derivative matrix, differential matrix, Jacobian matrix at point <math>x_0\in \mathbf R^n</math> || <math>Df(x_0)</math> or <math>\mathcal M(f'(x_0))</math> || <math>\mathcal M_{m,n}(\mathbf R)</math> || <math>\begin{pmatrix}\partial_1 f_1(x_0) & \cdots & \partial_n f_1(x_0) \\ \vdots & \ddots & \vdots \\ \partial_1 f_n(x_0) & \cdots & \partial_n f_n(x_0)\end{pmatrix}</math> || Since the total derivative is a linear transformation, and since linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math> have a one-to-one correspondence with real-valued <math>m</math> by <math>n</math> matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative ''is'' the matrix. TODO: talk about gradient vectors as rows.
|}

Note the absence of the gradient in the above table. The generalization of the gradient to the <math>\mathbf R^n \to \mathbf R^m</math> case is the derivative matrix.

==See also==

* [[Notational confusion of multivariable derivatives]]
* [[calculus:Relation between gradient vector and partial derivatives]]
* [[calculus:Relation between gradient vector and directional derivatives]]
* [[calculus:Directional derivative]]
* [[Summary table of probability terms]]

==References==

* Tao, Terence. ''Analysis II''. 2nd ed. Hindustan Book Agency. 2009.
* Folland, Gerald B. ''Advanced Calculus''. Pearson. 2002.
* Pugh, Charles Chapman. ''Real Mathematical Analysis''. Springer. 2010.

==External links==