Summary table of multivariable derivatives: Difference between revisions

From Calculus
(moving from https://machinelearning.subwiki.org/wiki/Summary_table_of_multivariable_derivatives)
 
 
(8 intermediate revisions by the same user not shown)
Line 31: Line 31:
|-
|-
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> || When <math>v = e_j</math>, this reduces to the <math>j</math>th partial derivative.
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> || When <math>v = e_j</math>, this reduces to the <math>j</math>th partial derivative.
|-
| Total derivative with respect to the <math>j</math>th variable || <math>\frac{df}{dx_j}</math> || <math>\mathbf R \to \mathbf R</math> || For <math>i \ne j</math>, we treat the variable <math>x_i = g_i(x_j)</math> as a function of <math>x_j</math>, and take the single-variable derivative with respect to <math>x_j</math> (more formally, <math>g : \mathbf R \to \mathbf R^n</math> is a function such that the <math>j</math>th component <math>g_j = \mathrm{id}</math> is the identity function). From the chain rule this becomes <math>\frac{df}{dx_j} = \nabla f(x) \cdot g'(x) = \frac{\partial f}{\partial x_1} \frac{dx_1}{dx_j} + \cdots + \frac{\partial f}{\partial x_n} \frac{dx_n}{dx_j}</math> ||
|}
|}


I think in this case, since <math>f'(x_0)(v)</math> coincides with <math>\nabla f(x_0)\cdot v</math>, people don't usually define the derivative separately. For example, Folland in ''Advanced Calculus'' defines ''differentiability'' but not the derivative! He just says that the vector that makes a function differentiable is the gradient.
I think in this case, since <math>f'(x_0)(v)</math> coincides with <math>\nabla f(x_0)\cdot v</math>, people don't usually define the derivative separately. For example, Folland in ''Advanced Calculus'' defines ''differentiability'' but not the derivative! He just says that the vector that makes a function differentiable is the gradient.
"Total derivative" is used for two different things (which coincide in the special case where they both make sense); see my answer https://math.stackexchange.com/a/3698838/35525 for details.


TODO: answer questions like "Is the gradient the derivative?"
TODO: answer questions like "Is the gradient the derivative?"
Line 62: Line 66:
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> ||
| Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> ||
|-
|-
| Total or Fréchet derivative (sometimes just called the derivative) at point <math>x_0\in \mathbf R^n</math> || <math>f'(x_0)</math> or <math>(Df)_{x_0}</math> or <math>d_{x_0}f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || The linear transformation <math>L</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 </math> || The derivative ''at a given point'' is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to "<math>f'</math>" as we can in the single-variable case. Its type would have to be <math>\mathbf R^n \to \mathbf R^n \to \mathbf R^m</math> or more specifically <math>\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)</math>. Also the notation <math>f'(x_0)</math> is slightly confusing: if the total derivative is a function, what happens if <math>n=m=1</math>? We see that <math>f'(x_0)\colon \mathbf R \to \mathbf R</math>, so the single-variable derivative isn't actually a number! To get the actual slope of the tangent line, we must evaluate the function at <math>1</math>: <math>f'(x_0)(1) \in \mathbf R</math>. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.
| Total or Fréchet derivative (sometimes just called the derivative) at point <math>x_0\in \mathbf R^n</math> || <math>f'(x_0)</math> or <math>(Df)_{x_0}</math> or <math>d_{x_0}f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || The linear transformation <math>L</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 </math> || The derivative ''at a given point'' is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to "<math>f'</math>" as we can in the single-variable case. Its type would have to be <math>\mathbf R^n \to \mathbf R^n \to \mathbf R^m</math> or more specifically <math>\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)</math> (where <math>\mathcal L(\mathbf R^n, \mathbf R^m)</math> is the set of linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math>). Also the notation <math>f'(x_0)</math> is slightly confusing: if the total derivative is a function, what happens if <math>n=m=1</math>? We see that <math>f'(x_0)\colon \mathbf R \to \mathbf R</math>, so the single-variable derivative isn't actually a number! To get the actual slope of the tangent line, we must evaluate the function at <math>1</math>: <math>f'(x_0)(1) \in \mathbf R</math>. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.
|-
|-
| Derivative matrix, differential matrix, Jacobian matrix at point <math>x_0\in \mathbf R^n</math> || <math>Df(x_0)</math> or <math>\mathcal M(f'(x_0))</math> || <math>\mathcal M_{m,n}(\mathbf R)</math> || <math>\begin{pmatrix}\partial_1 f_1(x_0) & \cdots & \partial_n f_1(x_0) \\ \vdots & \ddots & \vdots \\ \partial_1 f_n(x_0) & \cdots & \partial_n f_n(x_0)\end{pmatrix}</math> || Since the total derivative is a linear transformation, and since linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math> have a one-to-one correspondence with real-valued <math>m</math> by <math>n</math> matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative ''is'' the matrix. TODO: talk about gradient vectors as rows.
| Derivative matrix, differential matrix, Jacobian matrix at point <math>x_0\in \mathbf R^n</math> || <math>Df(x_0)</math> or <math>\mathcal M(f'(x_0))</math> || <math>\mathcal M_{m,n}(\mathbf R)</math> || <math>\begin{pmatrix}\partial_1 f_1(x_0) & \cdots & \partial_n f_1(x_0) \\ \vdots & \ddots & \vdots \\ \partial_1 f_n(x_0) & \cdots & \partial_n f_n(x_0)\end{pmatrix}</math> || Since the total derivative is a linear transformation, and since linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math> have a one-to-one correspondence with real-valued <math>m</math> by <math>n</math> matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative ''is'' the matrix. TODO: talk about gradient vectors as rows.
|-
| Total derivative with respect to the <math>j</math>th variable || <math>\frac{df}{dx_j}</math> || || ||
|}
|}


Line 72: Line 78:


* [[Notational confusion of multivariable derivatives]]
* [[Notational confusion of multivariable derivatives]]
* [[calculus:Relation between gradient vector and partial derivatives]]
* [[Relation between gradient vector and partial derivatives]]
* [[calculus:Relation between gradient vector and directional derivatives]]
* [[Relation between gradient vector and directional derivatives]]
* [[calculus:Directional derivative]]
* [[Directional derivative]]
* [[Summary table of probability terms]]
* [[machinelearning:Summary table of probability terms]]


==References==
==References==
Line 84: Line 90:


==External links==
==External links==
* this post does a similar thing: https://reallyeli.com/posts/total_derivative.html

Latest revision as of 20:28, 17 June 2020

This page is a summary table of multivariable derivatives.

  • TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied

Single-variable real function

For comparison and completeness, we give a summary table of the single-variable derivative. Let f:RR be a single-variable real function.

Term Notation Type Definition Notes
Derivative of f f or dfdx RR f(x)=limh0f(x+h)f(x)h
Derivative of f at x0R f(x0) or dfdx(x0) or ddxf(x)|x=x0 R f'(x0)=limh0f(x0+h)f(x0)h=limxx0f(x)f(x0)xx0 In the most general multivariable case, f(x0) will become a linear transformation, so analogously we may wish to talk about the single-variable f(x0) as the function f(x0):RR defined by f(x0)(x)=f(x0)x, where on the left side "f(x0)" is a function and on the right side "f(x0)" is a number. If "f(x0)" is a function, we can evaluate it at 1 to recover the number: f(x0)(1). This is pretty confusing, and in practice everyone thinks of "f(x0)" in the single-variable case as a number, making the notation divergent; see Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case for more information.

Real-valued function of Rn

Let f:RnR be a real-valued function of Rn.

Term Notation Type Definition Notes
Partial derivative of f with respect to its jth variable jf or xjf or fxj or fxj or fj RnR jf(x)=limt0f(x+tej)f(x)t Here ej=(0,,1,,0) is the jth vector of the standard basis, i.e. the vector with all zeroes except a one in the jth spot. Therefore x+tej can also be written (x1,,xj+t,,xn) when broken down into components.
Gradient f RnRn f(x)=(1f(x),,nf(x))
Gradient at x0Rn f(x0) Rn or M1,n(R) (1f(x0),,nf(x0)) or the vector c such that limxx0|f(x)f(x0)c(xx0)||xx0|=0
Directional derivative in the direction of v Dvf or vf RnR Dvf(x)=limt0f(x+tv)f(x)t When v=ej, this reduces to the jth partial derivative.
Total derivative with respect to the jth variable dfdxj RR For ij, we treat the variable xi=gi(xj) as a function of xj, and take the single-variable derivative with respect to xj (more formally, g:RRn is a function such that the jth component gj=id is the identity function). From the chain rule this becomes dfdxj=f(x)g(x)=fx1dx1dxj++fxndxndxj

I think in this case, since f(x0)(v) coincides with f(x0)v, people don't usually define the derivative separately. For example, Folland in Advanced Calculus defines differentiability but not the derivative! He just says that the vector that makes a function differentiable is the gradient.

"Total derivative" is used for two different things (which coincide in the special case where they both make sense); see my answer https://math.stackexchange.com/a/3698838/35525 for details.

TODO: answer questions like "Is the gradient the derivative?"

Vector-valued function of R

Let f:RRm be a vector-valued function of R. A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like f.

Term Notation Type Definition Notes
Velocity vector at t v(t) or Df(t) RRm (f1(t),,fn(t))

Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.

Vector-valued function of Rn

Let f:RnRm be a vector-valued function of Rn. Since the function is vector-valued, some authors use a boldface letter like f.

Term Notation Type Definition Notes
Partial derivative with respect to the jth variable jf or xjf or fxj or fxj or fj RnRm jf(x)=limt0f(x+tej)f(x)t
Directional derivative in the direction of v Dvf or vf RnRm Dvf(x)=limt0f(x+tv)f(x)t
Total or Fréchet derivative (sometimes just called the derivative) at point x0Rn f(x0) or (Df)x0 or dx0f RnRm The linear transformation L such that limxx0|f(x)f(x0)L(xx0)||xx0|=0 The derivative at a given point is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to "f" as we can in the single-variable case. Its type would have to be RnRnRm or more specifically RnL(Rn,Rm) (where L(Rn,Rm) is the set of linear transformations from Rn to Rm). Also the notation f(x0) is slightly confusing: if the total derivative is a function, what happens if n=m=1? We see that f(x0):RR, so the single-variable derivative isn't actually a number! To get the actual slope of the tangent line, we must evaluate the function at 1: f(x0)(1)R. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.
Derivative matrix, differential matrix, Jacobian matrix at point x0Rn Df(x0) or M(f(x0)) Mm,n(R) (1f1(x0)nf1(x0)1fn(x0)nfn(x0)) Since the total derivative is a linear transformation, and since linear transformations from Rn to Rm have a one-to-one correspondence with real-valued m by n matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative is the matrix. TODO: talk about gradient vectors as rows.
Total derivative with respect to the jth variable dfdxj

Note the absence of the gradient in the above table. The generalization of the gradient to the RnRm case is the derivative matrix.

See also

References

  • Tao, Terence. Analysis II. 2nd ed. Hindustan Book Agency. 2009.
  • Folland, Gerald B. Advanced Calculus. Pearson. 2002.
  • Pugh, Charles Chapman. Real Mathematical Analysis. Springer. 2010.

External links