Summary table of multivariable derivatives: Difference between revisions

Latest revision as of 20:28, 17 June 2020

This page is a summary table of multivariable derivatives.

TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied

Single-variable real function

For comparison and completeness, we give a summary table of the single-variable derivative. Let $f : R \to R$ be a single-variable real function.

Term	Notation	Type	Definition	Notes
Derivative of $f$	$f^{'}$ or $\frac{d f}{d x}$	$R \to R$	$f^{'} (x) = lim_{h \to 0} \frac{f (x + h) - f (x)}{h}$
Derivative of $f$ at $x_{0} \in R$	$f^{'} (x_{0})$ or $\frac{d f}{d x} (x_{0})$ or ${\frac{d}{d x} f (x) \|}_{x = x_{0}}$	$R$	$\begin{array}{r} f' (x_{0}) & = lim_{h \to 0} \frac{f (x_{0} + h) - f (x_{0})}{h} \\ = lim_{x \to x_{0}} \frac{f (x) - f (x_{0})}{x - x_{0}} \end{array}$	In the most general multivariable case, $f^{'} (x_{0})$ will become a linear transformation, so analogously we may wish to talk about the single-variable $f^{'} (x_{0})$ as the function $f^{'} (x_{0}) : R \to R$ defined by $f^{'} (x_{0}) (x) = f^{'} (x_{0}) x$ , where on the left side " $f^{'} (x_{0})$ " is a function and on the right side " $f^{'} (x_{0})$ " is a number. If " $f^{'} (x_{0})$ " is a function, we can evaluate it at $1$ to recover the number: $f^{'} (x_{0}) (1)$ . This is pretty confusing, and in practice everyone thinks of " $f^{'} (x_{0})$ " in the single-variable case as a number, making the notation divergent; see Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case for more information.

Real-valued function of Rⁿ

Let $f : R^{n} \to R$ be a real-valued function of $R^{n}$ .

Term	Notation	Type	Definition	Notes
Partial derivative of $f$ with respect to its $j$ th variable	$\partial_{j} f$ or $\partial_{x_{j}} f$ or $\frac{\partial f}{\partial x_{j}}$ or $f_{x_{j}}$ or $f_{j}$	$R^{n} \to R$	$\partial_{j} f (x) = lim_{t \to 0} \frac{f (x + t e_{j}) - f (x)}{t}$	Here $e_{j} = (0, \dots, 1, \dots, 0)$ is the $j$ th vector of the standard basis, i.e. the vector with all zeroes except a one in the $j$ th spot. Therefore $x + t e_{j}$ can also be written $(x_{1}, \dots, x_{j} + t, \dots, x_{n})$ when broken down into components.
Gradient	$\nabla f$	$R^{n} \to R^{n}$	$\nabla f (x) = (\partial_{1} f (x), \dots, \partial_{n} f (x))$
Gradient at $x_{0} \in R^{n}$	$\nabla f (x_{0})$	$R^{n}$ or $M_{1, n} (R)$	$(\partial_{1} f (x_{0}), \dots, \partial_{n} f (x_{0}))$ or the vector $c$ such that $lim_{x \to x_{0}} \frac{\| f (x) - f (x_{0}) - c \cdot (x - x_{0}) \|}{\| x - x_{0} \|} = 0$
Directional derivative in the direction of $v$	$D_{v} f$ or $\partial_{v} f$	$R^{n} \to R$	$D_{v} f (x) = lim_{t \to 0} \frac{f (x + t v) - f (x)}{t}$	When $v = e_{j}$ , this reduces to the $j$ th partial derivative.
Total derivative with respect to the $j$ th variable	$\frac{d f}{d x_{j}}$	$R \to R$	For $i \neq j$ , we treat the variable $x_{i} = g_{i} (x_{j})$ as a function of $x_{j}$ , and take the single-variable derivative with respect to $x_{j}$ (more formally, $g : R \to R^{n}$ is a function such that the $j$ th component $g_{j} = i d$ is the identity function). From the chain rule this becomes $\frac{d f}{d x_{j}} = \nabla f (x) \cdot g^{'} (x) = \frac{\partial f}{\partial x_{1}} \frac{d x_{1}}{d x_{j}} + \dots + \frac{\partial f}{\partial x_{n}} \frac{d x_{n}}{d x_{j}}$

I think in this case, since $f^{'} (x_{0}) (v)$ coincides with $\nabla f (x_{0}) \cdot v$ , people don't usually define the derivative separately. For example, Folland in Advanced Calculus defines differentiability but not the derivative! He just says that the vector that makes a function differentiable is the gradient.

"Total derivative" is used for two different things (which coincide in the special case where they both make sense); see my answer https://math.stackexchange.com/a/3698838/35525 for details.

TODO: answer questions like "Is the gradient the derivative?"

Vector-valued function of R

Let $f : R \to R^{m}$ be a vector-valued function of $R$ . A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like $f$ .

Term	Notation	Type	Definition	Notes
Velocity vector at $t$	$v (t)$ or $D f (t)$	$R \to R^{m}$	$({f_{1^{'}}}^{'} (t), \dots, {f_{n^{'}}}^{'} (t))$

Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.

Vector-valued function of Rⁿ

Let $f : R^{n} \to R^{m}$ be a vector-valued function of $R^{n}$ . Since the function is vector-valued, some authors use a boldface letter like $f$ .

Term	Notation	Type	Definition	Notes
Partial derivative with respect to the $j$ th variable	$\partial_{j} f$ or $\partial_{x_{j}} f$ or $\frac{\partial f}{\partial x_{j}}$ or $f_{x_{j}}$ or $f_{j}$	$R^{n} \to R^{m}$	$\partial_{j} f (x) = lim_{t \to 0} \frac{f (x + t e_{j}) - f (x)}{t}$
Directional derivative in the direction of $v$	$D_{v} f$ or $\partial_{v} f$	$R^{n} \to R^{m}$	$D_{v} f (x) = lim_{t \to 0} \frac{f (x + t v) - f (x)}{t}$
Total or Fréchet derivative (sometimes just called the derivative) at point $x_{0} \in R^{n}$	$f^{'} (x_{0})$ or $(D f)_{x_{0}}$ or $d_{x_{0}} f$	$R^{n} \to R^{m}$	The linear transformation $L$ such that $lim_{x \to x_{0}} \frac{\| f (x) - f (x_{0}) - L (x - x_{0}) \|}{\| x - x_{0} \|} = 0$	The derivative at a given point is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to " $f^{'}$ " as we can in the single-variable case. Its type would have to be $R^{n} \to R^{n} \to R^{m}$ or more specifically $R^{n} \to L (R^{n}, R^{m})$ (where $L (R^{n}, R^{m})$ is the set of linear transformations from $R^{n}$ to $R^{m}$ ). Also the notation $f^{'} (x_{0})$ is slightly confusing: if the total derivative is a function, what happens if $n = m = 1$ ? We see that $f^{'} (x_{0}) : R \to R$ , so the single-variable derivative isn't actually a number! To get the actual slope of the tangent line, we must evaluate the function at $1$ : $f^{'} (x_{0}) (1) \in R$ . Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.
Derivative matrix, differential matrix, Jacobian matrix at point $x_{0} \in R^{n}$	$D f (x_{0})$ or $M (f^{'} (x_{0}))$	$M_{m, n} (R)$	$(\begin{matrix} \partial_{1} f_{1} (x_{0}) & \dots & \partial_{n} f_{1} (x_{0}) \\ ⋮ & ⋱ & ⋮ \\ \partial_{1} f_{n} (x_{0}) & \dots & \partial_{n} f_{n} (x_{0}) \end{matrix})$	Since the total derivative is a linear transformation, and since linear transformations from $R^{n}$ to $R^{m}$ have a one-to-one correspondence with real-valued $m$ by $n$ matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative is the matrix. TODO: talk about gradient vectors as rows.
Total derivative with respect to the $j$ th variable	$\frac{d f}{d x_{j}}$

Note the absence of the gradient in the above table. The generalization of the gradient to the $R^{n} \to R^{m}$ case is the derivative matrix.

References

Tao, Terence. Analysis II. 2nd ed. Hindustan Book Agency. 2009.
Folland, Gerald B. Advanced Calculus. Pearson. 2002.
Pugh, Charles Chapman. Real Mathematical Analysis. Springer. 2010.

External links

this post does a similar thing: https://reallyeli.com/posts/total_derivative.html

@@ Line 31: / Line 31: @@
 |-
 | Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> || When <math>v = e_j</math>, this reduces to the <math>j</math>th partial derivative.
+|-
+| Total derivative with respect to the <math>j</math>th variable || <math>\frac{df}{dx_j}</math> || <math>\mathbf R \to \mathbf R</math> || For <math>i \ne j</math>, we treat the variable <math>x_i = g_i(x_j)</math> as a function of <math>x_j</math>, and take the single-variable derivative with respect to <math>x_j</math> (more formally, <math>g : \mathbf R \to \mathbf R^n</math> is a function such that the <math>j</math>th component <math>g_j = \mathrm{id}</math> is the identity function). From the chain rule this becomes <math>\frac{df}{dx_j} = \nabla f(x) \cdot g'(x) = \frac{\partial f}{\partial x_1} \frac{dx_1}{dx_j} + \cdots + \frac{\partial f}{\partial x_n} \frac{dx_n}{dx_j}</math> ||
 |}
 I think in this case, since <math>f'(x_0)(v)</math> coincides with <math>\nabla f(x_0)\cdot v</math>, people don't usually define the derivative separately. For example, Folland in ''Advanced Calculus'' defines ''differentiability'' but not the derivative! He just says that the vector that makes a function differentiable is the gradient.
+"Total derivative" is used for two different things (which coincide in the special case where they both make sense); see my answer https://math.stackexchange.com/a/3698838/35525 for details.
 TODO: answer questions like "Is the gradient the derivative?"
@@ Line 62: / Line 66: @@
 | Directional derivative in the direction of <math>v</math> || <math>D_v f</math> or <math>\partial_v f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || <math>D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}</math> ||
 |-
-| Total or Fréchet derivative (sometimes just called the derivative) at point <math>x_0\in \mathbf R^n</math> || <math>f'(x_0)</math> or <math>(Df)_{x_0}</math> or <math>d_{x_0}f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || The linear transformation <math>L</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 </math> || The derivative ''at a given point'' is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to "<math>f'</math>" as we can in the single-variable case. Its type would have to be <math>\mathbf R^n \to \mathbf R^n \to \mathbf R^m</math> or more specifically <math>\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)</math>. Also the notation <math>f'(x_0)</math> is slightly confusing: if the total derivative is a function, what happens if <math>n=m=1</math>? We see that <math>f'(x_0)\colon \mathbf R \to \mathbf R</math>, so the single-variable derivative isn't actually a number! To get the actual slope of the tangent line, we must evaluate the function at <math>1</math>: <math>f'(x_0)(1) \in \mathbf R</math>. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.
+| Total or Fréchet derivative (sometimes just called the derivative) at point <math>x_0\in \mathbf R^n</math> || <math>f'(x_0)</math> or <math>(Df)_{x_0}</math> or <math>d_{x_0}f</math> || <math>\mathbf R^n \to \mathbf R^m</math> || The linear transformation <math>L</math> such that <math>\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 </math> || The derivative ''at a given point'' is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to "<math>f'</math>" as we can in the single-variable case. Its type would have to be <math>\mathbf R^n \to \mathbf R^n \to \mathbf R^m</math> or more specifically <math>\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)</math> (where <math>\mathcal L(\mathbf R^n, \mathbf R^m)</math> is the set of linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math>). Also the notation <math>f'(x_0)</math> is slightly confusing: if the total derivative is a function, what happens if <math>n=m=1</math>? We see that <math>f'(x_0)\colon \mathbf R \to \mathbf R</math>, so the single-variable derivative isn't actually a number! To get the actual slope of the tangent line, we must evaluate the function at <math>1</math>: <math>f'(x_0)(1) \in \mathbf R</math>. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.
 |-
 | Derivative matrix, differential matrix, Jacobian matrix at point <math>x_0\in \mathbf R^n</math> || <math>Df(x_0)</math> or <math>\mathcal M(f'(x_0))</math> || <math>\mathcal M_{m,n}(\mathbf R)</math> || <math>\begin{pmatrix}\partial_1 f_1(x_0) & \cdots & \partial_n f_1(x_0) \\ \vdots & \ddots & \vdots \\ \partial_1 f_n(x_0) & \cdots & \partial_n f_n(x_0)\end{pmatrix}</math> || Since the total derivative is a linear transformation, and since linear transformations from <math>\mathbf R^n</math> to <math>\mathbf R^m</math> have a one-to-one correspondence with real-valued <math>m</math> by <math>n</math> matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative ''is'' the matrix. TODO: talk about gradient vectors as rows.
+|-
+| Total derivative with respect to the <math>j</math>th variable || <math>\frac{df}{dx_j}</math> || || ||
 |}
@@ Line 72: / Line 78: @@
 * [[Notational confusion of multivariable derivatives]]
-* [[calculus:Relation between gradient vector and partial derivatives]]
+* [[Relation between gradient vector and partial derivatives]]
-* [[calculus:Relation between gradient vector and directional derivatives]]
+* [[Relation between gradient vector and directional derivatives]]
-* [[calculus:Directional derivative]]
+* [[Directional derivative]]
-* [[Summary table of probability terms]]
+* [[machinelearning:Summary table of probability terms]]
 ==References==
@@ Line 84: / Line 90: @@
 ==External links==
+* this post does a similar thing: https://reallyeli.com/posts/total_derivative.html