<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://calculus.subwiki.org/w/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=IssaRice</id>
	<title>Calculus - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://calculus.subwiki.org/w/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=IssaRice"/>
	<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/wiki/Special:Contributions/IssaRice"/>
	<updated>2026-05-14T07:10:44Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.41.2</generator>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Summary_table_of_multivariable_derivatives&amp;diff=3212</id>
		<title>Summary table of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Summary_table_of_multivariable_derivatives&amp;diff=3212"/>
		<updated>2020-06-17T20:28:46Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: /* External links */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page is a &#039;&#039;&#039;summary table of multivariable derivatives&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
* TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied&lt;br /&gt;
&lt;br /&gt;
==Single-variable real function==&lt;br /&gt;
&lt;br /&gt;
For comparison and completeness, we give a summary table of the single-variable derivative. Let &amp;lt;math&amp;gt;f\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt; be a single-variable real function.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x) = \lim_{h\to0} \frac{f(x+h) - f(x)}{h}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; at &amp;lt;math&amp;gt;x_0 \in \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{df}{dx}(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\left.\frac{d}{dx}f(x)\right|_{x=x_0}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\begin{align}f&#039;(x_0) &amp;amp;= \lim_{h\to0} \frac{f(x_0+h) - f(x_0)}{h} \\ &amp;amp;= \lim_{x\to x_0} \frac{f(x) - f(x_0)}{x-x_0}\end{align}&amp;lt;/math&amp;gt; || In the most general multivariable case, &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; will become a linear transformation, so analogously we may wish to talk about the single-variable &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; as the function &amp;lt;math&amp;gt;f&#039;(x_0)\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt; defined by &amp;lt;math&amp;gt;f&#039;(x_0)(x) = f&#039;(x_0)x&amp;lt;/math&amp;gt;, where on the left side &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a function and on the right side &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a number. If &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a function, we can evaluate it at &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt; to recover the number: &amp;lt;math&amp;gt;f&#039;(x_0)(1)&amp;lt;/math&amp;gt;. This is pretty confusing, and in practice everyone thinks of &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; in the single-variable case as a number, making the notation divergent; see [[Notational confusion of multivariable derivatives#The derivative as a linear transformation in the several variable case and a number in the single-variable case|Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case]] for more information.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Real-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;&amp;lt;sup&amp;gt;&#039;&#039;n&#039;&#039;&amp;lt;/sup&amp;gt;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; be a real-valued function of &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Partial derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; with respect to its &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\partial_j f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_{x_j} f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{\partial f}{\partial x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_{x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_j&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}&amp;lt;/math&amp;gt; || Here &amp;lt;math&amp;gt;e_j = (0,\ldots,1,\ldots,0)&amp;lt;/math&amp;gt; is the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th vector of the standard basis, i.e. the vector with all zeroes except a one in the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th spot. Therefore &amp;lt;math&amp;gt;x + te_j&amp;lt;/math&amp;gt; can also be written &amp;lt;math&amp;gt;(x_1,\ldots, x_j + t, \ldots, x_n)&amp;lt;/math&amp;gt; when broken down into components.&lt;br /&gt;
|-&lt;br /&gt;
| Gradient || &amp;lt;math&amp;gt;\nabla f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\nabla f(x) = (\partial_1 f(x), \ldots, \partial_n f(x))&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Gradient at &amp;lt;math&amp;gt;x_0 \in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\nabla f(x_0)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\mathcal M_{1,n}(\mathbf R)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(\partial_1 f(x_0), \ldots, \partial_n f(x_0))&amp;lt;/math&amp;gt; or the vector &amp;lt;math&amp;gt;c&amp;lt;/math&amp;gt; such that &amp;lt;math&amp;gt;\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - c\cdot (x-x_0)\|}{\|x-x_0\|} = 0 &amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Directional derivative in the direction of &amp;lt;math&amp;gt;v&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_v f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}&amp;lt;/math&amp;gt; || When &amp;lt;math&amp;gt;v = e_j&amp;lt;/math&amp;gt;, this reduces to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th partial derivative.&lt;br /&gt;
|-&lt;br /&gt;
| Total derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\frac{df}{dx_j}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R&amp;lt;/math&amp;gt; || For &amp;lt;math&amp;gt;i \ne j&amp;lt;/math&amp;gt;, we treat the variable &amp;lt;math&amp;gt;x_i = g_i(x_j)&amp;lt;/math&amp;gt; as a function of &amp;lt;math&amp;gt;x_j&amp;lt;/math&amp;gt;, and take the single-variable derivative with respect to &amp;lt;math&amp;gt;x_j&amp;lt;/math&amp;gt; (more formally, &amp;lt;math&amp;gt;g : \mathbf R \to \mathbf R^n&amp;lt;/math&amp;gt; is a function such that the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th component &amp;lt;math&amp;gt;g_j = \mathrm{id}&amp;lt;/math&amp;gt; is the identity function). From the chain rule this becomes &amp;lt;math&amp;gt;\frac{df}{dx_j} = \nabla f(x) \cdot g&#039;(x) = \frac{\partial f}{\partial x_1} \frac{dx_1}{dx_j} + \cdots + \frac{\partial f}{\partial x_n} \frac{dx_n}{dx_j}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
I think in this case, since &amp;lt;math&amp;gt;f&#039;(x_0)(v)&amp;lt;/math&amp;gt; coincides with &amp;lt;math&amp;gt;\nabla f(x_0)\cdot v&amp;lt;/math&amp;gt;, people don&#039;t usually define the derivative separately. For example, Folland in &#039;&#039;Advanced Calculus&#039;&#039; defines &#039;&#039;differentiability&#039;&#039; but not the derivative! He just says that the vector that makes a function differentiable is the gradient.&lt;br /&gt;
&lt;br /&gt;
&amp;quot;Total derivative&amp;quot; is used for two different things (which coincide in the special case where they both make sense); see my answer https://math.stackexchange.com/a/3698838/35525 for details.&lt;br /&gt;
&lt;br /&gt;
TODO: answer questions like &amp;quot;Is the gradient the derivative?&amp;quot;&lt;br /&gt;
&lt;br /&gt;
==Vector-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R \to \mathbf R^m&amp;lt;/math&amp;gt; be a vector-valued function of &amp;lt;math&amp;gt;\mathbf R&amp;lt;/math&amp;gt;. A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like &amp;lt;math&amp;gt;\mathbf f&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Velocity vector at &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;v(t)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;Df(t)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(f_1&#039;(t), \ldots, f_n&#039;(t))&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.&lt;br /&gt;
&lt;br /&gt;
==Vector-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;&amp;lt;sup&amp;gt;&#039;&#039;n&#039;&#039;&amp;lt;/sup&amp;gt;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; be a vector-valued function of &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt;. Since the function is vector-valued, some authors use a boldface letter like &amp;lt;math&amp;gt;\mathbf f&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Partial derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\partial_j f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_{x_j} f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{\partial f}{\partial x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_{x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_j&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Directional derivative in the direction of &amp;lt;math&amp;gt;v&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_v f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Total or Fréchet derivative (sometimes just called the derivative) at point &amp;lt;math&amp;gt;x_0\in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;(Df)_{x_0}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;d_{x_0}f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || The linear transformation &amp;lt;math&amp;gt;L&amp;lt;/math&amp;gt; such that &amp;lt;math&amp;gt;\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 &amp;lt;/math&amp;gt; || The derivative &#039;&#039;at a given point&#039;&#039; is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to &amp;quot;&amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt;&amp;quot; as we can in the single-variable case. Its type would have to be &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; or more specifically &amp;lt;math&amp;gt;\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)&amp;lt;/math&amp;gt; (where &amp;lt;math&amp;gt;\mathcal L(\mathbf R^n, \mathbf R^m)&amp;lt;/math&amp;gt; is the set of linear transformations from &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; to &amp;lt;math&amp;gt;\mathbf R^m&amp;lt;/math&amp;gt;). Also the notation &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; is slightly confusing: if the total derivative is a function, what happens if &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt;? We see that &amp;lt;math&amp;gt;f&#039;(x_0)\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt;, so the single-variable derivative isn&#039;t actually a number! To get the actual slope of the tangent line, we must evaluate the function at &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt;: &amp;lt;math&amp;gt;f&#039;(x_0)(1) \in \mathbf R&amp;lt;/math&amp;gt;. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.&lt;br /&gt;
|-&lt;br /&gt;
| Derivative matrix, differential matrix, Jacobian matrix at point &amp;lt;math&amp;gt;x_0\in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;Df(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\mathcal M(f&#039;(x_0))&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathcal M_{m,n}(\mathbf R)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\begin{pmatrix}\partial_1 f_1(x_0) &amp;amp; \cdots &amp;amp; \partial_n f_1(x_0) \\ \vdots &amp;amp; \ddots &amp;amp; \vdots \\ \partial_1 f_n(x_0) &amp;amp; \cdots &amp;amp; \partial_n f_n(x_0)\end{pmatrix}&amp;lt;/math&amp;gt; || Since the total derivative is a linear transformation, and since linear transformations from &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; to &amp;lt;math&amp;gt;\mathbf R^m&amp;lt;/math&amp;gt; have a one-to-one correspondence with real-valued &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative &#039;&#039;is&#039;&#039; the matrix. TODO: talk about gradient vectors as rows.&lt;br /&gt;
|-&lt;br /&gt;
| Total derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\frac{df}{dx_j}&amp;lt;/math&amp;gt; || || ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Note the absence of the gradient in the above table. The generalization of the gradient to the &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; case is the derivative matrix.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Notational confusion of multivariable derivatives]]&lt;br /&gt;
* [[Relation between gradient vector and partial derivatives]]&lt;br /&gt;
* [[Relation between gradient vector and directional derivatives]]&lt;br /&gt;
* [[Directional derivative]]&lt;br /&gt;
* [[machinelearning:Summary table of probability terms]]&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&lt;br /&gt;
* Tao, Terence. &#039;&#039;Analysis II&#039;&#039;. 2nd ed. Hindustan Book Agency. 2009.&lt;br /&gt;
* Folland, Gerald B. &#039;&#039;Advanced Calculus&#039;&#039;. Pearson. 2002.&lt;br /&gt;
* Pugh, Charles Chapman. &#039;&#039;Real Mathematical Analysis&#039;&#039;. Springer. 2010.&lt;br /&gt;
&lt;br /&gt;
==External links==&lt;br /&gt;
&lt;br /&gt;
* this post does a similar thing: https://reallyeli.com/posts/total_derivative.html&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3211</id>
		<title>Notational confusion of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3211"/>
		<updated>2020-05-30T23:43:41Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;I think there&#039;s several different confusions that arise from multivariable derivative notation:&lt;br /&gt;
&lt;br /&gt;
* The thing where &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; can mean two different things on LHS and RHS when &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; is used as both an initial and intermediate variable. (See Folland for details.)&lt;br /&gt;
* the meaning of partial derivatives depends on which variables you take to be independent; see p. 75 of folland&lt;br /&gt;
* The thing where if &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt; then &amp;lt;math&amp;gt;\frac{\partial f}{\partial x}(x,x)&amp;lt;/math&amp;gt; feels like it might be &amp;lt;math&amp;gt;(2x,2x)&amp;lt;/math&amp;gt; even though it&#039;s actually &amp;lt;math&amp;gt;(2x,0)&amp;lt;/math&amp;gt;. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]&lt;br /&gt;
* The ambiguity of expressions like &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt;&lt;br /&gt;
* dual basis stuff -- see Tao&#039;s explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]&lt;br /&gt;
&lt;br /&gt;
Working off the example from Tao above, let &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt;. What does &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt; mean? Here are four possibilities:&lt;br /&gt;
&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; (i.e., the total derivative of f) at the point (x,x). Once we fix a point &amp;lt;math&amp;gt;(x_0,y_0)&amp;lt;/math&amp;gt;, then &amp;lt;math&amp;gt;f&#039;(x_0,y_0)&amp;lt;/math&amp;gt; is a linear map &amp;lt;math&amp;gt;\mathbf R^2 \to \mathbf R^2&amp;lt;/math&amp;gt; defined by the matrix &amp;lt;math&amp;gt;\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}&amp;lt;/math&amp;gt;. Since our point is &amp;lt;math&amp;gt;(x_0, y_0) = (x,x)&amp;lt;/math&amp;gt;, we have &amp;lt;math&amp;gt;\begin{pmatrix}2x &amp;amp; 0 \\ 0 &amp;amp; 2x\end{pmatrix}&amp;lt;/math&amp;gt;.&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule &amp;lt;math&amp;gt;\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}&amp;lt;/math&amp;gt; to compute) evaluated at the point (x,x). We have &amp;lt;math&amp;gt;(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})&amp;lt;/math&amp;gt;. We can&#039;t evaluate this further since we don&#039;t know how x and y are related. If we assume the relationship y=x then this reduces to (2x,2x).&lt;br /&gt;
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;. We now differentiate this function. The result is the function &amp;lt;math&amp;gt;x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
# There&#039;s implicitly a function &amp;lt;math&amp;gt;\phi(x,y) = (x,x)&amp;lt;/math&amp;gt;, so &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x) = \frac{d}{dx} f(\phi(x,y))&amp;lt;/math&amp;gt;. Using the chain rule, this is &amp;lt;math&amp;gt;(f\circ \phi)&#039;(x,y) = f&#039;(\phi(x,y))\phi&#039;(x,y) = \left.\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}\right\vert_{(x_0,y_0) = \phi(x,y)} \begin{pmatrix}1 &amp;amp; 0 \\ 1 &amp;amp; 0\end{pmatrix} = \begin{pmatrix}2x &amp;amp; 0 \\ 2x &amp;amp; 0\end{pmatrix}&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==Big picture==&lt;br /&gt;
&lt;br /&gt;
Why is this notation so confusing? I think there are two (?) big reasons:&lt;br /&gt;
&lt;br /&gt;
* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland&#039;s example of &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; meaning two different things.&lt;br /&gt;
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==The derivative as a linear transformation in the several variable case and a number in the single-variable case==&lt;br /&gt;
&lt;br /&gt;
* The thing where the total derivative for &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt; &amp;quot;should&amp;quot; be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&amp;amp;lpg=PR1&amp;amp;pg=PA357 &amp;quot;Appendix A: Perorations of Dieudonne&amp;quot;] (p. 337) in Pugh&#039;s &#039;&#039;Real Mathematical Analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
==Total derivative versus derivative matrix==&lt;br /&gt;
&lt;br /&gt;
Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, so many books call the total derivative a matrix or equate the two.&lt;br /&gt;
&lt;br /&gt;
A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Summary table of multivariable derivatives]]&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3210</id>
		<title>Notational confusion of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3210"/>
		<updated>2020-05-30T23:07:30Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;I think there&#039;s several different confusions that arise from multivariable derivative notation:&lt;br /&gt;
&lt;br /&gt;
* The thing where &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; can mean two different things on LHS and RHS when &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; is used as both an initial and intermediate variable. (See Folland for details.)&lt;br /&gt;
* The thing where if &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt; then &amp;lt;math&amp;gt;\frac{\partial f}{\partial x}(x,x)&amp;lt;/math&amp;gt; feels like it might be &amp;lt;math&amp;gt;(2x,2x)&amp;lt;/math&amp;gt; even though it&#039;s actually &amp;lt;math&amp;gt;(2x,0)&amp;lt;/math&amp;gt;. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]&lt;br /&gt;
* The ambiguity of expressions like &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt;&lt;br /&gt;
* dual basis stuff -- see Tao&#039;s explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]&lt;br /&gt;
&lt;br /&gt;
Working off the example from Tao above, let &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt;. What does &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt; mean? Here are four possibilities:&lt;br /&gt;
&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; (i.e., the total derivative of f) at the point (x,x). Once we fix a point &amp;lt;math&amp;gt;(x_0,y_0)&amp;lt;/math&amp;gt;, then &amp;lt;math&amp;gt;f&#039;(x_0,y_0)&amp;lt;/math&amp;gt; is a linear map &amp;lt;math&amp;gt;\mathbf R^2 \to \mathbf R^2&amp;lt;/math&amp;gt; defined by the matrix &amp;lt;math&amp;gt;\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}&amp;lt;/math&amp;gt;. Since our point is &amp;lt;math&amp;gt;(x_0, y_0) = (x,x)&amp;lt;/math&amp;gt;, we have &amp;lt;math&amp;gt;\begin{pmatrix}2x &amp;amp; 0 \\ 0 &amp;amp; 2x\end{pmatrix}&amp;lt;/math&amp;gt;.&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule &amp;lt;math&amp;gt;\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}&amp;lt;/math&amp;gt; to compute) evaluated at the point (x,x). We have &amp;lt;math&amp;gt;(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})&amp;lt;/math&amp;gt;. We can&#039;t evaluate this further since we don&#039;t know how x and y are related. If we assume the relationship y=x then this reduces to (2x,2x).&lt;br /&gt;
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;. We now differentiate this function. The result is the function &amp;lt;math&amp;gt;x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
# There&#039;s implicitly a function &amp;lt;math&amp;gt;\phi(x,y) = (x,x)&amp;lt;/math&amp;gt;, so &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x) = \frac{d}{dx} f(\phi(x,y))&amp;lt;/math&amp;gt;. Using the chain rule, this is &amp;lt;math&amp;gt;(f\circ \phi)&#039;(x,y) = f&#039;(\phi(x,y))\phi&#039;(x,y) = \left.\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}\right\vert_{(x_0,y_0) = \phi(x,y)} \begin{pmatrix}1 &amp;amp; 0 \\ 1 &amp;amp; 0\end{pmatrix} = \begin{pmatrix}2x &amp;amp; 0 \\ 2x &amp;amp; 0\end{pmatrix}&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==Big picture==&lt;br /&gt;
&lt;br /&gt;
Why is this notation so confusing? I think there are two (?) big reasons:&lt;br /&gt;
&lt;br /&gt;
* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland&#039;s example of &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; meaning two different things.&lt;br /&gt;
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==The derivative as a linear transformation in the several variable case and a number in the single-variable case==&lt;br /&gt;
&lt;br /&gt;
* The thing where the total derivative for &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt; &amp;quot;should&amp;quot; be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&amp;amp;lpg=PR1&amp;amp;pg=PA357 &amp;quot;Appendix A: Perorations of Dieudonne&amp;quot;] (p. 337) in Pugh&#039;s &#039;&#039;Real Mathematical Analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
==Total derivative versus derivative matrix==&lt;br /&gt;
&lt;br /&gt;
Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, so many books call the total derivative a matrix or equate the two.&lt;br /&gt;
&lt;br /&gt;
A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Summary table of multivariable derivatives]]&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3209</id>
		<title>Notational confusion of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3209"/>
		<updated>2020-05-30T23:07:09Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;I think there&#039;s several different confusions that arise from multivariable derivative notation:&lt;br /&gt;
&lt;br /&gt;
* The thing where &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; can mean two different things on LHS and RHS when &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; is used as both an initial and intermediate variable. (See Folland for details.)&lt;br /&gt;
* The thing where if &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt; then &amp;lt;math&amp;gt;\frac{\partial f}{\partial x}(x,x)&amp;lt;/math&amp;gt; feels like it might be &amp;lt;math&amp;gt;(2x,2x)&amp;lt;/math&amp;gt; even though it&#039;s actually &amp;lt;math&amp;gt;(2x,0)&amp;lt;/math&amp;gt;. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]&lt;br /&gt;
* The ambiguity of expressions like &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt;&lt;br /&gt;
* dual basis stuff -- see Tao&#039;s explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]&lt;br /&gt;
&lt;br /&gt;
Working off the example from Tao above, let &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt;. What does &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt; mean? Here are four possibilities:&lt;br /&gt;
&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point &amp;lt;math&amp;gt;(x_0,y_0)&amp;lt;/math&amp;gt;, then &amp;lt;math&amp;gt;f&#039;(x_0,y_0)&amp;lt;/math&amp;gt; is a linear map &amp;lt;math&amp;gt;\mathbf R^2 \to \mathbf R^2&amp;lt;/math&amp;gt; defined by the matrix &amp;lt;math&amp;gt;\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}&amp;lt;/math&amp;gt;. Since our point is &amp;lt;math&amp;gt;(x_0, y_0) = (x,x)&amp;lt;/math&amp;gt;, we have &amp;lt;math&amp;gt;\begin{pmatrix}2x &amp;amp; 0 \\ 0 &amp;amp; 2x\end{pmatrix}&amp;lt;/math&amp;gt;.&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule &amp;lt;math&amp;gt;\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}&amp;lt;/math&amp;gt; to compute) evaluated at the point (x,x). We have &amp;lt;math&amp;gt;(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})&amp;lt;/math&amp;gt;. We can&#039;t evaluate this further since we don&#039;t know how x and y are related. If we assume the relationship y=x then this reduces to (2x,2x).&lt;br /&gt;
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;. We now differentiate this function. The result is the function &amp;lt;math&amp;gt;x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
# There&#039;s implicitly a function &amp;lt;math&amp;gt;\phi(x,y) = (x,x)&amp;lt;/math&amp;gt;, so &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x) = \frac{d}{dx} f(\phi(x,y))&amp;lt;/math&amp;gt;. Using the chain rule, this is &amp;lt;math&amp;gt;(f\circ \phi)&#039;(x,y) = f&#039;(\phi(x,y))\phi&#039;(x,y) = \left.\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}\right\vert_{(x_0,y_0) = \phi(x,y)} \begin{pmatrix}1 &amp;amp; 0 \\ 1 &amp;amp; 0\end{pmatrix} = \begin{pmatrix}2x &amp;amp; 0 \\ 2x &amp;amp; 0\end{pmatrix}&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==Big picture==&lt;br /&gt;
&lt;br /&gt;
Why is this notation so confusing? I think there are two (?) big reasons:&lt;br /&gt;
&lt;br /&gt;
* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland&#039;s example of &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; meaning two different things.&lt;br /&gt;
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==The derivative as a linear transformation in the several variable case and a number in the single-variable case==&lt;br /&gt;
&lt;br /&gt;
* The thing where the total derivative for &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt; &amp;quot;should&amp;quot; be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&amp;amp;lpg=PR1&amp;amp;pg=PA357 &amp;quot;Appendix A: Perorations of Dieudonne&amp;quot;] (p. 337) in Pugh&#039;s &#039;&#039;Real Mathematical Analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
==Total derivative versus derivative matrix==&lt;br /&gt;
&lt;br /&gt;
Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, so many books call the total derivative a matrix or equate the two.&lt;br /&gt;
&lt;br /&gt;
A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Summary table of multivariable derivatives]]&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3208</id>
		<title>Notational confusion of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3208"/>
		<updated>2020-05-30T22:50:33Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;I think there&#039;s several different confusions that arise from multivariable derivative notation:&lt;br /&gt;
&lt;br /&gt;
* The thing where &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; can mean two different things on LHS and RHS when &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; is used as both an initial and intermediate variable. (See Folland for details.)&lt;br /&gt;
* The thing where if &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt; then &amp;lt;math&amp;gt;\frac{\partial f}{\partial x}(x,x)&amp;lt;/math&amp;gt; feels like it might be &amp;lt;math&amp;gt;(2x,2x)&amp;lt;/math&amp;gt; even though it&#039;s actually &amp;lt;math&amp;gt;(2x,0)&amp;lt;/math&amp;gt;. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]&lt;br /&gt;
* The ambiguity of expressions like &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt;&lt;br /&gt;
* dual basis stuff -- see Tao&#039;s explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]&lt;br /&gt;
&lt;br /&gt;
Working off the example from Tao above, let &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt;. What does &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt; mean? Here are four possibilities:&lt;br /&gt;
&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point &amp;lt;math&amp;gt;(x_0,y_0)&amp;lt;/math&amp;gt;, then &amp;lt;math&amp;gt;f&#039;(x_0,y_0)&amp;lt;/math&amp;gt; is a linear map &amp;lt;math&amp;gt;\mathbf R^2 \to \mathbf R^2&amp;lt;/math&amp;gt; defined by the matrix &amp;lt;math&amp;gt;\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}&amp;lt;/math&amp;gt;. Applying this linear map to (x,x), we get &amp;lt;math&amp;gt;\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}\begin{pmatrix}x \\ x\end{pmatrix} = \begin{pmatrix}2x_0x \\ 2y_0x\end{pmatrix} \in \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule &amp;lt;math&amp;gt;\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}&amp;lt;/math&amp;gt; to compute) evaluated at the point (x,x). We have &amp;lt;math&amp;gt;(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})&amp;lt;/math&amp;gt;. We can&#039;t evaluate this further since we don&#039;t know how x and y are related. If we assume the relationship y=x then this reduces to (2x,2x).&lt;br /&gt;
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;. We now differentiate this function. The result is the function &amp;lt;math&amp;gt;x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
# There&#039;s implicitly a function &amp;lt;math&amp;gt;\phi(x,y) = (x,x)&amp;lt;/math&amp;gt;, so &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x) = \frac{d}{dx} f(\phi(x,y))&amp;lt;/math&amp;gt;. Using the chain rule, this is &amp;lt;math&amp;gt;(f\circ \phi)&#039;(x,y) = f&#039;(\phi(x,y))\phi&#039;(x,y) = \left.\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}\right\vert_{(x_0,y_0) = \phi(x,y)} \begin{pmatrix}1 &amp;amp; 0 \\ 1 &amp;amp; 0\end{pmatrix} = \begin{pmatrix}2x &amp;amp; 0 \\ 2x &amp;amp; 0\end{pmatrix}&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==Big picture==&lt;br /&gt;
&lt;br /&gt;
Why is this notation so confusing? I think there are two (?) big reasons:&lt;br /&gt;
&lt;br /&gt;
* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland&#039;s example of &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; meaning two different things.&lt;br /&gt;
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==The derivative as a linear transformation in the several variable case and a number in the single-variable case==&lt;br /&gt;
&lt;br /&gt;
* The thing where the total derivative for &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt; &amp;quot;should&amp;quot; be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&amp;amp;lpg=PR1&amp;amp;pg=PA357 &amp;quot;Appendix A: Perorations of Dieudonne&amp;quot;] (p. 337) in Pugh&#039;s &#039;&#039;Real Mathematical Analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
==Total derivative versus derivative matrix==&lt;br /&gt;
&lt;br /&gt;
Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, so many books call the total derivative a matrix or equate the two.&lt;br /&gt;
&lt;br /&gt;
A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Summary table of multivariable derivatives]]&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3207</id>
		<title>Notational confusion of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3207"/>
		<updated>2020-05-30T22:49:54Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;I think there&#039;s several different confusions that arise from multivariable derivative notation:&lt;br /&gt;
&lt;br /&gt;
* The thing where &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; can mean two different things on LHS and RHS when &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; is used as both an initial and intermediate variable. (See Folland for details.)&lt;br /&gt;
* The thing where if &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt; then &amp;lt;math&amp;gt;\frac{\partial f}{\partial x}(x,x)&amp;lt;/math&amp;gt; feels like it might be &amp;lt;math&amp;gt;(2x,2x)&amp;lt;/math&amp;gt; even though it&#039;s actually &amp;lt;math&amp;gt;(2x,0)&amp;lt;/math&amp;gt;. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]&lt;br /&gt;
* The ambiguity of expressions like &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt;&lt;br /&gt;
* dual basis stuff -- see Tao&#039;s explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]&lt;br /&gt;
&lt;br /&gt;
Working off the example from Tao above, let &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt;. What does &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt; mean? Here are three possibilities:&lt;br /&gt;
&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point &amp;lt;math&amp;gt;(x_0,y_0)&amp;lt;/math&amp;gt;, then &amp;lt;math&amp;gt;f&#039;(x_0,y_0)&amp;lt;/math&amp;gt; is a linear map &amp;lt;math&amp;gt;\mathbf R^2 \to \mathbf R^2&amp;lt;/math&amp;gt; defined by the matrix &amp;lt;math&amp;gt;\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}&amp;lt;/math&amp;gt;. Applying this linear map to (x,x), we get &amp;lt;math&amp;gt;\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}\begin{pmatrix}x \\ x\end{pmatrix} = \begin{pmatrix}2x_0x \\ 2y_0x\end{pmatrix} \in \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule &amp;lt;math&amp;gt;\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}&amp;lt;/math&amp;gt; to compute) evaluated at the point (x,x). We have &amp;lt;math&amp;gt;(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})&amp;lt;/math&amp;gt;. We can&#039;t evaluate this further since we don&#039;t know how x and y are related. If we assume the relationship y=x then this reduces to (2x,2x).&lt;br /&gt;
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;. We now differentiate this function. The result is the function &amp;lt;math&amp;gt;x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
# There&#039;s implicitly a function &amp;lt;math&amp;gt;\phi(x,y) = (x,x)&amp;lt;/math&amp;gt;, so &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x) = \frac{d}{dx} f(\phi(x,y))&amp;lt;/math&amp;gt;. Using the chain rule, this is &amp;lt;math&amp;gt;(f\circ \phi)&#039;(x,y) = f&#039;(\phi(x,y))\phi&#039;(x,y) = \left.\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}\right\vert_{(x_0,y_0) = \phi(x,y)} \begin{pmatrix}1 &amp;amp; 0 \\ 1 &amp;amp; 0\end{pmatrix} = \begin{pmatrix}2x &amp;amp; 0 \\ 2x &amp;amp; 0\end{pmatrix}&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==Big picture==&lt;br /&gt;
&lt;br /&gt;
Why is this notation so confusing? I think there are two (?) big reasons:&lt;br /&gt;
&lt;br /&gt;
* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland&#039;s example of &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; meaning two different things.&lt;br /&gt;
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==The derivative as a linear transformation in the several variable case and a number in the single-variable case==&lt;br /&gt;
&lt;br /&gt;
* The thing where the total derivative for &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt; &amp;quot;should&amp;quot; be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&amp;amp;lpg=PR1&amp;amp;pg=PA357 &amp;quot;Appendix A: Perorations of Dieudonne&amp;quot;] (p. 337) in Pugh&#039;s &#039;&#039;Real Mathematical Analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
==Total derivative versus derivative matrix==&lt;br /&gt;
&lt;br /&gt;
Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, so many books call the total derivative a matrix or equate the two.&lt;br /&gt;
&lt;br /&gt;
A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Summary table of multivariable derivatives]]&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3206</id>
		<title>Notational confusion of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3206"/>
		<updated>2020-05-30T22:48:28Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;I think there&#039;s several different confusions that arise from multivariable derivative notation:&lt;br /&gt;
&lt;br /&gt;
* The thing where &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; can mean two different things on LHS and RHS when &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; is used as both an initial and intermediate variable. (See Folland for details.)&lt;br /&gt;
* The thing where if &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt; then &amp;lt;math&amp;gt;\frac{\partial f}{\partial x}(x,x)&amp;lt;/math&amp;gt; feels like it might be &amp;lt;math&amp;gt;(2x,2x)&amp;lt;/math&amp;gt; even though it&#039;s actually &amp;lt;math&amp;gt;(2x,0)&amp;lt;/math&amp;gt;. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]&lt;br /&gt;
* The ambiguity of expressions like &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt;&lt;br /&gt;
* dual basis stuff -- see Tao&#039;s explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]&lt;br /&gt;
&lt;br /&gt;
Working off the example from Tao above, let &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt;. What does &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt; mean? Here are three possibilities:&lt;br /&gt;
&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point &amp;lt;math&amp;gt;(x_0,y_0)&amp;lt;/math&amp;gt;, then &amp;lt;math&amp;gt;f&#039;(x_0,y_0)&amp;lt;/math&amp;gt; is a linear map &amp;lt;math&amp;gt;\mathbf R^2 \to \mathbf R^2&amp;lt;/math&amp;gt; defined by the matrix &amp;lt;math&amp;gt;\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}&amp;lt;/math&amp;gt;. Applying this linear map to (x,x), we get &amp;lt;math&amp;gt;\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}\begin{pmatrix}x \\ x\end{pmatrix} = \begin{pmatrix}2x_0x \\ 2y_0x\end{pmatrix} \in \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule &amp;lt;math&amp;gt;\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}&amp;lt;/math&amp;gt; to compute) evaluated at the point (x,x). We have &amp;lt;math&amp;gt;(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})&amp;lt;/math&amp;gt;. We can&#039;t evaluate this further since we don&#039;t know how x and y are related. If we assume the relationship y=x then this reduces to (2x,2x).&lt;br /&gt;
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;. We now differentiate this function. The result is the function &amp;lt;math&amp;gt;x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
# There&#039;s implicitly a function &amp;lt;math&amp;gt;\phi(x,y) = (x,x)&amp;lt;/math&amp;gt;, so &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x) = \frac{d}{dx} f(\phi(x,y))&amp;lt;/math&amp;gt;. Using the chain rule, this is &amp;lt;math&amp;gt;(f\circ \phi)&#039;(x,y) = f&#039;(\phi(x,y))\phi&#039;(x,y) = \left.\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}\right\vert_{(x_0,y_0) = \phi(x,y)} \begin{pmatrix}1 &amp;amp; 0 \\ 1 &amp;amp; 0\end{pmatrix}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Big picture==&lt;br /&gt;
&lt;br /&gt;
Why is this notation so confusing? I think there are two (?) big reasons:&lt;br /&gt;
&lt;br /&gt;
* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland&#039;s example of &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; meaning two different things.&lt;br /&gt;
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==The derivative as a linear transformation in the several variable case and a number in the single-variable case==&lt;br /&gt;
&lt;br /&gt;
* The thing where the total derivative for &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt; &amp;quot;should&amp;quot; be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&amp;amp;lpg=PR1&amp;amp;pg=PA357 &amp;quot;Appendix A: Perorations of Dieudonne&amp;quot;] (p. 337) in Pugh&#039;s &#039;&#039;Real Mathematical Analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
==Total derivative versus derivative matrix==&lt;br /&gt;
&lt;br /&gt;
Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, so many books call the total derivative a matrix or equate the two.&lt;br /&gt;
&lt;br /&gt;
A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Summary table of multivariable derivatives]]&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Summary_table_of_multivariable_derivatives&amp;diff=3205</id>
		<title>Summary table of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Summary_table_of_multivariable_derivatives&amp;diff=3205"/>
		<updated>2020-05-30T22:16:34Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: /* Real-valued function of Rn */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page is a &#039;&#039;&#039;summary table of multivariable derivatives&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
* TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied&lt;br /&gt;
&lt;br /&gt;
==Single-variable real function==&lt;br /&gt;
&lt;br /&gt;
For comparison and completeness, we give a summary table of the single-variable derivative. Let &amp;lt;math&amp;gt;f\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt; be a single-variable real function.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x) = \lim_{h\to0} \frac{f(x+h) - f(x)}{h}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; at &amp;lt;math&amp;gt;x_0 \in \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{df}{dx}(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\left.\frac{d}{dx}f(x)\right|_{x=x_0}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\begin{align}f&#039;(x_0) &amp;amp;= \lim_{h\to0} \frac{f(x_0+h) - f(x_0)}{h} \\ &amp;amp;= \lim_{x\to x_0} \frac{f(x) - f(x_0)}{x-x_0}\end{align}&amp;lt;/math&amp;gt; || In the most general multivariable case, &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; will become a linear transformation, so analogously we may wish to talk about the single-variable &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; as the function &amp;lt;math&amp;gt;f&#039;(x_0)\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt; defined by &amp;lt;math&amp;gt;f&#039;(x_0)(x) = f&#039;(x_0)x&amp;lt;/math&amp;gt;, where on the left side &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a function and on the right side &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a number. If &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a function, we can evaluate it at &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt; to recover the number: &amp;lt;math&amp;gt;f&#039;(x_0)(1)&amp;lt;/math&amp;gt;. This is pretty confusing, and in practice everyone thinks of &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; in the single-variable case as a number, making the notation divergent; see [[Notational confusion of multivariable derivatives#The derivative as a linear transformation in the several variable case and a number in the single-variable case|Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case]] for more information.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Real-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;&amp;lt;sup&amp;gt;&#039;&#039;n&#039;&#039;&amp;lt;/sup&amp;gt;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; be a real-valued function of &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Partial derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; with respect to its &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\partial_j f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_{x_j} f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{\partial f}{\partial x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_{x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_j&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}&amp;lt;/math&amp;gt; || Here &amp;lt;math&amp;gt;e_j = (0,\ldots,1,\ldots,0)&amp;lt;/math&amp;gt; is the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th vector of the standard basis, i.e. the vector with all zeroes except a one in the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th spot. Therefore &amp;lt;math&amp;gt;x + te_j&amp;lt;/math&amp;gt; can also be written &amp;lt;math&amp;gt;(x_1,\ldots, x_j + t, \ldots, x_n)&amp;lt;/math&amp;gt; when broken down into components.&lt;br /&gt;
|-&lt;br /&gt;
| Gradient || &amp;lt;math&amp;gt;\nabla f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\nabla f(x) = (\partial_1 f(x), \ldots, \partial_n f(x))&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Gradient at &amp;lt;math&amp;gt;x_0 \in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\nabla f(x_0)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\mathcal M_{1,n}(\mathbf R)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(\partial_1 f(x_0), \ldots, \partial_n f(x_0))&amp;lt;/math&amp;gt; or the vector &amp;lt;math&amp;gt;c&amp;lt;/math&amp;gt; such that &amp;lt;math&amp;gt;\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - c\cdot (x-x_0)\|}{\|x-x_0\|} = 0 &amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Directional derivative in the direction of &amp;lt;math&amp;gt;v&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_v f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}&amp;lt;/math&amp;gt; || When &amp;lt;math&amp;gt;v = e_j&amp;lt;/math&amp;gt;, this reduces to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th partial derivative.&lt;br /&gt;
|-&lt;br /&gt;
| Total derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\frac{df}{dx_j}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R&amp;lt;/math&amp;gt; || For &amp;lt;math&amp;gt;i \ne j&amp;lt;/math&amp;gt;, we treat the variable &amp;lt;math&amp;gt;x_i = g_i(x_j)&amp;lt;/math&amp;gt; as a function of &amp;lt;math&amp;gt;x_j&amp;lt;/math&amp;gt;, and take the single-variable derivative with respect to &amp;lt;math&amp;gt;x_j&amp;lt;/math&amp;gt; (more formally, &amp;lt;math&amp;gt;g : \mathbf R \to \mathbf R^n&amp;lt;/math&amp;gt; is a function such that the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th component &amp;lt;math&amp;gt;g_j = \mathrm{id}&amp;lt;/math&amp;gt; is the identity function). From the chain rule this becomes &amp;lt;math&amp;gt;\frac{df}{dx_j} = \nabla f(x) \cdot g&#039;(x) = \frac{\partial f}{\partial x_1} \frac{dx_1}{dx_j} + \cdots + \frac{\partial f}{\partial x_n} \frac{dx_n}{dx_j}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
I think in this case, since &amp;lt;math&amp;gt;f&#039;(x_0)(v)&amp;lt;/math&amp;gt; coincides with &amp;lt;math&amp;gt;\nabla f(x_0)\cdot v&amp;lt;/math&amp;gt;, people don&#039;t usually define the derivative separately. For example, Folland in &#039;&#039;Advanced Calculus&#039;&#039; defines &#039;&#039;differentiability&#039;&#039; but not the derivative! He just says that the vector that makes a function differentiable is the gradient.&lt;br /&gt;
&lt;br /&gt;
&amp;quot;Total derivative&amp;quot; is used for two different things (which coincide in the special case where they both make sense); see my answer https://math.stackexchange.com/a/3698838/35525 for details.&lt;br /&gt;
&lt;br /&gt;
TODO: answer questions like &amp;quot;Is the gradient the derivative?&amp;quot;&lt;br /&gt;
&lt;br /&gt;
==Vector-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R \to \mathbf R^m&amp;lt;/math&amp;gt; be a vector-valued function of &amp;lt;math&amp;gt;\mathbf R&amp;lt;/math&amp;gt;. A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like &amp;lt;math&amp;gt;\mathbf f&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Velocity vector at &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;v(t)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;Df(t)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(f_1&#039;(t), \ldots, f_n&#039;(t))&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.&lt;br /&gt;
&lt;br /&gt;
==Vector-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;&amp;lt;sup&amp;gt;&#039;&#039;n&#039;&#039;&amp;lt;/sup&amp;gt;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; be a vector-valued function of &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt;. Since the function is vector-valued, some authors use a boldface letter like &amp;lt;math&amp;gt;\mathbf f&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Partial derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\partial_j f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_{x_j} f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{\partial f}{\partial x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_{x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_j&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Directional derivative in the direction of &amp;lt;math&amp;gt;v&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_v f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Total or Fréchet derivative (sometimes just called the derivative) at point &amp;lt;math&amp;gt;x_0\in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;(Df)_{x_0}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;d_{x_0}f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || The linear transformation &amp;lt;math&amp;gt;L&amp;lt;/math&amp;gt; such that &amp;lt;math&amp;gt;\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 &amp;lt;/math&amp;gt; || The derivative &#039;&#039;at a given point&#039;&#039; is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to &amp;quot;&amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt;&amp;quot; as we can in the single-variable case. Its type would have to be &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; or more specifically &amp;lt;math&amp;gt;\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)&amp;lt;/math&amp;gt; (where &amp;lt;math&amp;gt;\mathcal L(\mathbf R^n, \mathbf R^m)&amp;lt;/math&amp;gt; is the set of linear transformations from &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; to &amp;lt;math&amp;gt;\mathbf R^m&amp;lt;/math&amp;gt;). Also the notation &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; is slightly confusing: if the total derivative is a function, what happens if &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt;? We see that &amp;lt;math&amp;gt;f&#039;(x_0)\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt;, so the single-variable derivative isn&#039;t actually a number! To get the actual slope of the tangent line, we must evaluate the function at &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt;: &amp;lt;math&amp;gt;f&#039;(x_0)(1) \in \mathbf R&amp;lt;/math&amp;gt;. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.&lt;br /&gt;
|-&lt;br /&gt;
| Derivative matrix, differential matrix, Jacobian matrix at point &amp;lt;math&amp;gt;x_0\in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;Df(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\mathcal M(f&#039;(x_0))&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathcal M_{m,n}(\mathbf R)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\begin{pmatrix}\partial_1 f_1(x_0) &amp;amp; \cdots &amp;amp; \partial_n f_1(x_0) \\ \vdots &amp;amp; \ddots &amp;amp; \vdots \\ \partial_1 f_n(x_0) &amp;amp; \cdots &amp;amp; \partial_n f_n(x_0)\end{pmatrix}&amp;lt;/math&amp;gt; || Since the total derivative is a linear transformation, and since linear transformations from &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; to &amp;lt;math&amp;gt;\mathbf R^m&amp;lt;/math&amp;gt; have a one-to-one correspondence with real-valued &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative &#039;&#039;is&#039;&#039; the matrix. TODO: talk about gradient vectors as rows.&lt;br /&gt;
|-&lt;br /&gt;
| Total derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\frac{df}{dx_j}&amp;lt;/math&amp;gt; || || ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Note the absence of the gradient in the above table. The generalization of the gradient to the &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; case is the derivative matrix.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Notational confusion of multivariable derivatives]]&lt;br /&gt;
* [[Relation between gradient vector and partial derivatives]]&lt;br /&gt;
* [[Relation between gradient vector and directional derivatives]]&lt;br /&gt;
* [[Directional derivative]]&lt;br /&gt;
* [[machinelearning:Summary table of probability terms]]&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&lt;br /&gt;
* Tao, Terence. &#039;&#039;Analysis II&#039;&#039;. 2nd ed. Hindustan Book Agency. 2009.&lt;br /&gt;
* Folland, Gerald B. &#039;&#039;Advanced Calculus&#039;&#039;. Pearson. 2002.&lt;br /&gt;
* Pugh, Charles Chapman. &#039;&#039;Real Mathematical Analysis&#039;&#039;. Springer. 2010.&lt;br /&gt;
&lt;br /&gt;
==External links==&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3204</id>
		<title>Notational confusion of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3204"/>
		<updated>2020-05-30T09:25:27Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;I think there&#039;s several different confusions that arise from multivariable derivative notation:&lt;br /&gt;
&lt;br /&gt;
* The thing where &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; can mean two different things on LHS and RHS when &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; is used as both an initial and intermediate variable. (See Folland for details.)&lt;br /&gt;
* The thing where if &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt; then &amp;lt;math&amp;gt;\frac{\partial f}{\partial x}(x,x)&amp;lt;/math&amp;gt; feels like it might be &amp;lt;math&amp;gt;(2x,2x)&amp;lt;/math&amp;gt; even though it&#039;s actually &amp;lt;math&amp;gt;(2x,0)&amp;lt;/math&amp;gt;. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]&lt;br /&gt;
* The ambiguity of expressions like &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt;&lt;br /&gt;
* dual basis stuff -- see Tao&#039;s explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]&lt;br /&gt;
&lt;br /&gt;
Working off the example from Tao above, let &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt;. What does &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt; mean? Here are three possibilities:&lt;br /&gt;
&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point &amp;lt;math&amp;gt;(x_0,y_0)&amp;lt;/math&amp;gt;, then &amp;lt;math&amp;gt;f&#039;(x_0,y_0)&amp;lt;/math&amp;gt; is a linear map &amp;lt;math&amp;gt;\mathbf R^2 \to \mathbf R^2&amp;lt;/math&amp;gt; defined by the matrix &amp;lt;math&amp;gt;\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}&amp;lt;/math&amp;gt;. Applying this linear map to (x,x), we get &amp;lt;math&amp;gt;\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}\begin{pmatrix}x \\ x\end{pmatrix} = \begin{pmatrix}2x_0x \\ 2y_0x\end{pmatrix} \in \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule &amp;lt;math&amp;gt;\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}&amp;lt;/math&amp;gt; to compute) evaluated at the point (x,x). We have &amp;lt;math&amp;gt;(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})&amp;lt;/math&amp;gt;. We can&#039;t evaluate this further since we don&#039;t know how x and y are related. If we assume the relationship y=x then this reduces to (2x,2x).&lt;br /&gt;
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;. We now differentiate this function. The result is the function &amp;lt;math&amp;gt;x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==Big picture==&lt;br /&gt;
&lt;br /&gt;
Why is this notation so confusing? I think there are two (?) big reasons:&lt;br /&gt;
&lt;br /&gt;
* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland&#039;s example of &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; meaning two different things.&lt;br /&gt;
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==The derivative as a linear transformation in the several variable case and a number in the single-variable case==&lt;br /&gt;
&lt;br /&gt;
* The thing where the total derivative for &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt; &amp;quot;should&amp;quot; be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&amp;amp;lpg=PR1&amp;amp;pg=PA357 &amp;quot;Appendix A: Perorations of Dieudonne&amp;quot;] (p. 337) in Pugh&#039;s &#039;&#039;Real Mathematical Analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
==Total derivative versus derivative matrix==&lt;br /&gt;
&lt;br /&gt;
Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, so many books call the total derivative a matrix or equate the two.&lt;br /&gt;
&lt;br /&gt;
A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Summary table of multivariable derivatives]]&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3203</id>
		<title>Notational confusion of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3203"/>
		<updated>2020-05-30T09:25:11Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;I think there&#039;s several different confusions that arise from multivariable derivative notation:&lt;br /&gt;
&lt;br /&gt;
* The thing where &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; can mean two different things on LHS and RHS when &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; is used as both an initial and intermediate variable. (See Folland for details.)&lt;br /&gt;
* The thing where if &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt; then &amp;lt;math&amp;gt;\frac{\partial f}{\partial x}(x,x)&amp;lt;/math&amp;gt; feels like it might be &amp;lt;math&amp;gt;(2x,2x)&amp;lt;/math&amp;gt; even though it&#039;s actually &amp;lt;math&amp;gt;(2x,0)&amp;lt;/math&amp;gt;. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]&lt;br /&gt;
* The ambiguity of expressions like &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt;&lt;br /&gt;
* dual basis stuff -- see Tao&#039;s explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]&lt;br /&gt;
&lt;br /&gt;
Working off the example from Tao above, let &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt;. What does &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt; mean? Here are three possibilities:&lt;br /&gt;
&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point &amp;lt;math&amp;gt;(x_0,y_0)&amp;lt;/math&amp;gt;, then &amp;lt;math&amp;gt;f&#039;(x_0,y_0)&amp;lt;/math&amp;gt; is a linear map &amp;lt;math&amp;gt;\mathbf R^2 \to \mathbf R^2&amp;lt;/math&amp;gt; defined by the matrix &amp;lt;math&amp;gt;\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}&amp;lt;/math&amp;gt;. Applying this linear map to (x,x), we get &amp;lt;math&amp;gt;\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}\begin{pmatrix}x \\ x\end{pmatrix} = \begin{pmatrix}2x_0x \\ 2y_0x\end{pmatrix} \in \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule &amp;lt;math&amp;gt;\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}&amp;lt;/math&amp;gt; to compute) evaluated at the point (x,x). We have &amp;lt;math&amp;gt;(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})&amp;lt;/math&amp;gt;. We can&#039;t evaluate this further since we don&#039;t know how x and y are related. If we assume the relationship y=x then this reduces to the following.&lt;br /&gt;
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;. We now differentiate this function. The result is the function &amp;lt;math&amp;gt;x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==Big picture==&lt;br /&gt;
&lt;br /&gt;
Why is this notation so confusing? I think there are two (?) big reasons:&lt;br /&gt;
&lt;br /&gt;
* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland&#039;s example of &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; meaning two different things.&lt;br /&gt;
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==The derivative as a linear transformation in the several variable case and a number in the single-variable case==&lt;br /&gt;
&lt;br /&gt;
* The thing where the total derivative for &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt; &amp;quot;should&amp;quot; be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&amp;amp;lpg=PR1&amp;amp;pg=PA357 &amp;quot;Appendix A: Perorations of Dieudonne&amp;quot;] (p. 337) in Pugh&#039;s &#039;&#039;Real Mathematical Analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
==Total derivative versus derivative matrix==&lt;br /&gt;
&lt;br /&gt;
Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, so many books call the total derivative a matrix or equate the two.&lt;br /&gt;
&lt;br /&gt;
A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Summary table of multivariable derivatives]]&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Summary_table_of_multivariable_derivatives&amp;diff=3202</id>
		<title>Summary table of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Summary_table_of_multivariable_derivatives&amp;diff=3202"/>
		<updated>2020-05-30T09:18:55Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: /* Vector-valued function of Rn */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page is a &#039;&#039;&#039;summary table of multivariable derivatives&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
* TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied&lt;br /&gt;
&lt;br /&gt;
==Single-variable real function==&lt;br /&gt;
&lt;br /&gt;
For comparison and completeness, we give a summary table of the single-variable derivative. Let &amp;lt;math&amp;gt;f\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt; be a single-variable real function.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x) = \lim_{h\to0} \frac{f(x+h) - f(x)}{h}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; at &amp;lt;math&amp;gt;x_0 \in \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{df}{dx}(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\left.\frac{d}{dx}f(x)\right|_{x=x_0}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\begin{align}f&#039;(x_0) &amp;amp;= \lim_{h\to0} \frac{f(x_0+h) - f(x_0)}{h} \\ &amp;amp;= \lim_{x\to x_0} \frac{f(x) - f(x_0)}{x-x_0}\end{align}&amp;lt;/math&amp;gt; || In the most general multivariable case, &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; will become a linear transformation, so analogously we may wish to talk about the single-variable &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; as the function &amp;lt;math&amp;gt;f&#039;(x_0)\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt; defined by &amp;lt;math&amp;gt;f&#039;(x_0)(x) = f&#039;(x_0)x&amp;lt;/math&amp;gt;, where on the left side &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a function and on the right side &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a number. If &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a function, we can evaluate it at &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt; to recover the number: &amp;lt;math&amp;gt;f&#039;(x_0)(1)&amp;lt;/math&amp;gt;. This is pretty confusing, and in practice everyone thinks of &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; in the single-variable case as a number, making the notation divergent; see [[Notational confusion of multivariable derivatives#The derivative as a linear transformation in the several variable case and a number in the single-variable case|Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case]] for more information.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Real-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;&amp;lt;sup&amp;gt;&#039;&#039;n&#039;&#039;&amp;lt;/sup&amp;gt;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; be a real-valued function of &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Partial derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; with respect to its &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\partial_j f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_{x_j} f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{\partial f}{\partial x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_{x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_j&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}&amp;lt;/math&amp;gt; || Here &amp;lt;math&amp;gt;e_j = (0,\ldots,1,\ldots,0)&amp;lt;/math&amp;gt; is the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th vector of the standard basis, i.e. the vector with all zeroes except a one in the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th spot. Therefore &amp;lt;math&amp;gt;x + te_j&amp;lt;/math&amp;gt; can also be written &amp;lt;math&amp;gt;(x_1,\ldots, x_j + t, \ldots, x_n)&amp;lt;/math&amp;gt; when broken down into components.&lt;br /&gt;
|-&lt;br /&gt;
| Gradient || &amp;lt;math&amp;gt;\nabla f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\nabla f(x) = (\partial_1 f(x), \ldots, \partial_n f(x))&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Gradient at &amp;lt;math&amp;gt;x_0 \in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\nabla f(x_0)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\mathcal M_{1,n}(\mathbf R)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(\partial_1 f(x_0), \ldots, \partial_n f(x_0))&amp;lt;/math&amp;gt; or the vector &amp;lt;math&amp;gt;c&amp;lt;/math&amp;gt; such that &amp;lt;math&amp;gt;\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - c\cdot (x-x_0)\|}{\|x-x_0\|} = 0 &amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Directional derivative in the direction of &amp;lt;math&amp;gt;v&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_v f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}&amp;lt;/math&amp;gt; || When &amp;lt;math&amp;gt;v = e_j&amp;lt;/math&amp;gt;, this reduces to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th partial derivative.&lt;br /&gt;
|-&lt;br /&gt;
| Total derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\frac{df}{dx_j}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R&amp;lt;/math&amp;gt; || For &amp;lt;math&amp;gt;i \ne j&amp;lt;/math&amp;gt;, we treat the variable &amp;lt;math&amp;gt;x_i = g_i(x_j)&amp;lt;/math&amp;gt; as a function of &amp;lt;math&amp;gt;x_j&amp;lt;/math&amp;gt;, and take the single-variable derivative with respect to &amp;lt;math&amp;gt;x_j&amp;lt;/math&amp;gt; (more formally, &amp;lt;math&amp;gt;g : \mathbf R \to \mathbf R^n&amp;lt;/math&amp;gt; is a function such that the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th component &amp;lt;math&amp;gt;g_j = \mathrm{id}&amp;lt;/math&amp;gt; is the identity function). From the chain rule this becomes &amp;lt;math&amp;gt;\frac{df}{dx_j} = \nabla f(x) \cdot g&#039;(x) = \frac{\partial f}{\partial x_1} \frac{dx_1}{dx_j} + \cdots + \frac{\partial f}{\partial x_n} \frac{dx_n}{dx_j}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
I think in this case, since &amp;lt;math&amp;gt;f&#039;(x_0)(v)&amp;lt;/math&amp;gt; coincides with &amp;lt;math&amp;gt;\nabla f(x_0)\cdot v&amp;lt;/math&amp;gt;, people don&#039;t usually define the derivative separately. For example, Folland in &#039;&#039;Advanced Calculus&#039;&#039; defines &#039;&#039;differentiability&#039;&#039; but not the derivative! He just says that the vector that makes a function differentiable is the gradient.&lt;br /&gt;
&lt;br /&gt;
TODO: answer questions like &amp;quot;Is the gradient the derivative?&amp;quot;&lt;br /&gt;
&lt;br /&gt;
==Vector-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R \to \mathbf R^m&amp;lt;/math&amp;gt; be a vector-valued function of &amp;lt;math&amp;gt;\mathbf R&amp;lt;/math&amp;gt;. A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like &amp;lt;math&amp;gt;\mathbf f&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Velocity vector at &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;v(t)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;Df(t)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(f_1&#039;(t), \ldots, f_n&#039;(t))&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.&lt;br /&gt;
&lt;br /&gt;
==Vector-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;&amp;lt;sup&amp;gt;&#039;&#039;n&#039;&#039;&amp;lt;/sup&amp;gt;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; be a vector-valued function of &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt;. Since the function is vector-valued, some authors use a boldface letter like &amp;lt;math&amp;gt;\mathbf f&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Partial derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\partial_j f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_{x_j} f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{\partial f}{\partial x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_{x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_j&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Directional derivative in the direction of &amp;lt;math&amp;gt;v&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_v f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Total or Fréchet derivative (sometimes just called the derivative) at point &amp;lt;math&amp;gt;x_0\in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;(Df)_{x_0}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;d_{x_0}f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || The linear transformation &amp;lt;math&amp;gt;L&amp;lt;/math&amp;gt; such that &amp;lt;math&amp;gt;\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 &amp;lt;/math&amp;gt; || The derivative &#039;&#039;at a given point&#039;&#039; is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to &amp;quot;&amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt;&amp;quot; as we can in the single-variable case. Its type would have to be &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; or more specifically &amp;lt;math&amp;gt;\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)&amp;lt;/math&amp;gt; (where &amp;lt;math&amp;gt;\mathcal L(\mathbf R^n, \mathbf R^m)&amp;lt;/math&amp;gt; is the set of linear transformations from &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; to &amp;lt;math&amp;gt;\mathbf R^m&amp;lt;/math&amp;gt;). Also the notation &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; is slightly confusing: if the total derivative is a function, what happens if &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt;? We see that &amp;lt;math&amp;gt;f&#039;(x_0)\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt;, so the single-variable derivative isn&#039;t actually a number! To get the actual slope of the tangent line, we must evaluate the function at &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt;: &amp;lt;math&amp;gt;f&#039;(x_0)(1) \in \mathbf R&amp;lt;/math&amp;gt;. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.&lt;br /&gt;
|-&lt;br /&gt;
| Derivative matrix, differential matrix, Jacobian matrix at point &amp;lt;math&amp;gt;x_0\in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;Df(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\mathcal M(f&#039;(x_0))&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathcal M_{m,n}(\mathbf R)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\begin{pmatrix}\partial_1 f_1(x_0) &amp;amp; \cdots &amp;amp; \partial_n f_1(x_0) \\ \vdots &amp;amp; \ddots &amp;amp; \vdots \\ \partial_1 f_n(x_0) &amp;amp; \cdots &amp;amp; \partial_n f_n(x_0)\end{pmatrix}&amp;lt;/math&amp;gt; || Since the total derivative is a linear transformation, and since linear transformations from &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; to &amp;lt;math&amp;gt;\mathbf R^m&amp;lt;/math&amp;gt; have a one-to-one correspondence with real-valued &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative &#039;&#039;is&#039;&#039; the matrix. TODO: talk about gradient vectors as rows.&lt;br /&gt;
|-&lt;br /&gt;
| Total derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\frac{df}{dx_j}&amp;lt;/math&amp;gt; || || ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Note the absence of the gradient in the above table. The generalization of the gradient to the &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; case is the derivative matrix.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Notational confusion of multivariable derivatives]]&lt;br /&gt;
* [[Relation between gradient vector and partial derivatives]]&lt;br /&gt;
* [[Relation between gradient vector and directional derivatives]]&lt;br /&gt;
* [[Directional derivative]]&lt;br /&gt;
* [[machinelearning:Summary table of probability terms]]&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&lt;br /&gt;
* Tao, Terence. &#039;&#039;Analysis II&#039;&#039;. 2nd ed. Hindustan Book Agency. 2009.&lt;br /&gt;
* Folland, Gerald B. &#039;&#039;Advanced Calculus&#039;&#039;. Pearson. 2002.&lt;br /&gt;
* Pugh, Charles Chapman. &#039;&#039;Real Mathematical Analysis&#039;&#039;. Springer. 2010.&lt;br /&gt;
&lt;br /&gt;
==External links==&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Summary_table_of_multivariable_derivatives&amp;diff=3201</id>
		<title>Summary table of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Summary_table_of_multivariable_derivatives&amp;diff=3201"/>
		<updated>2020-05-30T09:18:39Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: /* Vector-valued function of Rn */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page is a &#039;&#039;&#039;summary table of multivariable derivatives&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
* TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied&lt;br /&gt;
&lt;br /&gt;
==Single-variable real function==&lt;br /&gt;
&lt;br /&gt;
For comparison and completeness, we give a summary table of the single-variable derivative. Let &amp;lt;math&amp;gt;f\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt; be a single-variable real function.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x) = \lim_{h\to0} \frac{f(x+h) - f(x)}{h}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; at &amp;lt;math&amp;gt;x_0 \in \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{df}{dx}(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\left.\frac{d}{dx}f(x)\right|_{x=x_0}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\begin{align}f&#039;(x_0) &amp;amp;= \lim_{h\to0} \frac{f(x_0+h) - f(x_0)}{h} \\ &amp;amp;= \lim_{x\to x_0} \frac{f(x) - f(x_0)}{x-x_0}\end{align}&amp;lt;/math&amp;gt; || In the most general multivariable case, &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; will become a linear transformation, so analogously we may wish to talk about the single-variable &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; as the function &amp;lt;math&amp;gt;f&#039;(x_0)\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt; defined by &amp;lt;math&amp;gt;f&#039;(x_0)(x) = f&#039;(x_0)x&amp;lt;/math&amp;gt;, where on the left side &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a function and on the right side &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a number. If &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a function, we can evaluate it at &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt; to recover the number: &amp;lt;math&amp;gt;f&#039;(x_0)(1)&amp;lt;/math&amp;gt;. This is pretty confusing, and in practice everyone thinks of &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; in the single-variable case as a number, making the notation divergent; see [[Notational confusion of multivariable derivatives#The derivative as a linear transformation in the several variable case and a number in the single-variable case|Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case]] for more information.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Real-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;&amp;lt;sup&amp;gt;&#039;&#039;n&#039;&#039;&amp;lt;/sup&amp;gt;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; be a real-valued function of &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Partial derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; with respect to its &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\partial_j f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_{x_j} f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{\partial f}{\partial x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_{x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_j&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}&amp;lt;/math&amp;gt; || Here &amp;lt;math&amp;gt;e_j = (0,\ldots,1,\ldots,0)&amp;lt;/math&amp;gt; is the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th vector of the standard basis, i.e. the vector with all zeroes except a one in the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th spot. Therefore &amp;lt;math&amp;gt;x + te_j&amp;lt;/math&amp;gt; can also be written &amp;lt;math&amp;gt;(x_1,\ldots, x_j + t, \ldots, x_n)&amp;lt;/math&amp;gt; when broken down into components.&lt;br /&gt;
|-&lt;br /&gt;
| Gradient || &amp;lt;math&amp;gt;\nabla f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\nabla f(x) = (\partial_1 f(x), \ldots, \partial_n f(x))&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Gradient at &amp;lt;math&amp;gt;x_0 \in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\nabla f(x_0)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\mathcal M_{1,n}(\mathbf R)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(\partial_1 f(x_0), \ldots, \partial_n f(x_0))&amp;lt;/math&amp;gt; or the vector &amp;lt;math&amp;gt;c&amp;lt;/math&amp;gt; such that &amp;lt;math&amp;gt;\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - c\cdot (x-x_0)\|}{\|x-x_0\|} = 0 &amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Directional derivative in the direction of &amp;lt;math&amp;gt;v&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_v f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}&amp;lt;/math&amp;gt; || When &amp;lt;math&amp;gt;v = e_j&amp;lt;/math&amp;gt;, this reduces to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th partial derivative.&lt;br /&gt;
|-&lt;br /&gt;
| Total derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\frac{df}{dx_j}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R&amp;lt;/math&amp;gt; || For &amp;lt;math&amp;gt;i \ne j&amp;lt;/math&amp;gt;, we treat the variable &amp;lt;math&amp;gt;x_i = g_i(x_j)&amp;lt;/math&amp;gt; as a function of &amp;lt;math&amp;gt;x_j&amp;lt;/math&amp;gt;, and take the single-variable derivative with respect to &amp;lt;math&amp;gt;x_j&amp;lt;/math&amp;gt; (more formally, &amp;lt;math&amp;gt;g : \mathbf R \to \mathbf R^n&amp;lt;/math&amp;gt; is a function such that the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th component &amp;lt;math&amp;gt;g_j = \mathrm{id}&amp;lt;/math&amp;gt; is the identity function). From the chain rule this becomes &amp;lt;math&amp;gt;\frac{df}{dx_j} = \nabla f(x) \cdot g&#039;(x) = \frac{\partial f}{\partial x_1} \frac{dx_1}{dx_j} + \cdots + \frac{\partial f}{\partial x_n} \frac{dx_n}{dx_j}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
I think in this case, since &amp;lt;math&amp;gt;f&#039;(x_0)(v)&amp;lt;/math&amp;gt; coincides with &amp;lt;math&amp;gt;\nabla f(x_0)\cdot v&amp;lt;/math&amp;gt;, people don&#039;t usually define the derivative separately. For example, Folland in &#039;&#039;Advanced Calculus&#039;&#039; defines &#039;&#039;differentiability&#039;&#039; but not the derivative! He just says that the vector that makes a function differentiable is the gradient.&lt;br /&gt;
&lt;br /&gt;
TODO: answer questions like &amp;quot;Is the gradient the derivative?&amp;quot;&lt;br /&gt;
&lt;br /&gt;
==Vector-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R \to \mathbf R^m&amp;lt;/math&amp;gt; be a vector-valued function of &amp;lt;math&amp;gt;\mathbf R&amp;lt;/math&amp;gt;. A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like &amp;lt;math&amp;gt;\mathbf f&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Velocity vector at &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;v(t)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;Df(t)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(f_1&#039;(t), \ldots, f_n&#039;(t))&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.&lt;br /&gt;
&lt;br /&gt;
==Vector-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;&amp;lt;sup&amp;gt;&#039;&#039;n&#039;&#039;&amp;lt;/sup&amp;gt;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; be a vector-valued function of &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt;. Since the function is vector-valued, some authors use a boldface letter like &amp;lt;math&amp;gt;\mathbf f&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Partial derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\partial_j f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_{x_j} f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{\partial f}{\partial x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_{x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_j&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Directional derivative in the direction of &amp;lt;math&amp;gt;v&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_v f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Total or Fréchet derivative (sometimes just called the derivative) at point &amp;lt;math&amp;gt;x_0\in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;(Df)_{x_0}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;d_{x_0}f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || The linear transformation &amp;lt;math&amp;gt;L&amp;lt;/math&amp;gt; such that &amp;lt;math&amp;gt;\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 &amp;lt;/math&amp;gt; || The derivative &#039;&#039;at a given point&#039;&#039; is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to &amp;quot;&amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt;&amp;quot; as we can in the single-variable case. Its type would have to be &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; or more specifically &amp;lt;math&amp;gt;\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)&amp;lt;/math&amp;gt; (where &amp;lt;math&amp;gt;\mathcal L(\mathbf R^n, \mathbf R^m)&amp;lt;/math&amp;gt; is the set of linear transformations from &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; to &amp;lt;math&amp;gt;\mathbf R^m&amp;lt;/math&amp;gt;). Also the notation &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; is slightly confusing: if the total derivative is a function, what happens if &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt;? We see that &amp;lt;math&amp;gt;f&#039;(x_0)\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt;, so the single-variable derivative isn&#039;t actually a number! To get the actual slope of the tangent line, we must evaluate the function at &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt;: &amp;lt;math&amp;gt;f&#039;(x_0)(1) \in \mathbf R&amp;lt;/math&amp;gt;. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.&lt;br /&gt;
|-&lt;br /&gt;
| Derivative matrix, differential matrix, Jacobian matrix at point &amp;lt;math&amp;gt;x_0\in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;Df(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\mathcal M(f&#039;(x_0))&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathcal M_{m,n}(\mathbf R)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\begin{pmatrix}\partial_1 f_1(x_0) &amp;amp; \cdots &amp;amp; \partial_n f_1(x_0) \\ \vdots &amp;amp; \ddots &amp;amp; \vdots \\ \partial_1 f_n(x_0) &amp;amp; \cdots &amp;amp; \partial_n f_n(x_0)\end{pmatrix}&amp;lt;/math&amp;gt; || Since the total derivative is a linear transformation, and since linear transformations from &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; to &amp;lt;math&amp;gt;\mathbf R^m&amp;lt;/math&amp;gt; have a one-to-one correspondence with real-valued &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative &#039;&#039;is&#039;&#039; the matrix. TODO: talk about gradient vectors as rows.&lt;br /&gt;
|-&lt;br /&gt;
| Total derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\frac{df}{dx_j}&amp;lt;/math&amp;gt; || ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Note the absence of the gradient in the above table. The generalization of the gradient to the &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; case is the derivative matrix.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Notational confusion of multivariable derivatives]]&lt;br /&gt;
* [[Relation between gradient vector and partial derivatives]]&lt;br /&gt;
* [[Relation between gradient vector and directional derivatives]]&lt;br /&gt;
* [[Directional derivative]]&lt;br /&gt;
* [[machinelearning:Summary table of probability terms]]&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&lt;br /&gt;
* Tao, Terence. &#039;&#039;Analysis II&#039;&#039;. 2nd ed. Hindustan Book Agency. 2009.&lt;br /&gt;
* Folland, Gerald B. &#039;&#039;Advanced Calculus&#039;&#039;. Pearson. 2002.&lt;br /&gt;
* Pugh, Charles Chapman. &#039;&#039;Real Mathematical Analysis&#039;&#039;. Springer. 2010.&lt;br /&gt;
&lt;br /&gt;
==External links==&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3200</id>
		<title>Notational confusion of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3200"/>
		<updated>2020-05-30T09:12:50Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;I think there&#039;s several different confusions that arise from multivariable derivative notation:&lt;br /&gt;
&lt;br /&gt;
* The thing where &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; can mean two different things on LHS and RHS when &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; is used as both an initial and intermediate variable. (See Folland for details.)&lt;br /&gt;
* The thing where if &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt; then &amp;lt;math&amp;gt;\frac{\partial f}{\partial x}(x,x)&amp;lt;/math&amp;gt; feels like it might be &amp;lt;math&amp;gt;(2x,2x)&amp;lt;/math&amp;gt; even though it&#039;s actually &amp;lt;math&amp;gt;(2x,0)&amp;lt;/math&amp;gt;. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]&lt;br /&gt;
* The ambiguity of expressions like &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt;&lt;br /&gt;
* dual basis stuff -- see Tao&#039;s explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]&lt;br /&gt;
&lt;br /&gt;
Working off the example from Tao above, let &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt;. What does &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt; mean? Here are three possibilities:&lt;br /&gt;
&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point &amp;lt;math&amp;gt;(x_0,y_0)&amp;lt;/math&amp;gt;, then &amp;lt;math&amp;gt;f&#039;(x_0,y_0)&amp;lt;/math&amp;gt; is a linear map &amp;lt;math&amp;gt;\mathbf R^2 \to \mathbf R^2&amp;lt;/math&amp;gt; defined by the matrix &amp;lt;math&amp;gt;\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}&amp;lt;/math&amp;gt;. Applying this linear map to (x,x), we get &amp;lt;math&amp;gt;\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}\begin{pmatrix}x \\ x\end{pmatrix} = \begin{pmatrix}2x_0x \\ 2y_0x\end{pmatrix} \in \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule &amp;lt;math&amp;gt;\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}&amp;lt;/math&amp;gt; to compute) evaluated at the point (x,x). We have &amp;lt;math&amp;gt;(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})&amp;lt;/math&amp;gt;. We can&#039;t evaluate this further since we don&#039;t know how x and y are related.&lt;br /&gt;
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;. We now differentiate this function. The result is the function &amp;lt;math&amp;gt;x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==Big picture==&lt;br /&gt;
&lt;br /&gt;
Why is this notation so confusing? I think there are two (?) big reasons:&lt;br /&gt;
&lt;br /&gt;
* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland&#039;s example of &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; meaning two different things.&lt;br /&gt;
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==The derivative as a linear transformation in the several variable case and a number in the single-variable case==&lt;br /&gt;
&lt;br /&gt;
* The thing where the total derivative for &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt; &amp;quot;should&amp;quot; be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&amp;amp;lpg=PR1&amp;amp;pg=PA357 &amp;quot;Appendix A: Perorations of Dieudonne&amp;quot;] (p. 337) in Pugh&#039;s &#039;&#039;Real Mathematical Analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
==Total derivative versus derivative matrix==&lt;br /&gt;
&lt;br /&gt;
Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, so many books call the total derivative a matrix or equate the two.&lt;br /&gt;
&lt;br /&gt;
A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Summary table of multivariable derivatives]]&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3199</id>
		<title>Notational confusion of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3199"/>
		<updated>2020-05-30T09:07:33Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;I think there&#039;s several different confusions that arise from multivariable derivative notation:&lt;br /&gt;
&lt;br /&gt;
* The thing where &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; can mean two different things on LHS and RHS when &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; is used as both an initial and intermediate variable. (See Folland for details.)&lt;br /&gt;
* The thing where if &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt; then &amp;lt;math&amp;gt;\frac{\partial f}{\partial x}(x,x)&amp;lt;/math&amp;gt; feels like it might be &amp;lt;math&amp;gt;(2x,2x)&amp;lt;/math&amp;gt; even though it&#039;s actually &amp;lt;math&amp;gt;(2x,0)&amp;lt;/math&amp;gt;. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]&lt;br /&gt;
* The ambiguity of expressions like &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt;&lt;br /&gt;
* dual basis stuff -- see Tao&#039;s explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]&lt;br /&gt;
&lt;br /&gt;
Working off the example from Tao above, let &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt;. What does &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt; mean? Here are three possibilities:&lt;br /&gt;
&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point &amp;lt;math&amp;gt;(x_0,y_0)&amp;lt;/math&amp;gt;, then &amp;lt;math&amp;gt;f&#039;(x_0,y_0)&amp;lt;/math&amp;gt; is a linear map &amp;lt;math&amp;gt;\mathbf R^2 \to \mathbf R^2&amp;lt;/math&amp;gt; defined by the matrix &amp;lt;math&amp;gt;\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}&amp;lt;/math&amp;gt;. Applying this linear map to (x,x), we get &amp;lt;math&amp;gt;\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}\begin{pmatrix}x \\ x\end{pmatrix} = \begin{pmatrix}2x_0x \\ 2y_0x\end{pmatrix} \in \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule &amp;lt;math&amp;gt;\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}&amp;lt;/math&amp;gt; to compute) evaluated at the point (x,x). We have &amp;lt;math&amp;gt;(2x,0) + (0,2y)\frac{dy}{dx} = (2x, 2y\frac{dy}{dx})&amp;lt;/math&amp;gt;.&lt;br /&gt;
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;. We now differentiate this function. The result is the function &amp;lt;math&amp;gt;x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==Big picture==&lt;br /&gt;
&lt;br /&gt;
Why is this notation so confusing? I think there are two (?) big reasons:&lt;br /&gt;
&lt;br /&gt;
* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland&#039;s example of &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; meaning two different things.&lt;br /&gt;
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==The derivative as a linear transformation in the several variable case and a number in the single-variable case==&lt;br /&gt;
&lt;br /&gt;
* The thing where the total derivative for &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt; &amp;quot;should&amp;quot; be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&amp;amp;lpg=PR1&amp;amp;pg=PA357 &amp;quot;Appendix A: Perorations of Dieudonne&amp;quot;] (p. 337) in Pugh&#039;s &#039;&#039;Real Mathematical Analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
==Total derivative versus derivative matrix==&lt;br /&gt;
&lt;br /&gt;
Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, so many books call the total derivative a matrix or equate the two.&lt;br /&gt;
&lt;br /&gt;
A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Summary table of multivariable derivatives]]&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3198</id>
		<title>Notational confusion of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3198"/>
		<updated>2020-05-30T09:05:16Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;I think there&#039;s several different confusions that arise from multivariable derivative notation:&lt;br /&gt;
&lt;br /&gt;
* The thing where &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; can mean two different things on LHS and RHS when &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; is used as both an initial and intermediate variable. (See Folland for details.)&lt;br /&gt;
* The thing where if &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt; then &amp;lt;math&amp;gt;\frac{\partial f}{\partial x}(x,x)&amp;lt;/math&amp;gt; feels like it might be &amp;lt;math&amp;gt;(2x,2x)&amp;lt;/math&amp;gt; even though it&#039;s actually &amp;lt;math&amp;gt;(2x,0)&amp;lt;/math&amp;gt;. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]&lt;br /&gt;
* The ambiguity of expressions like &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt;&lt;br /&gt;
* dual basis stuff -- see Tao&#039;s explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]&lt;br /&gt;
&lt;br /&gt;
Working off the example from Tao above, let &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt;. What does &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt; mean? Here are three possibilities:&lt;br /&gt;
&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point &amp;lt;math&amp;gt;(x_0,y_0)&amp;lt;/math&amp;gt;, then &amp;lt;math&amp;gt;f&#039;(x_0,y_0)&amp;lt;/math&amp;gt; is a linear map &amp;lt;math&amp;gt;\mathbf R^2 \to \mathbf R^2&amp;lt;/math&amp;gt; defined by the matrix &amp;lt;math&amp;gt;\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}&amp;lt;/math&amp;gt;. Applying this linear map to (x,x), we get &amp;lt;math&amp;gt;\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}\begin{pmatrix}x \\ x\end{pmatrix} = \begin{pmatrix}2x_0x \\ 2y_0x\end{pmatrix} \in \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule &amp;lt;math&amp;gt;\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}&amp;lt;/math&amp;gt; to compute) evaluated at the point (x,x).&lt;br /&gt;
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;. We now differentiate this function. The result is the function &amp;lt;math&amp;gt;x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==Big picture==&lt;br /&gt;
&lt;br /&gt;
Why is this notation so confusing? I think there are two (?) big reasons:&lt;br /&gt;
&lt;br /&gt;
* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland&#039;s example of &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; meaning two different things.&lt;br /&gt;
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==The derivative as a linear transformation in the several variable case and a number in the single-variable case==&lt;br /&gt;
&lt;br /&gt;
* The thing where the total derivative for &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt; &amp;quot;should&amp;quot; be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&amp;amp;lpg=PR1&amp;amp;pg=PA357 &amp;quot;Appendix A: Perorations of Dieudonne&amp;quot;] (p. 337) in Pugh&#039;s &#039;&#039;Real Mathematical Analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
==Total derivative versus derivative matrix==&lt;br /&gt;
&lt;br /&gt;
Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, so many books call the total derivative a matrix or equate the two.&lt;br /&gt;
&lt;br /&gt;
A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Summary table of multivariable derivatives]]&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3197</id>
		<title>Notational confusion of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3197"/>
		<updated>2020-05-30T09:03:41Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;I think there&#039;s several different confusions that arise from multivariable derivative notation:&lt;br /&gt;
&lt;br /&gt;
* The thing where &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; can mean two different things on LHS and RHS when &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; is used as both an initial and intermediate variable. (See Folland for details.)&lt;br /&gt;
* The thing where if &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt; then &amp;lt;math&amp;gt;\frac{\partial f}{\partial x}(x,x)&amp;lt;/math&amp;gt; feels like it might be &amp;lt;math&amp;gt;(2x,2x)&amp;lt;/math&amp;gt; even though it&#039;s actually &amp;lt;math&amp;gt;(2x,0)&amp;lt;/math&amp;gt;. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]&lt;br /&gt;
* The ambiguity of expressions like &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt;&lt;br /&gt;
* dual basis stuff -- see Tao&#039;s explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]&lt;br /&gt;
&lt;br /&gt;
Working off the example from Tao above, let &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt;. What does &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt; mean? Here are three possibilities:&lt;br /&gt;
&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; (i.e., the total derivative of f) evaluated at the point (x,x). Once we fix a point &amp;lt;math&amp;gt;(x_0,y_0)&amp;lt;/math&amp;gt;, then &amp;lt;math&amp;gt;f&#039;(x_0,y_0)&amp;lt;/math&amp;gt; is a linear map &amp;lt;math&amp;gt;\mathbf R^2 \to \mathbf R^2&amp;lt;/math&amp;gt; defined by the matrix &amp;lt;math&amp;gt;\begin{pmatrix}2x_0 &amp;amp; 0 \\ 0 &amp;amp; 2y_0\end{pmatrix}&amp;lt;/math&amp;gt;.&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule &amp;lt;math&amp;gt;\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}&amp;lt;/math&amp;gt; to compute) evaluated at the point (x,x).&lt;br /&gt;
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;. We now differentiate this function. The result is the function &amp;lt;math&amp;gt;x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==Big picture==&lt;br /&gt;
&lt;br /&gt;
Why is this notation so confusing? I think there are two (?) big reasons:&lt;br /&gt;
&lt;br /&gt;
* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland&#039;s example of &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; meaning two different things.&lt;br /&gt;
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==The derivative as a linear transformation in the several variable case and a number in the single-variable case==&lt;br /&gt;
&lt;br /&gt;
* The thing where the total derivative for &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt; &amp;quot;should&amp;quot; be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&amp;amp;lpg=PR1&amp;amp;pg=PA357 &amp;quot;Appendix A: Perorations of Dieudonne&amp;quot;] (p. 337) in Pugh&#039;s &#039;&#039;Real Mathematical Analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
==Total derivative versus derivative matrix==&lt;br /&gt;
&lt;br /&gt;
Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, so many books call the total derivative a matrix or equate the two.&lt;br /&gt;
&lt;br /&gt;
A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Summary table of multivariable derivatives]]&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3196</id>
		<title>Notational confusion of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3196"/>
		<updated>2020-05-30T09:01:03Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;I think there&#039;s several different confusions that arise from multivariable derivative notation:&lt;br /&gt;
&lt;br /&gt;
* The thing where &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; can mean two different things on LHS and RHS when &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; is used as both an initial and intermediate variable. (See Folland for details.)&lt;br /&gt;
* The thing where if &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt; then &amp;lt;math&amp;gt;\frac{\partial f}{\partial x}(x,x)&amp;lt;/math&amp;gt; feels like it might be &amp;lt;math&amp;gt;(2x,2x)&amp;lt;/math&amp;gt; even though it&#039;s actually &amp;lt;math&amp;gt;(2x,0)&amp;lt;/math&amp;gt;. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]&lt;br /&gt;
* The ambiguity of expressions like &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt;&lt;br /&gt;
* dual basis stuff -- see Tao&#039;s explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]&lt;br /&gt;
&lt;br /&gt;
Working off the example from Tao above, let &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt;. What does &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt; mean? Here are three possibilities:&lt;br /&gt;
&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; (i.e., the total derivative of f) evaluated at the point (x,x).&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule &amp;lt;math&amp;gt;\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}&amp;lt;/math&amp;gt; to compute) evaluated at the point (x,x).&lt;br /&gt;
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;. We now differentiate this function. The result is the function &amp;lt;math&amp;gt;x \mapsto (2x,2x) : \mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==Big picture==&lt;br /&gt;
&lt;br /&gt;
Why is this notation so confusing? I think there are two (?) big reasons:&lt;br /&gt;
&lt;br /&gt;
* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland&#039;s example of &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; meaning two different things.&lt;br /&gt;
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==The derivative as a linear transformation in the several variable case and a number in the single-variable case==&lt;br /&gt;
&lt;br /&gt;
* The thing where the total derivative for &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt; &amp;quot;should&amp;quot; be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&amp;amp;lpg=PR1&amp;amp;pg=PA357 &amp;quot;Appendix A: Perorations of Dieudonne&amp;quot;] (p. 337) in Pugh&#039;s &#039;&#039;Real Mathematical Analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
==Total derivative versus derivative matrix==&lt;br /&gt;
&lt;br /&gt;
Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, so many books call the total derivative a matrix or equate the two.&lt;br /&gt;
&lt;br /&gt;
A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Summary table of multivariable derivatives]]&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3195</id>
		<title>Notational confusion of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3195"/>
		<updated>2020-05-30T08:59:32Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;I think there&#039;s several different confusions that arise from multivariable derivative notation:&lt;br /&gt;
&lt;br /&gt;
* The thing where &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; can mean two different things on LHS and RHS when &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; is used as both an initial and intermediate variable. (See Folland for details.)&lt;br /&gt;
* The thing where if &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt; then &amp;lt;math&amp;gt;\frac{\partial f}{\partial x}(x,x)&amp;lt;/math&amp;gt; feels like it might be &amp;lt;math&amp;gt;(2x,2x)&amp;lt;/math&amp;gt; even though it&#039;s actually &amp;lt;math&amp;gt;(2x,0)&amp;lt;/math&amp;gt;. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]&lt;br /&gt;
* The ambiguity of expressions like &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt;&lt;br /&gt;
* dual basis stuff -- see Tao&#039;s explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]&lt;br /&gt;
&lt;br /&gt;
Working off the example from Tao above, let &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt;. What does &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt; mean? Here are three possibilities:&lt;br /&gt;
&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; (i.e., the total derivative of f) evaluated at the point (x,x).&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule &amp;lt;math&amp;gt;\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}&amp;lt;/math&amp;gt; to compute) evaluated at the point (x,x).&lt;br /&gt;
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;. We now differentiate this function.&lt;br /&gt;
&lt;br /&gt;
==Big picture==&lt;br /&gt;
&lt;br /&gt;
Why is this notation so confusing? I think there are two (?) big reasons:&lt;br /&gt;
&lt;br /&gt;
* The notation violates the substitution axiom of equality. We write things like z = z(x,y) where the same symbol z now has two different types. Folland&#039;s example of &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; meaning two different things.&lt;br /&gt;
* The precedence of the differentiation operator is sometimes unclear. e.g. this is the case with &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==The derivative as a linear transformation in the several variable case and a number in the single-variable case==&lt;br /&gt;
&lt;br /&gt;
* The thing where the total derivative for &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt; &amp;quot;should&amp;quot; be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&amp;amp;lpg=PR1&amp;amp;pg=PA357 &amp;quot;Appendix A: Perorations of Dieudonne&amp;quot;] (p. 337) in Pugh&#039;s &#039;&#039;Real Mathematical Analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
==Total derivative versus derivative matrix==&lt;br /&gt;
&lt;br /&gt;
Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, so many books call the total derivative a matrix or equate the two.&lt;br /&gt;
&lt;br /&gt;
A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Summary table of multivariable derivatives]]&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3194</id>
		<title>Notational confusion of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3194"/>
		<updated>2020-05-30T08:55:09Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;I think there&#039;s several different confusions that arise from multivariable derivative notation:&lt;br /&gt;
&lt;br /&gt;
* The thing where &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; can mean two different things on LHS and RHS when &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; is used as both an initial and intermediate variable. (See Folland for details.)&lt;br /&gt;
* The thing where if &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt; then &amp;lt;math&amp;gt;\frac{\partial f}{\partial x}(x,x)&amp;lt;/math&amp;gt; feels like it might be &amp;lt;math&amp;gt;(2x,2x)&amp;lt;/math&amp;gt; even though it&#039;s actually &amp;lt;math&amp;gt;(2x,0)&amp;lt;/math&amp;gt;. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]&lt;br /&gt;
* The ambiguity of expressions like &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt;&lt;br /&gt;
* dual basis stuff -- see Tao&#039;s explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]&lt;br /&gt;
&lt;br /&gt;
Working off the example from Tao above, let &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt;. What does &amp;lt;math&amp;gt;\frac{d}{dx} f(x,x)&amp;lt;/math&amp;gt; mean? Here are three possibilities:&lt;br /&gt;
&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; (i.e., the total derivative of f) evaluated at the point (x,x).&lt;br /&gt;
# It&#039;s &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; (i.e., the total derivative of f with respect to x, in which we treat the variable y as a function of x, and use the chain rule &amp;lt;math&amp;gt;\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{dy}{dx}&amp;lt;/math&amp;gt; to compute) evaluated at the point (x,x).&lt;br /&gt;
# We first compute f(x,x) to get an expression involving only x, which implicitly defines a function &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^2&amp;lt;/math&amp;gt;. We now differentiate this function.&lt;br /&gt;
&lt;br /&gt;
==The derivative as a linear transformation in the several variable case and a number in the single-variable case==&lt;br /&gt;
&lt;br /&gt;
* The thing where the total derivative for &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt; &amp;quot;should&amp;quot; be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&amp;amp;lpg=PR1&amp;amp;pg=PA357 &amp;quot;Appendix A: Perorations of Dieudonne&amp;quot;] (p. 337) in Pugh&#039;s &#039;&#039;Real Mathematical Analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
==Total derivative versus derivative matrix==&lt;br /&gt;
&lt;br /&gt;
Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, so many books call the total derivative a matrix or equate the two.&lt;br /&gt;
&lt;br /&gt;
A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Summary table of multivariable derivatives]]&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Summary_table_of_multivariable_derivatives&amp;diff=3193</id>
		<title>Summary table of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Summary_table_of_multivariable_derivatives&amp;diff=3193"/>
		<updated>2020-05-30T08:50:02Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: /* Real-valued function of Rn */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page is a &#039;&#039;&#039;summary table of multivariable derivatives&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
* TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied&lt;br /&gt;
&lt;br /&gt;
==Single-variable real function==&lt;br /&gt;
&lt;br /&gt;
For comparison and completeness, we give a summary table of the single-variable derivative. Let &amp;lt;math&amp;gt;f\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt; be a single-variable real function.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x) = \lim_{h\to0} \frac{f(x+h) - f(x)}{h}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; at &amp;lt;math&amp;gt;x_0 \in \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{df}{dx}(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\left.\frac{d}{dx}f(x)\right|_{x=x_0}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\begin{align}f&#039;(x_0) &amp;amp;= \lim_{h\to0} \frac{f(x_0+h) - f(x_0)}{h} \\ &amp;amp;= \lim_{x\to x_0} \frac{f(x) - f(x_0)}{x-x_0}\end{align}&amp;lt;/math&amp;gt; || In the most general multivariable case, &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; will become a linear transformation, so analogously we may wish to talk about the single-variable &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; as the function &amp;lt;math&amp;gt;f&#039;(x_0)\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt; defined by &amp;lt;math&amp;gt;f&#039;(x_0)(x) = f&#039;(x_0)x&amp;lt;/math&amp;gt;, where on the left side &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a function and on the right side &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a number. If &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a function, we can evaluate it at &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt; to recover the number: &amp;lt;math&amp;gt;f&#039;(x_0)(1)&amp;lt;/math&amp;gt;. This is pretty confusing, and in practice everyone thinks of &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; in the single-variable case as a number, making the notation divergent; see [[Notational confusion of multivariable derivatives#The derivative as a linear transformation in the several variable case and a number in the single-variable case|Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case]] for more information.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Real-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;&amp;lt;sup&amp;gt;&#039;&#039;n&#039;&#039;&amp;lt;/sup&amp;gt;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; be a real-valued function of &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Partial derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; with respect to its &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\partial_j f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_{x_j} f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{\partial f}{\partial x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_{x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_j&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}&amp;lt;/math&amp;gt; || Here &amp;lt;math&amp;gt;e_j = (0,\ldots,1,\ldots,0)&amp;lt;/math&amp;gt; is the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th vector of the standard basis, i.e. the vector with all zeroes except a one in the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th spot. Therefore &amp;lt;math&amp;gt;x + te_j&amp;lt;/math&amp;gt; can also be written &amp;lt;math&amp;gt;(x_1,\ldots, x_j + t, \ldots, x_n)&amp;lt;/math&amp;gt; when broken down into components.&lt;br /&gt;
|-&lt;br /&gt;
| Gradient || &amp;lt;math&amp;gt;\nabla f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\nabla f(x) = (\partial_1 f(x), \ldots, \partial_n f(x))&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Gradient at &amp;lt;math&amp;gt;x_0 \in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\nabla f(x_0)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\mathcal M_{1,n}(\mathbf R)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(\partial_1 f(x_0), \ldots, \partial_n f(x_0))&amp;lt;/math&amp;gt; or the vector &amp;lt;math&amp;gt;c&amp;lt;/math&amp;gt; such that &amp;lt;math&amp;gt;\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - c\cdot (x-x_0)\|}{\|x-x_0\|} = 0 &amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Directional derivative in the direction of &amp;lt;math&amp;gt;v&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_v f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}&amp;lt;/math&amp;gt; || When &amp;lt;math&amp;gt;v = e_j&amp;lt;/math&amp;gt;, this reduces to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th partial derivative.&lt;br /&gt;
|-&lt;br /&gt;
| Total derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\frac{df}{dx_j}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R&amp;lt;/math&amp;gt; || For &amp;lt;math&amp;gt;i \ne j&amp;lt;/math&amp;gt;, we treat the variable &amp;lt;math&amp;gt;x_i = g_i(x_j)&amp;lt;/math&amp;gt; as a function of &amp;lt;math&amp;gt;x_j&amp;lt;/math&amp;gt;, and take the single-variable derivative with respect to &amp;lt;math&amp;gt;x_j&amp;lt;/math&amp;gt; (more formally, &amp;lt;math&amp;gt;g : \mathbf R \to \mathbf R^n&amp;lt;/math&amp;gt; is a function such that the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th component &amp;lt;math&amp;gt;g_j = \mathrm{id}&amp;lt;/math&amp;gt; is the identity function). From the chain rule this becomes &amp;lt;math&amp;gt;\frac{df}{dx_j} = \nabla f(x) \cdot g&#039;(x) = \frac{\partial f}{\partial x_1} \frac{dx_1}{dx_j} + \cdots + \frac{\partial f}{\partial x_n} \frac{dx_n}{dx_j}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
I think in this case, since &amp;lt;math&amp;gt;f&#039;(x_0)(v)&amp;lt;/math&amp;gt; coincides with &amp;lt;math&amp;gt;\nabla f(x_0)\cdot v&amp;lt;/math&amp;gt;, people don&#039;t usually define the derivative separately. For example, Folland in &#039;&#039;Advanced Calculus&#039;&#039; defines &#039;&#039;differentiability&#039;&#039; but not the derivative! He just says that the vector that makes a function differentiable is the gradient.&lt;br /&gt;
&lt;br /&gt;
TODO: answer questions like &amp;quot;Is the gradient the derivative?&amp;quot;&lt;br /&gt;
&lt;br /&gt;
==Vector-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R \to \mathbf R^m&amp;lt;/math&amp;gt; be a vector-valued function of &amp;lt;math&amp;gt;\mathbf R&amp;lt;/math&amp;gt;. A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like &amp;lt;math&amp;gt;\mathbf f&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Velocity vector at &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;v(t)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;Df(t)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(f_1&#039;(t), \ldots, f_n&#039;(t))&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.&lt;br /&gt;
&lt;br /&gt;
==Vector-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;&amp;lt;sup&amp;gt;&#039;&#039;n&#039;&#039;&amp;lt;/sup&amp;gt;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; be a vector-valued function of &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt;. Since the function is vector-valued, some authors use a boldface letter like &amp;lt;math&amp;gt;\mathbf f&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Partial derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\partial_j f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_{x_j} f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{\partial f}{\partial x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_{x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_j&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Directional derivative in the direction of &amp;lt;math&amp;gt;v&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_v f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Total or Fréchet derivative (sometimes just called the derivative) at point &amp;lt;math&amp;gt;x_0\in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;(Df)_{x_0}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;d_{x_0}f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || The linear transformation &amp;lt;math&amp;gt;L&amp;lt;/math&amp;gt; such that &amp;lt;math&amp;gt;\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 &amp;lt;/math&amp;gt; || The derivative &#039;&#039;at a given point&#039;&#039; is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to &amp;quot;&amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt;&amp;quot; as we can in the single-variable case. Its type would have to be &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; or more specifically &amp;lt;math&amp;gt;\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)&amp;lt;/math&amp;gt; (where &amp;lt;math&amp;gt;\mathcal L(\mathbf R^n, \mathbf R^m)&amp;lt;/math&amp;gt; is the set of linear transformations from &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; to &amp;lt;math&amp;gt;\mathbf R^m&amp;lt;/math&amp;gt;). Also the notation &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; is slightly confusing: if the total derivative is a function, what happens if &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt;? We see that &amp;lt;math&amp;gt;f&#039;(x_0)\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt;, so the single-variable derivative isn&#039;t actually a number! To get the actual slope of the tangent line, we must evaluate the function at &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt;: &amp;lt;math&amp;gt;f&#039;(x_0)(1) \in \mathbf R&amp;lt;/math&amp;gt;. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.&lt;br /&gt;
|-&lt;br /&gt;
| Derivative matrix, differential matrix, Jacobian matrix at point &amp;lt;math&amp;gt;x_0\in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;Df(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\mathcal M(f&#039;(x_0))&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathcal M_{m,n}(\mathbf R)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\begin{pmatrix}\partial_1 f_1(x_0) &amp;amp; \cdots &amp;amp; \partial_n f_1(x_0) \\ \vdots &amp;amp; \ddots &amp;amp; \vdots \\ \partial_1 f_n(x_0) &amp;amp; \cdots &amp;amp; \partial_n f_n(x_0)\end{pmatrix}&amp;lt;/math&amp;gt; || Since the total derivative is a linear transformation, and since linear transformations from &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; to &amp;lt;math&amp;gt;\mathbf R^m&amp;lt;/math&amp;gt; have a one-to-one correspondence with real-valued &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative &#039;&#039;is&#039;&#039; the matrix. TODO: talk about gradient vectors as rows.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Note the absence of the gradient in the above table. The generalization of the gradient to the &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; case is the derivative matrix.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Notational confusion of multivariable derivatives]]&lt;br /&gt;
* [[Relation between gradient vector and partial derivatives]]&lt;br /&gt;
* [[Relation between gradient vector and directional derivatives]]&lt;br /&gt;
* [[Directional derivative]]&lt;br /&gt;
* [[machinelearning:Summary table of probability terms]]&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&lt;br /&gt;
* Tao, Terence. &#039;&#039;Analysis II&#039;&#039;. 2nd ed. Hindustan Book Agency. 2009.&lt;br /&gt;
* Folland, Gerald B. &#039;&#039;Advanced Calculus&#039;&#039;. Pearson. 2002.&lt;br /&gt;
* Pugh, Charles Chapman. &#039;&#039;Real Mathematical Analysis&#039;&#039;. Springer. 2010.&lt;br /&gt;
&lt;br /&gt;
==External links==&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Summary_table_of_multivariable_derivatives&amp;diff=3192</id>
		<title>Summary table of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Summary_table_of_multivariable_derivatives&amp;diff=3192"/>
		<updated>2020-05-30T08:49:44Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: /* Real-valued function of Rn */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page is a &#039;&#039;&#039;summary table of multivariable derivatives&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
* TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied&lt;br /&gt;
&lt;br /&gt;
==Single-variable real function==&lt;br /&gt;
&lt;br /&gt;
For comparison and completeness, we give a summary table of the single-variable derivative. Let &amp;lt;math&amp;gt;f\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt; be a single-variable real function.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x) = \lim_{h\to0} \frac{f(x+h) - f(x)}{h}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; at &amp;lt;math&amp;gt;x_0 \in \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{df}{dx}(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\left.\frac{d}{dx}f(x)\right|_{x=x_0}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\begin{align}f&#039;(x_0) &amp;amp;= \lim_{h\to0} \frac{f(x_0+h) - f(x_0)}{h} \\ &amp;amp;= \lim_{x\to x_0} \frac{f(x) - f(x_0)}{x-x_0}\end{align}&amp;lt;/math&amp;gt; || In the most general multivariable case, &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; will become a linear transformation, so analogously we may wish to talk about the single-variable &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; as the function &amp;lt;math&amp;gt;f&#039;(x_0)\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt; defined by &amp;lt;math&amp;gt;f&#039;(x_0)(x) = f&#039;(x_0)x&amp;lt;/math&amp;gt;, where on the left side &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a function and on the right side &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a number. If &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a function, we can evaluate it at &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt; to recover the number: &amp;lt;math&amp;gt;f&#039;(x_0)(1)&amp;lt;/math&amp;gt;. This is pretty confusing, and in practice everyone thinks of &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; in the single-variable case as a number, making the notation divergent; see [[Notational confusion of multivariable derivatives#The derivative as a linear transformation in the several variable case and a number in the single-variable case|Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case]] for more information.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Real-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;&amp;lt;sup&amp;gt;&#039;&#039;n&#039;&#039;&amp;lt;/sup&amp;gt;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; be a real-valued function of &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Partial derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; with respect to its &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\partial_j f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_{x_j} f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{\partial f}{\partial x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_{x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_j&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}&amp;lt;/math&amp;gt; || Here &amp;lt;math&amp;gt;e_j = (0,\ldots,1,\ldots,0)&amp;lt;/math&amp;gt; is the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th vector of the standard basis, i.e. the vector with all zeroes except a one in the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th spot. Therefore &amp;lt;math&amp;gt;x + te_j&amp;lt;/math&amp;gt; can also be written &amp;lt;math&amp;gt;(x_1,\ldots, x_j + t, \ldots, x_n)&amp;lt;/math&amp;gt; when broken down into components.&lt;br /&gt;
|-&lt;br /&gt;
| Gradient || &amp;lt;math&amp;gt;\nabla f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\nabla f(x) = (\partial_1 f(x), \ldots, \partial_n f(x))&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Gradient at &amp;lt;math&amp;gt;x_0 \in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\nabla f(x_0)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\mathcal M_{1,n}(\mathbf R)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(\partial_1 f(x_0), \ldots, \partial_n f(x_0))&amp;lt;/math&amp;gt; or the vector &amp;lt;math&amp;gt;c&amp;lt;/math&amp;gt; such that &amp;lt;math&amp;gt;\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - c\cdot (x-x_0)\|}{\|x-x_0\|} = 0 &amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Directional derivative in the direction of &amp;lt;math&amp;gt;v&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_v f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}&amp;lt;/math&amp;gt; || When &amp;lt;math&amp;gt;v = e_j&amp;lt;/math&amp;gt;, this reduces to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th partial derivative.&lt;br /&gt;
|-&lt;br /&gt;
| Total derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\frac{df}{dx_j}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R&amp;lt;/math&amp;gt; || For &amp;lt;math&amp;gt;i \ne j&amp;lt;/math&amp;gt;, we treat the variable &amp;lt;math&amp;gt;x_i = g_i(x_j)&amp;lt;/math&amp;gt; as a function of &amp;lt;math&amp;gt;x_j&amp;lt;/math&amp;gt;, and take the single-variable derivative with respect to &amp;lt;math&amp;gt;x_j&amp;lt;/math&amp;gt; (more formally, &amp;lt;math&amp;gt;g : \mathbf R \to \mathbf R^n&amp;lt;/math&amp;gt; is a function such that the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th component &amp;lt;math&amp;gt;g_j = \id&amp;lt;/math&amp;gt; is the identity function). From the chain rule this becomes &amp;lt;math&amp;gt;\frac{df}{dx_j} = \nabla f(x) \cdot g&#039;(x) = \frac{\partial f}{\partial x_1} \frac{dx_1}{dx_j} + \cdots + \frac{\partial f}{\partial x_n} \frac{dx_n}{dx_j}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
I think in this case, since &amp;lt;math&amp;gt;f&#039;(x_0)(v)&amp;lt;/math&amp;gt; coincides with &amp;lt;math&amp;gt;\nabla f(x_0)\cdot v&amp;lt;/math&amp;gt;, people don&#039;t usually define the derivative separately. For example, Folland in &#039;&#039;Advanced Calculus&#039;&#039; defines &#039;&#039;differentiability&#039;&#039; but not the derivative! He just says that the vector that makes a function differentiable is the gradient.&lt;br /&gt;
&lt;br /&gt;
TODO: answer questions like &amp;quot;Is the gradient the derivative?&amp;quot;&lt;br /&gt;
&lt;br /&gt;
==Vector-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R \to \mathbf R^m&amp;lt;/math&amp;gt; be a vector-valued function of &amp;lt;math&amp;gt;\mathbf R&amp;lt;/math&amp;gt;. A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like &amp;lt;math&amp;gt;\mathbf f&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Velocity vector at &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;v(t)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;Df(t)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(f_1&#039;(t), \ldots, f_n&#039;(t))&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.&lt;br /&gt;
&lt;br /&gt;
==Vector-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;&amp;lt;sup&amp;gt;&#039;&#039;n&#039;&#039;&amp;lt;/sup&amp;gt;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; be a vector-valued function of &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt;. Since the function is vector-valued, some authors use a boldface letter like &amp;lt;math&amp;gt;\mathbf f&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Partial derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\partial_j f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_{x_j} f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{\partial f}{\partial x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_{x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_j&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Directional derivative in the direction of &amp;lt;math&amp;gt;v&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_v f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Total or Fréchet derivative (sometimes just called the derivative) at point &amp;lt;math&amp;gt;x_0\in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;(Df)_{x_0}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;d_{x_0}f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || The linear transformation &amp;lt;math&amp;gt;L&amp;lt;/math&amp;gt; such that &amp;lt;math&amp;gt;\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 &amp;lt;/math&amp;gt; || The derivative &#039;&#039;at a given point&#039;&#039; is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to &amp;quot;&amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt;&amp;quot; as we can in the single-variable case. Its type would have to be &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; or more specifically &amp;lt;math&amp;gt;\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)&amp;lt;/math&amp;gt; (where &amp;lt;math&amp;gt;\mathcal L(\mathbf R^n, \mathbf R^m)&amp;lt;/math&amp;gt; is the set of linear transformations from &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; to &amp;lt;math&amp;gt;\mathbf R^m&amp;lt;/math&amp;gt;). Also the notation &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; is slightly confusing: if the total derivative is a function, what happens if &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt;? We see that &amp;lt;math&amp;gt;f&#039;(x_0)\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt;, so the single-variable derivative isn&#039;t actually a number! To get the actual slope of the tangent line, we must evaluate the function at &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt;: &amp;lt;math&amp;gt;f&#039;(x_0)(1) \in \mathbf R&amp;lt;/math&amp;gt;. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.&lt;br /&gt;
|-&lt;br /&gt;
| Derivative matrix, differential matrix, Jacobian matrix at point &amp;lt;math&amp;gt;x_0\in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;Df(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\mathcal M(f&#039;(x_0))&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathcal M_{m,n}(\mathbf R)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\begin{pmatrix}\partial_1 f_1(x_0) &amp;amp; \cdots &amp;amp; \partial_n f_1(x_0) \\ \vdots &amp;amp; \ddots &amp;amp; \vdots \\ \partial_1 f_n(x_0) &amp;amp; \cdots &amp;amp; \partial_n f_n(x_0)\end{pmatrix}&amp;lt;/math&amp;gt; || Since the total derivative is a linear transformation, and since linear transformations from &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; to &amp;lt;math&amp;gt;\mathbf R^m&amp;lt;/math&amp;gt; have a one-to-one correspondence with real-valued &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative &#039;&#039;is&#039;&#039; the matrix. TODO: talk about gradient vectors as rows.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Note the absence of the gradient in the above table. The generalization of the gradient to the &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; case is the derivative matrix.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Notational confusion of multivariable derivatives]]&lt;br /&gt;
* [[Relation between gradient vector and partial derivatives]]&lt;br /&gt;
* [[Relation between gradient vector and directional derivatives]]&lt;br /&gt;
* [[Directional derivative]]&lt;br /&gt;
* [[machinelearning:Summary table of probability terms]]&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&lt;br /&gt;
* Tao, Terence. &#039;&#039;Analysis II&#039;&#039;. 2nd ed. Hindustan Book Agency. 2009.&lt;br /&gt;
* Folland, Gerald B. &#039;&#039;Advanced Calculus&#039;&#039;. Pearson. 2002.&lt;br /&gt;
* Pugh, Charles Chapman. &#039;&#039;Real Mathematical Analysis&#039;&#039;. Springer. 2010.&lt;br /&gt;
&lt;br /&gt;
==External links==&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Summary_table_of_multivariable_derivatives&amp;diff=3191</id>
		<title>Summary table of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Summary_table_of_multivariable_derivatives&amp;diff=3191"/>
		<updated>2020-05-30T08:46:34Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: /* Real-valued function of Rn */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page is a &#039;&#039;&#039;summary table of multivariable derivatives&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
* TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied&lt;br /&gt;
&lt;br /&gt;
==Single-variable real function==&lt;br /&gt;
&lt;br /&gt;
For comparison and completeness, we give a summary table of the single-variable derivative. Let &amp;lt;math&amp;gt;f\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt; be a single-variable real function.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x) = \lim_{h\to0} \frac{f(x+h) - f(x)}{h}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; at &amp;lt;math&amp;gt;x_0 \in \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{df}{dx}(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\left.\frac{d}{dx}f(x)\right|_{x=x_0}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\begin{align}f&#039;(x_0) &amp;amp;= \lim_{h\to0} \frac{f(x_0+h) - f(x_0)}{h} \\ &amp;amp;= \lim_{x\to x_0} \frac{f(x) - f(x_0)}{x-x_0}\end{align}&amp;lt;/math&amp;gt; || In the most general multivariable case, &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; will become a linear transformation, so analogously we may wish to talk about the single-variable &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; as the function &amp;lt;math&amp;gt;f&#039;(x_0)\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt; defined by &amp;lt;math&amp;gt;f&#039;(x_0)(x) = f&#039;(x_0)x&amp;lt;/math&amp;gt;, where on the left side &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a function and on the right side &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a number. If &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a function, we can evaluate it at &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt; to recover the number: &amp;lt;math&amp;gt;f&#039;(x_0)(1)&amp;lt;/math&amp;gt;. This is pretty confusing, and in practice everyone thinks of &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; in the single-variable case as a number, making the notation divergent; see [[Notational confusion of multivariable derivatives#The derivative as a linear transformation in the several variable case and a number in the single-variable case|Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case]] for more information.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Real-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;&amp;lt;sup&amp;gt;&#039;&#039;n&#039;&#039;&amp;lt;/sup&amp;gt;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; be a real-valued function of &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Partial derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; with respect to its &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\partial_j f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_{x_j} f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{\partial f}{\partial x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_{x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_j&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}&amp;lt;/math&amp;gt; || Here &amp;lt;math&amp;gt;e_j = (0,\ldots,1,\ldots,0)&amp;lt;/math&amp;gt; is the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th vector of the standard basis, i.e. the vector with all zeroes except a one in the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th spot. Therefore &amp;lt;math&amp;gt;x + te_j&amp;lt;/math&amp;gt; can also be written &amp;lt;math&amp;gt;(x_1,\ldots, x_j + t, \ldots, x_n)&amp;lt;/math&amp;gt; when broken down into components.&lt;br /&gt;
|-&lt;br /&gt;
| Gradient || &amp;lt;math&amp;gt;\nabla f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\nabla f(x) = (\partial_1 f(x), \ldots, \partial_n f(x))&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Gradient at &amp;lt;math&amp;gt;x_0 \in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\nabla f(x_0)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\mathcal M_{1,n}(\mathbf R)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(\partial_1 f(x_0), \ldots, \partial_n f(x_0))&amp;lt;/math&amp;gt; or the vector &amp;lt;math&amp;gt;c&amp;lt;/math&amp;gt; such that &amp;lt;math&amp;gt;\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - c\cdot (x-x_0)\|}{\|x-x_0\|} = 0 &amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Directional derivative in the direction of &amp;lt;math&amp;gt;v&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_v f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}&amp;lt;/math&amp;gt; || When &amp;lt;math&amp;gt;v = e_j&amp;lt;/math&amp;gt;, this reduces to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th partial derivative.&lt;br /&gt;
|-&lt;br /&gt;
| Total derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\frac{df}{dx_j}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R&amp;lt;/math&amp;gt; || For &amp;lt;math&amp;gt;i \ne j&amp;lt;/math&amp;gt;, we treat the variable &amp;lt;math&amp;gt;x_i = g_i(x_j)&amp;lt;/math&amp;gt; as a function of &amp;lt;math&amp;gt;x_j&amp;lt;/math&amp;gt;, and take the single-variable derivative with respect to &amp;lt;math&amp;gt;x_j&amp;lt;/math&amp;gt;. From the chain rule this becomes &amp;lt;math&amp;gt;\frac{df}{dx_j} = \frac{\partial f}{\partial x_1} \frac{dx_1}{dx_j} + \cdots + \frac{\partial f}{\partial x_n} \frac{dx_n}{dx_j}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
I think in this case, since &amp;lt;math&amp;gt;f&#039;(x_0)(v)&amp;lt;/math&amp;gt; coincides with &amp;lt;math&amp;gt;\nabla f(x_0)\cdot v&amp;lt;/math&amp;gt;, people don&#039;t usually define the derivative separately. For example, Folland in &#039;&#039;Advanced Calculus&#039;&#039; defines &#039;&#039;differentiability&#039;&#039; but not the derivative! He just says that the vector that makes a function differentiable is the gradient.&lt;br /&gt;
&lt;br /&gt;
TODO: answer questions like &amp;quot;Is the gradient the derivative?&amp;quot;&lt;br /&gt;
&lt;br /&gt;
==Vector-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R \to \mathbf R^m&amp;lt;/math&amp;gt; be a vector-valued function of &amp;lt;math&amp;gt;\mathbf R&amp;lt;/math&amp;gt;. A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like &amp;lt;math&amp;gt;\mathbf f&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Velocity vector at &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;v(t)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;Df(t)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(f_1&#039;(t), \ldots, f_n&#039;(t))&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.&lt;br /&gt;
&lt;br /&gt;
==Vector-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;&amp;lt;sup&amp;gt;&#039;&#039;n&#039;&#039;&amp;lt;/sup&amp;gt;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; be a vector-valued function of &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt;. Since the function is vector-valued, some authors use a boldface letter like &amp;lt;math&amp;gt;\mathbf f&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Partial derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\partial_j f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_{x_j} f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{\partial f}{\partial x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_{x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_j&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Directional derivative in the direction of &amp;lt;math&amp;gt;v&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_v f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Total or Fréchet derivative (sometimes just called the derivative) at point &amp;lt;math&amp;gt;x_0\in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;(Df)_{x_0}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;d_{x_0}f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || The linear transformation &amp;lt;math&amp;gt;L&amp;lt;/math&amp;gt; such that &amp;lt;math&amp;gt;\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 &amp;lt;/math&amp;gt; || The derivative &#039;&#039;at a given point&#039;&#039; is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to &amp;quot;&amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt;&amp;quot; as we can in the single-variable case. Its type would have to be &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; or more specifically &amp;lt;math&amp;gt;\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)&amp;lt;/math&amp;gt; (where &amp;lt;math&amp;gt;\mathcal L(\mathbf R^n, \mathbf R^m)&amp;lt;/math&amp;gt; is the set of linear transformations from &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; to &amp;lt;math&amp;gt;\mathbf R^m&amp;lt;/math&amp;gt;). Also the notation &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; is slightly confusing: if the total derivative is a function, what happens if &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt;? We see that &amp;lt;math&amp;gt;f&#039;(x_0)\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt;, so the single-variable derivative isn&#039;t actually a number! To get the actual slope of the tangent line, we must evaluate the function at &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt;: &amp;lt;math&amp;gt;f&#039;(x_0)(1) \in \mathbf R&amp;lt;/math&amp;gt;. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.&lt;br /&gt;
|-&lt;br /&gt;
| Derivative matrix, differential matrix, Jacobian matrix at point &amp;lt;math&amp;gt;x_0\in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;Df(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\mathcal M(f&#039;(x_0))&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathcal M_{m,n}(\mathbf R)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\begin{pmatrix}\partial_1 f_1(x_0) &amp;amp; \cdots &amp;amp; \partial_n f_1(x_0) \\ \vdots &amp;amp; \ddots &amp;amp; \vdots \\ \partial_1 f_n(x_0) &amp;amp; \cdots &amp;amp; \partial_n f_n(x_0)\end{pmatrix}&amp;lt;/math&amp;gt; || Since the total derivative is a linear transformation, and since linear transformations from &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; to &amp;lt;math&amp;gt;\mathbf R^m&amp;lt;/math&amp;gt; have a one-to-one correspondence with real-valued &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative &#039;&#039;is&#039;&#039; the matrix. TODO: talk about gradient vectors as rows.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Note the absence of the gradient in the above table. The generalization of the gradient to the &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; case is the derivative matrix.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Notational confusion of multivariable derivatives]]&lt;br /&gt;
* [[Relation between gradient vector and partial derivatives]]&lt;br /&gt;
* [[Relation between gradient vector and directional derivatives]]&lt;br /&gt;
* [[Directional derivative]]&lt;br /&gt;
* [[machinelearning:Summary table of probability terms]]&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&lt;br /&gt;
* Tao, Terence. &#039;&#039;Analysis II&#039;&#039;. 2nd ed. Hindustan Book Agency. 2009.&lt;br /&gt;
* Folland, Gerald B. &#039;&#039;Advanced Calculus&#039;&#039;. Pearson. 2002.&lt;br /&gt;
* Pugh, Charles Chapman. &#039;&#039;Real Mathematical Analysis&#039;&#039;. Springer. 2010.&lt;br /&gt;
&lt;br /&gt;
==External links==&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3176</id>
		<title>Notational confusion of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3176"/>
		<updated>2019-08-10T06:15:20Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;I think there&#039;s several different confusions that arise from multivariable derivative notation:&lt;br /&gt;
&lt;br /&gt;
* The thing where &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; can mean two different things on LHS and RHS when &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; is used as both an initial and intermediate variable. (See Folland for details.)&lt;br /&gt;
* The thing where if &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt; then &amp;lt;math&amp;gt;\frac{\partial f}{\partial x}(x,x)&amp;lt;/math&amp;gt; feels like it might be &amp;lt;math&amp;gt;(2x,2x)&amp;lt;/math&amp;gt; even though it&#039;s actually &amp;lt;math&amp;gt;(2x,0)&amp;lt;/math&amp;gt;. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]&lt;br /&gt;
* The ambiguity of expressions like &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt;&lt;br /&gt;
* dual basis stuff -- see Tao&#039;s explanation of this in p. 225 of [https://terrytao.files.wordpress.com/2011/06/blog-book.pdf]&lt;br /&gt;
&lt;br /&gt;
==The derivative as a linear transformation in the several variable case and a number in the single-variable case==&lt;br /&gt;
&lt;br /&gt;
* The thing where the total derivative for &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt; &amp;quot;should&amp;quot; be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&amp;amp;lpg=PR1&amp;amp;pg=PA357 &amp;quot;Appendix A: Perorations of Dieudonne&amp;quot;] (p. 337) in Pugh&#039;s &#039;&#039;Real Mathematical Analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
==Total derivative versus derivative matrix==&lt;br /&gt;
&lt;br /&gt;
Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, so many books call the total derivative a matrix or equate the two.&lt;br /&gt;
&lt;br /&gt;
A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Summary table of multivariable derivatives]]&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Summary_table_of_multivariable_derivatives&amp;diff=3170</id>
		<title>Summary table of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Summary_table_of_multivariable_derivatives&amp;diff=3170"/>
		<updated>2018-11-03T03:57:53Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: /* Vector-valued function of Rn */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page is a &#039;&#039;&#039;summary table of multivariable derivatives&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
* TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied&lt;br /&gt;
&lt;br /&gt;
==Single-variable real function==&lt;br /&gt;
&lt;br /&gt;
For comparison and completeness, we give a summary table of the single-variable derivative. Let &amp;lt;math&amp;gt;f\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt; be a single-variable real function.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x) = \lim_{h\to0} \frac{f(x+h) - f(x)}{h}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; at &amp;lt;math&amp;gt;x_0 \in \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{df}{dx}(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\left.\frac{d}{dx}f(x)\right|_{x=x_0}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\begin{align}f&#039;(x_0) &amp;amp;= \lim_{h\to0} \frac{f(x_0+h) - f(x_0)}{h} \\ &amp;amp;= \lim_{x\to x_0} \frac{f(x) - f(x_0)}{x-x_0}\end{align}&amp;lt;/math&amp;gt; || In the most general multivariable case, &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; will become a linear transformation, so analogously we may wish to talk about the single-variable &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; as the function &amp;lt;math&amp;gt;f&#039;(x_0)\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt; defined by &amp;lt;math&amp;gt;f&#039;(x_0)(x) = f&#039;(x_0)x&amp;lt;/math&amp;gt;, where on the left side &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a function and on the right side &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a number. If &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a function, we can evaluate it at &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt; to recover the number: &amp;lt;math&amp;gt;f&#039;(x_0)(1)&amp;lt;/math&amp;gt;. This is pretty confusing, and in practice everyone thinks of &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; in the single-variable case as a number, making the notation divergent; see [[Notational confusion of multivariable derivatives#The derivative as a linear transformation in the several variable case and a number in the single-variable case|Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case]] for more information.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Real-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;&amp;lt;sup&amp;gt;&#039;&#039;n&#039;&#039;&amp;lt;/sup&amp;gt;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; be a real-valued function of &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Partial derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; with respect to its &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\partial_j f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_{x_j} f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{\partial f}{\partial x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_{x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_j&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}&amp;lt;/math&amp;gt; || Here &amp;lt;math&amp;gt;e_j = (0,\ldots,1,\ldots,0)&amp;lt;/math&amp;gt; is the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th vector of the standard basis, i.e. the vector with all zeroes except a one in the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th spot. Therefore &amp;lt;math&amp;gt;x + te_j&amp;lt;/math&amp;gt; can also be written &amp;lt;math&amp;gt;(x_1,\ldots, x_j + t, \ldots, x_n)&amp;lt;/math&amp;gt; when broken down into components.&lt;br /&gt;
|-&lt;br /&gt;
| Gradient || &amp;lt;math&amp;gt;\nabla f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\nabla f(x) = (\partial_1 f(x), \ldots, \partial_n f(x))&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Gradient at &amp;lt;math&amp;gt;x_0 \in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\nabla f(x_0)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\mathcal M_{1,n}(\mathbf R)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(\partial_1 f(x_0), \ldots, \partial_n f(x_0))&amp;lt;/math&amp;gt; or the vector &amp;lt;math&amp;gt;c&amp;lt;/math&amp;gt; such that &amp;lt;math&amp;gt;\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - c\cdot (x-x_0)\|}{\|x-x_0\|} = 0 &amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Directional derivative in the direction of &amp;lt;math&amp;gt;v&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_v f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}&amp;lt;/math&amp;gt; || When &amp;lt;math&amp;gt;v = e_j&amp;lt;/math&amp;gt;, this reduces to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th partial derivative.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
I think in this case, since &amp;lt;math&amp;gt;f&#039;(x_0)(v)&amp;lt;/math&amp;gt; coincides with &amp;lt;math&amp;gt;\nabla f(x_0)\cdot v&amp;lt;/math&amp;gt;, people don&#039;t usually define the derivative separately. For example, Folland in &#039;&#039;Advanced Calculus&#039;&#039; defines &#039;&#039;differentiability&#039;&#039; but not the derivative! He just says that the vector that makes a function differentiable is the gradient.&lt;br /&gt;
&lt;br /&gt;
TODO: answer questions like &amp;quot;Is the gradient the derivative?&amp;quot;&lt;br /&gt;
&lt;br /&gt;
==Vector-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R \to \mathbf R^m&amp;lt;/math&amp;gt; be a vector-valued function of &amp;lt;math&amp;gt;\mathbf R&amp;lt;/math&amp;gt;. A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like &amp;lt;math&amp;gt;\mathbf f&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Velocity vector at &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;v(t)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;Df(t)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(f_1&#039;(t), \ldots, f_n&#039;(t))&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.&lt;br /&gt;
&lt;br /&gt;
==Vector-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;&amp;lt;sup&amp;gt;&#039;&#039;n&#039;&#039;&amp;lt;/sup&amp;gt;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; be a vector-valued function of &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt;. Since the function is vector-valued, some authors use a boldface letter like &amp;lt;math&amp;gt;\mathbf f&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Partial derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\partial_j f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_{x_j} f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{\partial f}{\partial x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_{x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_j&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Directional derivative in the direction of &amp;lt;math&amp;gt;v&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_v f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Total or Fréchet derivative (sometimes just called the derivative) at point &amp;lt;math&amp;gt;x_0\in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;(Df)_{x_0}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;d_{x_0}f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || The linear transformation &amp;lt;math&amp;gt;L&amp;lt;/math&amp;gt; such that &amp;lt;math&amp;gt;\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 &amp;lt;/math&amp;gt; || The derivative &#039;&#039;at a given point&#039;&#039; is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to &amp;quot;&amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt;&amp;quot; as we can in the single-variable case. Its type would have to be &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; or more specifically &amp;lt;math&amp;gt;\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)&amp;lt;/math&amp;gt; (where &amp;lt;math&amp;gt;\mathcal L(\mathbf R^n, \mathbf R^m)&amp;lt;/math&amp;gt; is the set of linear transformations from &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; to &amp;lt;math&amp;gt;\mathbf R^m&amp;lt;/math&amp;gt;). Also the notation &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; is slightly confusing: if the total derivative is a function, what happens if &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt;? We see that &amp;lt;math&amp;gt;f&#039;(x_0)\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt;, so the single-variable derivative isn&#039;t actually a number! To get the actual slope of the tangent line, we must evaluate the function at &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt;: &amp;lt;math&amp;gt;f&#039;(x_0)(1) \in \mathbf R&amp;lt;/math&amp;gt;. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.&lt;br /&gt;
|-&lt;br /&gt;
| Derivative matrix, differential matrix, Jacobian matrix at point &amp;lt;math&amp;gt;x_0\in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;Df(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\mathcal M(f&#039;(x_0))&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathcal M_{m,n}(\mathbf R)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\begin{pmatrix}\partial_1 f_1(x_0) &amp;amp; \cdots &amp;amp; \partial_n f_1(x_0) \\ \vdots &amp;amp; \ddots &amp;amp; \vdots \\ \partial_1 f_n(x_0) &amp;amp; \cdots &amp;amp; \partial_n f_n(x_0)\end{pmatrix}&amp;lt;/math&amp;gt; || Since the total derivative is a linear transformation, and since linear transformations from &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; to &amp;lt;math&amp;gt;\mathbf R^m&amp;lt;/math&amp;gt; have a one-to-one correspondence with real-valued &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative &#039;&#039;is&#039;&#039; the matrix. TODO: talk about gradient vectors as rows.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Note the absence of the gradient in the above table. The generalization of the gradient to the &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; case is the derivative matrix.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Notational confusion of multivariable derivatives]]&lt;br /&gt;
* [[Relation between gradient vector and partial derivatives]]&lt;br /&gt;
* [[Relation between gradient vector and directional derivatives]]&lt;br /&gt;
* [[Directional derivative]]&lt;br /&gt;
* [[machinelearning:Summary table of probability terms]]&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&lt;br /&gt;
* Tao, Terence. &#039;&#039;Analysis II&#039;&#039;. 2nd ed. Hindustan Book Agency. 2009.&lt;br /&gt;
* Folland, Gerald B. &#039;&#039;Advanced Calculus&#039;&#039;. Pearson. 2002.&lt;br /&gt;
* Pugh, Charles Chapman. &#039;&#039;Real Mathematical Analysis&#039;&#039;. Springer. 2010.&lt;br /&gt;
&lt;br /&gt;
==External links==&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3169</id>
		<title>Notational confusion of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3169"/>
		<updated>2018-11-03T03:48:45Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;I think there&#039;s several different confusions that arise from multivariable derivative notation:&lt;br /&gt;
&lt;br /&gt;
* The thing where &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; can mean two different things on LHS and RHS when &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; is used as both an initial and intermediate variable. (See Folland for details.)&lt;br /&gt;
* The thing where if &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt; then &amp;lt;math&amp;gt;\frac{\partial f}{\partial x}(x,x)&amp;lt;/math&amp;gt; feels like it might be &amp;lt;math&amp;gt;(2x,2x)&amp;lt;/math&amp;gt; even though it&#039;s actually &amp;lt;math&amp;gt;(2x,0)&amp;lt;/math&amp;gt;. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]&lt;br /&gt;
* The ambiguity of expressions like &amp;lt;math&amp;gt;\nabla f(Ax)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The derivative as a linear transformation in the several variable case and a number in the single-variable case==&lt;br /&gt;
&lt;br /&gt;
* The thing where the total derivative for &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt; &amp;quot;should&amp;quot; be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&amp;amp;lpg=PR1&amp;amp;pg=PA357 &amp;quot;Appendix A: Perorations of Dieudonne&amp;quot;] (p. 337) in Pugh&#039;s &#039;&#039;Real Mathematical Analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
==Total derivative versus derivative matrix==&lt;br /&gt;
&lt;br /&gt;
Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, so many books call the total derivative a matrix or equate the two.&lt;br /&gt;
&lt;br /&gt;
A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Summary table of multivariable derivatives]]&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Summary_table_of_multivariable_derivatives&amp;diff=3168</id>
		<title>Summary table of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Summary_table_of_multivariable_derivatives&amp;diff=3168"/>
		<updated>2018-11-03T03:44:16Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: /* See also */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page is a &#039;&#039;&#039;summary table of multivariable derivatives&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
* TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied&lt;br /&gt;
&lt;br /&gt;
==Single-variable real function==&lt;br /&gt;
&lt;br /&gt;
For comparison and completeness, we give a summary table of the single-variable derivative. Let &amp;lt;math&amp;gt;f\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt; be a single-variable real function.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x) = \lim_{h\to0} \frac{f(x+h) - f(x)}{h}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; at &amp;lt;math&amp;gt;x_0 \in \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{df}{dx}(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\left.\frac{d}{dx}f(x)\right|_{x=x_0}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\begin{align}f&#039;(x_0) &amp;amp;= \lim_{h\to0} \frac{f(x_0+h) - f(x_0)}{h} \\ &amp;amp;= \lim_{x\to x_0} \frac{f(x) - f(x_0)}{x-x_0}\end{align}&amp;lt;/math&amp;gt; || In the most general multivariable case, &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; will become a linear transformation, so analogously we may wish to talk about the single-variable &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; as the function &amp;lt;math&amp;gt;f&#039;(x_0)\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt; defined by &amp;lt;math&amp;gt;f&#039;(x_0)(x) = f&#039;(x_0)x&amp;lt;/math&amp;gt;, where on the left side &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a function and on the right side &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a number. If &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a function, we can evaluate it at &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt; to recover the number: &amp;lt;math&amp;gt;f&#039;(x_0)(1)&amp;lt;/math&amp;gt;. This is pretty confusing, and in practice everyone thinks of &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; in the single-variable case as a number, making the notation divergent; see [[Notational confusion of multivariable derivatives#The derivative as a linear transformation in the several variable case and a number in the single-variable case|Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case]] for more information.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Real-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;&amp;lt;sup&amp;gt;&#039;&#039;n&#039;&#039;&amp;lt;/sup&amp;gt;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; be a real-valued function of &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Partial derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; with respect to its &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\partial_j f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_{x_j} f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{\partial f}{\partial x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_{x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_j&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}&amp;lt;/math&amp;gt; || Here &amp;lt;math&amp;gt;e_j = (0,\ldots,1,\ldots,0)&amp;lt;/math&amp;gt; is the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th vector of the standard basis, i.e. the vector with all zeroes except a one in the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th spot. Therefore &amp;lt;math&amp;gt;x + te_j&amp;lt;/math&amp;gt; can also be written &amp;lt;math&amp;gt;(x_1,\ldots, x_j + t, \ldots, x_n)&amp;lt;/math&amp;gt; when broken down into components.&lt;br /&gt;
|-&lt;br /&gt;
| Gradient || &amp;lt;math&amp;gt;\nabla f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\nabla f(x) = (\partial_1 f(x), \ldots, \partial_n f(x))&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Gradient at &amp;lt;math&amp;gt;x_0 \in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\nabla f(x_0)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\mathcal M_{1,n}(\mathbf R)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(\partial_1 f(x_0), \ldots, \partial_n f(x_0))&amp;lt;/math&amp;gt; or the vector &amp;lt;math&amp;gt;c&amp;lt;/math&amp;gt; such that &amp;lt;math&amp;gt;\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - c\cdot (x-x_0)\|}{\|x-x_0\|} = 0 &amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Directional derivative in the direction of &amp;lt;math&amp;gt;v&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_v f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}&amp;lt;/math&amp;gt; || When &amp;lt;math&amp;gt;v = e_j&amp;lt;/math&amp;gt;, this reduces to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th partial derivative.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
I think in this case, since &amp;lt;math&amp;gt;f&#039;(x_0)(v)&amp;lt;/math&amp;gt; coincides with &amp;lt;math&amp;gt;\nabla f(x_0)\cdot v&amp;lt;/math&amp;gt;, people don&#039;t usually define the derivative separately. For example, Folland in &#039;&#039;Advanced Calculus&#039;&#039; defines &#039;&#039;differentiability&#039;&#039; but not the derivative! He just says that the vector that makes a function differentiable is the gradient.&lt;br /&gt;
&lt;br /&gt;
TODO: answer questions like &amp;quot;Is the gradient the derivative?&amp;quot;&lt;br /&gt;
&lt;br /&gt;
==Vector-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R \to \mathbf R^m&amp;lt;/math&amp;gt; be a vector-valued function of &amp;lt;math&amp;gt;\mathbf R&amp;lt;/math&amp;gt;. A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like &amp;lt;math&amp;gt;\mathbf f&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Velocity vector at &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;v(t)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;Df(t)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(f_1&#039;(t), \ldots, f_n&#039;(t))&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.&lt;br /&gt;
&lt;br /&gt;
==Vector-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;&amp;lt;sup&amp;gt;&#039;&#039;n&#039;&#039;&amp;lt;/sup&amp;gt;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; be a vector-valued function of &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt;. Since the function is vector-valued, some authors use a boldface letter like &amp;lt;math&amp;gt;\mathbf f&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Partial derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\partial_j f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_{x_j} f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{\partial f}{\partial x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_{x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_j&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Directional derivative in the direction of &amp;lt;math&amp;gt;v&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_v f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Total or Fréchet derivative (sometimes just called the derivative) at point &amp;lt;math&amp;gt;x_0\in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;(Df)_{x_0}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;d_{x_0}f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || The linear transformation &amp;lt;math&amp;gt;L&amp;lt;/math&amp;gt; such that &amp;lt;math&amp;gt;\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 &amp;lt;/math&amp;gt; || The derivative &#039;&#039;at a given point&#039;&#039; is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to &amp;quot;&amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt;&amp;quot; as we can in the single-variable case. Its type would have to be &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; or more specifically &amp;lt;math&amp;gt;\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)&amp;lt;/math&amp;gt;. Also the notation &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; is slightly confusing: if the total derivative is a function, what happens if &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt;? We see that &amp;lt;math&amp;gt;f&#039;(x_0)\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt;, so the single-variable derivative isn&#039;t actually a number! To get the actual slope of the tangent line, we must evaluate the function at &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt;: &amp;lt;math&amp;gt;f&#039;(x_0)(1) \in \mathbf R&amp;lt;/math&amp;gt;. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.&lt;br /&gt;
|-&lt;br /&gt;
| Derivative matrix, differential matrix, Jacobian matrix at point &amp;lt;math&amp;gt;x_0\in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;Df(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\mathcal M(f&#039;(x_0))&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathcal M_{m,n}(\mathbf R)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\begin{pmatrix}\partial_1 f_1(x_0) &amp;amp; \cdots &amp;amp; \partial_n f_1(x_0) \\ \vdots &amp;amp; \ddots &amp;amp; \vdots \\ \partial_1 f_n(x_0) &amp;amp; \cdots &amp;amp; \partial_n f_n(x_0)\end{pmatrix}&amp;lt;/math&amp;gt; || Since the total derivative is a linear transformation, and since linear transformations from &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; to &amp;lt;math&amp;gt;\mathbf R^m&amp;lt;/math&amp;gt; have a one-to-one correspondence with real-valued &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative &#039;&#039;is&#039;&#039; the matrix. TODO: talk about gradient vectors as rows.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Note the absence of the gradient in the above table. The generalization of the gradient to the &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; case is the derivative matrix.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Notational confusion of multivariable derivatives]]&lt;br /&gt;
* [[Relation between gradient vector and partial derivatives]]&lt;br /&gt;
* [[Relation between gradient vector and directional derivatives]]&lt;br /&gt;
* [[Directional derivative]]&lt;br /&gt;
* [[machinelearning:Summary table of probability terms]]&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&lt;br /&gt;
* Tao, Terence. &#039;&#039;Analysis II&#039;&#039;. 2nd ed. Hindustan Book Agency. 2009.&lt;br /&gt;
* Folland, Gerald B. &#039;&#039;Advanced Calculus&#039;&#039;. Pearson. 2002.&lt;br /&gt;
* Pugh, Charles Chapman. &#039;&#039;Real Mathematical Analysis&#039;&#039;. Springer. 2010.&lt;br /&gt;
&lt;br /&gt;
==External links==&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3167</id>
		<title>Notational confusion of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Notational_confusion_of_multivariable_derivatives&amp;diff=3167"/>
		<updated>2018-11-03T03:43:29Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: move from https://machinelearning.subwiki.org/wiki/Notational_confusion_of_multivariable_derivatives&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;I think there&#039;s several different confusions that arise from multivariable derivative notation:&lt;br /&gt;
&lt;br /&gt;
* The thing where &amp;lt;math&amp;gt;\frac{\partial w}{\partial t}&amp;lt;/math&amp;gt; can mean two different things on LHS and RHS when &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; is used as both an initial and intermediate variable. (See Folland for details.)&lt;br /&gt;
* The thing where if &amp;lt;math&amp;gt;f(x,y) = (x^2,y^2)&amp;lt;/math&amp;gt; then &amp;lt;math&amp;gt;\frac{\partial f}{\partial x}(x,x)&amp;lt;/math&amp;gt; feels like it might be &amp;lt;math&amp;gt;(2x,2x)&amp;lt;/math&amp;gt; even though it&#039;s actually &amp;lt;math&amp;gt;(2x,0)&amp;lt;/math&amp;gt;. (Example from Tao.) See also [https://issarice.com/mathematics-and-notation]&lt;br /&gt;
&lt;br /&gt;
==The derivative as a linear transformation in the several variable case and a number in the single-variable case==&lt;br /&gt;
&lt;br /&gt;
* The thing where the total derivative for &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt; &amp;quot;should&amp;quot; be a function but people treat it as a number. Refer to [https://books.google.com/books?id=2NVJCgAAQBAJ&amp;amp;lpg=PR1&amp;amp;pg=PA357 &amp;quot;Appendix A: Perorations of Dieudonne&amp;quot;] (p. 337) in Pugh&#039;s &#039;&#039;Real Mathematical Analysis&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
==Total derivative versus derivative matrix==&lt;br /&gt;
&lt;br /&gt;
Technically the total derivative at a point is a linear transformation, whereas the derivative matrix is a matrix so an array of numbers arranged in a certain order. However, there is a one-to-one correspondence between linear transformations &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, so many books call the total derivative a matrix or equate the two.&lt;br /&gt;
&lt;br /&gt;
A similar confusion exists in the teaching of linear algebra, where sometimes only matrices are mentioned.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Summary table of multivariable derivatives]]&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
	<entry>
		<id>https://calculus.subwiki.org/w/index.php?title=Summary_table_of_multivariable_derivatives&amp;diff=3166</id>
		<title>Summary table of multivariable derivatives</title>
		<link rel="alternate" type="text/html" href="https://calculus.subwiki.org/w/index.php?title=Summary_table_of_multivariable_derivatives&amp;diff=3166"/>
		<updated>2018-11-03T03:42:13Z</updated>

		<summary type="html">&lt;p&gt;IssaRice: moving from https://machinelearning.subwiki.org/wiki/Summary_table_of_multivariable_derivatives&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page is a &#039;&#039;&#039;summary table of multivariable derivatives&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
* TODO maybe good to have separate rows for evaluated and pre-evaluated versions, for things that are functions/can be applied&lt;br /&gt;
&lt;br /&gt;
==Single-variable real function==&lt;br /&gt;
&lt;br /&gt;
For comparison and completeness, we give a summary table of the single-variable derivative. Let &amp;lt;math&amp;gt;f\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt; be a single-variable real function.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{df}{dx}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x) = \lim_{h\to0} \frac{f(x+h) - f(x)}{h}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; at &amp;lt;math&amp;gt;x_0 \in \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{df}{dx}(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\left.\frac{d}{dx}f(x)\right|_{x=x_0}&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\begin{align}f&#039;(x_0) &amp;amp;= \lim_{h\to0} \frac{f(x_0+h) - f(x_0)}{h} \\ &amp;amp;= \lim_{x\to x_0} \frac{f(x) - f(x_0)}{x-x_0}\end{align}&amp;lt;/math&amp;gt; || In the most general multivariable case, &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; will become a linear transformation, so analogously we may wish to talk about the single-variable &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; as the function &amp;lt;math&amp;gt;f&#039;(x_0)\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt; defined by &amp;lt;math&amp;gt;f&#039;(x_0)(x) = f&#039;(x_0)x&amp;lt;/math&amp;gt;, where on the left side &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a function and on the right side &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a number. If &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; is a function, we can evaluate it at &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt; to recover the number: &amp;lt;math&amp;gt;f&#039;(x_0)(1)&amp;lt;/math&amp;gt;. This is pretty confusing, and in practice everyone thinks of &amp;quot;&amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt;&amp;quot; in the single-variable case as a number, making the notation divergent; see [[Notational confusion of multivariable derivatives#The derivative as a linear transformation in the several variable case and a number in the single-variable case|Notational confusion of multivariable derivatives § The derivative as a linear transformation in the several variable case and a number in the single-variable case]] for more information.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Real-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;&amp;lt;sup&amp;gt;&#039;&#039;n&#039;&#039;&amp;lt;/sup&amp;gt;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; be a real-valued function of &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Partial derivative of &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; with respect to its &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\partial_j f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_{x_j} f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{\partial f}{\partial x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_{x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_j&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}&amp;lt;/math&amp;gt; || Here &amp;lt;math&amp;gt;e_j = (0,\ldots,1,\ldots,0)&amp;lt;/math&amp;gt; is the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th vector of the standard basis, i.e. the vector with all zeroes except a one in the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th spot. Therefore &amp;lt;math&amp;gt;x + te_j&amp;lt;/math&amp;gt; can also be written &amp;lt;math&amp;gt;(x_1,\ldots, x_j + t, \ldots, x_n)&amp;lt;/math&amp;gt; when broken down into components.&lt;br /&gt;
|-&lt;br /&gt;
| Gradient || &amp;lt;math&amp;gt;\nabla f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\nabla f(x) = (\partial_1 f(x), \ldots, \partial_n f(x))&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Gradient at &amp;lt;math&amp;gt;x_0 \in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\nabla f(x_0)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\mathcal M_{1,n}(\mathbf R)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(\partial_1 f(x_0), \ldots, \partial_n f(x_0))&amp;lt;/math&amp;gt; or the vector &amp;lt;math&amp;gt;c&amp;lt;/math&amp;gt; such that &amp;lt;math&amp;gt;\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - c\cdot (x-x_0)\|}{\|x-x_0\|} = 0 &amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Directional derivative in the direction of &amp;lt;math&amp;gt;v&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_v f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}&amp;lt;/math&amp;gt; || When &amp;lt;math&amp;gt;v = e_j&amp;lt;/math&amp;gt;, this reduces to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th partial derivative.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
I think in this case, since &amp;lt;math&amp;gt;f&#039;(x_0)(v)&amp;lt;/math&amp;gt; coincides with &amp;lt;math&amp;gt;\nabla f(x_0)\cdot v&amp;lt;/math&amp;gt;, people don&#039;t usually define the derivative separately. For example, Folland in &#039;&#039;Advanced Calculus&#039;&#039; defines &#039;&#039;differentiability&#039;&#039; but not the derivative! He just says that the vector that makes a function differentiable is the gradient.&lt;br /&gt;
&lt;br /&gt;
TODO: answer questions like &amp;quot;Is the gradient the derivative?&amp;quot;&lt;br /&gt;
&lt;br /&gt;
==Vector-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R \to \mathbf R^m&amp;lt;/math&amp;gt; be a vector-valued function of &amp;lt;math&amp;gt;\mathbf R&amp;lt;/math&amp;gt;. A parametric curve (or parametrized curve) is an example of this. Since the function is vector-valued, some authors use a boldface letter like &amp;lt;math&amp;gt;\mathbf f&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Velocity vector at &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;v(t)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;Df(t)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(f_1&#039;(t), \ldots, f_n&#039;(t))&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Note the absence for partial/directional derivatives. There is only one variable with respect to which we can differentiate, so there is no direction to choose from.&lt;br /&gt;
&lt;br /&gt;
==Vector-valued function of &#039;&#039;&#039;R&#039;&#039;&#039;&amp;lt;sup&amp;gt;&#039;&#039;n&#039;&#039;&amp;lt;/sup&amp;gt;==&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;f\colon \mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; be a vector-valued function of &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt;. Since the function is vector-valued, some authors use a boldface letter like &amp;lt;math&amp;gt;\mathbf f&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;sortable wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Term !! Notation !! Type !! Definition !! Notes&lt;br /&gt;
|-&lt;br /&gt;
| Partial derivative with respect to the &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt;th variable || &amp;lt;math&amp;gt;\partial_j f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_{x_j} f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\frac{\partial f}{\partial x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_{x_j}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;f_j&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\partial_j f(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}{t}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Directional derivative in the direction of &amp;lt;math&amp;gt;v&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\partial_v f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;D_v f(x) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}&amp;lt;/math&amp;gt; ||&lt;br /&gt;
|-&lt;br /&gt;
| Total or Fréchet derivative (sometimes just called the derivative) at point &amp;lt;math&amp;gt;x_0\in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;(Df)_{x_0}&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;d_{x_0}f&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; || The linear transformation &amp;lt;math&amp;gt;L&amp;lt;/math&amp;gt; such that &amp;lt;math&amp;gt;\lim_{x\to x_0} \frac{\|f(x) - f(x_0) - L(x-x_0)\|}{\|x-x_0\|} = 0 &amp;lt;/math&amp;gt; || The derivative &#039;&#039;at a given point&#039;&#039; is a linear transformation. One might wonder then what the derivative (without giving a point) is, i.e. what meaning to assign to &amp;quot;&amp;lt;math&amp;gt;f&#039;&amp;lt;/math&amp;gt;&amp;quot; as we can in the single-variable case. Its type would have to be &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; or more specifically &amp;lt;math&amp;gt;\mathbf R^n \to \mathcal L(\mathbf R^n, \mathbf R^m)&amp;lt;/math&amp;gt;. Also the notation &amp;lt;math&amp;gt;f&#039;(x_0)&amp;lt;/math&amp;gt; is slightly confusing: if the total derivative is a function, what happens if &amp;lt;math&amp;gt;n=m=1&amp;lt;/math&amp;gt;? We see that &amp;lt;math&amp;gt;f&#039;(x_0)\colon \mathbf R \to \mathbf R&amp;lt;/math&amp;gt;, so the single-variable derivative isn&#039;t actually a number! To get the actual slope of the tangent line, we must evaluate the function at &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt;: &amp;lt;math&amp;gt;f&#039;(x_0)(1) \in \mathbf R&amp;lt;/math&amp;gt;. Some authors avoid this by using different notation in the general multivariable case. Others accept this type error and ignore it.&lt;br /&gt;
|-&lt;br /&gt;
| Derivative matrix, differential matrix, Jacobian matrix at point &amp;lt;math&amp;gt;x_0\in \mathbf R^n&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;Df(x_0)&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;\mathcal M(f&#039;(x_0))&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\mathcal M_{m,n}(\mathbf R)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\begin{pmatrix}\partial_1 f_1(x_0) &amp;amp; \cdots &amp;amp; \partial_n f_1(x_0) \\ \vdots &amp;amp; \ddots &amp;amp; \vdots \\ \partial_1 f_n(x_0) &amp;amp; \cdots &amp;amp; \partial_n f_n(x_0)\end{pmatrix}&amp;lt;/math&amp;gt; || Since the total derivative is a linear transformation, and since linear transformations from &amp;lt;math&amp;gt;\mathbf R^n&amp;lt;/math&amp;gt; to &amp;lt;math&amp;gt;\mathbf R^m&amp;lt;/math&amp;gt; have a one-to-one correspondence with real-valued &amp;lt;math&amp;gt;m&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; matrices, the behavior of the total derivative can be summarized in a matrix; that summary is the derivative matrix. Some authors say that the total derivative &#039;&#039;is&#039;&#039; the matrix. TODO: talk about gradient vectors as rows.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Note the absence of the gradient in the above table. The generalization of the gradient to the &amp;lt;math&amp;gt;\mathbf R^n \to \mathbf R^m&amp;lt;/math&amp;gt; case is the derivative matrix.&lt;br /&gt;
&lt;br /&gt;
==See also==&lt;br /&gt;
&lt;br /&gt;
* [[Notational confusion of multivariable derivatives]]&lt;br /&gt;
* [[calculus:Relation between gradient vector and partial derivatives]]&lt;br /&gt;
* [[calculus:Relation between gradient vector and directional derivatives]]&lt;br /&gt;
* [[calculus:Directional derivative]]&lt;br /&gt;
* [[Summary table of probability terms]]&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&lt;br /&gt;
* Tao, Terence. &#039;&#039;Analysis II&#039;&#039;. 2nd ed. Hindustan Book Agency. 2009.&lt;br /&gt;
* Folland, Gerald B. &#039;&#039;Advanced Calculus&#039;&#039;. Pearson. 2002.&lt;br /&gt;
* Pugh, Charles Chapman. &#039;&#039;Real Mathematical Analysis&#039;&#039;. Springer. 2010.&lt;br /&gt;
&lt;br /&gt;
==External links==&lt;/div&gt;</summary>
		<author><name>IssaRice</name></author>
	</entry>
</feed>