Gradient descent with decaying learning rate: Difference between revisions

Revision as of 15:11, 1 September 2014

Definition

Gradient descent with decaying learning rate is a form of gradient descent where the learning rate varies as a function of the number of iterations, but is not otherwise dependent on the value of the vector at the stage. The update rule is as follows:

${\vec{x}}^{(k + 1)} = {\vec{x}}^{(k)} - α_{k} f (\vec{x^{(k)}})$

where $α_{k}$ depends only on $k$ and not on the choice of $x^{(k)}$ .

Cases

Type of decay	Example expression for $α_{k}$	More information
linear decay	$α_{k} = \frac{α_{0}}{k + 1}$	Gradient descent with linearly decaying learning rate
quadratic decay	$α_{k} = \frac{α_{0}}{(k + 1)^{2}}$	Gradient descent with quadratically decaying learning rate
exponential decay	$α_{k} = α_{0} e^{- β k}$ where $β > 0$	Gradient descent with exponentially decaying learning rate

@@ Line 3: / Line 3: @@
 '''Gradient descent with decaying learning rate''' is a form of [[gradient descent]] where the learning rate varies as a function of the number of iterations, but is not otherwise dependent on the value of the vector at the stage. The update rule is as follows:
-<math>\vec{x}^{(k+1)} = \vec{x}^{(k)} - \alpha_k f\left(\vec{x^{(k)}\right)</math>
+<math>\vec{x}^{(k+1)} = \vec{x}^{(k)} - \alpha_k f\left(\vec{x^{(k)}}\right)</math>
 where <math>\alpha_k</math> depends only on <math>k</math> and not on the choice of <math>x^{(k)}</math>.