Gradient descent with constant learning rate for a quadratic function of one variable
Explicitly, we have a quadratic function:
Recall that the unique point of local minimum, that is also the point of absolute minimum, is . The value of the minimum is .
Quadratic functions of one variable have a nice property: the value of the quadratic function at a point is an increasing function of its distance from the point of local minimum. Therefore, getting a closer point in the domain is equivalent to getting a smaller function value, and we do not need to keep track of the distinction.
Recall also that the derivatives are given by and .
Suppose is a positive real number. The gradient descent with constant learning rate is an iterative algorithm that aims to find the point of local minimum for . The algorithm starts with a guess and updates according to the rule:
Note that we use parenthesized superscripts to denote the iterates, and these should not be confused with exponents. The reason for using superscripts instead of subscripts is to keep notation consistent with the case of functions of multiple variables, where subscripts are used for coordinates and superscripts for iterates.
- Gradient descent with constant learning rate for a quadratic function of multiple variables
- Gradient descent with constant learning rate for a L1-regularized quadratic function of one variable
Closed form expressions for iterates
Recall the above expression:
Plugging in values, we obtain:
This is neater if we subtract the optimum from both sides:
We can now rewrite as:
In other words, the difference with multiplies by in each iteration. We therefore obtain this closed-form expression for the iterates:
Convergence properties based on the learning rate
Assume that the initial point for the gradient descent is not already the point of minimum . Then, the sequence converges if and only if:
This is equivalent to the condition:
or equivalently, that:
Moreover, if so, the convergence is linear convergence with convergence rate:
As before, we assume that the initial guess is not already the point of minimum.
|Case on||Does the sequence converge to the minimum? How does it behave?||Convergence rate|
|No. The sequence stays stuck at the initial point.||--|
|Yes. The sequence converges monotonically to the minimum.||The convergence is linear convergence and the convergence rate is . The closer is to , the faster the convergence. Note that the convergence rate is the same at every step rather than simply being a limiting value.|
|Yes. Convergence happens in one step.||Since convergence occurs in one step, it does not make sense to talk of the convergence rate.|
|Yes. The convergence is oscillatory (the sequence switches sides with respect to the minimum each time), but the value gets closer to the root in each step.||The convergence is linear convergence and the convergence rate is . The closer is to , the faster the convergence. Note that the convergence rate is the same at every step rather than simply being a limiting value.|
|No. The sequence oscillates between the points and , both equidistant from the point of minimum and on opposite sides of it.||--|
|No. The sequence oscillates around the root (each time, it switches side with respect to the minimum) and each member is further from the minimum than its predecessor.||--|