Gradient descent using Newton's method
Gradient descent using Newton's method is a variant of gradient descent where the step size along the gradient descent is determined using Newton's method. In other words, we move the same way that we would move if we were applying Newton's method to the function restricted to the line of the gradient vector through the point.
By default, we are referring to gradient descent using one iteration of Newton's method, i.e., we stop Newton's method after one iteration. However, we can consider gradient descent using Newton's method where we use multiple iterations of Newton's method to determine the step size for gradient descent.
Explicitly, the learning algorithm is:
where is the gradient vector of at the point and is the second derivative of along the gradient vector. Explicitly, if , we have:
Here, is the Hessian matrix of at the point .
The expression in the denominator can be justified based on the fact that Hessian matrix defines bilinear form that outputs second-order directional derivatives.