Approximate Newton's method for optimization of a function of multiple variables

From Calculus

Definition

The approximate Newton's method (or approximate Newton descent, or quasi-Newton method) for optimization of a function of multiple variables is a variation of Newton's method for optimization of a function of multiple variables where we do not necessarily use the exact inverse Hessian at each iteration, but use a matrix that is intended to approximate or bound it.

Note that the term quasi-Newton method may be used to describe this type of method, but that term is often restricted in meaning to a narrower subset of approximate Newton's methods.

Iterative step

Note that we use parenthesized superscripts to denote iterates and iteration-specific variables, and subscripts to denote coordinates. Parenthesized superscripts do not denote exponents.

Suppose we are applying gradient descent to minimize a function . We will denote the input to as a vector . Given a guess , gradient descent computes the next guess as follows:

Here, is intended as a stand-in (an approximation or a bound of some sort) on the inverse matrix of the Hessian matrix of . If we are using the usual Newton's method for optimization of a function of multiple variables, then would be the inverse of . However, in order to guarantee global convergence, may be chosen in terms of global bounds on the Hessian rather than the Hessian at a particular point.


Types of approximate Newton's method

Method Nature of the matrix Is sparse? Does depend on the current value of ? Is the rule stationary or dependent on the number of iterations?
gradient descent with constant learning rate The scalar matrix with scalar value Yes No Stationary
gradient descent with decaying learning rate A scalar matrix that changes with each iteration Yes No Dependent on the number of iterations
gradient descent using Newton's method A scalar matrix that changes with each iteration Yes Yes Stationary
gradient descent with exact line search A scalar matrix that changes with each iteration Yes Yes Stationary
parallel coordinate descent with constant learning rate A diagonal matrix with positive entries Yes No Stationary
sequential coordinate descent (of various types) A matrix with exactly one nonzero entry, located on a diagonal. The location of the entry changes with the iteration. Yes Depends on the type of variant used Depends on the type of variant used
Newton's method for optimization of a function of multiple variables A symmetric positive-definite matrix No (not guaranteed) Yes Stationary
Approximate Newton's method with constant approximate inverse Hessian A symmetric positive-definite matrix that remains constant across iterations Not guaranteed in general, but some methods of this type may guarantee sparsity (indeed, we can think of gradient descent with constant learning rate and parallel coordinate descent with constant learning rate as being special cases of this where the "approximate inverse Hessian" is chosen to be a scalar and a diagonal matrix respectively). No Stationary