Newton's method for root-finding for a function of one variable
- 1 Definition
- 2 Related methods
- 3 Iterative step
- 4 Algebraic observations about step size
- 5 Termination
- 6 Failure analysis
- 7 Sensitivity to the form in which the function is presented
- 8 Convergence rate
- 8.1 Convergence for linear functions
- 8.2 Convergence for quadratic functions
- 8.3 Convergence in general, starting from sufficiently close by
- 8.4 Some cases where it's particularly easy to reason about convergence, even outside an immediate neighborhood
Newton's method (also known as the Newton-Raphson method) is a root-finding algorithm that can be applied to a differentiable function whose derivative function is known and can be calculated at any point. It is closely related to the secant method, but has the advantage that it requires only a single initial guess.
Note that on this page, we use parenthesized superscripts to denote the iterates of the method, rather than subscripts. This is to keep notation consistent with the multiple-variable case, where subscripts are used for coordinates. Parenthesized superscripts should not be confused with exponents.
|Method||Change in problem statement||Change in method|
|Newton's method for root-finding for a vector-valued function of a vector variable||We have multiple real-valued functions, each of multiple variables. We can model this as a vector-valued function of a vector variable.||Division by is replaced by left multiplication by the inverse of the Jacobian matrix of the vector-valued function.|
|Newton's method for optimization of a function of one variable||Instead of finding where the function takes the value zero, we are interested in determining the minimum value of a function.||We apply Newton's method to the derivative of the function, Thus, instead of the iteration , we have .|
|Newton's method for optimization of a function of multiple variables||Optimization, but now for a single real-valued function of multiple variables.||We apply Newton's method for root-finding to the gradient vector of the function. The step is therefore obtained by multiplying the inverse of the Hessian matrix with the gradient vector.|
The initial guess for the root needs to be supplied to initiate the method.
For any nonnegative integer , define:
Geometrically, if we consider the tangent line at to the graph of , then is the -coordinate of the point of intersection of this line with the -axis. In particular, Newton's method would converge in one step if were a linear function.
Some associated terminology:
- We call the quantity the step size. Large step sizes mean big changes to our guesses.
- The step direction is given by the sign of . The step direction depends on the signs of both and .
Algebraic observations about step size
Relation with logarithmic derivative and condition number
- The step size is the absolute value of the reciprocal of the logarithmic derivative of at .
- The fractional change incurred in each iteration is given by the condition number of at .
We typically choose to terminate based on a criterion of the form for some small enough .
The failure analysis could be of the following forms:
- The algorithm throws up an error at some point, i.e., we simply aren't able to compute for some .
- The algorithm cycles between a set of points and therefore does not make progress.
- The algorithm continues just fine without cycling, but it does not converge to a root.
- The algorithm converges, but very slowly, so that the convergence is not helpful in practice.
Throwing up an error
The algorithm throws up an error in computing if and only if is a critical point for the function. There are two subcases:
- The derivative exists and equals zero.
- The derivative is undefined.
In both cases, the formula for calculating does not work.
(Note that point of local extremum implies critical point, so if we start with, or reach, a point of local extremum for the function, we will throw up an error).
The case of a well-defined derivative that equals zero
The case of well-defined nonzero one-sided derivatives
Sensitivity to the form in which the function is presented
Unlike the case of the bisection method, the qualitative behavior of Newton's method is not invariant under pre-composition or post-composition with monotone transformations. This is because the size of the step moved depends on attributes of the function that are not preserved under arbitrary monotone transformations.
Invariance under scalar multiplication
The qualitative behavior of Newton's method (and in particular, the sequence of guesses we get by iterating the method with a given starting point) is invariant under scalar multiplication. In other words, the functions and , for a constant , behave the same way. This is modulo the caveat about numerical precision. Note, however, that given a particular function, multiplying it by a constant can never improve the precision (because we cannot get more information than already exists) so the key point remains that replacing the function by a scalar multiple doesn't help.
Effect of power maps
If we replace the function by a function of the form , then this affects Newton's method in the following predictable manner: the step sizes become smaller, even though the set of roots does not change. Therefore, we should expect convergence to be slower in general.
Explicitly, recall that the step size in Newton's method is the reciprocal of the logarithmic derivative. By the chain rule for logarithmic derivatives, post-composing with a power function just multiplies the logarithmic derivative by the condition number for the power function, which is . So, the Newton's method step gets divided by .
This suggests a remedy: if we know that our function has a zero of multiplicity , we deliberately adjust for the small step sizes of Newton's method by multiplying the suggested step size by .
Covariance under affine transformations of the domain
Consider a function and a a function for and . The functions behave the same way with respect to Newton's method, provided that we adjust the values of by the same affine transformation. In other words, the sequence of guesses for the second function are obtained by mapping the sequence of guesses for the first function under the mapping .
Convergence for linear functions
For a linear function, Newton's method reaches the exact value of the unique root in a single step, i.e., regardless of the value of , the value of is the desired root.
To see this explicitly, consider a linear function:
Then, for any initial value , we have:
Thus, we get:
Also, the unique solution to is .
Geometrically, Newton's method involves constructing the tangent line through the point on the graph corresponding to the current guess for the root, and determining the point of intersection of that line with the -axis.
For a linear function, the tangent line to the graph of the function through any point on it coincides with the graph of the function itself. Therefore, the intersection of the tangent line with the -axis coincides with the intersection of the graph with the -axis, and therefore with the root.
Convergence for quadratic functions
For further information, refer: Newton's method for quadratic functions
Based on the discussion on sensitivity to the form in which the function is presented, there are essentially three types of qualitative behavior:
|Case on nature of quadratic function||Relation between initial guess and root of convergence (assume that we do not start at a root)||Monotonicity of convergence||Order of convergence||Numerical rate of convergence for that order|
|One root with multiplicity two, i.e., the polynomial is for some ,||Converges to the root, but does not reach it in finitely many steps||monotonic from the outset||1||1/2|
|Two distinct real roots, i.e., the polynomial is for some , ,||Converges to the root it is closer to, but does not reach it in finitely many steps. In particular, converges to if , converges to . If , converges to .||monotonic after at most one initial step||2||1/2|
|No real root||No root to converge to, chaotic behavior||--||--||--|
Convergence in general, starting from sufficiently close by
- Newton's method converges quadratically from sufficiently close to a root of multiplicity one
- Newton's method converges linearly from sufficiently close to a root of finite multiplicity greater than one
Some cases where it's particularly easy to reason about convergence, even outside an immediate neighborhood
|Case||What we can conclude about convergence||Further remarks|
|The part of the domain of a function to the immediate right of a root where the function is increasing and concave up (applies as long as the function stays increasing and concave up). The same holds if "increasing and concave up" is replaced by "decreasing and concave down"||Newton's method converges to the root. The sequence of guesses monotonically decreases towards the root.||In particular, for a polynomial function that completely factors over the reals into linear factors, this holds for the right-most root: starting with any point to its right, one converges to that root.|
|The part of the domain of a function to the immediate left of a root where the function is increasing and concave down (applies as long as the function stays that way). The same holds if "increasing and concave down" is replaced by "decreasing and concave up"||Newton's method converges to the root. The sequence of guesses monotonically increases towards the root.||In particular, for a polynomial function that completely factors over the reals into linear factors, this holds for the left-most root: starting with any point to its left, one converges to that root.|
|The part of the domain of a function to the immediate left of the right-most root, that has the further property that the function is increasing and concave up on the immediate left (including where one starts) and the entire domain right of the root. (Everything continues to hold if we replace "increasing and concave up" with "decreasing and concave down")||Newton's method converges to the root. The first iteration of the method gets us to the right of the root, after which it monotonically decreases to the root.||In particular, for a polynomial function that completely factors over the reals into linear factors, the part of the domain to the right of the right-most root of the derivative converges to the right-most root of the function.|
|The part of the domain of a function to the immediate right of the left-most root, that has the further property that the function is increasing and concave down on the immediate right (including where one starts) and the entire domain left of the root. (Everything continues to hold if we replace "increasing and concave down" with "decreasing and concave up")||Newton's method converges to the root. The first iteration of the method gets us to the left of the root, after which it monotonically increases to the root.||In particular, for a polynomial function that completely factors over the reals into linear factors, the part of the domain to the left of the left-most root of the derivative converges to the right-most root of the function.|