Bisection method: Difference between revisions

Revision as of 15:20, 26 April 2014

This article is about a root-finding algorithm. See all root-finding algorithms

Definition

The bisection method, also called the interval halving method, binary search method, and dichotomy method, is a root-finding algorithm.

Summary

Item	Value
Initial condition	works for a continuous function $f$ (or more generally, a function $f$ satisfying the intermediate value property) on an interval $[a,b]$ given that $f(a)$ and $f(b)$ have opposite signs.
Iterative step	At stage $i$ , we know that $f$ has a root in an interval $[a_{i},b_{i}]$ . We test $f$ at $(a_{i}+b_{i})/2$ . We then define the new interval $[a_{i+1},b_{i+1}]$ as the left half $[a_{i},(a_{i}+b_{i})/2]$ if the signs of $f$ at $a_{i}$ and $(a_{i}+b_{i})/2$ oppose one another, and as the right half otherwise.
Convergence rate	The size of the interval within which we are guaranteed to have a root halves at each step. The distance between the root and our "best guess" at any stage (say, the midpoint of the guaranteed interval) has an upper bound that halves at each step, so we have linear convergence.
Computational tools needed	Floating-point arithmetic to compute averages Ability to compute the value of a function at a point, or more minimalistically, determine whether the value is positive or negative.
Termination stage	We may terminate the algorithm based either on the size of the interval in the domain (i.e., we know that we are close to a root) or the closeness of the function value to zero. How we terminate depends on our goals.

Initial condition

The bisection method works for a continuous function $f$ (or more generally, a function $f$ satisfying the intermediate value property) on an interval $[a,b]$ given that $f(a)$ and $f(b)$ have opposite signs.

The bisection method can be used to find a root of a continuous function on a connected interval if we are able to locate two points in the domain of the function where it has opposite signs. We simply restrict the function to that domain and apply the method.

Iterative step

Let $a_{0}=a,b_{0}=b$ .

At stage $i$ for $i$ a nonnegative integer.

Prior knowledge:

$f(a_{i})$ and $f(b_{i})$ are both numerically distinguishable from zero (so they have defined signs, positive or negative) and they have opposite signs.
Combining that with the fact that $f$ is a continuous function, the intermediate value theorem tells us that $f$ has a root on $[a_{i},b_{i}]$ .

Iterative step:

Compute $f((a_{i}+b_{i})/2)$ .
If it is equal to (or numerically indistinguishable from) zero, then return $(a_{i}+b_{i})/2$ as the root and terminate the algorithm.
If $f((a_{i}+b_{i})/2)$ has sign opposite to $f(a_{i})$ (and therefore the same as $f(b_{i})$ ), then choose $a_{i+1}=a_{i}$ and $b_{i+1}=(a_{i}+b_{i})/2$ , so the new interval (for the next iteration) is $[a_{i+1},b_{i+1}]=[a_{i},(a_{i}+b_{i})/2]$ .
If $f((a_{i}+b_{i})/2)$ has sign opposite to $f(b_{i})$ (and therefore the same as $f(a_{i})$ ), the nchoose $a_{i+1}=(a_{i}+b_{i})/2$ and $b_{i+1}=b_{i}$ , so the new interval (for the next iteration) is $[a_{i+1},b_{i+1}]=[(a_{i}+b_{i})/2,b_{i}]$ .

Convergence rate

At stage $i$ , we can define our best guess $c_{i}$ as the midpoint $(a_{i}+b_{i})/2$ .

We can measure the convergence rate by looking at the size of the interval within which a root is guaranteed, and how this changes with time. We notice that the interval size halves at each stage, so that $|b_{n}-a_{n}|=|b-a|/2^{n}$ . The distance between $c_{i}$ and the location of an actual root is bounded from above by half the interval length, so this also asymptotically halves with each iteration (at worst, and on average).

This is a form of linear convergence.

Computational tools needed

We need to be able to compute halves of intervals. This is most easily done using floating-point binary arithmetic.
We also need to be able to compute function values at particular points, and determine the signs of these values. Note that we do not care about computing the function value per se but we do need reliable information about its size.

Termination

Domain-based termination

We may terminate the algorithm when the size of the interval within which a root is guaranteed has fallen below a certain pre-specified length $\ell$ . The number of steps for such termination can be predicted in advance as:

$\left\lceil \log _{2}\left({\frac {|b-a|}{\ell }}\right)\right\rceil$

At this stage, we may return the interval or the midpoint, depending on the desired format of the answer.

Output-based termination

We may terminate the algorithm once the value of the function at the midpoint is sufficiently close to zero.

The number of stages needed for such termination cannot be computed simply by knowing the length of the domain. However, if $f$ is differentiable and we have an upper bound $B$ on the magnitude of the derivative of $f$ , then we know that if we are within $\varepsilon /B$ distance of the root on the domain, the absolute value of the function value is at most $\varepsilon$ . We can therefore put an upper bound on the number of steps necessary.

$\left\lceil \log _{2}\left({\frac {|b-a|B}{\varepsilon }}\right)\right\rceil$

Selection of root found and sensitivity to interval choice

This finds only one root, not all roots

It's worth noting that this process finds only one root, not all roots. Consider, for instance, the function:

$f(x):=(x-1)(x-4)(x-5)$

on the interval $[0,6]$ . This is negative at 0 and positive at 6. At the midpoint 3, it is positive, so our first iteration picks the left half of the interval, namely $[0,3]$ . The process will then gradually converge to the root 1. The roots 4 and 5 get missed out. The reason they get missed out is that because an even number of them appeared in the test interval $[3,6]$ , we had the same sign of the function at both ends of the interval.

The root found may transition as we move either endpoint of the interval

Consider a function:

$g(x):=(x-1)(x-3)(x-5)$

Suppose we apply the bisection method to determine a root of $g$ on the interval $[0,a]$ where $a>5$ . Note that the sign of $g$ is negative at 0 and positive at $a$ , so the bisection method is applicable. However, what root it converges to depends on the value of $a$ . Explicitly:

If $5<a<6$ , then the first iteration yields the interval $[0,a/2]$ , with $a/2<3$ , and therefore we converge to the root 1 (which is the only root in the interval).
If $6<a<10$ , then the first iteration yields the interval $[a/2,a]$ , with $a/2>3$ , and therefore we converge to the root 5 (which is the only root in the interval).
More generally, if $5\cdot 2^{n}<a<6\cdot 2^{n}$ for $n$ a nonnegative integer, we converge to the root 1. On the other hand, if $6\cdot 2^{n}<a<10\cdot 2^{n}$ for $n$ a nonnegative integer, we converge to the root 5.

The qualitative behavior depends purely on the signum of the function

In order to determine how the bisection method works for a particular function $f$ , it suffices to know the function $\operatorname {sgn} \circ f$ , i.e., the composite of the signum function and $f$ . Explicitly, the function that predicts the way the bisection method will unfold is the function:

$(\operatorname {sgn} \circ f)(x):=\left\lbrace {\begin{array}{rl}1,&{\mbox{ if }}f(x)>0\\-1,&{\mbox{ if }}f(x)<0\\0,&{\mbox{ if }}f(x)=0\\\end{array}}\right.$

From the computational perspective, there is an important caveat to the above: what matters for the signum function is not whether the actual value of $f$ is positive, negative, or zero, but rather, whether the value as computed is definitely positive, definitely negative, or numerically indistinguishable from zero.

Modulo this computational caveat, two functions that are positive at the same places and negative at the same places would exhibit the same behavior with respect to the bisection method. In particular, if $q(x)=(p(x))^{3}$ , $p$ and $q$ behave identically under the bisection method.

The process sacrifices smart use of information about the function for a guaranteed convergence rate

By always picking the midpoint, we may be ignoring valuable information about just how far from zero the values of the functions at the endpoints are. There are some other methods that are better suited to making use of this information. However, these methods either work only for some functions or take more of a "gamble": they could go more horribly wrong. The methods include:

The case of polynomials

Polynomials with distinct roots

Consider a polynomial of the form:

$f(x)=g(x)(x-\alpha _{1})(x-\alpha _{2})\dots (x-\alpha _{n})$

on an interval $[a,b]$ where $g$ has constant sign on the interval (and in particular, has no root) and $a<\alpha _{1}<\alpha _{2}<\dots <\alpha _{n}<b$ . Also assume that $n$ is odd. Thus, the bisection method can be applied to the function $f$ .

In order to determine the value of $i$ for which the bisection method converges to $\alpha _{i}$ , it suffices to know the values:

$\left\{{\frac {\alpha _{1}-a}{b-a}},{\frac {\alpha _{2}-a}{b-a}},\dots ,{\frac {\alpha _{n}-a}{b-a}}\right\}$

Moreover, one of the following must be true for the $\alpha _{i}$ to which the bisection method converges:

The quotient ${\frac {\alpha _{i}-a}{b-a}}$ is a dyadic rational, i.e., it has the form $t/2^{s}$ for an integer $t$ and positive integer $s$ , and the bisection method terminates in finitely many steps at precisely $\alpha _{i}$ .
$i$ is odd, so there are an even number of roots in the interval $(a,\alpha _{i})$ and an even number of roots in the interval $(\alpha _{i},b)$ , and the bisection method converges to the root $\alpha _{i}$ but does not reach it in finitely many steps: The rationale for this case is that every time we narrow the interval, we discard either an even number of roots on the left or an even number of roots on the right.

Polynomials with some root repetition

Consider a polynomial of the form:

$f(x)=g(x)(x-\alpha _{1})^{m_{1}}(x-\alpha _{2})^{m_{2}}\dots (x-\alpha _{n})^{m_{n}}$

on an interval $[a,b]$ where $g$ has constant sign on the interval (and in particular, has no root) and $a<\alpha _{1}<\alpha _{2}<\dots <\alpha _{n}<b$ .

It is still the case that the behavior can be predicted by knowledge of:

$\left\{{\frac {\alpha _{1}-a}{b-a}},{\frac {\alpha _{2}-a}{b-a}},\dots ,{\frac {\alpha _{n}-a}{b-a}}\right\}$

along with knowledge of the tuple $(m_{1},m_{2},\dots ,m_{n})$ . But we can say something stronger:

In the cases where $m_{i}$ is even, we can divide $f$ by $(x-\alpha _{i})^{m_{i}}$ without affecting the qualitative behavior, unless ${\frac {\alpha _{i}-a}{b-a}}$ is a dyadic rational. If it is a dyadic rational, there is a possibility that the bisection method will land at that exact point and therefore converge to it. Other than that, it has no effect. From the numerical algorithm perspective, what matters is sufficient proximity to a dyadic rational with a sufficiently small denominator.
In the case where $m_{i}$ is odd, we can replace it by 1 without affecting the qualitative behavior of the bisection method.

@@ Line 131: / Line 131: @@
 <math>\left\{ \frac{\alpha_1 - a}{b - a}, \frac{\alpha_2 - a}{b - a}, \dots, \frac{\alpha_n - a}{b - a}\right\}</math>
-Moreover, the following is true: <math>i</math> must be odd. In other words, the convergence cannot be to an even-numbered root. This is because at every stage, the number of roots we discard on the left side or the right side is even.
+Moreover, one of the following must be true for the <math>\alpha_i</math> to which the bisection method converges:
+* The quotient <math>\frac{\alpha_i - a}{b - a}</math> is a dyadic rational, i.e., it has the form <math>t/2^s</math> for an integer <math>t</math> and positive integer <math>s</math>, '''and''' the bisection method terminates in finitely many steps at precisely <math>\alpha_i</math>.
+* <math>i</math> is odd, so there are an even number of roots in the interval <math>(a,\alpha_i)</math> and an even number of roots in the interval <math>(\alpha_i,b)</math>, '''and''' the bisection method converges to the root <math>\alpha_i</math> but does not reach it in finitely many steps: The rationale for this case is that every time we narrow the interval, we discard either an even number of roots on the left or an even number of roots on the right.
 ===Polynomials with some root repetition===