BB Notes
BB Notes
Branch and bound algorithms are methods for global optimization in nonconvex prob-
lems [?, ?]. They are nonheuristic, in the sense that they maintain a provable upper and
lower bound on the (globally) optimal objective value; they terminate with a certificate prov-
ing that the suboptimal point found is ǫ-suboptimal. Branch and bound algorithms can be
(and often are) slow, however. In the worst case they require effort that grows exponentially
with problem size, but in some cases we are lucky, and the methods converge with much
less effort. In these notes we describe two typical and simple examples of branch and bound
methods, and show some typical results, for a minimum cardinality problem.
so f ⋆ = Φmin (Qinit ).
The algorithm will use two functions Φlb (Q) and Φub (Q) defined for any rectangle Q ⊆
Qinit . These functions must satisfy the following conditions. First, they are lower and upper
bounds on Φmin (Q), respectively: for any Q ⊆ Qinit ,
The second condition is (roughly) that the bounds become tight as the rectangle shrinks
to a point. More precisely, as the maximum half-length of the sides of Q, denoted by size(Q),
goes to zero, the difference between upper and lower bounds uniformly converges to zero,
i.e.,
∀ǫ > 0 ∃δ > 0 ∀Q ⊆ Qinit , size(Q) ≤ δ =⇒ Φub (Q) − Φlb (Q) ≤ ǫ.
1
Figure 1: Branch and bound example in R2 , after 3 iterations. The partition of
the original rectangle is shown at left; the associated binary tree is shown at right.
We mention a third condition, which is not needed to prove convergence of the branch
and bound algorithm, but is needed in practice: The functions Φub (Q) and Φlb (Q) should
be cheap to compute.
We now describe the algorithm. We start by computing
L1 = Φlb (Qinit ), U1 = Φub (Qinit ),
which are lower and upper bounds on f ⋆ , respectively. If U1 − L1 ≤ ǫ, the algorithm
terminates. Otherwise we partition Qinit into two rectangles Qinit = Q1 ∪ Q2 , and compute
Φlb (Qi ) and Φub (Qi ), i = 1, 2. Then we have new lower and upper bounds on f ⋆ ,
L2 = min{Φlb (Q1 ), Φlb (Q2 )} ≤ Φmin (Qinit ) ≤ U2 = min{Φub (Q1 ), Φub (Q2 )}.
If U2 − L2 ≤ ǫ, the algorithm terminates. Otherwise, we partition one of Q1 and Q2 into
two rectangles, to obtain a new partition of Qinit into three rectangles, and compute Φlb and
Φub for these new rectangles. We update the lower bound L3 as the minimum over the lower
bounds over the partition of Qinit , and similarly for the upper bound U3 .
At each iteration we split one rectangle into two, so after k iterations (counting the
evaluation of Φlb (Qinit ) and Φub (Qinit ) as the first one), we have a partition of Qinit into k
rectangles, Qinit = ∪ki=1 Qi . This partition can be represented by a partially filled binary
tree, with the root node corresponding to the original rectangle, and the children of a node
corresponding to the subrectangles obtained by partitioning the parent rectangle.
This is illustrated in figure 1. The original rectangle corresponds to the root of the
tree. We first partition Qinit vertically. At the next iteration, we horizontally partition the
lefthand child; at the third iteration, we vertically partition the bottom child. After three
iterations, we have a partition of the original rectangle into 4 rectangles, which correspond
to the leaves in the tree.
The associated lower and upper bounds on f ⋆ are
Lk = min Φlb (Qi ), Uk = min Φub (Qi ).
i=1,...,k i=1,...,k
2
We can assume that the lower and upper bounds of the pair of rectangles obtained by
splitting are no worse than the lower and upper bounds of the rectangle they were formed
from. This implies that Lk is nondecreasing, and Uk is nonincreasing.
To fully specify the algorithm, we need to give the rule for choosing which rectangle to
split at each step, and we have to specify which edge along which the rectangle is to be split.
One standard method for choosing the rectangle in the current partition to be split is to
choose one with the smallest lower bound, i.e., a rectangle that satisfies Φlb (Q) = Lk . Once
we choose the rectangle to split, we split it along any of its longest edges.
The algorithm is summarized below.
k = 0;
L0 = {Qinit };
L0 = Φlb (Qinit );
U0 = Φub (Qinit );
while Uk − Lk > ǫ, {
pick Q ∈ Lk for which Φlb (Q) = Lk ;
split Q along one of its longest edges into QI and QII ;
form Lk+1 from Lk by removing Qk and adding QI and QII ;
Lk+1 := minQ∈Lk+1 Φlb (Q);
Uk+1 := minQ∈Lk+1 Φub (Q);
k := k + 1;
}
The requirement that we split the chosen rectangle along a longest edge controls the
condition number of the rectangles in the partition, which in turn allows us to prove conver-
gence.
As the algorithm proceeds, we can eliminate some rectangles from consideration; they
can be pruned, since Φmin (Qinit ) cannot be achieved in them. This is done as follows. At
each iteration, we eliminate from the list Lk any rectangles that satisfy Φlb (Q) > Uk , since
every point in such a rectangle is worse than the current upper bound on f ⋆ .
Pruning does not affect the algorithm, since any rectangle pruned would never be selected
later for further splitting. But pruning can reduce storage requirements. With pruning, the
union of the list of current rectangles can be regarded as the set of possible minimizers of
f . The number of active rectangles, and their total volume, give some measure of algorithm
progress.
The term pruning comes from the following. The algorithm can be viewed as developing a
partially filled binary tree of rectangles representing the current partition Lk , with the nodes
corresponding to rectangles and the children of a given node representing the two halves
obtained by splitting it. The leaves of the tree give the current partition. By removing a
rectangle from consideration, we prune the tree.
3
1.1 Convergence analysis
We first show that after a large enough number of iterations, the partition Lk must contain
a rectangle of small volume. We then show that this rectangle has small size, and this in
turn implies that Uk − Lk is small.
The number of rectangles in the partition Lk is k, and the total volume of these rectangles
is vol(Qinit ), so
vol(Qinit )
min vol(Q) ≤ . (1)
Q∈Lk k
Thus, after a large number of iterations, at least one rectangle in the partition has small
volume. Next, we show that small volume implies small Q size for a rectangle in any partition.
We define the condition number of a rectangle Q = i [li , ui ] as
maxi (ui − li )
cond(Q) = .
mini (ui − li )
Our splitting rule, which requires that we split rectangles along a longest edge, results in
an upper bound on the condition number of rectangles in our partition: for any rectangle
Q ∈ Lk ,
cond(Q) ≤ max{cond(Qinit ), 2}. (2)
To prove this it is enough to show that when a rectangle Q is split into rectangles Q1 and
Q2 , we have
cond(Q1 ) ≤ max{cond(Q), 2}, cond(Q2 ) ≤ max{cond(Q), 2}.
Let νmax be the maximum edge length of Q, and let νmin be the minimum edge length of Q,
so cond(Q) = νmax /νmin . When Q is split into Q1 and Q2 , the maximum edge length of the
children is no more than νmax . The minimum edge length of the children, by our splitting
rule, is no smaller than the minimum of νmax /2 and νmin . It follows that the condition
number of the children is no more than the maximum of 2 and cond(Q).
We note that there are other splitting rules that also result in a uniform bound on the
condition number of the rectangles in any partition generated. One such rule is to cycle
through the index on which we split the rectangle. If Q was formed by splitting its parent
along the ith coordinate, then when we split Q, we split it along the (i + 1) modulo m
coordinate.
We can bound the size of a rectangle Q in terms of its volume and condition number,
since
Y
vol(Q) = (ui − li )
i
m−1
≥ max(ui − li ) min(ui − li )
i i
(2 size(Q))m
=
cond(Q)m−1
m
2 size(Q)
≥ .
cond(Q)
4
Thus,
1
size(Q) ≤ cond(Q)vol(Q)1/m . (3)
2
Combining equations (1), (2) and (3) we get
1/m
1 vol(Qinit )
min size(Q) ≤ max{cond(Qinit ), 2} . (4)
Q∈Lk 2 k
Thus the minimum size of the rectangles in the partition Lk converges to zero.
By assumption 2, there exists a δ such that any rectangle of size δ or smaller satisfies
Φub (Q) − Φlb (Q) ≤ ǫ. Take k large enough that at least one rectangle in the partition
has size not exceeding δ. Then at least one rectangle, say, Q, in partition Lk satisfies
Φub (Q)−Φlb (Q) ≤ ǫ. It follows than when this rectangle was added to the list, the algorithm
should have terminated.
minimize f0 (x, z)
subject to fi (x, z) ≤ 0, i = 1, . . . , m (5)
zj ∈ {0, 1}, j = 1, . . . , n,
where x and z are the optimization variables. The variables z1 , . . . , zn are, naturally, called
Boolean variables. We assume the functions fi , i = 0, . . . , n, are convex. This problem is
called a mixed Boolean convex problem. We denote the optimal value of this problem as p⋆ .
One way to solve this problem is by exhaustive search. That is, we solve 2n convex
optimization problems, one for each possible value of the (Boolean) vector z, and then choose
the smallest of these optimal values. This involves solving a number of convex problems that
is exponential in the size of the variable z. For n more than 30 or so, this is clearly not
possible.
We will use a branch and bound method to solve this problem. In the worst case, we end
up solving the 2n convex problems, i.e., carrying an exhaustive search. But with luck, this
does not occur.
The convex relaxation
minimize f0 (x, z)
subject to fi (x, z) ≤ 0, i = 1, . . . , m (6)
0 ≤ zj ≤ 1, j = 1, . . . , n,
with (continuous) variables x and z, is convex, and therefore easily solved. Its optimal value,
which we denote L1 , is a lower bound on the optimal value of (5). This lower bound can be
+∞ (in which case the original problem is surely infeasible) or −∞.
5
We can also get an upper bound on p⋆ using the relaxation. The simplest method is to
round each of the relaxed Boolean variables zi to 0 or 1. A more sophisticated approach is
to first round each of the relaxed Boolean variables zi to 0 or 1, and then, with these values
of zi fixed, solve the resulting convex problem in the variable x.
The upper bound obtained from either of these methods can be +∞, if this method fails
to produce a feasible point. (The upper bound can also be −∞, in which case the original
problem is surely unbounded below.) We’ll denote this upper bound by U1 . Of course, if we
have U1 − L1 ≤ ǫ, the required tolerance, we can quit.
Now we are going to branch. Pick any index k, and form two problems: The first problem
is
minimize f0 (x, z)
subject to fi (x, z) ≤ 0, i = 1, . . . , m
zj ∈ {0, 1}, j = 1, . . . , n
zk = 0
and the second problem is
minimize f0 (x, z)
subject to fi (x, z) ≤ 0, i = 1, . . . , m
zj ∈ {0, 1}, j = 1, . . . , n
zk = 1.
In other words, we fix the value of zk to 0 in the first problem, and 1 in the second. We
call these subproblems of the original, since they can be thought of as the same problem,
with one variable eliminated or fixed. Each of these subproblems is also a mixed Boolean
convex problem, with n − 1 Boolean variables. The optimal value of the original problem is
the smaller of the optimal values of the two subproblems.
We now solve the two convex relaxations of these subproblems to obtain a lower and upper
bound on the optimal value of each subproblem. We’ll denote these as L̃, Ũ (for zk = 0)
and L̄, Ū (for zk = 1). Each of these two lower bounds must be at least as large as L1 , i.e.,
min{L̃, L̄} ≥ L1 . We can also assume, without loss of generality, that min{Ũ , Ū } ≤ U1 .
From these two sets of bounds, we obtain the following bounds on p⋆ :
6
fixing a variable that is not fixed in the parent problem. An edge in the tree corresponds to
the value 0 or 1 of a particular variable zi . At the root node of the tree, none of the values of
the zi are specified. A node at depth k in the tree corresponds to a subproblem in which k of
the Boolean variables have fixed values. For each node, we have an upper and lower bound
on the optimal value of the subproblem. After k iterations, we have a tree with exactly k
leaves.
The minimum of the lower bounds, over all the leaf nodes, gives a lower bound on the
optimal value p⋆ ; similarly, the minimum of the upper bounds, over all the leaf nodes, gives
an upper bound on the optimal value p⋆ . We refer to these lower and upper bounds as Lk
and Uk , respectively. The global lower bound Lk is nondecreasing, and the global upper
bound Uk is nonincreasing, so we always have Uk+1 − Lk+1 ≤ Uk − Lk . We can terminate
the algorithm when Uk − Lk ≤ ǫ.
Proving convergence of the algorithm is trivial: it must terminate, with Uk = Lk , in
fewer than 2n iterations. To see this, note that if a leaf has depth n, it means that all the
Boolean variables are fixed in the subproblem, so by solving the convex relaxation we get
the exact solution. In other words, for any leaf of depth n, we have U = L. The worst thing
that can happen is that we develop a complete binary tree, to depth n, which requires 2n
iterations, at which point every subproblem lower and upper bound is equal, and therefore
the algorithm terminates. This is nothing more than exhaustive search.
We can carry out pruning, to save memory (but not time), as the algorithm progresses. At
iteration k we have the upper bound Uk on p⋆ . Any node in the tree that has a lower bound
larger than U can be pruned, i.e., removed from the tree, since all points corresponding to
that node are worse than the current best point found.
The choice of which leaf to split at a given stage in the algorithm is somewhat arbitrary.
The primary strategy is to split the leaf corresponding to the smallest lower bound. This
means we always get an improvement in the global lower bound. The drawback of this
scheme is that in some situations (such as the cardinality example in the next section) we
may achieve a tight lower bound before getting any optimal feasible points.
The same can be said for the choice of index to use to split a leaf. It can be any index
corresponding to a Boolean variable that has not yet been fixed. One heuristic is to split
along a variable k for which |zk⋆ − 1/2| is smallest, where zk⋆ is the optimal point found in
the relaxation. This corresponds to splitting a variable that is most ‘undecided’ (between
0 and 1) in the relaxation. In some problems, the opposite heuristic works well: we split a
variable that is most ‘decided’. To do this, we choose k for which |zk⋆ − 1/2| is maximum.
In most cases, many of the variables zk⋆ will be 0 or 1; among these, we can choose the one
with the largest associated Lagrange multiplier. (The idea here is that this is the relaxed
variable that is 0 and 1, and has the highest pressure to stay there.)
7
[−0.143, ∞)
z1 = 0 z1 = 1
[∞, ∞] [0.2, ∞]
z2 = 0 z2 = 1
[∞, ∞) [1, 1]
with variables z1 , z2 , and z3 . This problem is small enough to solve by exhaustive enumer-
ation. Out of 8 possible 3-dimensional Boolean vectors, just one is feasible. This optimal
point is z = (1, 1, 1).
Figure 2 shows the binary tree corresponding to the branch and bound algorithm, which
terminates in 2 iterations, after 5 LP relaxations are solved. Each node shows the lower and
upper bound, and each edge shows the variable that is fixed, and its value. At the first node
we solve the relaxed problem and obtain the lower bound L1 = −0.143. Rounding the zi⋆
found in the relaxation does not, however, yield a feasible solution, so the upper bound is
U1 = ∞.
Setting z1 = 0 we find that the relaxed problem is infeasible, so we have L = ∞ (and
therefore U = ∞). At this point we have determined that, if there is any solution, it must
satisfy z1 = 1. Now we explore the z1 = 1 branch. Solving the associated relaxed LP
we obtain the new lower bound 0.2, an improvement over the previous global lower bound
−0.143. Once again, when we round zi⋆ , we get an infeasible point, so the upper bound at
this node is still ∞. At this point, two iterations into the algorithm, having solved a total
of 3 LPs, we find that L2 = 0.2 and U2 = ∞. (This means that so far, we are not even sure
the problem is feasible.)
Now we split on the variable z2 . Setting z2 = 1 we get a relaxed optimal value of 1. In
the relaxed problem, we have z3⋆ = 1, so (even without rounding) we obtain a feasible point
with objective 1. Thus, the lower and upper bounds at this node are both 1. When we set
z2 = 0, the relaxed problem is infeasible. For this node, then, we have L = U = ∞.
8
After this third iteration we have
L3 = min{∞, ∞, 1} = 1, U3 = min{∞, ∞, 1} = 1,
which means we have solved the problem globally.
Of course for a simple problem like this we could have simply checked the 8 possible
Boolean vectors for feasibility, and evaluated the objective function of those that are feasible.
This example was meant only to illustrate the branch and bound method.
9
which is equivalent to
Pn
minimize i=1 ((1/Ui )(xi )+ + (−1/Li )(xi )− )
subject to Ax b
(assuming Li < 0 and Ui > 0, for i = 1, . . . , n). The objective here is an asymmetric weighted
ℓ1 -norm. Minimizing the ℓ1 -norm over a convex set is a well known heuristic for finding a
sparse solution. We can improve on the lower bound found by the relaxation. Since card(x)
is integer valued, if we know card(x) ≥ l, we know that card(x) ≥ ⌈l⌉, the smallest integer
greater than or equal to l. (For example, if we know that card(x) ≥ 23.4, then in fact we
know that card(x) ≥ 24.)
We find an upper bound on the mixed Boolean-LP (8) by simply recording the cardinality
of the x obtained in the relaxed problem (9).
At each iteration we split a node with the lowest lower bound. This will be the optimal
strategy if we find an optimal point before proving the lower bound. Less clear is the choice of
variable to split; empirical results suggest that splitting along the ‘most ambiguous’ variable
in the relaxation, i.e., zj , where j = argmini (|zi − 1/2|), is an efficient strategy.
10
Figure 3: Tree with 30 variable problem, after 3 iterations (top left), 5 iterations
(top right), 10 iterations (bottom left), and 124 iterations (bottom right).
20
18
cardinality
16
14
upper bound
lower bound
ceil(lower bound)
12
0 20 40 60 80 100 120
iteration
11
100
10-1
10-2
0 20 40 60 80 100 120
iteration
60
50
number of leaves on tree
40
30
20
10
0
0 20 40 60 80 100 120
iteration
12
107 possible values of z ‘in play’.
Figure 6 shows the total number of active (non-pruned) leaves in the tree, which is the
size of the partition, versus iteration number. After k iterations, we can have at most k
leaves (which occurs only if no pruming occurs). So we can see that in this example, we have
typically pruned around half of the leaves.
For this example, direct enumeration of all sparsity patterns is impractical (or at least,
very slow compared to branch and bound), since it would require the solution of 230 ≈ 109
LPs. We must count ourselves lucky to have solved the problem with (provable) global
optimality by solving only a few hundred LPs.
13
40
35
cardinality
30
25
20
upper bound
lower bound
ceil(lower bound)
15
0 500 1000 1500 2000 2500 3000 3500 4000
iteration
Larger example
We show the same plots for a larger example with 50 variables. It takes 3665 iterations to
find the optimal cardinality of 32. This took approximately 20 minutes on a Linux system
with an Intel Core 2 Duo 1.67 GHz and 2 GB of RAM. Solving the problem by brute
force would, in comparison, involve, as a minimum, eliminating all sparsity patterns with
cardinality 31. In this example, we know that six of the variables cannot possibly be zero,
44
so we need to eliminate 31 > 1010 sparsity patterns by solving an LP for each. This is
obviously effectively impossible.
This is the same example used in the lecture on ℓ1 -norm methods for cardinality problems.
The simple ℓ1 -norm heuristic yields a point with cardinality 44 (by solving one LP); using
the iterated ℓ1 -norm heuristic, we obtain a point with cardinality 36 (by solving 4 LPs).
The branch and bound method does not produce a better point than this until around 480
iterations (which requires the solution of 1061 LPs), when a point with cardinality 35 is
found. It takes around 1200 iterations before an optimal point is found, and another 2500
iterations after that to prove that it is optimal.
14
100
10-2
10-3
10-4
10-5
0 500 1000 1500 2000 2500 3000 3500 4000
iteration
1600
1400
number of leaves on tree
1200
1000
800
600
400
200
0
0 500 1000 1500 2000 2500 3000 3500 4000
iteration
15
190
180
170
cardinality 160
150
140
130
120
upper bound
110 lower bound
ceil(lower bound)
100
0 1000 2000 3000 4000 5000
iteration
16
100
10-2
10-3
10-4
10-5
10-6
10-7
10-8
10-9
10-10
10-11
10-12
0 1000 2000 3000 4000 5000
iteration
Figure 11: Portion of solutions remaining possible (not eliminated), 200 variables.
1000
800
number of leaves on tree
600
400
200
0
0 1000 2000 3000 4000 5000
number of iterations
17
Acknowledgments
Arpita Ghosh and Alessandro Magnani helped with an earlier version of this set of notes.
18