CSE488_Lab6_Optimization
CSE488_Lab6_Optimization
Optimization II
1 Introduction to optimization
In general, optimization is the process of finding and selecting the optimal element from a set
of feasible candidates. In mathematical optimization, this problem is usually formulated as deter-
mining the extreme value of a function on a given domain. An extreme value, or an optimal value,
can refer to either the minimum or maximum of the function, depending on the application and
the specific problem. In this lab we are concerned with the optimization of realvalued functions
of one or several variables, which optionally can be subject to a set of constraints that restricts the
domain of the function. The applications of mathematical optimization are many and varied, and
so are the methods and algorithms that must be employed to solve optimization problems. Since
optimization is a universally important mathematical tool, it has been developed and adapted for
use in many fields of science and engineering, and the terminology used to describe optimization
problems varies between fields. For example, the mathematical function that is optimized may be
called a cost function, loss function, energy function, or objective function, to mention a few. Here
we use the generic term objective function. Optimization is closely related to equation solving
because at an optimal value of a function, its derivative, or gradient in the multivariate case, is
zero.
In this lab we discuss using SciPy’s optimization module optimize for nonlinear optimization
problems, and we will briefly explore using the convex optimization library cvxopt for linear
optimization problems with linear constraints. This library also has powerful solvers for quadratic
programming problems.
cvxopt: The convex optimization library cvxopt provides solvers for linear and quadratic opti-
mization problems.
Page 1
CSE488: Computational Intelligence Lab08: Optimization II
for the multivariate case, can be used to determine if a stationary point is a local minimum or
not. In particular if the second-order derivative is positive, or the Hessian positive definite, when
evaluated at stationary point x*, then x* is a local minimum. Negative second-order derivative, or
negative definite Hessian, corresponds to a local maximum, and a zero second-order derivative,
or an indefinite Hessian, corresponds to saddle point.
CSE488: Computational Intelligence Lab08: Optimization II
Exercise 1 As an example for illustrating these techniques, consider the following classic opti-
mization problem:
Minimize the area of a cylinder with unit volume.
Here, suitable variables are the radius r and height h of the cylinder, and the objective function is
f ([r, h]) = 2πr2 + 2πrh, subject to the equality constraint g([r, h]) = πr2 h − 1 = 0. As this prob-
lem is formulated here, it is a two-dimensional optimization problem with an equality constraint.
However, we can algebraically solve the constraint equation for one of the dependent variables,
CSE488: Computational Intelligence Lab08: Optimization II
for example, h = 1/πr2 , and substitute this into the objective function to obtain an unconstrained
one-dimensional optimization problem: f (r ) = 2πr2 + 2/r.
To solve this problem using SciPy’s numerical optimization functions, we first define a Python
function f that implements the objective function. To solve the optimization problem, we then
pass this function to, for example, optimize.brent. Optionally we can use the brack keyword
argument to specify a starting interval for the algorithm:
[48]: 0.5419260772557135
[49]: f(r_min)
[49]: 5.535810445932086
Instead of calling optimize.brent directly, we could use the generic interface for scalar minimiza-
tion problems optimize.minimize_scalar. Note that to specify a starting interval in this case, we
must use the bracket keyword argument:
All these methods give that the radius that minimizes the area of the cylinder is approximately
0.54 ( and a minimum area of approximately 5.54. The objective function that we minimized in
this example is plotted below, where the minimum is marked with a red star. When possible,
it is a good idea to visualize the objective function before attempting a numerical optimization,
because it can help in identifying a suitable initial interval or a starting point for the numerical
optimization routine.
[52]: r = np.arange(0,2,0.01)
f_r = f(r)
plt.plot(r,f_r)
plt.plot(r_min,f(r_min),'r*',markersize=12)
F:\Users\Yehia\anaconda3\envs\py3\lib\site-packages\ipykernel_launcher.py:2:
RuntimeWarning: divide by zero encountered in true_divide
CSE488: Computational Intelligence Lab08: Optimization II
steepest descent method, the gradient has been replaced with the gradient multiplied from the
left with the inverse of Hessian matrix for the function. In general this alters both the direction
and the length of the step, so this method is not strictly a steepest descent method and may not
converge if started too far from a minimum. However, when close to a minimum, it converges
quickly. As usual there is a trade-off between convergence rate and stability. As it is formulated
here, Newton’s method requires both the gradient and the Hessian of the function
In SciPy, Newton’s method is implemented in the function optimize.fmin_ncg. This function
takes the following arguments: a Python function for the objective function, a starting point, a
Python function for evaluating the gradient, and (optionally) a Python function for evaluating the
Hessian.
Exercise 2 To see how this method can be used to solve an optimization problem, we consider
the following problem:
min x f ( x )
where the objective function is
To apply Newton’s method, we need to calculate the gradient and the Hessian. For this particular
case, this can easily be done by hand. However, it can also be computed using sympy library
Now the functions f, f_diff, and f_H are vectorized Python functions on the form that, for example,
optimize.fmin_ncg expects, and we can proceed with a numerical optimization of the problem at
hand by calling this function. In addition to the functions that we have prepared, we also need to
give a starting point for the Newton method. Here we use (0, 0) as the starting point.
[10]: x_opt
CSE488: Computational Intelligence Lab08: Optimization II
The routine found a minimum point at ( x1 , x2 ) = (1.88292613, 1.37658523), and diagnostic infor-
mation about the solution was also printed to standard output, including the number of iterations
and the number of function, gradient, and Hessian evaluations that were required to arrive at the
solution.
[53]: def f_plot(X,Y):
return (X-1)**4 + 5*(Y-1)**2 - 2*X*Y
In practice, it may not always be possible to provide functions for evaluating both the gradient
and the Hessian of the objective function, and often it is convenient with a solver that only re-
quires function evaluations. For such cases, several methods exist to numerically estimate the
gradient or the Hessian or both. Methods that approximate the Hessian are known as quasi-
CSE488: Computational Intelligence Lab08: Optimization II
Newton methods, and there are also alternative iterative methods that completely avoid using the
Hessian. Two popular methods are the Broyden-Fletcher-Goldfarb-Shanno (BFGS) and the con-
jugate gradient methods, which are implemented in SciPy as the functions optimize.fmin_bfgs
and optimize.fmin_cg.
The BFGS method is a quasi-Newton method that can gradually build up numerical estimates of
the Hessian, and also the gradient, if necessary. The conjugate gradient method is a variant of the
steepest descent method and does not use the Hessian, and it can be used with numerical estimates
of the gradient obtained from only function evaluations. With these methods, the number of func-
tion evaluations that are required to solve a problem is much larger than for Newton’s method,
which on the other hand also evaluates the gradient and the Hessian. Both optimize.fmin_bfgs
and optimize. fmin_cg can optionally accept a function for evaluating the gradient, but if not
provided, the gradient is estimated from function evaluations.
The preceding problem given, which was solved with the Newton method, can also be solved
using the optimize.fmin_bfgs and optimize.fmin_cg, without providing a function for the Hes-
sian:
[13]: x_opt = optimize.fmin_bfgs(f, (0, 0), fprime=f_diff)
[14]: x_opt
[16]: x_opt
Note that here, as shown in the diagnostic output from the optimization solvers in the preceding
section, the number of function and gradient evaluations is larger than for Newton’s method. As
already mentioned, both of these methods can also be used without providing a function for the
gradient as well, as shown in the following example using the optimize.fmin_bfgs solver:
[18]: x_opt
Notes:
In general, the BFGS method is often a good first approach to try, in particular if neither the gra-
dient nor the Hessian is known. If only the gradient is known, then the BFGS method is still the
generally recommended method to use, although the conjugate gradient method is often a com-
petitive alternative to the BFGS method. If both the gradient and the Hessian are known, then
Newton’s method is the method with the fastest convergence in general. However, it should be
noted that although the BFGS and the conjugate gradient methods theoretically have slower con-
vergence than Newton’s method, they can sometimes offer improved stability and can therefore
be preferable.
Each iteration can also be more computationally demanding with Newton’s method compared
to quasi-Newton methods and the conjugate gradient method, and especially for large problems,
these methods can be faster in spite of requiring more iterations.
Exercise 3 To illustrate this method, consider the problem of minimizing the function
which has a large number of local minima. This can make it tricky to pick a suitable initial point
for an iterative solver. To solve this optimization problem with SciPy, we first define a Python
function for the objective function:
To systematically search for the minimum over a coordinate grid, we call optimize. brute with
the objective function f as the first parameter and a tuple of slice objects as the second argument,
CSE488: Computational Intelligence Lab08: Optimization II
one for each coordinate. The slice objects specify the coordinate grid over which to search for
a minimum value. Here we also set the keyword argument finish=None, which prevents the
optimize.brute from automatically refining the best candidate.
[68]: f(x_start)
[68]: -9.5
On the coordinate grid specified by the given tuple of slice objects, the optimal point is ( x1 , x2 ) =
(1.5, 1.5), with corresponding objective function minimum -9.5. This is now a good starting point
for a more sophisticated iterative solver, such as optimize. fmin_bfgs:
[70]: x_opt
[71]: f(x_opt)
[71]: -9.520229273055016
Here the BFGS method gave the final minimum point (x1,x2) = (1.47586906,1.48365788), with the
minimum value of the objective function -9.52022927306. For this type of problem, guessing the
initial starting point easily results in that the iterative solver converges to a local minimum, and
the systematic approach that optimize.brute provides is frequently useful.
In this section, we have explicitly called functions for specific solvers, for example, opti-
mize.fmin_bfgs. However, like for scalar optimization, SciPy also provides a unified inter-
face for all multivariate optimization solver with the function optimize.minimize, which dis-
patches out to the solver-specific functions depending on the value of the method keyword ar-
gument (remember, the univariate minimization function that provides a unified interface is
optimize.scalar_minimize). For clarity, here we have favored explicitly calling functions for
specific solvers, but in general it is a good idea to use optimize.minimize, as this makes it easier to
switch between different solvers.
For example, to solve the previous example, we could have used the following:
[4.61008275e-06, 1.63490348e-02]])
jac: array([-7.15255737e-07, -7.15255737e-07])
message: 'Optimization terminated successfully.'
nfev: 21
nit: 4
njev: 7
status: 0
success: True
x: array([1.47586906, 1.48365787])
[75]: result.x
f ( x, β) = β 0 + β 1 exp (− β 2 x2 )
Once the model function is defined, we generate randomized data points that simulate the obser-
vations.
[29]: xdata = np.linspace(0, 5, 50)
y = f(xdata, *beta)
ydata = y + 0.05 * np.random.randn(len(xdata))
With the model function and observation data prepared, we are ready to start solving the nonlin-
ear least square problem. The first step is to define a function for the residuals given the data and
the model function, which is specified in terms of the yet-to-be determined model parameters β
Here the best fit is quite close to the true parameter values (0.25, 0.75, 0.5), as defined earlier. By
plotting the observation data and the model function for the true and fitted function parameters,
we can visually confirm that the fitted model seems to describe the data well
The SciPy optimize module also provides an alternative interface to nonlinear least square
fitting, through the function optimize.curve_fit. This is a convenience wrapper around
optimize.leastsq, which eliminates the need to explicitly define the residual function for the
least square problem. The previous problem could therefore be solved more concisely using the
following:
Exercise 5 As an example of solving a bounded optimization problem with the L-BFGS-B solver,
consider minimizing the objective function
f ( x ) = ( x1 − 1)2 − ( x2 − 1)2
ax.add_patch(bound_rect)
ax.set_xlabel(r"$x_1$", fontsize=18)
ax.set_ylabel(r"$x_2$", fontsize=18)
plt.colorbar(c, ax=ax)
Constraints that are defined by equalities or inequalities that include more than one variable are
somewhat more complicated to deal with. However, there are general techniques also for this
type of problems. One of these techniques is Lagrange multipliers. This method can be extended
to handle inequality constraints as well, and there exist various numerical methods of apply-
ing this approach. One example is the method known as sequential least square programming,
abbreviated as SLSQP, which is available in the SciPy as the optimize.slsqp function and via
optimize.minimize with method='SLSQP'. The optimize.minimize function takes the keyword ar-
gument constraints, which should be a list of dictionaries that each specifies a constraint. The
allowed keys (values) in this dictionary are type (‘eq’ or ‘ineq’), fun (constraint function), jac (Ja-
cobian of the constraint function), and args (additional arguments to constraint function and the
function for evaluating its Jacobian).
Exercise 6 To illustrate this technique, consider the problem of maximizing the volume of a rect-
angle with sides of length x0 , x1 , and x2 , subject to the constraint that the total surface area should
be unity:
g( x ) = 2x1 x2 + 2x0 x2 + 2x1 x0 − 1 = 0
.
[36]: def f(X):
return -X[0] * X[1] * X[2]
def g(X):
return 2 * (X[0]*X[1] + X[1] * X[2] + X[2] * X[0]) - 1
Note that since the SciPy optimization functions solve minimization problems, and here we are
CSE488: Computational Intelligence Lab08: Optimization II
interested in maximization, the function f is here the negative of the original objective function.
Next we define the constraint dictionary for g( x ) = 0 and finally call the optimize.minimize func-
tion
[37]: constraint = dict(type='eq', fun=g)
result = optimize.minimize(f, [0.5, 1, 1.5],
,→method='SLSQP',constraints=[constraint])
result
[38]: result.x
To solve problems with inequality constraints, all we need to do is to set type='ineq' in the
constraint dictionary and provide the corresponding inequality function.
( x0 − 1)2 + ( x1 − 1)2
g( x ) = x1 − 1.75 − ( x0 − 0.75)4 ≥ 0
.
As usual, we begin by defining the objective function and the constraint function, as well as the
constraint dictionary:
Next, we are ready to solve the optimization problem by calling the optimize.minimize function.
For comparison, here we also solve the corresponding unconstrained problem.
CSE488: Computational Intelligence Lab08: Optimization II
For optimization problems with only inequality constraints, SciPy provides an alternative solver
using the constrained optimization by linear approximation (COBYLA)method. This solver is ac-
cessible either through optimize.fmin_cobyla or optimize. minimize with method='COBYLA'.
The previous example could just as well have been solved with this solver, by replacing
method=‘SLSQP’ with method=‘COBYLA’.
CSE488: Computational Intelligence Lab08: Optimization II
f ( x ) = − x0 + 2x1 − 3x2
x0 + x1 ≤ 1, − x0 + 3x1 ≤ 2, and − x1 + x2 ≤ 3
1 1 0
. On the standard form, we have c = (−1, 2, −3), b = (1, 2, 3), and A = −1 3 0
0 −1 1
To solve this problem, here we use the cvxopt library, which provides the linear programming
solver with the cvxopt.solvers.lp function. This solver expects as arguments the c, A, and b
vectors and matrix used in the standard form introduced in the preceding text, in the given order.
The cvxopt library uses its own classes for representing matrices and vectors, but fortunately they
are interoperable with NumPy arrays via the array interface and can therefore be cast from one
form to another using the cvxopt.matrix and np.array functions.
To solve the stated example problem using the cvxopt library, we therefore first create NumPy
arrays for the A matrix and the c and b vectors and convert them to cvxopt matrices using the
cvxpot.matrix function:
[82]: c = np.array([-1.0, 2.0, -3.0])
A = np.array([[ 1.0, 1.0, 0.0], [-1.0, 3.0, 0.0],[ 0.0, -1.0, 1.0]])
b = np.array([1.0, 2.0, 3.0])
A_ = cvxopt.matrix(A)
b_ = cvxopt.matrix(b)
c_ = cvxopt.matrix(c)
CSE488: Computational Intelligence Lab08: Optimization II
The cvxopt compatible matrices and vectors c_, A_, and b_ can now be passed to the linear pro-
gramming solver cvxopt.solvers.lp:
[84]: sol
[45]: x = np.array(sol['x'])
x
[45]: array([[0.25],
[0.75],
[3.75]])
[46]: -10.0
The solution to the optimization problem is given in terms of the vector x, which in this particular
example is x = (0.25, 0.75, 3.75), which corresponds to the f ( x ) value −10. With this method
and the cvxopt.solvers.lp solver, linear programming problems with hundreds or thousands
of variables can readily be solved. All that is needed is to write the optimization problem on the
standard form and create the c, A, and b arrays.