0% found this document useful (0 votes)
103 views53 pages

002 2012 Intro To Optimal Control

This document provides an introduction to dynamic optimization problems. It discusses that dynamic optimization problems determine the optimal time path for choice variables over time, as opposed to static optimization which finds single optimal values. A simple 2-period consumption model is presented as an example dynamic optimization problem with state variables that depend on previous period's choices. Key aspects of dynamic optimization problems are defined, including state variables, state equations, control/choice variables, and how the objective is to maximize benefits over the time horizon while satisfying constraints. Reformulating the 2-period problem in terms of state variables and state equations is discussed as a way to solve dynamic problems instead of brute force simultaneous optimization.

Uploaded by

t t
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views53 pages

002 2012 Intro To Optimal Control

This document provides an introduction to dynamic optimization problems. It discusses that dynamic optimization problems determine the optimal time path for choice variables over time, as opposed to static optimization which finds single optimal values. A simple 2-period consumption model is presented as an example dynamic optimization problem with state variables that depend on previous period's choices. Key aspects of dynamic optimization problems are defined, including state variables, state equations, control/choice variables, and how the objective is to maximize benefits over the time horizon while satisfying constraints. Reformulating the 2-period problem in terms of state variables and state equations is discussed as a way to solve dynamic problems instead of brute force simultaneous optimization.

Uploaded by

t t
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

1.

An introduction to dynamic optimization -- Optimal Control


002 Math Econ - Summer 2012

I. Overview of optimization
Optimization is the unifying paradigm in almost all economic analysis. So before we
start, let’s think about optimization. The tree below provides a very nice general
representation of the range of optimization problems that you might encounter. There
are two things to take from this. First, all optimization problems have a great deal in
common: an objective function, constraints, and choice variables. Second, there are
lots of different types of optimization problems and how you solve them will depend
on the branch on which you find yourself.

In this part of course we will use both analytical & numerical methods to solve certain
class of optimization problems. This class focuses on a set of optimization problems
that have two common features: the objective function is a linear aggregation over
time, and a set of variables, called the state variables, are constrained across time.
And so we begin …

Static Optimization: single optimal magnitude for each choice variable and does not
entail a schedule of optimal sequence of action.
Dynamic Optimization: it takes the form of an optimal time path for every choice
variable (today, tomorrow etc.), and determines the optimal magnitude thereby.

II. Introduction – A simple 2-period consumption model


Consider the simple consumer's optimization problem:
max u(z a , zb )
z

s.t. pa za + pb zb ≤ x
[pay attention to the notation: z is the vector of choice variables and x is the
consumer's exogenously determined income.]

Solving the one-period problem should be familiar to you. What happens if the
consumer lives for two periods, but has to survive off of the income endowment
provided at the beginning of the first period? That is, what happens if her problem is
max U(z1a , z1b , z2a , z2b ) = U(z1 , z2 )
z

s.t. p' z1 + p' z2 ≤ x1


where the constraint uses matrix notation with p = [pa , pb] refers to a price vector
and z1 = [ z1a , z1b]. We now have a problem of dynamic optimization. When we chose
z1, we must take into account how it will affect our choices in period 2.

We're going to make a huge (though common) assumption and maintain that
assumption throughout the course: utility is additively separable across time1:
u(z) = u(z1) + u(z2)

Clearly one way to solve this problem would be just as we would a standard static
problem: set up a Lagrangian and solve for all optimal choices simultaneously. This
may work here, where there are only 2 periods, but if we have 100 periods (or even an
infinite number of periods) then this could get really messy. This course will develop
methods to solve such problems.
  2  

The Dynamic Optimization problem has 4 basic ingredients –


1. A given initial point and a given terminal point; X(0) & X(T)
2. A set of admissible paths from the initial point to the terminal point; 0 & T
3. A set of path values serving as performance indices (cost, profit, etc.)
associated with the various paths; and
4. A specified objective - either to maximize or to minimize the path value or
performance index by choosing the optimal path.

The Concept of a Functional


The relationship between paths and path values deserves our close attention, for it
represents a special sort of mapping-not a mapping from real numbers to real numbers
as in the usual function, but a mapping from paths (curves) to real numbers
(performance indices). Let us think of the paths in question as time paths, and denote
them by YI(t), YII(t), and so on and VI, VII represent the associated path values. The
general notation for the mapping should therefore be V[y(t)]. But it must be
emphasized that this symbol fundamentally differs from the composite-function
symbol g[f(x)]. In the latter, g is a function of f, and f is in turn a function of x; thus, g
is in the final analysis a function of x. In the symbol V[y(t)], on the other hand, the
y(t) component comes as an integral unit-to indicate time paths-and therefore we
should not take V to be a function of t. Instead, V should be understood to be a
function of ''y(t)'' as such.

This is a good point to introduce some very important terminology:


• All dynamic optimization problems have a time horizon. In the problem above
t is discrete, t={1,2}, but t can also be continuous, taking on every value
between t0 and T, and we can solve problems where T → ∞
• xt is what we call a state variable because it is the state that the decision-maker
faces in period t. Note that xt is parametric (i.e., it is taken as given) to the
decision-maker's problem in t, and xt+1 is parametric to the choices in period
t+1. However, xt+1 is determined by the choices made in t. The state variables in
a problem are the variables upon which a decision maker bases his or her
choices in each period. Another important characteristic of state variables is that
typically the choices you make in one period will influence the value of the state
  3  

variable in the next period.


• A state equation defines the intertemporal changes in a state variable.
• zt is the vector of t th period control (or choice) variables. Choice variables
determine the (expected) payoff in the current period and the (expected) state
next period.
• pa and pb are parameters of the model. They are held constant or change
exogenously and deterministically over time.
• Finally, we have what is called intermediate variables. These are variables that
are really functions of the state and control variables and the parameters. For
example, in the problem considered here, one-period utility might be carried as
an intermediate variable. In firm problems, production or profit might be other
intermediate variables while productivity or profitability (a firm’s capacity to
generate output or profits) could be state variables. Do you see the difference?
This is very important. When you formulate a problem it is very important to
distinguish state variables from intermediate variables.
• The benefit function [here u(zt)] tells the instantaneous or single period net
benefits that accrue to the planner during the planning horizon. Despite its
name, the benefit function can take on positive or negative values. For example,
a function that defines the cost in each period can be the benefit function.
• In many problems there are benefits (or costs) that accrue after the planning
horizon. This is captured in models by including a salvage value, which is
usually a function of the terminal stock. Since the salvage value occurs after the
planning horizon, it can not be a function of the control variables, though it can
be a separate optimization problem in which choices are made.
• The sum (or integral) over the planning horizon plus the salvage value
determines the objective function(al). We usually use discounting when we
sum up over time.
• All of the problems that we will study in this course fall into the general
category of Markov decision processes (MDP). In an MDP the probability
distribution over the states in the next period is wholly determined by the
current state and current actions. One important implication of limiting
ourselves to MDPs is that, typically, history does not matter, i.e. xt+1 depends on
zt and xt, irrespective of the value of xt-1. When history is important in a problem
then the relevant historical variables must be explicitly included as state
variables.

In sum, the problems that we will study will have the following features. In each
period or moment in time the decision maker looks at the state variables (xt), then
chooses the control variables (zt). The combination of xt and zt generates immediate
benefits and costs. They also determine the probability distribution over x in the next
period or moment.

Instead of using brute force to find the solutions of all the z’s in one step, we
reformulate the problem. Let x1 be the endowment which is available in period 1, and
x2 be the endowment that remains in period 2. Following from the budget constraint,
we can see that x2= x1 – p'z1, with x2 ≥ 0. In this problem x2 defines the state that the
decision maker faces at the start of period 2. The equation which describes the change
in the x from period 1 to period 2, x2 –x1= - p'z1, is called the state equation. This
equation is also sometimes referred to as the equation of motion or the transition
equation.
  4  

We now rewrite our consumer’s problem, this time making use of the state equation:
2
max ∑ ut (zt ) s.t.
zt
t=1

xt+1 − xt = − p' zt ⎫
⎬ t = 1,2 (1.1)
xt+1 ≥ 0 ⎭
xt is fixed.
We now have a nasty little optimization problem with four constraints, two of them
inequality constraints – not fun. This course will help you solve and understand these
kinds of problems. Note that this formulation is quite general in that you could easily
write the n-period problem by simply replacing the 2’s in (1) with n.

III. The OC (optimal control) way of solving the problem


We will solve dynamic optimization problems using two related methods. The first of
these is called optimal control. Optimal control makes use of Pontryagin's maximum
principle.

To see this approach, first note that for most specifications, economic intuition tells us
that x2>0 and x3=0. Hence, for t=1 (t+1=2), we can suppress inequality constraint in
(1). We’ll use the fact that x3=0 at the very end to solve the problem.

Write out the Lagrangian of (1):


2
L = ∑ ut (zt , xt ) + λt (xt − xt+1 − p' zt )] (1.2)
t=1
where we include xt in u(.) for completeness, though ∂u / ∂x = 0 .

More terminology
In optimal control theory, the variable λt is called the co-state variable and,
following the standard interpretation of Lagrange multipliers, at its optimal value λt
is equal to the marginal value of relaxing the constraint. In this case, that means it is
the marginal value of the state variable, xt. The co-state variable plays a critical role in
dynamic optimization.

The FOCs for (2) are standard:


∂L / ∂zti = ∂u / ∂z − λt pi = 0, i = a,b; t = 1,2
∂L / ∂x2 = ∂u / ∂x2 − λ1 + λ2 = 0
∂L / ∂λt = (xt − xt+1 − p' zt ) = 0, t = 1,2

We now use a little notation change that simplifies this problem and adds some
intuition (we'll see how the intuition arises in later lectures). That is, we define a
function known as the Hamiltonian where
H = u( z1 , x1) + λt (− p' zt ) .

Some things to note about the Hamiltonian:


• the tth Hamiltonian includes only zt and λt ,
  5  

• Unlike in a Lagrangian, only the RHS of state equation appears in the


parentheses.

In the left column of table below we present the first-order conditions of the
Lagrangian specification. Then on the right we present the derivative of the
Hamiltonian with respect to the same variables. By comparison, we then can see what
we would have to place on the right-hand side of the first derivative to obtain the
same optimum if using the Hamiltonian that we would reach if we used the
Lagrangian approach.

Hence, we see that for the solution using the Hamiltonian to yield the same maximum
the following conditions must hold
∂H
1. = 0 => The Hamiltonian should be maximized w.r.t. the control variable
∂zt
at every point in time.
∂H
2. = λt−1 − λt , for t > 1 => The co-state variable changes over time at a rate
∂xt
equal to minus the marginal value of the state variable to the Hamiltonian.
∂H
3. = xt+1 − xt => The  state  equation  must  always  be  satisfied.
∂ λt

When we combine these with a 4th condition, called the transversality condition
(how we transverse over to the world beyond t=1,2) we're able to solve the problem.
In this case the condition that x3 =0 (which for now we will assume to hold without
proof) serves that purpose. We'll discuss the transversality condition in more detail in
a few lectures.

These four conditions are the starting points for solving most optimal control
problems and sometimes the FOCs alone are sufficient to understand the economics
of a problem. However, if we want an explicit solution, then we would solve this
system of equations.

Although in this class most of the OC problems we’ll face are in continuous time, the
parallels should be obvious when we get there.

IV. The DP (Dynamic programming) way of solving the problem


The second way that we will solve dynamic optimization problems is using Dynamic
Programming. DP is about backward induction–thinking backwards about problems.
Let's see how this is applied in the context of the 2-period consumer's problem.
  6  

Imagine that the decision-maker is now in period 2, having already used up part of her
endowment in period 1, leaving x2 to be spent. In period 2, her problem is simply
V2 (x2 ) = max u2 (z2 ), s.t.
z2

p' z2 ≤ x2

If we solve this problem, we can easily obtain the function V(x2), which tells us the
maximum utility that can be obtained if she arrives in period 2 with x2 dollars
remaining. The function V(.) is equivalent to the indirect utility function with pa and
pb suppressed. The period 1 problem can then be written
max u(z1 ) + V2 (x2 ) s.t.
z1
(1.3)
x2 = x1 − p' z1

Note that we've implicitly assumed an interior solution so that the constraint requiring
that x3≥0 is assumed to hold with an equality and can be suppressed. Once we know
the functional form of V(.), (3) becomes a simple static optimization problem and its
solution is straightforward. Assume for a moment that the functional form of V(x2)
has been found. We can then write out Lagrangian of the first period problem,

L = u(z1 ) + V2 (x2 ) + λ1 (x1 − p' z1 − x2 ).

Again, we see that the economic meaning of the costate variable, l 1 is just as in the
OC setup, i.e., it is equal to the marginal value of a unit of x1.

Of course the problem is that we do not have an explicit functional form for V(.) and
as the problem becomes more complicated, obtaining a functional form becomes
more difficult, even impossible for many problems. Hence, the trick to solving DP
problems is to find the function V(.).

V. Summary
• OC problems are solved using the vehicle of the Hamiltonian, which must be
maximized at each point in time.
• DP is about backward induction.
• Both techniques are equivalent to standard Lagrangian techniques and the
interpretation of the shadow price, l, is the same.

VI. References
Deaton, Angus and John Muellbauer. 1980. Economics of Consumer Behavior. New
York: Cambridge University Press.
This document was generated at 3:53 PM, 07/07/11
Copyright 2011 Richard T. Woodward

2. Introduction to Optimal Control


002 - Math. Econ. Summer 2012

I. Why we're not studying Calculus of Variations.


1) OC is better & much more widely used.
2) Parallels to DP are clearer in OC.
3) COV is tough but you can study it in Kamien and Schwartz (1991) Part I

II. Optimal Control problems always contain


zt ⇒the (set of ) choice variable(s),
xt ⇒ the (set of ) state variable(s),
xɺ = f(t ,x ,z ) ⇒ the state equation(s),
T
V = ∫ F ( t , x, z )dt ⇒ an objective function in which F(⋅) is the benefit function introduced
0
previously.
x0 ⇒ an initial condition for the state variable,
and sometimes explicit intratemporal constraints, e.g. g(t,x,z)≤0

As we saw in the two-period discrete-time model in lecture 1, OC problems can be solved


more easily using the vehicle of the Hamiltonian. In the next lecture we’ll see more formally
why this holds and then explore the economic intuition behind the Hamiltonian. For now,
take my word for it.

Generally, the Hamiltonian takes the form: H = F(t,x,z)+λt▪f(t, x,z).

The maximum principle, due to Pontryagin, states that the following conditions, if satisfied,
guarantee a solution to the problem (you should commit these conditions to memory)
1. max H ( t , x, z , λ ) for all t ∈ [ 0, T ]
z

∂H ∂λ
2. = −λɺ =
∂x ∂t
∂H
3. = xɺ
∂λ
4. Transversality condition (such as, λ (T ) = 0)
Points to note:
• the maximization condition, 1, is not equivalent to ∂H/∂z =0, since corner solutions are
admissible and non-differential problems can be considered.
• the maximum criteria include 2 sets of differential equations (2&3), so there's one set of
differential equations that was not present in the original problem.
• ∂H/∂λ = the state equation by the definition of H.
• There are no second-order partial differential equations

In general the transversality condition is a condition that specifies what happens as we


transverse to time outside the planning horizon. Above we state λ(T)=0 as the condition for a
2- 2

problem in which there is no binding constraint on the terminal value of the state variable(s).
This condition makes intuitive sense: since λt is the marginal value of the state variable at
time t, if you have complete flexibility in choosing xT, you would want to choose that level so
that its marginal value is zero, i.e., λT=0. We will spend more time discussing the meaning
and derivation of transversality conditions in the next lecture.

III. The Solution of an optimal problem (An example from Chiang (1991) with slight
notation changes).

∫ − (1 + z )
T 2 12
max t dt
zt 0

s.t. xɺt = zt
and x0 = A, xT free

The Hamiltonian of this problem is


(
H = − 1+ z2 )12
+ λz

Note that we can use the standard interior solution for the maximization of the Hamiltonian
since the benefit function is concave and continuously differentiable. Hence, our
maximization equations are
1. ∂H ∂z = −1 2 (1 + z 2 )
−1 2
2z + λ = 0
(if you check the 2nd order conditions you can verify we've got a maximum)
2. ∂H ∂x = 0 = −λɺ
3. ∂H ∂λ = z = xɺ
4. λT=0, the transversality of this problem (because of the free value for xT).

Solving this problem is real easy.


1. 2 means that λ is constant.
2. Together with 4, this means that λ is constant at 0, i.e., λt=0 for all t.
3. To find zt* , solve 1 after dropping out λ and we see that the only way
−1 2 (1 + z 2 )
−1 2
2 z = 0 is if zt* = 0 .
4. Plug this into the state equation, 3, and we find that x remains constant at A.

Now that was easy, but not very interesting. Let's try something a little more challenging.

IV. A simple consumption problem


1
max ∫ ln [ zt 4 xt ] dt
zt 0

s.t. xɺt = 4 xt (1 − zt )
and x0 = 1, x1 = e 2

What would a phase diagram for x in x-z space look like?


2- 3

What is the transversality condition here?

The Hamiltonian for this problem is


H = ln [ zt 4 xt ] + λ  4 xt (1 − zt ) 

Maximum conditions:
∂H 1
1. = − λ 4 xt = 0 (check 2nd order condition)
∂z zt
∂H 1 
2. λɺ = − = −  + λ 4 (1 − zt ) 
∂x  xt 
∂H
3. xɺt = = 4 xt (1 − zt )
∂λ
4. x1 = e2

Simplifying the first equation yields


1
= zt .
λt 4 xt
At this point one can almost always get some economic intuition from the solution
For example, in this problem we find that current consumption is
inversely related to the product of the state and costate variables. Does this make intuitive
sense?

Substituting for zt in 2
1  1 
λɺt = − − λt 4 1 − 
xt  λt 4 xt 
1  1
λɺt = − −  λt 4 − 
xt  xt 
λɺt = −λt 4

Can you solve the differential equation to obtain λt as a function of t?

Now, substituting for zt in the state equation, we obtain


 1 
xɺt = 4 xt 1 − 
 λt 4 xt 
2- 4

So our three simplified equation are


1
5. = zt
λt 4 x t
6. λɺ = −λ 4
t t

1
7. xɺt = 4 xt −
λt
Is there an equilibrium where both λɺ and xɺ equal zero?

Notice that 6 involves one variable, 7 involves two variables and 5 involves three variables.
This suggests an order in which we might want to solve the problem – start with 6.

The differential equation in 6 can be solved directly to obtain


8. λt = λ0 e −4t
(where λ0 is the constant of integration, but clearly is also the value of λ when t=0).
[⇒ check λɺt = −4λ0e−4t = −4λt ]

This solution can then be substituted into 7 to get


e 4t
xɺt = 4 xt − ,
λ0
a linear FODE. Recall the way we solve linear FODE’s is as follows.
1
e −4t ( xɺt − 4 xt ) = −
λ0
1
e − 4t xɺt − 4e −4t xt = −
λ0

We can integrate both sides of this equation over t


LHS: ∫ e−4t xɺt − 4e−4t xt  dt = xt e −4t + A1
1 t
RHS: ∫ − dt = − + A2
λ0 λ0
so
t
xt e−4t + A1 = − + A2
λ0
or
t
e −4t xt = − +A
λ0
or
te 4t
9. xt = − + Ae 4t
λ0
where A is an unknown constant.
2- 5

We are close to the solution, but we aren’t finished until the values for all constants of
integration have been identified. To do this we use the initial and terminal conditions (a.k.a.
transversality condition).
Substituting in, x0=1, and t=0, yields
0 ⋅ e4•0
1= − + A ⋅ e4•0 = A
λ0
so A=1.

Now use the condition x1=e2


e4⋅1
2
e =− + e4⋅1
λ0
4
e
= e 4 − e2
λ0
e4
= λ0
e 4 − e2
λ0 ≈ 1.156
Now plug the values for A and λ0 into 8 and 9 to get the complete time line for λ and x:
λt = (1.156 ) e−4t and xt =e4t-.865te4t. These can then be substituted into 5 to get
1
zt =
4.624 − 4t
So this is the solution to the problem can be graphed as follows.
9

7
xt
6

2
zt
1
λt
0
0 0.2 0.4 0.6 0.8 1 1.2

Are these curves consistent with our intuition?


2- 6

V. An infinite horizon resource management problem


Consider the case of a fishery in which the stock of fish in a lake, xt, changes continuously
over time according to the following equation of motion:
xɺt = axt − b ( xt ) − zt
2

where a>0 and b>0 are parameters of the species’ biological growth function and zt is the rate
of harvest. Society's utility comes from fish consumption at the rate ln(zt), and the goal is to
maximize the discounted present value of its utility over an infinite horizon, discounting at the
rate r.

A formal statement of the planner’s problem, therefore is:



max ∫ e− rt ln ( zt ) s.t.
zt
0

xɺt = axt − bxt 2 − zt


xt ≥ 0
We solve this problem using a Hamiltonian:
(
H = e − rt ln ( zt ) + λt axt − b ( xt ) − zt
2
)
yielding the first-order conditions:
e − rt
1. = λt
zt
2. λt ( a − 2bxt ) = −λɺt
3. (
xɺt = axt − b ( xt ) − zt
2
)
4. lim λt = 0
t →∞

In this case, let’s jump directly to the phase diagram exploring the dynamics of the system.
The state equation gives tells us the dynamic relationship between xt and zt. We can use FOCs
1 and 2, to uncover the dynamic relationships of zt. Using 2 we see that
λɺ
− t = ( a − 2bxt )
λt
We can then use 1 to identify the 1:1 relationship between λɺt and zɺt :
e − rt
λt =
zt
ln ( λt ) = −rt − ln ( zt )
λɺt zɺ
= −r − t
λt zt
Hence we can write
zɺt
r+ = ( a − 2bxt ) ⇒ zɺt = ( a − r − 2bxt ) zt .
zt
2- 7

The two equations for our phase diagram, therefore, are


zɺt = ( a − r − 2bxt ) zt and
(
xɺt = axt − b ( xt ) − zt
2
)
zɺt ≥ 0 ⇒ ( a − r − 2bxt ) zt ≥ 0 xɺt ≥ 0 ⇒ axt − bxt 2 − zt ≥ 0
since, zt > 0 by the ln ( ⋅) function ⇒ axt − bxt 2 ≥ zt
⇒ a − r − 2bxt ≥ 0
a−r
⇒ ≥ xt
2b

zt I
zɺt = 0 II

IV III
xɺt = 0

a−r
xt
2b
It is clear from the diagram that we have a saddlepath equilibrium with paths in quadrants II
and IV, but all of the dynamics presented in the phase diagram are consistent with the first
order conditions 1 – 3. However, we can now use the constraint xt≥0 and the transversality
condition to show that only points that are actually on the saddlepaths are optimal by ruling
out all other points.

First, in quadrant I all paths lead to decreasing values of x and increasing values of z. Along
such paths xɺt = axt − b ( xt ) − zt is negative and growing in absolute value; eventually x would
2

have to become negative. But this violates the constraint on x; so such paths are not
admissible in the optimum.

In quadrant III, harvests are declining and the stock is increasing. Eventually this will lead to
a point where x reaches the biological steady state where natural growth is zero so harvests, zt
must also be zero. This will occur in finite time. But that means at such a point λt =∞, which
2- 8

violates the transversality condition. Hence as with quadrant I, no point in quadrant II is


consistent with the optimum.

Finally, we can also rule out any point in quadrants II or IV that are not on the saddle path
because if the path does not lead to the equilibrium it will cross over to quadrant I or III.
Hence, only points on the separatrices are optimal.

VI. References
Chiang, Alpha C. 1991. Elements of Dynamic Optimization. McGraw Hill

VII. Readings for next class


Chiang pp. 181-184 (book on reserve)
Léonard & van Long Chapter 7
This document was generated at 4:53 PM, 07/09/11
Copyright 2011 Richard T. Woodward

3. End points and transversality conditions


002 - Math. Econ. Summer 2012

In this lecture we consider a variety of alternative ending conditions for continuous-time


dynamic optimization problems. For example, it might be that the state variable, x, must
equal zero at the terminal time T, i.e., xT = 0, or it might be that it must be less than some
function of t, xT ≤ φ (T ) . We also consider problems where the ending time is flexible or
T→∞. In the process, we will provide a more formal development of Pontryagin’s maximum
principle.

I. Transversality conditions for a variety of ending points


(Based on Chiang pp. 181-184)

A. Vertical or Free-endpoint problems

xt

t
T
By vertical end point, we mean that T is fixed and xT can take on any value. This would be
appropriate if you are managing an asset or set of assets over a fixed horizon and it doesn't
matter what condition the assets are in when you reach T. This case we have considered
previously. When looked at from the perspective of the beginning of the planning horizon,
the value that t takes on at T is free and, moreover, it has no effect on what happens in the
future. So it is a fully free variable and we would maximize V over xT. Hence, it follows that
the shadow price of xT must equal zero, giving us our transversality condition, λT =0.

We will now confirm this intuition by deriving the transversality condition for this particular
problem and at the same time giving a more formal presentation of Pontryagin’s maximum
principle.
The objective function is
T
V ≡ ∫ F ( t , x, z ) dt
0
now, setting up an equation as a Lagrangian with the state-equation constraint, we have
L = ∫  F ( t , x, z ) + λt ( f ( t , x, z ) − xɺt )  dt .
T

0
We put the constraint inside the integral because it must hold at every point in time. Note that
the shadow price variable, λt, is actually not a single variable, but is instead defined at every
point in time in the interval 0 to T. Since the state equation must be satisfied at each point in
3 -2

time, at the optimum, it follows that λt ( f ( t , x, z ) − xɺt ) = 0 at each instant t, so that the value
of L must equal the value of V. Hence, we might write instead
V = ∫  F ( t , x, z ) + λt ( f ( t , x, z ) − xɺt )  dt
T

0
or
V = ∫ { F ( t , x, z ) + λt f ( t , x, z )} − λt xɺt  dt
T

.
T
V = ∫  H ( t , x, z , λ ) − λt xɺt dt
0

It will be useful to reformulate the last term, λt xɺt , by integrating by parts:

∫ udv = vu − ∫ vdu
with λ = u and x = v, so that dv = xɺ , we get
T T
− ∫ λt xɺt dt = − [ λt xt ]0 + ∫ λɺt xt dt
T
0 0
T
= ∫ λɺt xt dt + λ0 x0 − λT xT
0
so, we can rewrite V as
1.
T
[ ]
V = ∫ H (t , x, z , λ ) + λɺt xt dt + λ0 x0 − λT xT
0

Derivation of the maximum conditions (Based on Chiang chapter 7)


From 1, we can easily derive the first two conditions of the maximum principal.
Assuming an interior solution and twice-differentiability, a necessary condition
for an optimum is that the first derivatives of choice variables are equal to zero.
First consider our choice variable, zt. At each point in time it must be that
∂V ∂zt = 0 . This reduces to ∂H ∂z = 0 , which is the first of the conditions
stated without proof in lecture 3.
Next, for all t∈[0,T], xt is also a choice variable in 1, so it must also hold that
∂V ∂xt = 0 . This reduces to if − H x = λɺ , which is the second of the conditions
stated in lecture 3.
Finally, the FOC with respect to λt is more directly derived from the Lagrangian
above. ∂L ∂λ t = f ( t , x, z ) − xɺt , so this implies that
∂L ∂λt = 0 ⇒ xɺt = f ( t , x, z ) .

If the terminal condition is that xT can take on any value, then it must be that the marginal
value of a change in xT must equal to zero, i.e., ∂V/∂xT=0. Hence, the first-order condition
3- 3

with respect to xT is
∂V T  ∂t ∂x ∂z ∂λ ∂x ∂λɺ 
= ∫  Ht + H x t + H z t + H λ t + λɺt t + xt t  dt −λT = 0
∂xT 0
 ∂xT ∂xT ∂xT ∂xT ∂xT ∂xT 
Several terms in this derivative must equal zero. First, clearly it holds that ∂t ∂xT = 0 so
∂t
Ht = 0.
∂xT
Second, as stated above when we converted from L to V, λt will have no effect on V as long
as the constraint is satisfied, i.e., as long as the state equation is satisfied. Hence, the terms
∂V ∂V
that involve or can be ignored. Hence,
∂λt ∂λɺt
∂V T  ∂t ∂x ∂z ∂λ ∂x ∂λɺ 
= ∫  Ht + H x t + H z t + H λ t + λɺt t + xt t  dt −λT = 0
∂xT 0
 ∂xT ∂xT ∂xT ∂xT ∂xT ∂xT 
or
∂V T  ∂x ∂z ∂x 
= ∫  H x t + H z t + λɺt t  dt − λT = 0
∂xT 0
 ∂xT ∂xT ∂xT 
∂V T  ∂xt ∂z 
∂xT 0
(
= ∫  H x + λɺt )
∂xT
+ H z t  dt −λT = 0
∂xT 

As we derived above, the maximum principle requires that H x = −λɺt and Hz=0, so both of the
terms inside the integral equal zero at the optimum. Hence, we are left with
∂V
= − λT = 0 .
∂xT
The minus sign on the LHS is there because it reflects the marginal cost of leaving a marginal
unit of the stock at time T. In general, we can show that λt is the value of an additional unit of
the stock at time t. Setting this FOC equal to zero, we obtain the transversality condition,
λT=0.

This confirms our intuition that since we're attempting to maximize V over our planning
horizon, from the perspective of the beginning of that horizon xT is a variable to be chosen, it
must hold that λT, the marginal value of an additional unit of xT, must equal zero. Note that
this is the marginal value to V, i.e., to the sum of all benefits over time for 0 to T, not the value
to the benefit function, F(⋅). Although an additional unit may add value if it arrived at time T,
i.e., ∂F ( ⋅) ∂xT > 0 , the costs that are necessary for that marginal unit of x to arrive at T must
exactly balance the marginal benefit.
3- 4

B. Horizontal terminal line or fixed-endpoint problem


xt

xT

t
In this case there is no fixed endpoint, but the ending state variables must have a given level.
For example, you can keep an asset as long as you wish, but at the end of your use it must be
in a certain state. Again, we will use equation 1:

∫[ ]
T
V = H (t , x, z , λ ) + λɺ x dt + λ x − λ x .
t t 0 0 T T
0

Now, if we have the right terminal time, it must be the case that ∂V/∂T=0, for otherwise it
would certainly be the case that a change in T would increase V; if V/∂T>0 we would want to
increase the time horizon, and if V/∂T<0 it should be shortened. (Note that this is a necessary,
but not sufficient condition -- for the sufficient condition we'll have to wait until we introduce
an infinite horizon framework). Evaluating this derivative (remember Leibniz’s rule), we get -

∂V
( )
=  H (T , xT , zT , λT ) + λɺT xT  − λɺT xT + λT xɺT = 0
∂T 
The second and third terms cancel and, since we are restricted to have xT equal to a specific
value, it follows that xɺT = 0 . Hence, the condition reduces to H(T,xT,zT,λT)=0, i.e.,
H=F(T,xT,zT)+λT(f(T, xT,zT))=0

C. Fixed Terminal Point


In this case both xT and T are fixed. Such would be the case if you're managing the asset and,
at the end of a fixed amount of time you have to have the asset in a specified condition. A
simple case: you rent a car for 3 days and at the end of that time the gas tank has to have 5
gallons in it. There's nothing complicated about the transversality condition here, it is
satisfied by the constraints on T and xT , i.e. x3=5.
xt

xT

t
T
When added to the other optimum criteria, this transversality equation gives you enough
equations to solve the system and identify the optimal path.
3- 5

D. Terminal Curve
xt

xT = ϕ (T )

t
In this case the terminal condition is a function, xT = ϕ (T ) . Again, we use
T
1 V = ∫  H ( t , x, z , λ ) + λɺt xt  dt + λ0 x0 − λT xT .
0

Taking the derivative with respect to T and substituting in xɺT = φ ' (T )


∂V
= H (T , xT , zT , λT ) +λɺT xT − λɺT xT − λT φ ' (T ) = 0
∂T
which can be simplified to the transversality condition,
∂V
= H (T , xT , zT , λT ) − λT φ ' (T ) = 0
∂T

E. Truncated Vertical Terminal Line


xt

x
t
T
In this case the terminal time is fixed, but xT can only take on a set of values, e.g. xT≥x. This
would hold, for example, in a situation where you are using a stock of inputs that must be
used before you reach time T and xT≥0. You can use the input from 0 to T, but xt can never be
negative.

For such problems there are two possible transversality conditions. If xT>x, then the
transversality condition λT=0 applies. On the other hand, if the optimal path is to reach the
constraint on x, then the terminal condition would be xT=x. In general, the Kuhn-Tucker
specification is what we want. That is, our maximization objective is the same, but we now
have an inequality constraint, i.e., we're seeking to maximize

∫[ ]
T
V = H (t , x, z , λ ) + λɺ x dt + λ x − λ x s.t. xT≥x.
t t 0 0 T T
0
The Kuhn-Tucker conditions for the optimum then are:
λT≥0, xT≥x, and (xT−x)λT=0
where the last of these is the complementary slackness condition of the Kuhn-Tucker
conditions.
3- 6

As a practical matter, rather than burying the problem in calculus and algebra, I suggest that
you would typically take a guess, Is xT going to be greater than x? If you think it is, then
solve, the problem first using λT=0. If your solution leads to xT≥x, you're done. If not,
substitute in xT=x and solve again. This will usually work. When would this approach not
work?

F. Truncated Horizontal Terminal Line


xt

xT

t
T
In this case the time is flexible up to a point, e.g., T≤Tmax, but the state is fixed at a given
level, say xT is fixed. Again there are two possibilities, T=Tmax or T<Tmax. Using the
horizontal terminal line results from above, the transversality condition takes on a form
similar to the Kuhn-Tucker conditions above,
T≤Tmax, H(T,xT,zT,λT)≥0, and (T−Tmax)HT=0.

II. First, a word on salvage value


The problems above have assumed that all benefits and costs accrue during the planning
horizon. However, for finite horizon problems it is often the case that there are benefits or
costs that are functions of xT at T. For example, operating a car is certainly a dynamic
problem and there is typically some value (perhaps negative) to your vehicle when you're
finally finished with it. Similarly, farm production problems might be thought of as a
dynamic optimization problem in which there are costs during the growing season, followed
by a salvage value at harvest time.

Values that accrue to the planner outside of the planning horizon are referred to as salvage
values. The general optimization problem with salvage value becomes
T
max ∫ F ( t , x, z ) dt + S ( xT , T ) s.t.
z 0

xɺt = f ( t , x, z )
x0 = x0

Rewriting equation 1 with the salvage value, we obtain:


T
[ ]
1' V = ∫ H (t , x, z , λ ) + λɺt xt dt + λ0 x0 − λT xT + S (T , xT ) .
0

Following the same derivation as for the vertical end-point problem above, we can obtain
∂S (T , xT ) .
λT =
∂xT
3- 7

∂S (T , xT )
Intuitively, this makes sense: λT is the marginal value of the stock and is the
∂xT
marginal value of the stock outside the planning horizon. When these are equal, it means that
the marginal value of the stock over the planning horizon is equal to zero and all of the value
is captured by the salvage value.

Note that the addition of the salvage value does not affect the Hamiltonian, nor will it affect
the first 3 of the criteria that must be satisfied. What would be the transversality condition for
a horizontal end-point problem with a salvage value?

III. An important caveat


Most of the results above will not hold exactly if there are additional constraints on the
problem or if there is a salvage value. However, you should be able to derive similar
transversality conditions equation 1 and similar logic.

IV. Infinite horizon problems


It is frequently the case (I would argue, usually the case) that the true problem of interest has
an infinite horizon. The optimality conditions for an infinite horizon problem are identical to
those of a finite horizon problem with the exception of the transversality condition. Hence, in
solving the problem the most important change is how we deal with the need for the
transversality conditions. [Obviously, in infinite horizon problems the mnemonic of transversing to the
other side doesn't really work because there is no "other side" to which we might transverse.]

A. Fixed finite x
If we have a value of x to which we must arrive, i.e., x∞ ≡lim t→∞xt=k, then the problem is
identical to the horizontal terminal line case considered above.

B. Flexible xT
Recall from above that for the finite horizon problem we used equation 1:
T
V = ∫  H ( t , x, z , λ ) + λɺt xt  dt + λ0 x0 − λT xT .
0
In the infinite horizon case this equation is rewritten:

V = ∫  H ( t , x, z , λ ) + λɺt xt  dt + λ0 x0 − lim λt xt
0 t →∞

and, for problem in which x∞ is free, the condition analogous to the transversality condition in
the finite horizon case is lim λt = 0 . Note that if our objective is to maximize the present-
t →∞
value of benefits, this means that the present value of the marginal value of an additional unit
of x must go to zero as t goes to infinity. Hence, the current value (at time t) of an additional
unit of x must either be finite or grow at a rate slower than r so that the discount factor, e-rt,
pushes the present value to zero.

One way that we frequently present the results of infinite horizon problems is to evaluate the
equilibrium where λɺ = xɺ = 0 . Using these equations (and evaluating convergence and
stability via a phase diagram) we can then solve the problem. See the fishery problem in
Lecture 3.
3- 8

V. Summary
The central idea behind all transversality conditions is that if there is any flexibility at the end
of the time horizon, then the marginal benefit from taking advantage of that flexibility must
be zero at the optimum. You can apply this general principal to problems with more than one
variable, to problems with constraints and, as we have seen, to problems with a salvage value.

VI. Reading for next class


Dorfman, Robert. 1969. An Economic Interpretation of Optimal Control Theory. American
Economic Review 59(5):817-31.
This document was generated at 1:48 PM, 07/13/11
Copyright 2011 Richard T. Woodward

4. An economic understanding of optimal control


as explained by Dorfman (1969)
002 - Math. Econ. Summer 2012

The purpose of this lecture and the next is to help us understand the intuition behind the
optimal control framework. We draw first on Dorfman's seminal article in which he
explained OC to economists. .

(For this lecture, I will use Dorfman's notation so k is the state variable and x is the
choice variable)

A. The problem
Dorfman’s problem is to maximize
T
(1) W ( kt , x ) = ∫ u ( k , x,τ ) dτ
t
where x is the stream of all choices made between t and T.
the state equation is
∂k
kɺ = = f ( k , x, t )
∂t

B. Step 1. Divide time into two pieces


In order to help us understand this problem, Dorfman divides the time from t to T into
two pieces, from t to t+∆ and from t+∆ to T. If ∆ is small, then there is little loss of
accuracy if we linearize utility over the interval from t to t+∆, i.e., assume that u(k,x,t) is
constant over this interval. Technically, all the “=” signs below should be replaced by
“≈” signs, but we will assume the approximation error is trivial. Hence, we rewrite
T
W ( kt , x ) = u ( k , xt , t ) ⋅ ∆ + ∫ u ( k , x,τ ) dτ
t +∆
Let's look just at this second term. If we assume that we maximize over the second
interval from t+∆ to T, then we can eliminate the control variable, x , from the second
term to obtain
( )
T
V * ( kt +∆ , t + ∆ ) = max W ( kt +∆ , x , t + ∆ ) = ∫ u k * , x * , τ dτ ,
x t +∆

where k* and x* are the optimal paths of the state and control variables.

Following a policy of xt constant for the initial period from t to t+∆, and then optimizing
beyond that point can then be written
(2) V ( kt , xt , t ) = u ( kt , xt , t ) ∆ + V * ( kt +∆ , t + ∆ ) .
(note that the V on the LHS does not have a *, i.e., it is not necessarily at the optimum).

C. Step 2. Evaluate the FOC w.r.t. the control variable, xt


Problem (2) can be solved by applying standard tools of calculus. Dorfman takes the
FOC, directly with respect to the choice variable xt
4-2

∂ ∂ *
(3) ∆ u ( k , xt , t ) + V ( kt +∆ , t + ∆ ) = 0 .
∂xt ∂xt
We can then rewrite the second term
∂V * ∂V * ∂kt +∆
(4) =
∂xt ∂kt +∆ ∂xt
Since we assume that the interval ∆ is quite short, we can approximate the state equation
kt +∆ = kt + kɺ∆ = kt + f (k , xt , t )∆
so that
∂kt +∆ ∂f
(5) = 0+ ∆
∂xt ∂xt
Dorfman then substitutes (5) into (4), and also writes V'=λ, so that (3) can be rewritten
∂u ∂f
∆ + λt + ∆ ∆ = 0.
∂xt ∂xt

Note: we can get the same results if we start with a Lagrangian, i.e.,
( )
L = u ( kt xt , t ) ∆ + V * ( kt +∆ , t + ∆ ) − λt +∆ kt +∆ − ( kt + f ( k , xt , t ) ∆ )
and then the FOCs would be,
∂u ∂f
∆ + λt + ∆ ∆ = 0 , and
∂xt ∂xt
∂V ( ⋅)
= λt +∆ .
∂kt +∆
In the context of the Lagrangian we know that λ is the value of marginally
relaxing the constraint, i.e., the change in V that would be achieved by an extra
unit k. Hence, V' and λ are equivalent.

If we take the limit as ∆→0, λt + ∆ = λt . Then ∆ can then be canceled to obtain


∂u ∂f
(6) = −λt
∂xt ∂xt
This is the first of the optimality conditions of the maximum principle, (i.e., ∂H= 0 ).
∂z
Dorfman (822-23) provides a clear and succinct economic interpretation of this term:
[Equation (6)] says that the choice variable at every instant should be
selected so that the marginal immediate gains are in balance with the value
of the marginal contribution to the accumulation of capital.
Put another way, z should be increased as long as the marginal immediate benefit is
greater than the marginal future costs. In problems where z is discrete or constrained, it
may not be possible to actually achieve the equi-marginal condition, but the intuition
remains the same.
4-3

So now we've got a nice intuitive explanation for the first of the maximum conditions.
The central principle of dynamic optimization is that optimal choices are made when a
balance is struck between the immediate and future marginal consequences of our
choices.

D. Step 3. Look at the value of λt by taking ∂V*/∂kt


We now assume that the optimal choice of x has been made over our short interval, t to ∆.
( )
V * ( kt , t ) = u kt , xt* , t ∆ + V * ( kt +∆ , t + ∆ )
Differentiating this expression w.r.t. k and substituting λt for Vt', we get
∂u ∂
λt = ∆ + V * ( kt +∆ , t + ∆ )
∂k ∂k
∂u ∂V ( kt +∆ , t + ∆ ) ∂kt +∆
*

λt = ∆ +
∂k ∂kt +∆ ∂k
∂u ∂k
λt = ∆ + λt +∆ t +∆
∂k ∂k
Since this is over a short period, we can approximate
∂k ∂f
λt +∆ = λt + λɺ∆ and kt +∆ = kt + ∆kɺ, so that t +∆ = 1 + ∆
∂kt ∂k
Hence,
∂u ∂f 
( ) 
λt = ∆ + λt + λɺ∆ 1 + ∆ 
∂k  ∂k 
∂u ∂f ∂f
λt = ∆ +λt + λɺ∆ + λt ∆ + λɺ∆ 2
∂k ∂k ∂k
∂u ɺ ∂f ɺ 2 ∂f
0= ∆ + λ ∆ + λt ∆ + λ∆
∂k ∂k ∂k
or,
∂u ∂f ∂f
−λɺ = + λt + λɺ∆ .
∂k ∂k ∂k
Taking the limit at ∆→0, the last term falls out and we're left with
∂u ∂f
(7) − λɺ = +λ
∂k ∂k
which is the second maximum condition, −λɺ = ∂H .
∂k

What does Dorfman (p. 821) tell us about the economic intuition behind this equation?
To an economist, it λɺ  is the rate at which the capital is appreciating.
− λɺ is therefore the rate at which a unit of capital depreciates at time t. …
In other words, [1] a unit of capital loses value or depreciates as time
passes at the rate at which its potential contribution to profits becomes its
past contribution. … [or] [2] Each unit of the capital good is gradually
decreasing in value at precisely the same rate at which it is giving rise to
4-4

valuable outputs. [3] We can also interpret − λɺ as the loss that would be
incurred if the acquisition of a unit of capital were postponed for a short
time [which at the optimum must be equal to the instantaneous marginal
value of that unit of capital].
So we see that since the value of the capital stock at the beginning of the problem is equal
to the sum of the contributions of the capital stock across time. As we move across time,
therefore, the capital stock’s ability to contribute to V is “used up”.

E. Step 4. Summing up
Hence, each of the optimality conditions associated with the Hamiltonian has a clear
economic interpretation.
Let H = u (k , x, t ) + λt f (k , x, t )

FOC# Equation Interpretation


∂H
Choice =0 Finds the optimal balance between current and future welfare.
∂x
∂H The marginal value of the state variable is decreasing at the same
State = −λɺ
∂k rate at which it is generating benefits.
∂H ɺ
Costate =k The state equation must hold.
∂λ

II. A word about discounting


Discounting: Recall that if r is the annual rate of discount, then (1 + r ) is the
−T

discount factor applied to benefits or costs T years in the future. If we break


each year into n periods, then the periodic discount factor becomes r n so over
n periods (i.e., a year) the one-year discount factor becomes (1 + r n ) . As
−n

n→∞, this converges to e − r , the continuous-time discount factor.

Consider a modification of Dorfman's problem with the assumption that we will


maximize the present value of u(k,x,t)=e-rtw(k,x) over the interval 0 to T, i.e.,
T
W = ∫ e − rt w ( k , x ) dt
0
This is a restrictive specification of (1), so the optimality conditions must still hold. The
Hamiltonian now is
(8) H = e − rt w(k , x ) + λt f (k , x, t )
The interpretation of λt is the same: it is a measure of the contribution to W of an
additional unit of k in period t. However, because of discounting there is a tendency for
λt to fall over time. If Wt is the present value (back to year zero) of all the benefits from t
to T, then Wt will tend to be much smaller far in the future than it is for t close to zero.
Correspondingly, ∂Wt ∂kt = λt will also tend to fall over time.
4-5

Hence, the value of λt is influenced by two effects: the current (in period t) marginal
value of k, which could either be increasing or decreasing, and the discounting effect,
which is always falling. Hence, even if the marginal value of capital is increasing over
time (in current dollars), λ might be falling. Because of these two factors, it often
happens that the economic meaning of λt is not easily seen. An alternative way to specify
discounted optimal control problems that leads to more helpful solution is called the
current value Hamiltonian.

A. The Current Value Hamiltonian


We begin by defining an alternative shadow price variable, µt, which is equal to the value
of an additional unit of k to the benefit stream, valued in period t units, i.e.,
µt=ertλt
that is to get µt we have to inflate λt to get it into period t (current) values.

How could we solve for µt directly in Dorfman's model?


The current value Hamiltonian is obtained by inflating (8) to obtain
(9) H c = w(k , x ) + µ t f (k , x, t ) = H ⋅ e rt .
As a simple matter of algebra, we can derive the maximum conditions corresponding to
Hc and µ instead of H and λ.
The first condition, can be rewritten,
∂H ∂H c
= e− rt
∂x ∂x
∂H ∂H c
so, = 0 if and only if = 0.
∂x ∂x
Hence the analogous principle holds w.r.t. the control variable, i.e.,
∂H c
1') =0
∂x
or, more generally, maximize Hc with respect to x.

Now look at the FOC w.r.t. the state variable:


The standard formulation is
∂H
= −λɺ .
∂k
Looking at the LHS of this equation, we see that for the current value Hamiltonian, Hc,
∂H ∂H c
= e −rt
∂k ∂k
and, on the RHS, since λt=e-rtµt
( t t )
−λɺ = − −re− rt µ + e− rt µɺ = re− rt µ − e− rt µɺ
t t

Putting the LHS and RHS together, we get


∂H
= −λɺt
∂k
∂H c
e− rt = re− rt µt − e− rt µɺ t
∂k
4-6

cancelling e-rt gives the second optimum condition,


∂H c
2') = r µt − µɺ t .
∂k

Obviously the third condition, that the state equation must hold, remains unchanged. The
Transversality condition might change by a discount factor, but in many cases analogous
conditions hold. For example, if the TC is λT = 0, and λT =µTe-rT then it must also hold that
µT = 0. (Note that if T=∞, then for r>0, this would be satisfied if µt does not go to infinity
as t→∞).

Hence, we can use the current value Hamiltonian, but it is important to use the correct
optimality conditions.

In summary: We seek to maximize


T
W = ∫ e −rt w(k , x )dt subject to the state equation kɺ = f (k , x, t ) .
0
We can do this using the vehicle of the current value Hamiltonian,
H c = w(k , x ) + µ t f (k , x, t ) .
where the maximum criteria are:
∂H c
1') =0
∂x
∂H c
2') = rµ t − µɺ t
∂k t
∂H c ɺ
3') =k
∂µ

B. An economic interpretation of the current-value Hamiltonian


As in the standard case, the condition that Hc be maximized over time requires that we
strike a balance at every point in time; the only difference is that now we’re considering
this tradeoff at future points in time, rather than in present value terms.

The second condition is a bit trickier. Recall that 2' requires


∂H c ∂u µ ∂f r µ µ
= + = − ɺ
∂k ∂k ∂k
which we will rewrite
∂u ∂f
+µ + µɺ = r µ
∂k ∂k
The three terms of LHS of this equation reflect the benefits of holding a marginal unit of
the capital stock for an instant longer. The first term indicates the marginal immediate
benefit of the capital stock. The second term is the capital stock’s marginal value, in
terms of its contribution to future benefits. Finally, the third term indicates that the

1
Manseung Han, who took my class in 2002, greatly helped me in figuring out a clear presentation of this
part of the problem.
4-7

marginal value of the capital increases over time. The sum of these three tell us the
benefit of holding a marginal unit of capital for one more instant. The RHS of r µ , can
be thought of as the opportunity cost of holding capital. For example, suppose that our
capital good can be easily transformed into dollars and we discount at the rate r because it
is the market interest rate. Then rµ is the immediate opportunity cost of holding capital,
since we could sell it and earn interest at the rate r. Hence, at the optimum, we will hold
our state variable up to the point where its marginal value is equal to the marginal cost.

C. Summary
The current value formulation is very attractive for economic analysis because current
values are usually more interesting than discounted values. For example, in a simple
economy, the market price of a capital stock will equal the current-value co-state
variable. As economists we are usually more interested in such actual prices than we are
in their discounted present value. Hence, very often the current-value Hamiltonian is
more helpful than the present-value variety.

Also, as a practical matter, for analysis it is often the case that the differential equation
for µ will be autonomous (independent of t) while that for λ will not be. Hence, the
dynamics of a system involving µ can be interpreted using phase-diagram and steady-
state analysis, while this does not hold for λ.

One note of caution: we have stated and derived many of the basic results for the present-
value formulation (e.g., transversality conditions). When you are using the current-value
formulation, you need to be careful to ensure that everything is modified consistently.

III. Reference
Dorfman, Robert. 1969. An Economic Interpretation of Optimal Control Theory. American
Economic Review 59(5):817-31.
This document was generated at 8:24 AM, 07/13/11
Copyright 2011 Richard T. Woodward

5. Lessons in the optimal use of natural resource


from optimal control theory
002 - Math. Econ. Summer 2012

I. The model of Hotelling 1931


Hotelling's 1931 article, “The Economics of Exhaustible Resources” is a classic that
provides very important intuition that applies not only to natural resources, but any form
of depletable asset. Hotelling does not use the methodology of optimal control (since it
wasn't discovered yet), but this methodology is easily applicable to the problem.

A. The basic Hotelling model


Hotelling considers the problem of a depletable resource and how might it be optimally
used over time. What are the state and control variables of such a problem?

Let xt be the stock of the resource remaining at time t and let zt be the rate at which the
stock is being depleted. For simplicity, first assume that extraction costs are zero, and
that the market is perfectly competitive. In this case, the representative owner of the
resource will receive ptzt from the extraction of zt in period t and this will be pure profit
or, more accurately, quasi-rents.
Definitions (from http://www.bized.ac.uk/)
Economic rent: A surplus paid to any factor of production over its supply
price. Economic rent is the difference between what a factor of production is
earning (its return) and what it would need to be earning to keep it in its
present use. It is, in other words, the amount a factor is earning over and above
what it could be earning in its next best alternative use (its transfer earnings).
Quasi-rent: Short-term economic rent arising from a temporary inelasticity of
supply.

zt
CS u( xt , zt , t ) = ∫ p( z )dz
0
Pt D
PS=
Quasi Rent

zt
We consider the problem of a social planner who wants to maximize the present value of
consumer surplus plus rents (= producer surplus in this case). CS + PS at any instant in
zt
time is equal to the area under the inverse demand curve, i.e., u ( xt , zt , t ) = ∫ p ( z ) dz ,
0
where p(z) is the inverse demand curve for extractions of the resource.
5- 2

The problem is constrained by the fact that the original supply of the resource is finite,
x(t=0)=x0 and any extraction of the resource will reduce the available stock, xɺ = − z . We
know that in any period xt≥0 and simple intuition assures us that xT=0. Do you see why
xT =0?

A formal statement of the planner's problem, then, is as follows:


T T zt
max ∫ e− rt u ( xt , zt , t ) dt = max ∫ e − rt  ∫ p ( z ) dz  dt s.t.
zt 0 zt 0 
 0 
xɺt = − zt
x(t=0)=x0
xt≥0

The Hamiltonian of this problem is, therefore,


H=e-rtu(⋅) +λ(-zt)

and the maximization criteria are:


1. Hz=0: e-rtu'(⋅) -λt=0 ⇒ e-rtp(zt) -λt=0

2. Hx= − λɺ : − λɺ =0

3. Hλ= xɺ : xɺt = − zt

The transversality condition in this case is found by the terminal point condition,
4. xT=0

Looking at 1 and using the intuition developed by Dorfman, we see that the marginal
benefit of extraction in t, e-rtp(zt), must be equal to the marginal cost in terms of foregone
future net benefits, λt.

From 2 we see that λ is constant at, say, λ0 so we can drop the subscript. This is true in
any dynamic optimization problem in which neither the benefit function nor the state
equation depend on the state variable. This too is consistent with the intuition of
Dorfman – since the state variable does not give rise to benefits at t, its marginal value
does not change over time.

Substituting λt = λ into 1, we obtain


p(zt) =λert.
This is important. It shows that the optimal price will grow at the discount rate, and this is
true regardless of the demand function (as long as we have an interior solution). [Note
that in this example the marginal extraction cost is set at zero so that the price is equal to
the marginal quasi-rents earned by the producer. More generally, the marginal quasi-
rents would be equal to price minus marginal cost, and this would grow at the rate of
interest.]
5- 3

Another thing that is interesting in this model is that the value of λ does not change over
time. That means that the marginal increment to the objective function (the whole
integral) of a unit of the resource stock never changes. In other words, looking at the
entire time horizon, the planner would be completely indifferent between receiving a
marginal unit of the resource at time 0 and the instant before T, as long as it is known in
advance that at some point the unit will be arriving. However, note that this is the present
value co-state variable, λ. What would the path of the current-value costate variable look
like? How does the economic meaning of µ differ from that of λ?

If we want to proceed further, it is necessary to define a particular functional form for our
demand equation. Suppose that p(z)=e-γz so that the inverse demand curve looks like the
figure above.
Hence, from 1, Hz=0⇒ e − rt e −γ zt = λ , or e −γ zt = λ e rt so that,
−γ zt = ln λ + rt
or
ln λ + rt
5 zt = −
γ
t
At any point in time it will always hold that xt = x0 + ∫ xɺτ dτ . Hence, from our
τ =0
transversality condition, 4,
T
xT = 0 ⇒ x0 = − ∫ xɺτ dτ .
τ =0

T T ln (λ ) + rt 
From 3 and 5 this can be rewritten ∫
0
zt dt = x0 or ∫  −
0
 γ
dt = x0 .

Evaluating this integral leads to
T
1 r 2
 − ln ( λ ) t − t  = x0
γ 2 0
 r T
 − ln λ − T  = x0
 2 γ
.
γ r
− ln λ = x0 + T
T 2
Hence, we can solve for the unknown value of λ,
 γ r 
 − x0 − T 
λ=e  T
.
2 

In this case we can then solve explicitly for z by substituting into 5, yielding
5- 4

ln λ + rt
zt = −
γ
  − γ x0 − r T  
ln  e T 2   + rt
 
zt = −  
γ
γ r r
zt = x0 + T − t
γT γ2 γ
x0 r r
6. zt = + T− t
T 2γ γ
To verify that this is correct, check the integral of this, from 0 to T
T x r 2 r 2
∫0 zt dt = T0 T + 2γ T − 2γ T = x0 .
Looking at 6, we see that the rate of consumption at any point in time is determined by
x
two parts: a constant portion of the total stock, 0 , plus a portion that declines linearly
T
r T  T
over time  − t  . This second portion is greater than zero until t = , and is then
γ 2  2
less than zero for the remainder of the period.

Note that, that zT < 0 if


2γ x0
7. T> .
r
So that if this inequality is satisfied, along the optimal path defined by 6 xt will become
negative and then needs to be rebuilt so that it reaches zero at T. This violates the
constraint xt≥0. Hence, if 7 holds, we need to re-solve the problem with an explicit
constraint on the optimization problem. We will evaluate how to solve such constrained
problems later on.

B. Some variations on the theme and other results


Hotelling's analysis certainly doesn't end here.

Q: Consider again the question, “What would happen if we used the current-value instead
of the present-value Hamiltonian?”
A: Well, you can be sure that the current value co-state variable, µt, would not be
constant over time – how would the change in the shadow price of capital evolve?
What’s the economic interpretation of µ?

Q: What if there are costs to extraction c(zt) so that the planner's problem is to maximize
the area under the demand curve minus the area under the marginal cost curve?
5- 5

A: First recognize that if we define u~ (⋅) = ∫ p ( z ) − c' ( z )dz , where c' is the marginal cost
zt

0
function, then the general results will be exactly the same as in the original case after
substituting “marginal quasi rents” for “price”. That is, in this case the marginal surplus
will rise at the rate of interest. Obviously getting a nice clean closed-for solution z* will
not be as easy as it was in the first case, but the economic intuition does not change. This
economic principle is a central to a wide body of economic analysis.

Q: Would the social optimum be achieved in a competitive market?


A: First, assuming that both consumers and producers are interested in maximizing the
present value of their respective welfare, then we've maximized total surplus, i.e., it is a
Pareto Efficient outcome. So we can then ask, Do the assumptions of the 2nd Welfare
Theorem hold? If they do, then what does that tell us about the social optimum? If these
hold, then for a Pareto efficient there exists a price vector for which any Pareto efficient
allocation will be a competitive equilibrium. Finding the Pareto optimal allocation also
gives a competitive equilibrium. Hence, our findings are not only normative, but more
importantly, they’re positive; i.e. a prediction of what choices would actually occur in a
perfectly competitive economy.

Now, let's look at this question a little more intuitively. We know that one of the basic
results is that the price (or marginal quasi rents) grow at the rate of interest? Is this likely
to occur in a competitive economy as well? In the words of Hotelling, “it is a matter of
indifference to the owner of a mine whether he receives for a unit of his product a price
p0 now or a price p0eγt after time t” (p. 140). That is, price takers will look at the future
and decide to extract today, or a unit tomorrow at a higher price. The price must increase
by at least the rate of interest in this simple model because, if not, the market would face
a glut today. If the price rose faster than the rate of interest, then the owners would
choose to extract none today. Assuming that the inverse-demand curve is downward
sloping, supply and demand can be equal only if each individual is completely indifferent
as to when he or she extracts which also explains the constancy of λ.

This also gets at an important difference between profit and rents. We all know that in a
perfectly competitive economy with free entry, profits are pushed to zero -- so why do the
holders of the resource still make money in this case? Because there is not free entry.
The total resource endowment is fixed at x0. An owner of a portion of that stock is able
to make resource rents because he or she has access to a restricted profitable input.
Further, the owner is able to exploit the tradeoffs between current and future use to make
economic gains. This is what is meant by Hotelling rents.

II. Hartwick's model of national accounting and the general interpretation of the
Hamiltonian
Hartwick (1990) has a very nice presentation of the Hamiltonian's intuitive appeal as a
measure of welfare in a growth economy. The analogies to microeconomic problems will
be considered at the end of this section. Hartwick’s paper builds on Weitzman (1976)
and is a generalization of his more often cited 1977 paper.
5- 6

A. The general case


We'll first present the general case and then look at some of the Hartwick's particulars.
Consider the problem of optimal growth in an economy maximizing

∫ U (C ) e
−ρ t
dt
0
subject to a state equation for a malleable capital stock, x0, that can either be consumed or
saved for next period
xɺ0 = g 0 ( x, z ) − C
and n additional state equations for the n other assets in the economy (e.g., infrastructure,
human capital, environmental quality, etc.).
xɺi = g i ( x, z ) , i=1,…,n.

Please excuse the possibly confusing notation. Here the subscript is an index
of the good and the time subscript is suppressed.

z is a vector of control variables and C is the numeraire choice variable (think


consumption). The vector of state variables is denoted x.

The general current value Hamiltonian of this optimization problem is


n
H c = U ( C ) + µ0 ( g 0 ( x, z ) − C ) + ∑ µ j g j ( x, z ) .1
j =1

This is our first exposure to the problem of optimal control with multiple state and
control variables, but the maximization conditions are the simple analogues of the single
variable case:
∂H ∂H
= = 0 for all i [or in general, maximize H with respect C and all the zi’s]
∂C ∂zi
∂H
= ρµ j − µɺ j for all j
∂x j
∂H
= xɺ j for all j
∂µ j
∂H
Given the specification of utility, = U '− µ 0 = 0 ⇒ µ0=U'.
∂C
(remember, µ 0 is the costate variable on the numeraire good, not the costate variable at
t=0.)

Similar to the approach used by Dorfman, Hartwick uses a linear approximation of


current utility, U(C)≈U'⋅C, and, if we measure consumption in terms of dollars, U' is the
marginal utility of income. He then presents an approximation of the Hamiltonian in
terms of the marginal utility of consumption.

1
Again to write more concisely, H is the current value Hamiltonian, which we typically write Hc.
5- 7

n µ
H
= C + xɺ0 + ∑ j xɺ j
U' j =1 µ 0

If you look at the RHS of this equation, you will see that this is equivalent to net national
product in a closed economy without government. NNP is equal to the value of goods
and services (C) plus the net change in the value of the assets of the economy,
 n µ 
 xɺ0 + ∑ xɺ j  .
j

 j =1 µ 0 

The first lesson from this model, therefore, is a general one and, as we will discuss below,
it carries over quite nicely to microeconomic problems: maximizing the Hamiltonian is
equivalent to maximizing NNP, which seems like a pretty reasonable goal.

Using some simplistic economies, Hartwick helps us understand what the appropriate
µj
shadow prices on changes in an economy's assets should be, i.e., what are ?
µ0

B. The case of a non-renewable resource


The first case to consider is an economy in which there are two state variables.
• First there's the fungible capital stock, x0 which we will now call K.
• Second, there's a nonrenewable resource or mine, S which falls as the resource is
extracted, R, and grows when there are discoveries, D. Extractions, R are used in the
production function F(⋅) but cost f(R,S).
• Discovery costs rise over time as a function of cumulative discoveries so that the
marginal cost of finding more of the resource increases over time. The total cost of
discovery in a period is v(D), linearly approximated as vD ⋅ D with vD changing over
time.2
• Hartwick also includes labor, L, although since the economy is always assumed to be
at full employment and the growth rate of labor is exogenous, labor can be treated as
an intermediate variable and can, therefore, be largely ignored.

The three state equations are, therefore,


Capital stock: Kɺ = F ( K , L, R ) − C − f ( R, S ) − vD D
Resource Stock: Sɺ = − R + D
Discovery Cost: vɺD = g ( D )
and the resulting current value Hamiltonian is
H = U ( C ) + µ K  F ( K , L, R ) − C − f ( R, S ) − vD D  + µ S [ − R + D ] + µ D g ( D )
The FOCs w.r.t. the choice variables are:
HC=0: U ' = µ K
HR=0: µ K [ FR − f R ] − µ S = 0 .

2
This is a refinement of the specification in Hartwick (1990) as proposed Hamilton (1994).
5- 8

HD=0: − µ K vD + µ S + µ D g ′ = 0 .

A linear approximation of the current-value Hamiltonian can be written


H = U ' C + µ K Kɺ + µ S [ − R + D ] + µ D g ' D
Dividing by U'=µk, we get
H µ µ µ
= C + Kɺ − S R + S D + D g ′D
U' µK µK µK
µS
Using the HR and HD conditions, it follows that = [ FR − f R ] and
µK
µ K vD µS µ K vD µ K [ FR − f R ]
µD = − or µ D = −
g′ g′ g′ g′
Hence the linear approximation of the Hamiltonian can be rewritten
H µ [F − fR ] µ [ F − fR ] µ v µ [ F − fR ] 
= C + Kɺ − K R R+ K R D+ K D − K R  g ′D
U' µK µK µ
 K g ′ µ K g ′ 
H
= C + Kɺ − [ FR − f R ] R + vD D
U'
We know that in a competitive economy, the price paid for the resource would equal FR
(resources are paid their marginal value product). Hence, to arrive at NNP current
‘Hotelling Rents’ from extractions, namely [ FR − f R ] R , should be netted out of GNP,
and discoveries, priced at the marginal cost of discovery, should be added back in.3

Is this common practice in national accounting? No. The depreciation of natural


resource assets is ignored in the system of national accounts leading to a
misrepresentation of national welfare. One reason for this is the ability to actually
implement the necessary accounting practice. Hartwick elaborates, “The principal
problem of implementing the accounting rule above is in obtaining marginal extraction
costs for minerals extracted.”

C. An economy with annoying pollution


The final example that Hartwick presents is that of an economy in which there is a
disutility associated with pollution. The case Hartwick considers is where national
welfare is affected by changes in the pollution stock. That is, if the stock of pollution is
increasing, welfare goes down. If the stock of pollution is falling, welfare goes up. In this
case we would have U = U (C , Xɺ ) , where Xɺ is the change in the pollution stock, with
∂U
< 0.
∂Xɺ

Production is assumed to be affected by pollution, i.e., F(K,L,X) so, for example, more
pollution makes production more difficult. The pollution stock is assumed to increase

3
This result differs from that presented in Hamilton (1994). I have not attempted to determine where the
difference comes from.
5- 9

with the production at the rate γ, and decrease with choices made regarding the level of
cleanup, b, which costs f(b), i.e., Xɺ = −bX + γ F ( K , L, X ) and the evolution of the
numeraire capital stock follows Kɺ = F ( K , L, R ) − C − f ( b ) .

The current value Hamiltonian with this stock change incorporated in the utility function,
therefore is
H = U (C ,−bX + γF (K , L, X )) + µ K [F (K , L, X ) − C − f (b )] + µ X [− bX + γF (K , L, X )]

Again the FOC w.r.t. the control variables, C and b, yield


∂H
= 0 ⇒ UC = µK
∂C
∂H − U x fb µ X
= 0 ⇒ −U x X − µ K f b − µ X X = 0 ⇒ − =
∂b µK X µK
Using the linear approximation of H, therefore, yields
H µ
= C + Kɺ + X Xɺ
U' µK
U f 
= C + Kɺ −  X + b  Xɺ
 µK X 
Hence, if we want to correctly incorporate changes in the stock of pollution in the
calculation of welfare, the price that should be placed on these changes is a function not
only of the marginal damage of changes in the stock of pollution, but the marginal cost of
clean-up as well.

D. Implications beyond the realm of national income accounting


If you're not particularly interested in the national income accounts or environmental and
natural resource economics, the above discussion may seem academic. However, clearly,
the correct measurement of income is not an academic pursuit limited to the national
income accounts.

Hicks' (1939, Value and Capital) defined income as, to paraphrase, the maximum amount
that an individual can consume in a week without diminishing his or her ability to
consume next week. Clearly, just as for a national account, farmers and managers also
need to be aware of the distinction between investment, capital consumption, and true
income. Hartwick's Hamiltonian formulation of NNP, therefore, with its useful
presentation of the correct prices for use in the calculation of income, might readily be
applied to a host of microeconomic problems of concern to applied economists.

III. References
Hartwick, John M. 1977. Intergenerational Equity and the Investing of Rents from
Exhaustible Resources. American Economic Review 67(5):972-74.
This document was generated at 5:17 PM, 07/22/11
Copyright 2011 Richard T. Woodward

6. Optimal control with constraints and MRAP/Bang-Bang problems


002 - Math. Econ. Summer 2012

We now return to an optimal control approach to dynamic optimization. This means that
our problem will be characterized by continuous time and will be deterministic.
It is usually the case that we are not Free to Choose.1 The choice set faced by decision
makers is almost always constrained in some way and the nature of the constraint
frequently changes over time. For example, a binding budget constraint or production
function might determine the options that are available to the decision maker at any point
in time. In general, this implies that we will need to reformulate the simple Hamiltonian
problem to take account of the constraints. Fortunately, in many cases, economic
intuition will tell us that the constraint will not bind (except for example at t=T), in which
case our life is much simplified. We consider here cases where we're not so lucky, where
the constraints cannot be ruled out ex ante.
We will assume throughout that a feasible solution exists to the problem. Obviously, this
is something that needs to be confirmed before proceeding to waste a lot of time trying to
solve an infeasible problem.
In this lecture we cover constrained optimal control problems rather quickly looking at
the important conceptual issues. For technical details I refer you to Kamien & Schwartz,
which covers the technical details of solving constrained optimal control problems in
various chapters. We then go on to consider a class of problems where the constraints
play a particularly central role in the solution.

I. Optimal control with equality constraints


A. Theory
Consider a simple dynamic optimization problem
T
max ∫ e − rt u ( z , x, t ) dt s.t.
z 0

xɺ = g ( z , x, t )
h ( z , x, t ) = c
x ( 0 ) = x0
In this case we cannot use the Hamiltonian alone, because this would not take account of
the constraint, h(z,x,t)=c. Rather, we need to maximize the Hamiltonian subject to a
constraint ⇒ so we use a Lagrangian2 in which Hc is the objective function, i.e.,
L = H c + φ ( h ( z , x, t ) − c )
= u ( z , x, t ) + µ g ( z , x, t ) + φ ( c − h ( z , x, t ) ) .
Equivalently, you can think about embedding a Lagrangian, within a Hamiltonian, i.e.
1
This is an obtuse reference to the first popular book on economics I ever read, Free to Choose by Milton
and Rose Friedman.
2
This Lagrangian is given a variety of names in the literature. Some call it an augmented Hamiltonian,
some a Lagrangian, some just a Hamiltonian. As long as you know what you’re talking about, you can
pretty much call it whatever you like.
6- 2

H c = u ( z , x, t ) + φ ( c − h ( z , x, t ) ) + µ g ( z , x, t ) . We’ll use the first notation here.

Assuming that everything is continuously differentiable and that concavity assumptions


hold, the FOC's of this problem, then, are:
∂L
1. =0
∂z
∂L
2. = r µ − µɺ
∂x
and, of course, the constraints must be satisfied:
∂L
= xɺ
∂µ
∂L
= c − h ( z , x, t ) = 0
∂φ
Let's look at these in more detail. The FOC w.r.t. z is
∂L ∂u ∂g ∂h
1'. = +µ −φ =0
∂z ∂z ∂z ∂z
which can be rewritten
∂u ∂h ∂g
1''. −φ = −µ .
∂z ∂z ∂z
As Dorfman showed us, the FOC w.r.t. the control variable tells us that at the optimum
we balance off the marginal current benefit and marginal future costs. In this case the
RHS is the cost to future benefits of a marginal increase in z. The LHS, therefore, must
indicate the benefit to current utility from marginal increments to z. If ∂u/∂z>RHS, then
this implies that there is a cost to the constraint and φ ∂h is the cost to current utility
∂z
of the intratemporal constraint, h. If h(⋅) were marginally relaxed, then z could be
changed to push it closer to balancing off the value of z in the future.

λdg/dz
--µdg/dz

φdh/dz
du/dz

with without
z
constraint constraint
6- 3

In principle, the problem can then be solved based on these equations. It is important to
note that φ will be a function of time and will typically change over time. What is the
economic significance of φ?

B. Optimal control with multiple equality constraints


The extension to the case of multiple equality constraints, is easy; with n constraints the
Lagrangian will take the form
n
L = u ( z , x, t ) + λg ( z , x, t ) + ∑ φ i (ci − hi ( z , x, t )) .
i =1
Obviously, if n is greater than the cardinality of z, there may not be a feasible solution
unless some of the constraints do not bind or are redundant.

C. Example: The political business cycle model (Chiang’s (Elements of Dynamic


Optimization) presentation of Nordhaus 1975)
This model looks at macroeconomic policy. Two policy variables are available, U , the
rate of unemployment, and p, the rate of inflation. It is assumed that there is a trade-off
between these two so that support for the current administration can be defined by the
equation
v = v (U , p )
so that the relationship between the two policies can be described by the iso-vote curves
in the figure below.
p
More votes

U
Following standard Phillips-curve logic, there is an assumed trade-off between these two
objectives,
p = γ (U ) + απ
where π is the expected rate of inflation. Expectations evolve according to the
differential equation
πɺ = b ( p − π )
6- 4

We assume that the votes obtained at time T are a weighted sum of the support that is
obtained from 0 to T, with support nearer to the voting date being more important. Votes
T
obtained at T are equal to ∫ v (U , p ) ert dt .
0
The optimization problem then is
T
max ∫ v (U , p ) ert dt s.t.
U,p
0

p = γ (U ) + απ
πɺ = b ( p − π )
π ( 0 ) = π 0 , and π (T ) free.
Now clearly the first constraint could be used to substitute out for p and convert the
problem to a single control problem, but let’s consider the alternative, explicitly including
the constraint.
The Lagrangian for this optimal control problem would be
L = v (U , p ) ert + λ ( b ( p − π ) ) + φ γ (U ) + απ − p 
The optimum conditions would then be
∂L ∂v rt
= e + λb − φ = 0
∂p ∂p
∂L ∂v rt
= e + φγ ' = 0
∂U ∂U
∂L
= γ (U ) + απ − p = 0
∂φ
λɺ = λb − φα
πɺ = b ( p − π )
If we specify a functional form (see Chiang chapter 7) we can find that the optimal path
for policy, which shows that the political process creates a business cycle. In most
problems, however, it is easier to find the solution by using equality constraints to
eliminate variables before getting started.

II. Optimal control with inequality constraints

A. Theory
Suppose now that the problem we face is one in which we have inequality constraints,
hi(t, x, z)≤ci, with i=1,…, n
for n constraints and x and z are assumed to be vectors of the state and control variables
respectively. For each xj∈x, the state equation takes the form xɺ j = g j (t , x, z ) .
As with standard constrained optimization problems, the Kuhn-Tucker conditions will
yield a global maximum if any one of the Arrow-Hurwicz-Uzawa constraint
qualifications is met (see Chiang p. 278). The way this is typically satisfied in most
economic problems is for the gj to be concave or linear in the control variables.
6- 5

Assuming that the constraint qualification is met, we can then proceed to use the
Lagrangian specification using a Hamiltonian which takes the form
m
H = u (t , x, z ) + ∑ λtj g j (t , x, z )
j =1

which we then plug into the Lagrangian with the constraints,


m n
L = u (t , x, z ) + ∑ λtj g j (t , x, z ) + ∑ φti (ci − hi (t , x, z )) .
j =1 i =1

Note: For maximization problems I always write the constraint term of the
Lagrangian so that the argument inside the parentheses is greater than zero, or for
minimization problems you write it so that the argument is less than zero. If you
follow this rule, your Lagrange multiplier will always be positive.

The FOC's for this problem are:


∂L ∂u n ∂g j m ∂h
=0⇒ + ∑ λtj − ∑ φti i = 0 for all zk∈z
∂zk ∂zk j =1 ∂zk i =1 ∂zk
∂L
= −λɺ tj for all j
∂x j
∂L
= xɺ j
∂λ j
and, for the constraints
∂L
≥ 0 ⇒ hi ( xt , zt ) ≤ cti
∂φ i
with the complementary slackness conditions:
∂L
φ i ≥ 0 and φ i = 0 for all i .
∂φ i

As with all such problems, the appropriate transversality conditions must be used and, if
you choose to use a current-value Hamiltonian, the necessary adjustments must be made.
Note that in the current value specification, the interpretation of both the costate variable
and the shadow price on the intratemporal constraint would be altered.

B. Example: Hotelling’s optimal extraction problem


We return to Hotelling’s problem. The planner’s problem is to maximize
T zt
max ∫ e− rt  ∫ p ( z ) dz  dt s.t.
z 0  0 
xɺ = − z
x ( 0 ) = x0 , xt ≥ 0
6- 6

Economic intuition tells us that xT=0. Hence, xt≥0 for all t if zt≥0. Hence, we can
convert the problem to one of having a constraint on the control variable. The associated
Lagrangian would then be
L=e-rtu(⋅) +λ(-zt)+φ t⋅zt.

(note that we’ve converted a state constraint to a control constraint. We


cover constraints on the state variable below)

The associated maximization criteria are:


3. Lz=0: e-rtu'(⋅) -λt+φ t =0 ⇒ e-rtp(zt) −λt+φ t=0
4. Lx= − λɺ : − λɺ =0
5. Lλ= xɺ : xɺt = − zt
6. Lφ≥0: zt≥0
7. φ t≥0
8. φ tzt=0
The transversality condition is xT=0.

From 4 it still holds that λ is constant. However, 3 can be rewritten


p(zt) =(λ−φ t)ert.
Using the assumed functional form for inverse demand curve, p(z)=e-γz, we obtain
e-γz=(λ−φ t)ert. Taking logs we get −γ z = ln ( λ − φt ) + rt
or
ln ( λ − φt ) + rt
9. z = − .
γ
Now, using the complementary slackness conditions, we know that if z>0 then φ=0 and if
z=0, φ>0. The state path can, therefore, be broken into two parts, the first part from 0 to
T1 during which z>0 and the second part, from T1 to T, where z=0 and φ>0.

From 0 to T1
ln ( λ − 0 ) + rt ln ( λ ) + rt
z=− =−
γ γ
and from T1 to T,
ln ( λ − φt ) + rt
0=− ⇒ ln ( λ − φt ) = rt
γ
10. φt = λ − e − rt .
Now, we can speculate about the solution. It seems likely that at T1, φ will equal zero and
will then increase over time from that time onward. If not, then the paths of z and φ will
be discontinuous at T1. So let’s use this possibility and then later confirm that it holds.
If φT1 = 0 , then
11. λT1 = e− rT1 .
Furthermore, we must exhaust the resource by T1 so that
6- 7

T1 T1  ln (λ ) + rt 
∫0zt dt = x0 or ∫  −
0
 γ
dt = x0

Which we solved in lecture 6 to obtain
 γ r 
 − x0 − T1 
 T1 2 
λ =e .
Now, substituting in from 11, we obtain
 γ r 
 − x0 − T1 
− rT1  T1 2 
e =e
r γ
T1 = x0
2 T1
22γ
T1 = x0
r

T1 = x0
r
Hence, if our assumption that φ = 0 at T1 is valid, the optimal solution is for consumption
to decline from 0 to T1 and then stay constant at zero from that point onward.

Is the assumption correct? Without a formal proof, we can see using economic intuition
that it is. Suppose zT1 >0. A feasible option would be to reduce zT1 and consume for a
little longer. Since u(⋅) is concave (u''<0) it will hold that the marginal cost of a slight
reduction in z at T1 will be less than the marginal benefit of a slight increase in z a
moment later. Hence, it will never be optimal to consume a strictly positive amount of z
at T1 so the assumption that φ =0 at T1 is valid and our solution is the optimum

III. Constraints on the state space

A. Theory
Suppose now that we have constraints on the state variables which define a feasible
range. This is likely to be common in economic problems. You may, for example, have
limited storage space so that you cannot accumulate your inventory forever. Or if you
were dealing with a biological problem, you might be constrained to keep your stock of a
species above a lower bound where reproduction begins to fail, and an upper bound
where epidemics are common.

The approach to such problems is similar to that of the control problems.


Suppose we have an objective function
T
max ∫ u (t , x, z )dt s.t.
0

xɺ = g (t , x, z ), x(0) = x0 and
h(t , x ) ≥ 0.
The augmented Hamiltonian for this problem is
L = u ( t , x, z ) + λ g ( t , x, z ) + φ h ( t , x )
6- 8

and the necessary conditions for optimality include, the constraints plus
∂L
=0
∂z
∂L
λɺ = −
∂x
φ ≥ 0 and φh = 0
and the transversality condition.

Solving problems like this by hand can be quite difficult, even for very simple problems.
(See K&S p.232 if you want to convince yourself). (An alternative approach presented in
Chiang (p. 300) is often easier and we follow this approach below). For much applied
analysis, however, there may be no alternative to setting a computer to the problem to
find a numerical solution.

B. Example: Hotelling’s optimal extraction problem


Clearly, Hotelling’s problem can also be modeled as a restriction that xt≥0. In this case
our Lagrangian would take the form
L=e-rtu(⋅) +λ(-zt)+φ t⋅xt.
And the associated maximization criteria are:
12. Lz=0: e-rtu'(⋅) -λt =0 ⇒ e-rtp(zt) −λt=0
13. Lx= − λɺ : −λɺ = φt
14. Lλ= xɺ : xɺt = − zt
15. Lφ≥0: xt≥0
16. φ t≥0
17. φ txt=0

We won’t solve this problem in all its detail, but the solution method would follow a
similar path. We divide time into two portions, from 0 to T1 where φ=0 and λ is constant,
and from T1 to T, where xt=0 and λ falls with the increase in φ. To solve the problem we
note that φT1 =0 and then solve for T1.
One thing that is interesting in this specification is that the costate variable is no longer
constant over time. This makes sense: between 0 and T1 we’re indifferent about when we
get the extra unit of the resource. But after T1 it clearly makes a difference – the sooner
we get the additional unit the more valuable (in PV terms) it will be. When t>T1, we
know that zt=0 ⇒ p=1 and λt=e−rt. A marginal increase in the stock over this range
would allow the immediate sale of that stock at a price of 1 and the present value of this
marginal change in stock would, therefore, be e−rt.

IV. An example of constrained optimal control


A clear and useful example of applied constrained optimal control is the paper by
Chavas, Kliebenstein and Crenshaw (1985).
6-9

V. Bang-bang OC problems
There are some problems for which the optimal path does not involve a smooth approach
to the steady state or gradual changes over time. Two important classes of such problems
are known as "bang-bang" problems and most rapid approach problems. In such
problems the constraints play a central role in the solution.

A. Bang-bang example #1: A state variable constraint


Consider the following problem in which we seek to maximize discounted linear utility
obtained from a nonrenewable stock (sometimes referred to as a cake-eating problem):
T
max ∫ e − rt zt dt s.t.
z 0

xɺ = − z
x (t ) ≥ 0
x ( 0 ) = x0
What does intuition suggest about the solution to the problem? Will we want to consume
the resource stock x gradually? Why or why not? Let's check our intuition.

Following the framework from above, we set up the Lagrangian by adding the constraint
on the state variable to the Hamiltonian, i.e., L=H+φ(constraint). Using the current-value
specification, this give us
L = zt − µt zt + φt xt

The FOCs for the problem are:


∂L
= 0: 1 − µt = 0 (i )
∂z
∂L
= r µt − µɺ t : φt = r µt − µɺ t (ii)
∂x
Because of the constraint, the complementary slackness condition must also hold:
φt xt = 0 (iii ) .

The first of these implies that µt=1. Since this holds no matter the value of t, we know
that µɺ t = 0 for all t. Conditions i and ii together indicate that
µt=1 and φt=r.
The second of these is most interesting. It shows us that φt, the Lagrange multiplier, is
always positive. From the complementary slackness condition, it follows that xt must
equal 0 always. But wait! We know this isn't actually true at t=0. However, at t=0, xt is
not variable – it is parametric to our problem, so that point in time doesn’t count. But at
every instant except the immediate starting value, xt=0.

So how big is z at zero? The first thought is that it must equal x0 but this isn't quite right.
To see this, suppose that we found that the constraint started to bind, not immediately, but
after 10 seconds. To get the x to zero in 10 seconds, z per second would have to equal
x0/10. Now take the limit of this at the denominator goes to zero ⇒ z goes to infinity.
6 -10

Hence, what happens is that for one instant there is a spike of zt of infinite height and zero
length that pushes x exactly to zero. This type of solution is known as a bang-bang
problem because the state variable jumps discontinuously at a single point – BANG-
BANG! Since, in the real world it's pretty difficult to push anything to infinity, we would
typically interpret this solution as "consume it as fast as you can." This is formalized in
the framework of most-rapid-approach path problems below.

B. Bang-Bang Example #2 (based on Kamien and Schwartz p. 205) A control variable


constraint
Let xt be a productive asset that generates output at the rate rxt. This output can either be
sold or reinvested. The portion that is reinvested will be called zt so [1-zt] is the portion
that is sold. We assume that the interest can be consumed, but the principal cannot be
touched.3 Our question is, What portion of the interest should be invested and what
portion should be consumed over the interval [0,T]?

Formally, the problem is:


max ∫ [1 − zt ]rxt dt s.t.
T

z 0

xɺt = zt rxt
0 ≤ zt ≤ 1
x(0) = x0

This time we have two constraints: zt≤1 and zt≥0. Hence, our Lagrangian is
L = [1 − zt ] rxt + λ zt rxt + φ1t (1 − zt ) + φ2t zt
So that the necessary conditions are
∂L
= 0 ⇔ −rxt + λrxt − φ1 + φ 2 = 0
∂z
∂L
= −λɺt ⇔ −λɺ = [1 − zt ]r + λz t r
∂x
The transversality condition in this problem is λT=0 since xT is unconstrained with the
Kuhn-Tucker conditions,
KT1: φ1≥0 & φ1(1−zt)=0, and
KT2: φ2≥0 & φ2z=0.
From the KT1, we know that if φ1>0, then the first constraint binds and zt=1. Similarly,
from KT2, if φ2>0, then the second constraint binds and z=0. i.e.
φ1>0 ⇒ z = 1 φ2>0 ⇒ z = 0.
φ1=0 ⇐ z < 1 φ2=0 ⇐ z > 0.
Clearly, it is not possible for both φ1 and φ2 to be positive at the same time.

3
This problem is very similar to one looked at in Lecture 3. Comparing the two you’ll see one key
difference is that here utility is linear, while in lecture 3 utility was logarithmic.
6 -11

The first FOC can be rewritten


(λt − 1)rxt − φ1 + φ 2 = 0 .
We know that rxt will always be positive since consumption of the capital stock is not
allowed. Hence, we can see that three cases are possible:
1) if λ=1 ⇒φ1=0 φ2=0 ⇒ no constraint binds
2) if λ>1 ⇒φ1>0 φ2=0 ⇒ zt=1
3) if λ<1 ⇒φ1=0 φ2>0 ⇒ zt=0.

From the second FOC,


λɺ = − {[1 − zt ] r + λt zt r} .
Since everything in the brackets is positive, the RHS of the equation is negative ⇒ λ is
always falling.

By the transversality condition we know that eventually λ must hit λT=0. Hence,
eventually we'll reach case 3 where, λt<1 and zt=0 and we sell all of our output. But
when do we start selling – right away or after x has grown for a while? We know from
equation 2 that at λt=1 neither constraint binds.
• Suppose that at t=n λt=1.
• For t<n λt>1 and z=1.
• For t>n λt<1 and z=0.

An important question then is when is n? We can figure this out by working backwards
from λT=0. From the second FOC, we know that in the final period, (when λt<1) z=0, in
which case
λɺ = − r .
Solving this differential equation yields
λt = −rt + A.
Using the transversality condition,
λT = −rT + A = 0
A = rT
λt = −rt + rT = r (T − t )
Hence, λn=1 if
r (T − n ) = 1
n = ( rT − 1) r
Hence, we find that the optimal strategy is to invest everything from t=0 until
t = n = ( rT − 1) r . After t=n consume all of the interest. If (rT − 1) r < 0 then it would
be optimal to sell everything from the very outset.
6 -12

For (rT − 1) r > 0 , we can graph the solution:

n T
Z X λ
What would be the solution as T→∞? Does this make intuitive sense? What is it about
the specification of the problem that makes it inconsistent with our economic intuition?

VI. Most Rapid Approach Path problems


Bang-bang problems fit into a general class of problems that are commonly found in
economics: most-rapid-approach path problems (MRAP).4 Here, the optimal policy is to
get as quickly as possible to steady state where benefits are maximized. Consider the
first example bang-bang example above. Wouldn’t a solution in which we move toward
the equilibrium as fast as possible rather than impossibly fast be more intuitively
appealing?

A. MRAP example (Kamien & Schwartz p. 211)


A very simple firm generates output from its capital stock with the function f(xt) with the
property that lim f ' ( x ) = ∞ . The profit rate, therefore, is
x →0

π t = p ⋅ f ( xt ) − c ⋅ zt
where x is the firm's capital stock and z is investment, p and ct are exogenously evolving
unit price and unit cost respectively. The capital stock that starts with x(0)=x0,
depreciates at the rate b so that
xɺt = z t − bxt .
The firm's problem, therefore, is to maximize the present value of its profits,

∫ e − rt  p ⋅ f ( xt ) − c ⋅ zt  dt subject to
0

xɺt = zt − bxt ,
with three additional constraints:
i) x(t)≥0
ii) zt≥0
iii) p ⋅ f ( xt ) − c ⋅ zt ≥ 0
Let's use economic intuition to help us decide if we need to explicitly include all the
constraints in solving the problem?
4
Sometimes the term “bang-bang” is also used to describe MRAP problems.
6 -13

• The constraint on x almost certainly does not need to be imposed because as long as f'
gets big as x→0, the optimal solution will always avoid zero.
• The constraints on z, on the other hand might be relevant. But, we'll start by
assuming that neither constraint binds, and then see if we can figure out actual the
solution based on the assumed interior solution or, if not, we'll need to use the Kuhn-
Tucker specification. Note that if there does exist a steady state in x, then, as long as
b>0, z must be greater than zero. Hence, we anticipate that much might be learned
from the interior solution.
• Similarly, the profit constraint might also bind, but we would expect that in the long
run, profits would be positive. So again, we start by solving for an interior solution,
assuming π>0 where π = p ⋅ f ( xt ) − c ⋅ zt .

B. The interior solution


The current value Hamiltonian of the problem (assuming an interior solution w.r.t. z and
x with π>0) is
H c = p ⋅ f ( xt ) − c ⋅ zt + µt ( zt − bxt )
The necessary conditions for an interior are:
∂H c
= 0 ⇒ −c + µ t = 0
∂zt
∂H c ∂f ( xt )
= r µt − µɺ t ⇒ p − µt b = r µt − µɺ t
∂xt ∂xt
Over any range where the constraints on z do not bind, therefore, we have
c=µt
and, therefore, it must also hold that
µɺ t = cɺ = 0 .

Substituting c for µ and rearranging, the second FOC becomes


∂f ( xt )
18. pt = ( r + b ) c − cɺ
∂xt
over any interval where z>0.

We see, therefore, that the optimum conditions tell us about the optimal level of x, say x*.
We can then use the state equation to find the value of z that maintains this relation.

Since c and p are constant, this means that the capital stock will be held at a constant
pf ' ( x )
level and 18 reduces to = c . This is known as the modified golden rule.
r +b

Let's think about this condition for a moment.


• In a static economy, the optimal choice would be to choose x where the marginal
product of increasing x is equal to the marginal cost, i.e., where pf ' = c .
6 -14

• In an infinite-horizon economy, if we could increase x at all points in time this would


pf '
have a discounted present value of . However, since the capital stock depreciates
r
over time, this depreciation rate diminishes the present value of the gains that can be
obtained from an increase in x today, hence the present value of the benefit of a
pf '
marginal increase in xt is .
r +b

If p and c are not constant but grow in a deterministic way (e.g., constant and equal
inflation) then we could de-trend the values and find a real steady state. If p and c both
grow at a constant rate, say w, then there will be a unique and steady optimal value of x
for all z>0.

C. Corner solutions
All of the discussion above assumed that we are at an interior solution, where
0 < zt < p ⋅ f ( xt ) c . But, we ended up finding that the interior solution only holds when
the state variable x is at the point defined by equation 18. Hence, if we're not at x* at t=0,
then it must be that we're at a corner solution, either zt=0 or p ⋅ f ( xt ) − c ⋅ zt = 0 .

If x0>x* then it will follow that z will equal zero until xt depreciates to x*. If x0< x* then z
p
will be as large as possible f ( xt ) = zt until x* is reached.
c
Hence, economic intuition and a good understanding of the steady state can tell us where
we want to get and how we're going to get there – in the most rapid approach possible.

D. Some theory and generalities regarding MRAP problems


The general principles of MRAP problems are discussed by Wilen (1985, p. 64)
Spence and Starrett show that for any problem whose augmented
integrand (derived by substituting the dynamic constraint into the original
integrand) can be written as
Π A (K , Kɺ ) = M (K ) + N (K )Kɺ
the optimal solution reduces to one of simply reaching a steady state level
K=K* as quickly as possible.
Where K is the state variable and by "integrand" they mean the objective function, profits
in the case considered here.

How does this rule apply here? The integrand is pt f ( xt ) − ct z t . Using the state equation
bxt + xɺt = zt , the integrand can be written
pt f ( xt ) − ct (bxt + xɺt ) = pt f ( xt ) − ct bxt − ct xɺt .
Converting this to the notation used by Wilen,
6 -15

M ( K ) = pt f ( xt ) − ct bxt
and
N ( K ) Kɺ = ct xɺt .
Hence this problem fits into the general class of MRAP problems.

VII. References

Kamien, Morton I. and Schwartz, Nancy Lou. 1991. Dynamic Optimization : The
Calculus of Variations and Optimal Control in Economics and Management. New
York, N.Y. : Elsevier.

Wilen, James E. 1985. Bioeconomics of Renewable Resource Use, In A.V. Kneese and
J.L. Sweeney (eds.) Handbook of Natural Resource and Energy Economics, vol. I.
New York: Elsevier Science Publishers B.V.

Spence, Michael and David Starrett. 1975. Most Rapid Approach Paths in Accumulation
Problems. International Economic Review 16(2):388-403.

Chiang, Alpha C. 1991. Elements of Dynamic Optimization. McGraw Hill

Chavas, J.-P., J. Kliebenstein, and T.D. Crenshaw. 1985. "Modeling Dynamic


Agricultural Production Response: The Case of Swine Production." American
Journal of Agricultural Economics 67:636-46.

Hotelling, Harold. 1931. The Economics of Exhaustible Resources. The Journal of


Political Economy 39(2):137-175.

Kamien, Morton I. and Schwartz, Nancy Lou. 1991. Dynamic Optimization : The
Calculus of Variations and Optimal Control in Economics and Management. New
York, N.Y. : Elsevier.

You might also like