7.1 Discrete Dynamic Programming

When most read the word ‘programming’ they typically think of computer programming, creating a set of instructions that tell a computer how to perform a task. The term ‘mathematical programming’ refers to algorithms (methods) used by computers or manually to solve constrained optimization problems. The term refers to ways of solving constrained optimization models. In Chap. 4, the hill climbing method was introduced as an approach for solving discrete optimization problems. Hill climbing is one of many mathematical programming methods. Recall that this method only works if the functions to be maximized are continuous and concave, or convex if they are to be minimized. But what if those conditions are not satisfied? A mathematical programming method that is available for solving discrete optimization problems where the objective functions can be discontinuous, and of any shape, is called discrete dynamic programming.

Dynamic programming is an approach that transforms discrete multi-variable multi-stage optimization problems into networks of nodes and links and then solves for the best paths through such networks. Stages could be time periods or locations or activities. The nodes represent discrete states of the system that can exist at each stage either before or after a decision has been made. The links connecting those nodes in successive stages represent discrete decisions that are feasible, given the state of the system.

For example, recall the resource allocation problem introduced in previous chapters. The problem involved finding the allocations of resources to multiple users that maximized the total benefits derived from those allocations. Think of each user as being at a different location and an allocation decision process that proceeds in steps from one user to the next. The first step begins with deciding how many resources to allocate to the first user. Then, with the resources remaining, the second step involves making an allocation to the second user. Finally, with what resources remain, the third step is to make an allocation to the third user. Each step is called a stage of the dynamic programming process. The remaining available resources are a state of the system, represented by nodes. The links represent allocation decisions. A network representation of this process defines all possible discrete alternative allocations at each stage to each remaining user. The discrete dynamic programming procedure is a way of identifying the best path through this discrete network.

Converting an optimization problem into a discrete network of nodes and links representing different discrete states and decisions at each stage is the main challenge in using dynamic programming. Solving for the sequence of best decisions once a network is constructed is relatively easy, as will be shown for the following several example optimization problems.

7.1.1 Traveling Problem

Figure 7.1 could represent a map showing possible routes from the first state, node 1, to the end state, node 10. The problem is to find the best route from node 1 to 10. In this case, the states are just locations. The links are possible routes between two locations in each time step, or stage. The numbers on the links could represent travel time, or costs, or some relative measure of benefits. Suppose these link numbers represent costs and we wish to minimize the total cost of going from location 1 to location 10. Using a dynamic programming procedure, we can do this without having to consider all possible combinations of routes from node 1 to node 10.

Fig. 7.1
figure 1

A dynamic programming network showing nodes as locations, links as routes between two successive locations, and stages as the succession of decisions made over time or space

Referring to Fig. 7.1, we cannot immediately see how best to travel from node 1 to node 10. However, if we could determine the best (cheapest cost to node 10) link to take from each node in the network, then it would be easy to determine how to go from node 1 to node 10 the cheapest way. Dynamic programming provides an efficient way of doing that without the need to look at all possible alternative routes. To start the dynamic programming procedure, we can start where the decision is obvious, say at nodes 8 and 9, and then work backward, from right to left, toward node 1. At each node, we want to determine and record the cheapest way to go from that node to node 10. Call F(j) the cheapest cost to go from node j to 10. We also want to keep track of the best decision, or link, at each node j.

We begin at the last stage by determining how best to travel from node 8 to node 10, and from node 9 to node 10. There is only one choice at each of those nodes. The results of those decisions are shown in Fig. 7.2.

Fig. 7.2
figure 2

Results of dynamic programming for finding the best decision at each node at the beginning of the last stage, 4. The F(j) values are the total minimum costs of going from node j to node 10

Moving backward to the previous stage, stage 3, we can find the minimum total cost to go to node 10 from nodes 5, 6, and 7. F(5) = min{5 + F(8), 3 + F(9)} = min{5 + 6, 3 + 7} = 10. F(6) = min{7 + F(8), 8 + F(9)} = 13. F(7) = min{2 + F(8), 4 + F(9)} = 8. We can mark the decisions that are best in each case with an → as is shown in Fig. 7.3. Keep in mind that the F(j) values are the minimum costs to proceed from node j to node 10.

Fig. 7.3
figure 3

Results of dynamic programming for finding the best decision at all nodes at the beginning of the third stage

Note that we cannot compute the values of the minimum costs at each node at the beginning of stage 3 without first computing those values for each node at the end of stage 3 or equivalently at the beginning of stage 4. The same applies to each remaining stage, namely stages 2 and 1. In general, for each node or state (location) j at the beginning of a stage that is linked to node k at the end of the stage:

$$ \begin{aligned}{\text{F}}\left( {\text{j}} \right) \, = {\text{ minimum over all nodes k }}\left\{ {{\text{cost of link from j to k }} + {\text{ F}}\left( {\text{k}} \right)} \right\} \\ {\text{ for each node j in each stage}}. \\ \end{aligned} $$

In this case, at stage 3, the beginning nodes j are 5, 6, and 7, and the ending nodes k are 8 and 9.

Moving to stage 2, we can compute the minimum cost to go from nodes 2, 3, and 4 to 10, in a similar manner, again denoting the best decision by → . Note that these total remaining minimum cost values, F(j), computed for each node j at the beginning of each stage can be determined without the need to look beyond the stage we are in because we know the minimum costs of proceeding beyond that stage. At each ending node k.

$$ \begin{aligned}{\text{F}}\left( {\text{j}} \right) \, = {\text{ the minimum among all links from node j to nodes k of }} \\ \left\{ {{\text{cost of link from j to k }} + {\text{ F}}\left( {\text{k}} \right)} \right\} \\ \end{aligned} $$

Once we know the minimum costs of going from nodes 2, 3, and 4, namely F(2), F(3), and F(4), to node 10, we can move backward to the first stage and determine the total minimum cost to travel from node 1 to node 10, and the best decision to make at node 1 to achieve that minimum total cost, F(1). For this example, the minimum total cost is 15.

Now we can determine the optimal (minimum cost) path just by following the arrows beginning at node 1. This path is 1, 4, 5, 9, and 10 for a total cost of 3 + 2 + 3 + 7 = 15.

What has just been demonstrated is how discrete dynamic programming breaks down multiple variable optimization problems into many single variable optimization problems. Instead of finding the minimum total cost of traveling from node 1 to node 10, one could use the exact same procedure for a maximization problem where the maximum value at each node is recorded instead of the minimum value. Because the problem is discrete, there is no restriction on the shape of any cost or benefit or other objective functions. There could be restrictions or constraints limiting the possible decisions or links at any node, and hence only the feasible decisions should be included in any dynamic programming network. In other words, for this example, going from a beginning node j to an ending node k in any stage has to be feasible.

The sequence of steps shown in Figs. 7.2, 7.3, 7.4 and 7.5 is called a backward moving approach for solving a dynamic programming network model. We began where we wanted to end up and worked backward, from right to left over each state in each successive stage to an initial state where we are before solving the model, namely node 1 at the beginning of stage 1. Once we know the best decision to make at each node in the network, we can use that knowledge beginning at node 1 to work our way through the network following the arrows from node to node to finally reach node 10. When solving for the best decisions at each node in any stage, there is no need to consider any of the link costs in other stages.

Fig. 7.4
figure 4

Results of dynamic programming for finding the best decision at all nodes at the beginning of stage 2

Fig. 7.5
figure 5

Final stage of dynamic programming approach for finding best decision at node 1 to go to node 10 and the route to take

7.1.2 Resource Allocation

Consider the previously defined resource allocation problem in which 6 resources are to be allocated to three users, each resulting in net benefits. Let X be the allocation to the first user, user #1. The net benefits are 6XX2 for a maximum at X = 3. More than that reduces the net benefits. Let Y be the allocation to user #2. The net benefits are 7Y − 1.5Y2 for a maximum when Y = 7/3. Allocating Z to user #3 yields net benefits of 8Z − 0.5Z2 for a maximum when Z = 8. The sum of all the desired allocations is 13.33. If the available resources are less than 13.33, solving the following optimization model will identify the allocations that maximize the total net benefits.

$$ \begin{aligned} & {\text{Maximize}}\, _{ } {\text{B}}_{{1}} (X) \, + {\text{B}}_{{2}} (Y) +_{ } {\text{B}}_{{3}} (Z) \\ & \quad {\text{Subject to}}{:} \\ & \quad \quad {\text{B}}_{{1}} (X) \, = {6}X{-}X^{{2}} ; {\text{B}}_{{2}} (Y) = {7}Y{-}{ 1}.{5}Y^{{2}} ;_{ } {\text{B}}_{{3}} (Z) = {8}Z{-} \, 0.{5}Z^{{2}} \\ & \quad \quad X \, + \, Y \, + \, Z \le {6}. \\ \end{aligned} $$

Making discrete (e.g., integer) allocations allows us to draw a network of this allocation problem such as shown in Fig. 7.6.

Fig. 7.6
figure 6

Network representing the resource allocation problem with integer allocations. The numbers in the nodes are the resources available for subsequent allocations. Each link’s allocation is the difference between the two node state values. The numbers on the links are the benefits gained if that particular allocation decision is taken. Missing links are ones clearly not feasible or optimal

The nodes of Fig. 7.6 represent the amount of resources available for the remaining allocations, and the links represent the allocation to a particular user. The numbers on the links are the benefits resulting from that decision. The problem is to find the best path from the initial node representing 6 resources available to allocate to the three users to an ending node after making allocations to the three uses. Since the maximum of all allocations the users would like is 13.3, clearly the final state of the system after all allocations are made will be 0. There will not be any unallocated resources in an optimal solution.

Assuming a backward moving approach, designate Fi(S) as the maximum net benefits that can be obtained in remaining allocations given S resources available at the beginning of stage i. Starting at stage 3, we compute all the F3(S) values before moving to compute all the F2(S) values, and finally, we compute F1(S = 6).

$$ \begin{aligned} {\text{F}}_{{\text{i}}} \left( S \right) \, & = {\text{ maximum over all integer allocations}} \le S \\ & \quad \left\{ {{\text{allocation benefits }} + {\text{ F}}_{{{\text{i}} + {1}}} \left( {S - {\text{allocation}}} \right)} \right\}{\text{ for all values of}}\,S. \\ \end{aligned} $$

We also keep track of the best decision at each node (shown by an arrow). This backward moving approach is illustrated in Fig. 7.7.

Fig. 7.7
figure 7

The backward moving approach to solving the resource allocation problem. The numbers next to each node are the maximum remaining benefits, Fi(S), and the arrows signify the best allocation link given the available resources, the numbers in the nodes. The link benefits in stage 3 are the F3(S) values shown next to the nodes at the beginning of stage 3

The maximum total benefits that can be obtained from allocating 6 resources available is F1(6) = 34.5. Arrows in Fig. 7.7 show that allocation to user #1, X = 1 leaving 5 resources, so the allocation to user #2, Y = 1 leaving 4 resources, and hence the allocation to user #3, Z = 4.

Discrete dynamic programming models can often be solved using a forward rather than a backward moving approach. In this case, we begin at the initial node(s) in the network and for each node find f1(S) = Maximum net benefits that can be obtained from past allocation decisions given S resources available at end of stage i. All values of f1(S) are computed before moving to compute all of f2(S), and finally compute f3(S = 0), keeping track (e.g., using an arrow) of the best decision to get to where you are at the end of a stage. At each node, you are asked what is the best node to have come from to get to where you are. This approach is illustrated in Fig. 7.8.

Fig. 7.8
figure 8

Solving the resource allocation problem using the forward-moving approach of dynamic programming. The numbers at the bottom of each node represent the maximum benefits obtained from previous allocations given the resources remaining, the numbers in the nodes. The link (allocation) benefit values are not shown here but are as shown in Fig. 7.6 and used to compute the maximum benefits obtainable given the remaining resources to allocate to the remaining uses

To backtrack to find the optimal allocations, note that the best allocation to user #3 is 4. Therefore, the optimal state to be in at the beginning of that last stage is 4. This is the same state to be in at the end of stage 2. The arrow into that state shows that the best state to come from is state 5. And to get to state 5 at the end of stage 1 is to come from state 6. Hence, the best allocation to user #2 is 1, and to user #1 is 1, for the same total benefits of 34.5.

7.1.3 Capacity Expansion

Public works departments are often faced with determining when and how much infrastructure capacity to add to meet increasing demands over time. Why is this an issue? Why not just add the amount of capacity needed when it is needed? The answer is shown in Fig. 7.9.

Fig. 7.9
figure 9

Typical demand and cost functions for infrastructure capacity

The costs of adding additional capacity to meet the increasing demand over time are not defined by nice continuous convex functions. If they were, one could just add the capacity needed when it is needed and not be concerned with the uncertainty of future demands and costs. Typical infrastructure capacity cost functions have a fixed component and exhibit economies of scale, i.e., decreasing average and marginal costs with increasing capacity additions. A fixed cost exists if any capacity is to be added, otherwise, it is 0. The more times the capacity is increased the greater the sum of fixed costs. Fixed cost is a function of existing capacity among other factors. Hence, it makes economic sense to overbuild—to add more capacity than is needed so as to reduce the number of times capacity is to be added and to achieve lower average costs.

The dilemma of course is that we are not certain of both future demands and costs. We will return to that issue later. First, consider an example where it is assumed future demands are known and must be met. Meeting the demand is the condition most public works departments consider as a constraint. A general capacity expansion model that can be used to identify least-cost expansion schedules that meet future demands can be stated as

Minimize the present value of future expansion costs subject to meeting future demands.

Let A(t) be the capacity added to the existing capacity K(t) in period t at a cost of Ct(K(t), A(t)) that is to be paid at the beginning of period t. Let r(t) be the discount rate in period t, D(t) the capacity demanded by the end of period t, and n, the number of time periods being considered. A basic capacity expansion model (assuming no capacity decay over time) can be written as

$$ \begin{aligned} &{\text{Minimize}}\mathop \sum \limits_{t = 1}^{n} {\text{C}}_{{\text{t}}} \left( {K\left( {\text{t}} \right),A\left( {\text{t}} \right)} \right) \, [{1}/\left( {{1} + {\text{r}}\left( {\text{t}} \right)} \right)]^{{{\text{t}} - {1}}} \hfill \\ & \quad {\text{Subject to}}{:}\, \hfill \\ & \quad \quad K\left( 0 \right) \, = {\text{ existing capacity at beginning of period 1}}. \hfill \\ & \quad \quad K\left( {\text{t}} \right) \, + A\left( {\text{t}} \right) \, = K\left( {{\text{t}} + {1}} \right) \, \ge {\text{ D}}\left( {\text{t}} \right)\, {\text{t }} = { 1},{ 2}, \, \ldots ,{\text{ n}} \hfill \\ \end{aligned} $$

The data needed to solve a discrete example of this model are specified in Table 7.1.

Table 7.1 Data showing future demands and costs of a capacity expansion problem

The capacity expansion problem whose data are shown in Table 7.1 can be solved using discrete dynamic programming. It assumes 4 construction periods of 5 years each. It provides estimates of the present value of the costs of additional capacity needed at the end of each 5-year period for the next 20 years.

The discrete options in the first 5-year period are to add either 2, 4, 6, 8 or 10 units of capacity. In period 2, one can add any discrete even amount of capacity up to a total capacity of 10 units. Hence, if the beginning period capacity is 2, at least 4 and at most 8 units can be added. And so on to the last period that must have an initial capacity of at least 8, and if it is 8 two units can be added to reach 10 units total.

The dynamic programming network for this example problem is shown in Fig. 7.10.

Fig. 7.10
figure 10

A network representation of the capacity expansion problem is defined in Table 7.1. Links represent possible discrete feasible capacity expansion alternatives given the existing capacity at the beginning of each construction period. The numbers on the links are the present values of the costs of expansion

Solving this problem, using either a backward or forward-moving approach, will result in two different least-cost solutions, for a total present value of 26. The added capacities in successive construction periods are either 10, 0, 0, 0 or 6, 0, 4, 0. Which is better and why? They both cost 26, so the decision has to be based on other criteria.

How should we deal with the uncertainty of future demands and costs? How should we deal with the assumed time horizon of 4 periods, as clearly there is a future after that time in which additional capacity may be needed? In other words, how should we use a model like the one just presented?

Perhaps the answer to these questions will become clearer by asking another question. What should we do after implementing the first period’s decision, A(1)? Wait five years and then refer to this model’s output and implement the second decision, A(2)? Obviously not. Conditions may have changed and there are new estimates of future costs, demands, interest rates, and time horizons.

What is of interest when using a model such as this one is what to do now. How does the assumed time horizon and estimates of future demands and costs and interest rates impact this first decision, A(1)? If they do not, one can be more confident in the robustness of that first decision, at least with respect to the assumed objective, which in this case is minimizing the present value of the total cost.

7.2 Conclusions

Dynamic programming like all optimization methods has its advantages as well as limitations. It is well suited to address optimization problems which can be viewed as having to make a sequence of decisions and in which there are only a limited number of state variables and their discrete values, such as existing capacity, or resources available to allocate, in the examples just discussed. It is not dependent on the form of the objective function as are other methods previously discussed. While network representations of the dynamic programming optimization problems were used in this chapter to illustrate the two solution approaches, mathematical recursion equations can be created for finding the best decisions at each state (node) in each successive stage of a problem. These equations can be incorporated into a spreadsheet and would be used for solving larger problems than those presented in this chapter. These equations will be developed for solving more complex problems presented in later chapters (Fig. 7.11).

Fig. 7.11
figure 11

The shortest distance problem. User: Dcoetzee, Wikimedia Commons CC0 1.0 Universal Public Domain Dedication

Exercises

  1. 1.

    Consider the allocation problem of allocating resources to three users. The allocations are X, Y, and Z. User 1 total revenue is 6X − X2. User 2 total revenue is 7Y – 1.5Y2. User 3 total revenue is 8Z – 0.5Z2. The goal is to determine the values of X, Y, and Z that maximize {6X – X2 +7Y – 1.5Y2 +8Z – 0.5Z2} given 6 units of resources available.

    Show how to solve this allocation problem using discrete dynamic programming with integer allocations. Show how the dynamic programming network would be modified to be able to consider 8 integer resources as well as 6 resources to allocate to the three users having the same net benefit (total return) functions. What would the integer allocations and total returns be given 8 available resources? Show how this can be solved using the forward-moving and backward-moving approaches.

    To show that DP was used, show all F(S) values for each node representing a state S, and the best decision (arrow or heavy line) if more than one possible decision.

  2. 2.
    1. (a)

      Using dynamic programming (network) solve the following capacity expansion problem for the next 20 years (45-year construction periods) using forward and backward moving approaches.

      The following table provides estimates for the costs of additional water treatment plant capacity needed at the end of each 5-year period for the next 20 years. Find the capacity expansion schedule that minimizes the present values of the total future costs. If there is more than one least-cost solution, indicate which one you think is better, and why.

        

      Discounted costs of additional capacity

      Total additional capacity required at end of period

      Units of additional capacity

      Period

      Years

      2

      4

      6

      8

      10

      1

      1–5

      12

      15

      18

      23

      26

      2

      2

      6–10

      8

      11

      13

      15

       

      6

      3

      11–15

      6

      8

         

      8

      4

      15–20

      4

          

      10

      Note: The discrete options in the first 5-year period are to add 2, 4, 6, 8 or 10 units of capacity. In period 2, one can add any discrete even amount of capacity up to a total capacity of 10 units so if the beginning period capacity is 2 at least 4 and at most 8 units can be added. And so on to the last period which must have an initial capacity of at least 8, and if so only two units can be added to reach 10 units total.

    2. (b)

      The cost in each period t must be paid at the beginning of the period. What was the discount factor used to convert the costs at the beginning of each period t (say C(t)) to present value (or discounted) costs shown above? In other words, how would a cost at the beginning of period t be discounted to the beginning of period 1, given an annual interest rate of r? (Only the algebraic expression of the discount factor is asked, not the numerical value of r.)

    1. (c)

      How would you deal with the uncertainty of future demands and costs? In other words, how would you use a model like the one you developed?

  3. 3.

    Water Quality Management Model:

    Find the wastewater treatment efficiencies at sites 1 and 2 that meet stream quality standards at sites 2 and 3 at a total minimum cost. Currently, there is no treatment. All the wastewater is discharged into the stream.

    figure a

    Available Data:

    Streamflow = 1000 m3/day at all sites. 1 kg/day/1000 m3/day = 1 mg/l;

    Fraction of waste discharged into the stream at site 1 that reaches site 2: 0.25.

    Fraction of waste discharged at site 1 that reaches site 3: 0.15.

    Fraction of waste at and discharged into the stream at site 2 that reaches site 3: 0.60.

    Limits of treatment: removal of 30 % required, but no more than 90%, for both sites. The initial concentration just upstream of site 1 is 32 mg/l.

    The marginal cost of treatment at site 1 is 30 over the range of possible treatment fractions.

    The marginal cost of treatment at site 2 is 20 over the range of possible treatment fractions.

    Find the least-cost solution that meets the quality standards using dynamic programming.

  4. 4.

    Blueberries

    There are three farmer’s markets that sell organically and locally grown blueberries. The farmer who grows these blueberries gets 90 percent of the income from their sales; the markets get the other 10%. The demand for blueberries differs in each market. Some smart economist has determined that the demand (unit price) functions for blueberries at the three markets (m = 1, 2, 3) are 6/(1 + Q1), 7/(1 + 1.5Q2), and 8/(1 + 0.5Q3), respectively.

    figure b

    At each market m, the unit price varies each week depending on the amount of blueberries available, Qm, to be sold. How should the farmer distribute a crop ranging from 1 to 6 bushels of blueberries each week to maximize the total amount of income received from all three markets?

    Solve for the maximum revenue obtainable from a total of 6 bushels using discrete dynamic programming, assuming integer allocations. Use both backward and forward approaches. Show your work on a network, not just the solution.