13.1 Introduction

Fig. 13.1
figure 1

An example of a stochastic process involving uncertain outcomes over time. Public Domain. File:DJIA 2000s graph (log).svg, https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average#/media/File:DJIA_2000s_graph_(log).svg

A stochastic process refers to a system whose outputs are random over time. The sequence of newly infected people with a particular disease in a city, the sequences of coin tosses, the daily flows in the Danube River at Vienna, or the number of customers seeking driver license renewals at a local motor vehicle office each weekday are all examples of stochastic processes. While we cannot predict the outcome of any stochastic process precisely, we may be able to predict the probabilities of various outcomes of systems as influenced by any decisions made affecting their operation.

The examples presented in this chapter will be limited to simple first-order discrete stochastic processes. These are defined by conditional probabilities of being in some state St+1 in period t+1 given the state St in period t. We cannot predict what future states may be, but we assume we can predict the probabilities of being in various future states based on the current state. These predictions, expressed as conditional probabilities, Pr(St+1 | St), may be based on historical time series data whose statistical characteristics may apply in the future as well. What is also implied by using conditional probabilities is that the probability of some state value St+1 in period t+1 is dependent only on the actual value of the state St in the previous period t and not on previous state values. Hence, the use of the term ‘first-order’. The validity of such an assumption may largely depend on the duration of the time periods being modeled.

13.2 Changing Weather

Fig. 13.2
figure 2

Good weather days and bad weather days. They happen and are only temporary. Public domain. https://i.pinimg.com/originals/e3/0c/21/e30c2162f96bf54a059876d092906358.jpg

For example, consider two types of weather, good, G, and bad, B. Based on the following sequence of 20 days of observations, GGGBBBBGGGBBBGGGBBBB, a matrix of conditional probabilities can be created. The rows of this matrix represent the possible values of the weather in day t, St, and the columns represent the possible values of the weather in the next day t + 1, St+1 (Fig. 13.2). Out of 19 transitions from one state to another in this time series, 6 were from Good to Good and 3 were from Good to Bad, for a total of 9 transitions from Good. From the state of Bad, 2 became Good the next day, and 8 remained in a Bad state. Dividing each number of transitions from a Good state by the total number of transitions from Good, and the same for transitions from a Bad state defines the conditional probabilities that must sum to 1 on each row of the matrix. These conditional probabilities are also called transition probabilities—the probability of making a transition from one state in period t to another state in the next period, t + 1.

Fig. 13.3
figure 3

The matrix of conditional or transition probabilities above resulting from the recorded time series of good and bad days. It is called a first-order Markov chain whose rows sum to 1

Using these conditional probabilities, shown in Fig. 13.3, one can compute the probabilities of having a good or bad day in successive days t+1, t+2, t+3… given the current state of the weather in day t.

$$\begin{gathered}{\text{Pr}}\left( {{\text{G in t}} + {\text{1}}} \right){\text{ }} = {\text{Pr}}\left( {{\text{G in t}}} \right){\text{Pr}}\left( {{\text{G in t}} + {\text{1}}|{\text{G in t}}} \right){\text{ }} + {\text{ Pr}}\left( {{\text{B in t}}} \right){\text{Pr}}\left( {{\text{G in t}} + {\text{1}}|{\text{B in t}}} \right), \hfill \\ {\text{t }} = {1},{2},{3},{4},{ ....} \hfill \\ \end{gathered}$$
$$\begin{gathered}{\text{Pr}}\left( {{\text{B in t}} + {1}} \right) \, = {\text{ Pr}}\left( {\text{G in t}} \right){\text{Pr}}\left( {{\text{B in t}} + {1}|{\text{G in t}}} \right) + {\text{ Pr}}\left( {\text{B in t}} \right){\text{Pr}}\left( {{\text{B in t}}+{1}|{\text{B in t}}} \right), \hfill \\ {\text{t }} = {1},{2},{3},{4},{ ....} \hfill \\ \end{gathered}$$

Eventually, the predicted probabilities will not change significantly from one day to the next, as one would expect. The probability of the state of weather a month from now is not likely to be influenced by the weather today.

13.3 The Stock Market

For another example, consider successive states of the stock market. Assume the stock market can be in one of three states: 1 = bear market. 2 = strong bull market. 3 = weak bull market. Historically, a certain mutual fund gained –3%, 28%, and 10% annually when the market was in states 1, 2, and 3, respectively. The state transition matrix defining each P(Sy+1|Sy) is shown in Fig. 13.4.

Fig. 13.4
figure 4

Markov chain showing transition probabilities for three states of the stock market

Referring to these conditional or transition probabilities, we can determine what the probabilities of future states may be given the present state, as shown in Fig. 13.5. Assume the present state, S1, is 1.

Fig. 13.5
figure 5

Probabilities of the state of the stock market for three successive years

The process shown in Fig. 13.5 continues until it converges to 0.333, 0.200, and 0.467 for states 1, 2, and 3, respectively. These are termed steady-state values that do not change in subsequent periods. They are the unconditional probabilities of each state, and as one might guess, they are not influenced by the starting state in period 1. The state of this mutual fund 10 years from now will not likely depend on what it is now. These same steady-state values will result from any assumed state in year 1.

These steady-state values can be computed directly using the same equations used to compute successive probabilities as shown above but with unknown probabilities of each given state.

Thus, for this example, solving at least two of following three equations:

$${\text{Pr}}\left( {{\text{S}} = {1}} \right) \, = {\text{ Pr}}\left( {{\text{S}} = {1}} \right)\left( {0.{9}0} \right) \, + {\text{ Pr}}\left( {{\text{S}} = {2}} \right)\left( {0.0{5}} \right) \, + {\text{ Pr}}\left( {{\text{S}} = {3}} \right)\left( {0.0{5}} \right),$$
$${\text{Pr}}\left( {{\text{S}} = {2}} \right) \, = {\text{ Pr}}\left( {{\text{S}} = {1}} \right)\left( {0.0{2}} \right) \, + {\text{ Pr}}\left( {{\text{S}} = {2}} \right)\left( {0.{85}} \right) \, + {\text{ Pr}}\left( {{\text{S}} = {3}} \right)\left( {0.0{5}} \right),$$
$${\text{Pr}}\left( {{\text{S}} = {3}} \right) \, = {\text{ Pr}}\left( {{\text{S}} = {1}} \right)\left( {0.0{8}} \right) \, + {\text{ Pr}}\left( {{\text{S}} = {2}} \right)\left( {0.{1}0} \right) \, + {\text{ Pr}}\left( {{\text{S}} = {3}} \right)\left( {0.{9}0} \right),$$

together with the equation expressing the fact that Pr(S=1) + Pr(S=2) + Pr(S=3) = 1 will determine the steady-state values of each Pr(S), namely 0.333, 0.200, and 0.467 for S = 1, 2, and 3, respectively.

In general, for any Markov chain having rows i and columns j with transition probabilities TP(Sj|Si),

$${\text{Pr}}\left( {{\text{S}}_{{\text{j}}} } \right) \, = \, \sum_{{\text{i}}} {\text{Pr}}\left( {{\text{S}}_{{\text{i}}} } \right){\text{ TP}}\left( {{\text{S}}_{{\text{j}}} |{\text{S}}_{{\text{i}}} } \right) \,\, \forall {\text{j}}$$
$$\sum_{{\text{i}}} {\text{Pr}}\left( {{\text{S}}_{{\text{i}}} } \right) \, = { 1}.$$

Using the unconditional steady-state probabilities, Pr(Si), (such as found by solving the above equations) the expected annual yield is

$$- {3}\left( {0.{333}} \right) \, + { 28}\left( {0.{2}} \right) \, + { 1}0\left( {0.{467}} \right) \, = { 9}.{3}\% /{\text{year}}.$$

The expected yield, i10, over 10 years = (1.093)10 − 1 = 2.4333 − 1 or 143% Hence, investing $1 in this mutual fund, one can expect to have $2.43 in 10 years.

13.4 Human Health

The state of one’s health is also a stochastic process. Consider for this example four discrete states of health. Using data from the public health department, the following Markov chain shows the conditional probabilities of an average person being in any state of health given a previous state (Fig. 13.6).

Fig. 13.6
figure 6

Transition probabilities for states of health from one period to the next

We can use Excel, for example, to find the progression of state probabilities from some assumed initial state, solving successive equations:

$${\text{Pr}}\left( {{\text{S}}_{{\text{j}}} } \right)_{{{\text{t}} + {1}}} = \, \sum_{{\text{i}}} {\text{Pr}}\left( {{\text{S}}_{{\text{i}}} } \right)_{{\text{t}}} {\text{TP}}\left( {{\text{S}}_{{\text{j}}} |{\text{S}}_{{\text{i}}} } \right) \,\, \forall {\text{j}} {\text{ }}{\text{ }}{\text{t }} = { 2},{ 3},{ 4},{ ....}$$

Alternatively, we can find the steady-state probabilities of being in any state of health by solving

$${\text{Pr}}\left( {{\text{S}}_{{\text{j}}} } \right) \, = \, \sum_{{\text{i}}} {\text{Pr}}\left( {{\text{S}}_{{\text{i}}} } \right){\text{ TP}}\left( {{\text{S}}_{{\text{j}}} |{\text{S}}_{{\text{i}}} } \right)\,\, \forall {\text{j}}$$
$$\sum_{{\text{i}}} {\text{Pr}}\left( {{\text{S}}_{{\text{i}}} } \right) \, = { 1}$$

directly for the steady-state probabilities Pr(Sj) for each Sj.

These steady-state probabilities are shown in Table 13.1.

Table 13.1 Steady-state probabilities of various states of health

Next consider another state of health: death. Assume the Markov chain defining the transition probabilities for states of health is as shown in Fig. 13.7.

Fig. 13.7
figure 7

Transition probabilities for successive states of health

Solving the same set of equations as shown above defines the steady-state probabilities for these five states of health. They are as expected. They all are 0, except death. Its steady-state probability is 1. Such is life (or rather death). In the long run, we all are certain to die. Once dead we cannot transition to another state of health (as far as we know). Mathematicians call this a trapping state. Once in it, you cannot get out.

13.5 Reducing Crime

This is an example of building stochastic linear and dynamic programming optimization models incorporating transition probabilities.

A community center provides recreation facilities for people. The impact on the community is lower crime rates. Assume, again for simplicity, there are two states of crime rates—low (L) and high (H). Observed crime rates over time show that if the crime rate is low in any month, the probability of having a low rate the following month is 0.7. The probability of having a high crime rate month following a low crime rate month is 0.3. If the crime rate is high in a month, the probability of a high crime rate the following month is 0.6, and thus, the probability of a low crime rate is 0.4. These probabilities apply if the community center does not advertise its services and facilities. This is the do-nothing policy. (Policy n). These conditional probabilities are shown on the left of Fig. 13.8.

Fig. 13.8
figure 8

Transition probabilities associated low and high crime rates associated with two policies ‘n’ (do-nothing) and ‘a’ (advertise)

However, if the center advertises its recreation programs (Policy a), the conditional probabilities change to those shown on the right of Fig. 13.8.

There are costs involved in advertising as well as additional costs associated with high crime rates. These costs, denoted as C(j,k) associated with crime rate j and policy k, are listed in Table 13.2.

Table 13.2 Costs associated with the crime rate and policy

The objective is to find the policy associated with each state that minimizes the expected value of the monthly total cost. Letting the unknown joint probability of any combination of crime rates i followed by j, and policy k, be Pr(i,j,k), then the objective can be written as the sum over all values of i, j, and k, of the associated costs, C(j.k), times their joint probabilities, Pr(i,j,k):

$${\text{Minimize }}\sum_{{\text{i}}} \sum_{{\text{j}}} \sum_{{\text{k}}} {\text{C}}\left( {{\text{j}},{\text{k}}} \right){\text{ Pr}}\left( {{\text{i}},{\text{j}},{\text{k}}} \right).$$

To determine the steady-state values of each joint probability Pr(i,j,k), we can first define the marginal probabilities Pr(j,k) by summing the joint probabilities Pr(i,j,k) over all initial crime rates i.

$${\text{Pr}}\left( {{\text{j}},{\text{k}}} \right) \, = \, \sum_{{\text{i}}} {\text{Pr}}\left( {{\text{i}},{\text{j}},{\text{k}}} \right) \,\, \forall {\text{j}},{\text{k}}.$$

Each joint probability Pr(i,j,k) equals Pr(i,k) at time t times the known transition probability, TP(i,j,k), of state j at time t+1 given state i in period t and policy k.

$${\text{Pr}}\left( {{\text{i}},{\text{j}},{\text{k}}} \right) \, = {\text{ Pr}}\left( {{\text{i}},{\text{k}}} \right){\text{ TP}}\left( {{\text{i}},{\text{j}},{\text{k}}} \right) \,\, \forall {\text{i}},{\text{j}},{\text{k}}.$$

Combining these two equations

$${\text{Pr}}\left( {{\text{j}},{\text{k}}} \right) \, = \, \sum_{{\text{i}}} {\text{Pr}}\left( {{\text{i}},{\text{j}},{\text{k}}} \right){\text{ TP}}\left( {{\text{i}},{\text{j}},{\text{k}}} \right) \,\, \forall {\text{j}},{\text{k}}$$

and together with

$${\text{Pr}}\left( {\text{i}} \right) \, = \, \sum_{{\text{k}}} {\text{Pr}}\left( {{\text{i}},{\text{k}}} \right) \,\, \forall {\text{i}}$$

defining the steady-state probabilities of each crime state and \(\sum_{{\text{i}}} {\text{Pr}}\left( {\text{i}} \right) \, = { 1}\).

This defines a linear optimization model that when solved will give us the optimal policy k depending on the state of crime as well as the minimum monthly expected total cost.

For each state i, the policy k whose joint probability Pr(i,k) (either Pr(i,n) or Pr(i,a)) is non-zero will be the best policy. Its conditional probability, Pr(k|i), will equal 1. Otherwise, it will equal 0 unless it doesn’t matter what policy is chosen.

$${\text{Pr}}\left( {{\text{k}}|{\text{i}}} \right) \, = {\text{ Pr}}\left( {{\text{i}},{\text{k}}} \right)/{\text{Pr}}\left( {\text{i}} \right).$$

The solution of this model is

Objective value: Minimum monthly expected cost = 8.33.

Pr(L) = 0.667 = steady-state probability of low crime rate if optimal policy followed.

Pr(H) = 0.333 = steady-state probability of high crime rate if optimal policy followed.

Pr(L, n) = 0.667 implies that if in state L, do not advertise.

Pr(L, a) = 0.0 implies that if in state L, do not advertise.

Pr(H, n) = 0.0 implies that if in state H, advertise.

Pr(H, a) = 0.333 implies if in state H, advertise.

These values are derived from the values of the joint probabilities Pr(i,j,k) listed in Table 13.3.

Table 13.3 Optimal values of joint probabilities Pr(i,j,k)

An alternative linear programming model based on Fig. 13.8 is perhaps more straightforward. Let the probability Pr(State, policy), denoted here as PLn and PLa, be the indicator of the best policy given the state. Again, the one that is non-zero indicates the best policy. The probabilities of the states, Pr(L) and Pr(H), denoted as PL and PH in the model below, result if the optimal policy is followed.

$$\begin{aligned} {\text{Minimize }} & {\text{PLn}}^*\left( {{\text{TPLLn}}^*{\text{CL }} + {\text{ TPLHn}}^*{\text{CH}}} \right) \, + {\text{ PLa}}^*\left( {{\text{A }} + {\text{ TPLLa}}^*{\text{CL }} + {\text{ TPLHa}}^*{\text{CH}}} \right) \, + \\ &{\text{PHn}}^*\left( {{\text{TPHLn}}^*{\text{CL }} + {\text{ TPHHn}}^*{\text{CH}}} \right) \, + {\text{ PHa}}^*\left( {{\text{A }} + {\text{ TPHLa}}^*{\text{CL }} + {\text{ TPHHa}}^*{\text{CH}}} \right). \\ \end{aligned}$$
$${\text{PL }} = \, \left( {{\text{PLn}}^*{\text{TPLLn }} + {\text{ PHn}}^*{\text{TPHLn}}} \right) + \left( {{\text{PLa}}^*{\text{TPLLa }} + {\text{ PHa}}^*{\text{TPHLa}}} \right);$$
$${\text{PH }} = \, \left( {{\text{PLn}}^*{\text{TPLHn }} + {\text{ PHn}}^*{\text{TPHHn}}} \right) + \left( {{\text{PLa}}^*{\text{TPLHa }} + {\text{ PHa}}^*{\text{TPHHa}}} \right);$$
$${\text{PL }} + {\text{ PH }} = { 1};$$
$${\text{PL }} = {\text{ PLn }} + {\text{ PLa }};$$
$${\text{PH }} = {\text{ PHn }} + {\text{ PHa}}.$$

Transition probabilities:

TPLLn = 0.7; TPLLa = 0.8;

TPLHn = 0.3; TPLHa = 0.2;

TPHLn = 0.4; TPHLa = 0.6;

TPHHn = 0.6; TPHHa = 0.4.

Costs: CL = 0; CH = 20; advertising cost A = 5.

The solution to this model is

Objective value: 8.333333.

Variable

Value

Reduced cost

PLn

0.667

0.000

PLa

0.000

2.222

PHn

0.000

0.556

PHa

0.333

0.000

PL

0.667

0.000

PH

0.333

0.000

The two models containing probabilities as unknown variables presented above are solved using linear programming. From the values of these probabilities, we can identify the best policy given any state of the system. One can also use stochastic dynamic programming to find the best advertising policy directly given the current crime state. Each stage of the network is as shown in Fig. 13.9. The network clearly shows that no matter what policy is chosen, the ending states remain random.

Fig. 13.9
figure 9

Network representation of each stage of the stochastic dynamic programming model for crime reduction

Using dynamic programming, we need to compute the minimum expected cost of all remaining months at each node or state for each successive remaining month m. Let Fm(S) represent that value for any state S (L or H) and the remaining number of months m. Working backwards from right to left and beginning with F0S) = 0,

$$\begin{gathered}{\text{F}}_{{1}} \left( {\text{L}} \right) \, = {\text{ min }}\left\{ { \, \left[ {0.{\text{7F}}_{0} \left( {\text{L}} \right) \, + \, 0.{3}\left( {{\text{F}}_{0} \left( {\text{H}} \right) \, + { 2}0} \right)} \right]_{{\text{n}}} , [\left( {{5 } + \, 0.{\text{8F}}_{0} \left( {\text{L}} \right)} \right) \, + \, \left( {{5 } + \, 0.{2}\left( {{2}0 \, + {\text{ F}}_{0} \left( {\text{H}} \right)} \right)} \right]_{{\text{a}}} } \right\} \hfill \\ = {\text{ min }}\left( {{6},{ 9}} \right) \, = { 6}. \hfill \\ \end{gathered}$$

The best policy given state L with one month remaining is not to advertise.

$$\begin{gathered}{\text{F}}_{{1}} \left( {\text{H}} \right) \, = {\text{ min}} \left\{ { \, \left[ {0.{\text{4F}}_{0} \left( {\text{L}} \right) \, + \, 0.{6}\left( {{\text{F}}_{0} \left( {\text{H}} \right) \, + { 2}0} \right)} \right]_{{\text{n}}} , [ \, \left( {{5 } + \, 0.{\text{6F}}_{0} \left( {\text{L}} \right)} \right) \, + \, \left( {{5 } + \, 0.{4}\left( {{2}0 \, + {\text{ F}}_{0} \left( {\text{H}} \right)} \right)} \right]_{{\text{a}}} } \right\} \hfill \\ = {\text{ min }}\left( {{12},{ 13}} \right) \, = { 12}. \hfill \\ \end{gathered}$$

Again, the best policy given state H with one month remaining is not to advertise.

Continuing backward, the general recursion equations for each successive remaining month m are:

$${\text{F}}_{{{\text{m}} + {1}}} \left( {\text{L}} \right) \, = {\text{ min}}_{{}} \left\{ { \, \left[ {0.{\text{7F}}_{{\text{m}}} \left( {\text{L}} \right) \, + \, 0.{3}\left( {{\text{F}}_{{\text{m}}} \left( {\text{H}} \right) \, + { 2}0} \right)} \right]_{{\text{n}}} , [\left( {{5 } + \, 0.{\text{8F}}_{{\text{m}}} \left( {\text{L}} \right)} \right) \, + \left( {{5 } + \, 0.{2}\left( {{2}0 \, + {\text{ F}}_{{\text{m}}} \left( {\text{H}} \right)} \right)} \right]_{{\text{a}}} } \right\};$$
$${\text{F}}_{{{\text{m}} + {1}}} \left( {\text{H}} \right) \, = {\text{ min}}_{{}} \left\{ { \, \left[ {0.{\text{4F}}_{{\text{m}}} \left( {\text{L}} \right) \, + \, 0.{6}\left( {{\text{F}}_{{\text{m}}} \left( {\text{H}} \right) \, + { 2}0} \right)} \right]_{{\text{n}}} , [\left( {{5 } + \, 0.{\text{6F}}_{{\text{m}}} \left( {\text{L}} \right)} \right) \, + \left( {{5 } + \, 0.{4}\left( {{2}0 \, + {\text{ F}}_{{\text{m}}} \left( {\text{H}} \right)} \right)} \right]_{{\text{a}}} } \right\}.$$

The process can stop when the minimum cost policies k (decisions n or a) remain the same for the same state in two successive months or when the differences Fm+1(S) – Fm(S) equal the same constant for both values of S. This constant in this example will be the minimum monthly expected cost, 8.33.

The results from solving a succession of 10 recursive equations for each state are given in Table 13.4. Instead of using subscripts for the remaining months m, that value will be included in the function. For example, Fm(S) is shown as F(S,m) and F(S,m) = mink Fm(S,k).

Table 13.4 Selected model solutions showing minimum expected costs given rate of crime and months remaining

This expected monthly cost of 8.33 can be compared to the monthly expected cost if one decided not to advertise. The difference of the two expected cost values would identify the expected monthly benefits of adopting the optimal advertising policy (i.e., only advertise if in state H). The non-advertising expected monthly cost can be determined by solving the sequence of recursive equations:

$${\text{F}}_{{{\text{m}} + {1}}} \left( {\text{L}} \right) \, = \, 0.{\text{7F}}_{{\text{m}}} \left( {\text{L}} \right) \, + \, 0.{3}\left( {{\text{F}}_{{\text{m}}} \left( {\text{H}} \right) \, + { 2}0} \right) {\text{where F}}_{0} \left( {\text{L}} \right) \, = \, 0,$$
$${\text{F}}_{{{\text{m}} + {1}}} \left( {\text{H}} \right) \, = \, 0.{\text{4F}}_{{\text{m}}} \left( {\text{L}} \right) \, + \, 0.{6}\left( {{\text{F}}_{{\text{m}}} \left( {\text{H}} \right) \, + { 2}0} \right) {\text{where F}}_{0} \left( {\text{H}} \right) \, = \, 0,$$

until the difference Fm+1(S) - Fm(S) equals the same constant for each value of the crime state S.

Rounding to the nearest tenth,

$$\begin{gathered} {\text{F}}_{{1}} \left( {\text{L}} \right) \, = \, 0.{7}\left( 0 \right) \, + \, 0.{3}\left( {0 \, + { 2}0} \right) \, = { 6}. \hfill \\ {\text{F}}_{{1}} \left( {\text{H}} \right) \, = \, 0.{4 }\left( 0 \right) \, + \, 0.{6}\left( {0 \, + { 2}0} \right) \, = { 12}. \hfill \\ \end{gathered}$$
$$\begin{gathered} {\text{F}}_{{2}} \left( {\text{L}} \right) \, = \, 0.{7}\left( {6} \right) \, + \, 0.{3}\left( {{12 } + { 2}0} \right) \, = { 12}. \hfill \\ {\text{F}}_{{2}} \left( {\text{H}} \right) \, = \, 0.{4 }\left( {6} \right) \, + \, 0.{6}\left( {{12 } + { 2}0} \right) \, = { 21}.{6}. \hfill \\ \end{gathered}$$
$$\begin{gathered} {\text{F}}_{{3}} \left( {\text{L}} \right) \, = \, 0.{7}\left( {{12}} \right) \, + \, 0.{3}\left( {{21}.{6 } + { 2}0} \right) \, = { 2}0.{9}. \hfill \\ {\text{F}}_{{3}} \left( {\text{H}} \right) \, = \, 0.{4 }\left( {{12}} \right) \, + \, 0.{6}\left( {{21}.{6 } + { 2}0} \right) \, = { 29}.{8}. \hfill \\ \end{gathered}$$
$$\begin{gathered} {\text{F}}_{{4}} \left( {\text{L}} \right) \, = \, 0.{7}\left( {{2}0.{9}} \right) \, + \, 0.{3}\left( {{29}.{8 } + { 2}0} \right) \, = { 29}.{6}. \hfill \\ {\text{F}}_{{4}} \left( {\text{H}} \right) \, = \, 0.{4}\left( {{2}0.{9}} \right) \, + \, 0.{6}\left( {{29}.{8} + { 2}0} \right) \, = { 38}.{2}. \hfill \\ \end{gathered}$$
$$\begin{gathered} {\text{F}}_{{5}} \left( {\text{L}} \right) \, = \, 0.{7}\left( {{29}.{6}} \right) \, + \, 0.{3}\left( {{38}.{2} + { 2}0} \right) \, = { 38}.{2}. \hfill \\ {\text{F}}_{{5}} \left( {\text{H}} \right) \, = \, 0.{4}\left( {{29}.{6}} \right) \, + \, 0.{6}\left( {{38}.{2} + { 2}0} \right) \, = { 46}.{8}. \hfill \\ \end{gathered}$$

Note the difference F5(L) – F4(L) = 8.6 and the difference F5(H) – F4(H) = 8.6, and thus, the expected additional benefits from advertising are 8.6 – 8.3 = 0.3.

Finally, given any policy, optimal or not, one can compute the probabilities of being in any state. For this problem in which advertising is only implemented when in a high crime state, the transition probabilities from one state to another are shown in Fig. 13.10.

Fig. 13.10
figure 10

Transition probabilities if an optimal policy is followed

Solving for the steady-state probabilities of L and H

$$\begin{gathered} {\text{Pr}}\left( {\text{L}} \right) \, = {\text{ Pr}}\left( {\text{L}} \right)0.{7 } + {\text{ Pr}}\left( {\text{H}} \right)0.{6} \,\, {\text{or}}\,\, {\text{Pr}}\left( {\text{H}} \right) \, = {\text{ Pr}}\left( {\text{L}} \right)0.{3 } + {\text{ Pr}}\left( {\text{H}} \right)0.{4} \hfill \\ {\text{and}}\,\, {\text{Pr}}\left( {\text{L}} \right) \, + {\text{ Pr}}\left( {\text{H}} \right) \, = { 1} \hfill \\ \end{gathered}$$

results in

$$\begin{gathered} {\text{Pr}}\left( {\text{L}} \right) \, = \, 0.{667}\,{\text{and}} \hfill \\ {\text{Pr}}\left( {\text{H}} \right) \, = \, 0.{333}, \hfill \\ \end{gathered}$$

as previously determined using the linear model involving unknown joint probabilities.

This illustrates that one can obtain both operating policies (k given S) and state probabilities (Pr(S)) solving either linear or dynamic programming models of this or similar stochastic optimization problems. In one case, we find the optimal joint probabilities of states and policies and derive the operating policies from them. In the other case, we find the optimal policies and derive their joint probabilities. Neat! (Fig. 13.11).

Fig. 13.11
figure 11

The game of squash racquets, another example of a stochastic process

Exercises

  1. 1.

    Predicting weather.

    The mayor is considering having a $100-dollar a plate dinner to increase the funds available for the homeless. His problem is that he doesn’t know how many people might come. Experience suggests the attendance largely depends on whether it rains or not.

    The probability of a dry day depends on the past day’s condition. The local weather service has provided the following conditional probabilities of dry and wet days:

    figure a

    Invitations must be sent out at least two weeks in advance.

    1. (a)

      What is the probability of the selected day being a dry one?

    2. (b)

      Should the guests be encouraged to bring an umbrella? For this problem, make up convenience ‘benefits or costs’ for each possibility: For example, if it is dry and they do not bring an umbrella, or if it is wet and they do bring an umbrella, the benefit can be 10. If it rains and they do not have an umbrella, the benefit is -10. If it is dry and they have one, it is 5.

  2. 2.

    Gambling

    You are given an opportunity to begin with an investment of $1 in a succession of gambles where in each iteration there is a 90% chance of doubling your money and a 10% chance of losing all the money won plus your initial $1. You can quit playing at any time. What are your expected earnings and the probability of having them for successive iterations, and when, and why, would you stop playing?

  3. 3.

    Crime Reduction

    A community center provides recreation facilities for young people. Among the benefits to the community are lower crime rates. Assume there are two states of crime rates—low (L) and high (H). Observed crime rates over time show that if the crime rate is low in any month, the probability of having a low rate the following month is 0.5. The probability of having a high-rate month following a low-rate month is 0.5. If the crime rate is high in a month, the probability of a high rate the following month is 0.9, and thus, the probability of a low rate next month is 0.1. These probabilities apply if the community center does not advertise. This is the ‘do-nothing’ policy. (Policy n). These conditional probabilities are shown in Fig. 1. However, if the center advertises its recreation programs, (policy a) the conditional probabilities change to those shown in Fig. 2.

    The community center can change its policy at the beginning of each month. The high crime month costs 20 more than the low crime month, and advertising costs 10 per month.

    figure b

    Show how you would determine what policy to implement following each type of month (low or high crime rate) to minimize the total expected cost of crime and advertising expense.

    Hint: You can use dynamic programming along with the network below if you wish. Work backward. Stop when the minimum cost policies (decisions) remain the same in two successive months.

    figure c

    Solve for the steady-state policy that doesn’t change given the state (H or L) over time. You solve the problem represented by the network above, using dynamic programming or linear programming where the variables are the joint probabilities of states and decisions.

  4. 4.

    You are considering a 3-day trail maintenance project in a state park. The weather for the last 10 days has been the following:

    Good, Good, Good, Bad, Bad, Good, Good, Bad, Good, Good.

    1. (a)

      Compute the probability of having three consecutive days of good weather.

    2. (b)

      Compute the probability of having at least one bad weather day in those three days.