Risk minimising strategies for revenue management problems with target values

Abstract Consider a risk-averse decision maker in the setting of a single-leg dynamic revenue management problem with revenue controlled by limiting capacity for a fixed set of prices. Instead of focussing on maximising the expected revenue, the decision maker has the main objective of minimising the risk of failing to achieve a given target revenue. Interpreting the revenue management problem in the framework of finite Markov decision processes, we augment the state space of the risk-neutral problem definition and change the objective function to the probability of failing a certain specified target revenue. This enables us to obtain a dynamic programming solution that generates the policy minimising the risk of not attaining this target revenue. We compare this solution with recently proposed risk-sensitive policies in a numerical study and discuss advantages and limitations.


Introduction
Revenue management systems have become a standard tool in various industries beyond the original airline industry. These industries range from cruise lines, rental cars, media advertising, medical services to event management (see eg Talluri and van Ryzin, 2005).
We consider a typical revenue management model: a firm operating in a monopolistic setting offering multiple products. These products consume a fixed resource of a limited capacity. The firm sells the products over a finite time horizon. At the end of this time, the salvage value of the resource is assumed to be 0.
The firm can influence its revenue stream by allocating capacity to different classes of demand. Its objective is to find a policy which optimises an objective function. Normally, this objective function is risk-neutral, and the policy is chosen to maximise expected revenue. Such a risk-neutral objective can be motivated by the law of large numbers if the revenue process repeats itself very often, for example, a daily operating airline flight connection.
However, a risk-neutral policy might not be requested under all scenarios and a risk-averse policy might be advantageous for the decision maker. Lancaster (2003) remarks that a risk-neutral model is often not sufficient, even in the airline industry, as a stable revenue might be preferable because of financial constraints.
In practice, decision makers present some level of risk aversion in revenue management, as mentioned by Bitran and Caldentey (2003). Weatherford (2004) reports the same experience. He observed that airline analysts feel uncomfortable with recommendations of their (risk-neutral) revenue management systems, in particular while waiting for the high-fare passengers a few days before flight departure.
In recent papers by Barz and Waldmann (2007), Huang and Chang (2011) and Koenig and Meissner (2015), risk-neutral and risk-sensitive policies are analysed. The results show that an appropriate risk-averse policy can be selected if the decision maker knows the parameters representing his level of risk aversion. Such parameters have to be determined, which is something that is not straightforward in either of the published approaches, whether the underlying concept is an exponential utility function or a discount factor relaxing an optimality condition. Usually, the parameters have to be estimated by running numerical experiments and evaluating risk measures, such as mean-variance or conditional-value-at-risk, on the results.
Thus, we propose using the target-percentile risk measure, discussed by Boda and Filar (2006), as the object function. The target-percentile risk measure computes the probability of the return failing to achieve a previous given fixed target. There are several advantages of using this measure. First, one important structural property is its time consistency. It says that optimality of decisions should only consider the future. Time consistency is a desirable property for multi-period risk measures as it allows its use in dynamic programming, as shown for example by the work of Shapiro (2009). Second, it does not assume a special kind of revenue distribution, as it measures the percentile of the given target level. Third, numerical computation schemes are available as described by Wu and Lin (1999). Fourth, Boda and Filar (2006) show that multi-stage versions for the well-established risk measure value-at-risk can be developed using the target-percentile measure.
Fifth and important, it is easily interpreted by practitioners and does not require a risk sensitivity parameter which might be difficult to assess. Practitioners know the cash constraints of their businesses, which enable them financial liquidity and operational freedom. For their businesses, thus, they know the desired target level which they can use as input parameter in our model.
The structure of the paper is as follows. We review the relevant literature in Section 2. In Section 3, we describe our model as a Markov decision process and its extension to apply the target-percentile risk measure. This section also contains some implementation details. Section 4 shows numerical results of our approach and provides a comparison with results of other approaches. Finally, we conclude the paper in Section 5.

Related work
Most revenue management models use a risk-neutral objective function. We refer to the work of Talluri and van Ryzin (2005) for an overview of these kinds of models. In general, revenue management models are categorised often as capacity control model or dynamic pricing. However, Maglaras and Meissner (2006) discuss the similarities between both categories and give a common formulation in a risk-neutral setting.
The risk-neutral model of dynamic capacity control, which we consider here, was introduced by Lee and Hersh (1993). The corresponding Markov decision process is described by Lautenbacher and Stidham (1999).
The approaches for incorporating risk in revenue management models are analogous to the general decision making under risk: expected utility theory, mean-variance considerations, probabilistic constraints.
Expected utility theory as an element for reflecting risk in revenue management is recommended by Weatherford (2004). He states that the assumption of risk neutrality is not given for many practical scenarios and proposes expected utility theory as a risk-averse solution. Instead the well-adopted (risk-neutral) expected marginal seat revenue model, standard algorithms introduced by Beloba (1989), the expected marginal seat utility heuristic can reflect risk sensitivity for decision making. Weatherford and Beloba (2002) also show how forecasting errors affect the revenue.
Recent works of Barz and Waldmann (2007), Feng and Xiao (2008) and Xiong et al (2011) are employing expected utility theory, too. Both papers support the application of an exponential utility function to account for risk aversion. Barz and Waldmann (2007) use the Markov decision process formulation of static and dynamic capacity control models, whereas Feng and Xiao (2008) provide closed form solutions from a more general point of view, and Xiong et al (2011) consider overbooking in their model.
As the first revenue management model with risk considerations, the model of Feng and Xiao (1999) uses variance as its risk measure; in particular, the variance of sales because of price changes. In order to integrate risk into their objective function, they combine expected revenue with a weighted penalty function for the sales variance. The risk sensitivity of the decision maker can be adjusted by the weighting.
Recently, Huang and Chang (2011) presented a risk-sensitive modification of the optimality condition for the dynamic capacity control model and investigated their method by measuring mean versus standard deviation in simulation runs. They offer a ranking of their risk-sensitive policies using a Sharpe ratio of revenue and standard deviation.
Illustrating the vulnerability of risk-neutral revenue management because of demand forecast inaccuracy, Lancaster (2003) recommends a revenue-per-available-seat-mile-at-risk metric, which integrates risk measurement with the value at risk (V@R) metric. This metric is the expected maximum of underperformance over a time horizon at a chosen confidence level.
That cost of price changes should be considered from a risk perspective is demonstrated by Koenig and Meissner (2010) who compare the suitability of two different pricing strategies by the risk measures standard deviation and conditional value at risk (CV@R). In a further paper, Koenig and Meissner (2015) evaluate a range of risk-sensitive policies for the dynamic capacity control model. Gönsch and Hassler (2014) propose an heuristic for computing an CV@R-optimal policy in a recent paper. Their approach solves a knapsack problem for each state in their value function.
Risk sensitivity is incorporated by Levin et al (2008) into a dynamic pricing model of perishable products. Their objective function consists of maximum expected revenue constrained by a desired minimum level of revenue with minimum acceptable probability. This constraint is similar to a V@R formulation. The authors formulate a hybrid objective function which combines the risk-neutral objective of expected revenue and a penalty term representing risk aversion. Principally, they approach this risk-adjusted maximisation problem by using a further state which a risk-neutral dynamic pricing model does not require. This state keeps track of already gained revenue.
In a capacity control setting, we base our risk incorporation on a state space expansion, too, but our underlying model is derived from a Markov decision process formulation.

Description of model
In the following, we describe the dynamic capacity control problem as a Markov decision process in a similar way as previously done by Lautenbacher and Stidham (1999) and Barz and Waldmann (2007). This model is then expanded in the state space in order to become a model which allows the application of a risk-minimising policy. To achieve this, we follow the approach of Wu and Lin (1999). Our objective function is the target-percentile dynamic risk measure. Finally, we point out some aspects for implementation of this approach.

Markov decision process for dynamic capacity control model
We consider the capacity control model stated by Lee and Hersh (1993), which is often referred to as dynamic capacity control. Although originally developed for airline revenue management, it can be transferred to other industries. We describe the model in terms of its original airline revenue management context in order to be more intuitive.
We assume that the booking requests follow a Poisson arrival process. Thus, the booking period for a single-leg flight is separated into N decision periods in such a way that the probability of more than one request can be ignored. The decision periods are denoted by n ∈ {0, …, N}. The departure is at n = 0. If it supports understanding, we will use n as subscript else omit it. Further, there are k booking classes with fares The probability of a request for fare class i in decision period n is given by p n,i . Further, we set the probabilities for n = 0 as zero for all fare classes: p 0,i = 0; this step just supports our model setting as the last decision will be made at time n = 1. The probability of no request in period n is p n;0 ¼ 1 -P k i¼1 p n;i . The initial capacity of seats is given by C. The remaining seats are given by c ⩽ C in a time period.
We have a finite-state, discrete-time, Markov decision process Γ = (S, A, R, P) with state space S and action space A. Further, R denotes the reward set and P, the set of transition probabilities. Time runs in discrete steps and represents the remaining time before flight departure.
The state space S contains all possible configurations of remaining capacity c and request for a fare class i. Thus S = {0, 1, …, C} × {0, 1, …, k} and a state (c, i) ∈ S says that we have c seats left and a request for fare class i. We set the fare class 0 with fare F 0 = 0, as is often common practice.
Our action space A(c, i) corresponds to the 'reject' and 'accept' decisions for a given state. We have to only allow the accepting and rejecting of seats at the valid fare prices and not for the artificial class i = 0. Overbooking is not allowed.
Let R be the set of rewards (fares) when accepting one booking. Rewards are denoted by r n (s, a) ∈ R with s ∈ S, a ∈ A and r n ((c, i), a) = aF i for n, c > 0 and 0 otherwise. The transition probabilities p ∈ P are defined for states (c, i), (c − a, j) ∈ S with a ∈ A by p n ((c − a, j)|(c, i), a) = p n,j for n = N, N − 1, …, 0, and 0 otherwise.
A decision maker decides on a sequence of rules a n = d n (c n , i n ), which determine a policy π = {d N , d N − 1 , …, d 1 }. Thus, a policy determines if a booking request is accepted or rejected in state (c n , i n ). Now let ρ π N ðc; iÞ ¼ P N n¼0 r n denote the random variable of the gained revenue for a particular policy π beginning with capacity c and request i at N remaining time steps. The expected revenue is given by The maximal expected revenue and its associated risk-neutral policy can be computed by the Bellman equation for this problem. However, we are interested in a policy which minimises the time-consistent dynamic risk measure of not achieving a target revenue x in the accumulated return.

Markov decision process for minimising risk of failing target
We are interested in minimising the risk of not attaining a specified target revenue x N for the dynamic capacity control model. Thus, we want to find a policy π, which minimises the objective function representing the probability of not achieving a previous specified target level x N . In order to derive this objective function, we follow the approaches mentioned by White (1988), Wu and Lin (1999) and Boda and Filar (2006) and expand the Markov decision process Γ by a larger state space. The extended Markov decision processΓ is similar to Γ. It consists ofΓ ¼ ðS;Ã;R;PÞ ¼ ðS; A; R; PÞ, as described below.
The state space S is replaced by the new state spaceS ¼ The new state spaceS consists of states of the configurations of remaining capacity c and a request for fare class i, and additionally, a revenue target x. All state variables are updated over time, for example, the revenue target x decreases by the realised fare price in accordance with decrementing c by selling a seat.
We are interested in the probability that our obtained total revenue does not attain a target level x. Let the set of deterministic Markovian policies beΠ and let the random variable for the cumulative gained reward, applying policyπ 2 Π beginning with capacity c, request i, remaining time steps N, and target x, be ρπ N ððc; iÞ; xÞ ¼ P N n¼0r n . For the policyπ, the target-percentile risk measure is defined as where P denotes a probability. The time consistency property of the target-percentile risk measure can be shown as demonstrated by Boda and Filar (2006). Thus, we are looking now for an optimal policyπ * for each objective function Vπ N ððc; iÞ; xÞ that minimises the risk of failing target level x: The associated percentile (minimum risk level for x) is denoted Vπ * N . Following Wu and Lin (1999) and Boda and Filar (2006), we can derive the dynamic programming equations for computation of the minimum percentile Vπ * N (see Appendix). We attain the following equations for x 2 R; 8 c; i ð Þ 2 S : In time n = 0, the initial probabilities are one for a target x > 0 (as there is no remaining time for earning any value) and 0 for a target x ⩽ 0 (as this will definitely be met because our initial revenue is zero). Note that the final percentile of all time periods is determined by Vπ * N + 1 ðc; i; xÞ; in Vπ * N ðc; i; xÞ, we already know the requested class i at time N.
The optimal policyπ * can be computed from the minimum percentile Vπ * N by Equation (2) for a given target level x. It should be pointed out that an optimal policy describes one way to obtain the target level, but several optimal policies might exist. Therefore, if more than one decision rule can be chosen in a certain state in order to achieve the minimum percentile we select the decision rule which contributes most to the revenue. In particular, we prefer to accept a request if the probabilities of both possible decisions a ∈ {0, 1} are equal when determining the minimum in Equation (3), P k j¼0 p n -1;j Vπ * n -1 ððc; jÞ; xÞ ¼ P k j¼0 p n -1;j Vπ * n -1 ððc -1; jÞ; x -F i Þ and the risk-neutral solution would accept the request, too. In this manner, we achieve the same probabilities regarding the target but with the policy which yields the greater expected revenue.
Furthermore, if we have in some state achieved the target level the following states can be arbitrarily chosen. In practice, the policy for the ongoing states should be optimised then under another criterion, such as the expected revenue. Moreover, if the target can never be obtained in the given setting, all policies are equally improper and no optimal target-percentile policy exists (technically, all policies are optimal but none is proper). In both cases, we apply the risk-neutral policy which maximises the expected revenue throughout this paper if not otherwise stated.
For efficient implementation, we apply an usual transformation of the dynamic programming formulation of Equation (3). Introducing the operator T n ðc; xÞ :¼ P k i¼0 p n;i V n ðc; i; xÞ helps reducing the state space by variables representing the fare class of an arrival. Defining W n (c, x): = T n (c, x)V n (c, i, x), we transform Equation (3), as follows, for x 2 R; c 2 f0; ; Cg: The computation of all possible cumulative rewards given by the variable x could be reduced if done on a suitable grid for larger problems as described in the works of Wu and Lin (1999) and Boda et al (2004). In this paper, we do not apply the grid reduction.
Example. In order to illustrate the method, we give a stylised example. Consider only two classes with fares F 1 = 200; F 2 = 100, two remaining time periods N = 2, one seat left C = 1, and the probabilities for arrivals p 1,1 = 0.10, p 1,2 = 0.15, p 2,1 = p 2,2 = 0.20. Thus, for example, the probability of a request of fare 2 in period 1 before departure is 15%. We have a few scenarios in this setting: if a request for a distinct fare class comes in period 2 before departure, we can accept it or reject this fare class and then wait for possible arrivals in the last period and, if they appear, accept. It is easy to see that the policy which always accepts (expected revenue of 81) is better off when compared with others. However, consider that now we want the best policy for a target value of 200. The expected revenue maximising policy fails that target with probability of 0.74. A better choice for this target would be only acceptance of the highest fare class, a policy which fails only with a likelihood of 0.72.

Numerical simulation and results
In their introductory paper about dynamic capacity control, Lee and Hersh (1993) used an example which also served for illustration in the recent papers of Barz and Waldmann (2007), Huang and Chang (2011) and Koenig and Meissner (2015). Thus, we can also demonstrate the proposed target-percentile policy in the same exemplary setup.

Exemplary simulation setup
There are N = 30 time periods before departure, and the initial number of seats is C = 10. The four fare classes are F 1 = 200, F 2 = 150, F 3 = 120, F 4 = 80. The probabilities for a request of a fare class in a given time period are shown in Table 1.
In order to see how the target-percentile policy works, we conducted an experiment with 1000 sample runs. Random arrivals were simulated in a Monte Carlo manner using the values of Table 1. When compared with other proposed policies, the same sample paths (random arrivals) were used, of course.
A single simulation run is initialised with values for remaining seats, time periods before departure, and a policy. The policy contains for each state the acceptable fare classes. The state is described by remaining time periods, remaining seats, and a remaining target value. The simulation then continues to loop over the time periods until the departure time zero is reached. Inside the loop, a random generator simulates requests for fare classes which are accepted if the current policy allows acceptance of the class or else rejected. An update of the state is as follows: time periods are always decremented by one, seats are decremented only if a fare is accepted, and the target value is decremented by the gained fare price.
Policy illustration. Figure 1 visualises the policyπ * for the described example. We see slices through a three-dimensional matrix which displays the index of the maximum allowed fare class for each state (c, x) in time n with initial target of 1200. In order to use the policy, we start in the state (10, 1200) with 30 time periods to go. This is the top corner on the right hand side of the presented box. The state at this position in the matrix gives the maximum allowed fare class, which lets one decide how to act at this point in time before departure. Only fare classes with higher or equal price than the associated class shown are accepted. As time marches on, one moves always one step further along the time dimension to departure time zero; this is parallel to the south-west direction in the figure.
The policy decides now which way to move in both other dimensions. An acceptance of a request causes a move downwards along the dimension of the capacity, orthogonally downwards in the matrix. Finally, the price of an accepted fare dictates how to move in the target direction, along the northwest direction in the figure. Thus, considering the figure, the simulation will generate random trajectories from the top corner on the right hand side to the bottom corner on the left hand side. Of course, the end of each trajectory will often be different because of the random realisations but it has to end with coordinate n = 0.
We illustrate the effect of changing a target level on the policy in Figure 2. The figure shows for two different target levels (1200, 1400) the corresponding policies when revenue has not yet been gained. The effect of increasing the target can  be observed by the right hand side of the matrices which show the indices of the maximum allowed fare class for each state. For example, a capacity of at least six seats is required for a target of 1200, but a capacity of at least seven seats is required for a target of 1400. Only the highest fare class is accepted when only six seats are available for the target of 1200, respectively, when only seven seats are available for the target of 1400. Figure 2 illustrates the pure target policies for 1200 and 1400. As already mentioned, we apply the risk-neutral policy in states which do not allow to achieve a target; this is not shown in this figure.
Evaluation. As the proposed policy optimises the targetpercentile, we start our evaluation with different (obtainable) target revenues, comparing the theoretical and the simulation results. As mentioned in Section 3.2, there are scenarios when target revenue is achieved but time is remaining and one or more seats are left. We present the average of remaining time and seats for such cases as well. Further, the averaged revenue is computed by switching to the risk-neutral policy or the firstcome-first-serve (FCFS) policy when the target has been achieved. Table 2 shows the results for seven different targets. The average of failed cases in the simulation is plausible within numerical errors to the theoretical percentile, validating that the policy does as expected.
The expected revenue of the risk-neutral policy for the analysed problem is 1407.2. Looking at the results of Table 2, we see that a policy which aims towards a lower target revenue than the expected value accepts an upcoming request early in time. Decisions are made soon and not post-poned to later periods. This effect is easily observable as remaining time and seats decrease, while the target is increasing. Policies with lower targets have a greater probability for reaching the target. It can be more easily obtained by accepting requests early, thus leaving more time for balancing against having no profitable requests in the next time periods. For the very low target of 800, the target policy is similar to the FCFS policy and every request is taken early in order to achieve the low target. For the very high target of 1900, the target policy 'speculates' for unlikely combinations of requests of high fares and leaves with empty seats.
The effect between switching to the risk-neutral or FCFS policy for the remaining time after achieving the target can be observed for the revenue and standard deviation. Of course, there is no impact on failed target, remaining time and seats. With decreasing remaining time (or increasing target), the difference between the revenue and standard deviation of using the risk-neutral or FCFS policy for the remaining time diminishes. The average revenues of the target policies are in each case lower than that of the risk-neutral policy but greater than the FCFS policy. The standard deviation of these revenues grows with an increasing target, although when compared with the risk-neutral and FCFS cases, their policies less often fail the targets. This can be explained by comparing the distribution histograms of the revenues of the policies. In the following, we apply the risk-neutral policy when a target is reached. Figure 3 shows the distribution histograms of 1000 simulation runs of three policies: one with low target 1200, one with high target 1400, and a risk-neutral one maximising expected revenue.
The distribution associated with the low target has its peak above its target value 1200 and a slight negative skew. It has only small frequencies for values lower than 1200 but also for values higher than 1500, as its standard deviation from Table 2 also emphasises. It has a peak at 1300. The risk-neutral solution shows a negative skewed distribution with a peak at 1500 with a long tail to very low values, though some high revenues at 1800. Compared with the policy with target 1200, its revenues are more often below 1200; however, given the revenue is greater 1200, it will be better off. Its risk of falling below 1200 remains higher than the risk of the low target policy.
The distribution of the policy with high target 1400 has a strong negative skew with a long tail to low values, too. The peak of the distribution is at 1400. Compared with the riskneutral counterpart, this policy shifts frequency from 1300 to 1400 revenue. The target is achieved mainly at the expense of 1300 revenue and greater than 1500 revenue. Further, it shows also higher frequencies for low revenue than both other policies. Hence, if it fails the target, there is a greater risk of obtaining only low revenue. The histogram demonstrates that the policy with low target aims at a lower average revenue and smaller variance, but the policy with a higher target, near to the expected revenue of the risk-neutral solution, does not.
In order to evaluate the performance of target revenue policies in more detail, we compare them with the risk-sensitive policies derived from expected utility theory, as in Barz and Waldmann (2007). We select the latter policies for comparison as they result from optimising the dynamic capacity control model using an exponential utility and no heuristics. Referring to the recent works of Huang and Chang (2011) and Koenig and Meissner (2015), we view the mean, standard deviation, and CV@R of the policies. The CV@R is a measure for the expected revenue given the revenue is below a certain quantile specified by a confidence level α; it is the expected value in the α percent of worst cases. Table 3 compares both types of risk-sensitive policies. Beyond the mean, standard deviation, and CV@R with confidence level 5%, the observed relative frequency of failing the 1000 target is given. We see that the target policy for 1000 has the least risk of failing it. However, it is also observable that the target policies only limit the risk of failing their certain target and do not provide more preferable results in terms of the other measures. The expected utility based policies have higher average revenues than most target policies. If the target policies aim at a level greater than 1200, the CV@R drops down with further increasing the level. The CV@R of the policies employing an exponential utility function decreases with decreasing level of risk aversion. The standard deviation decreases with higher target and higher level of risk sensitivity for both types of policies, with exception of the 1400 target. Discussed already by Figure 3, the CV@R results also show that the target policies do not limit the risk of obtaining only low revenues in the worst cases. Further, it is interesting that the policies aimed at targets  different from 1000 do not guarantee good performance regarding the 1000 target. This effect becomes more observable in the distribution histogram of the 1000 revenue target policy and the expected utility policy with high risk aversion γ = 0.01, as shown in Figure 4. The target policy has a lower average revenue, a higher 5% CV@R, and a higher standard deviation than the exponential utility policy, and it achieves at least a revenue of 1000 in more cases. The frequencies for the revenues 800 and 900 are lower for the target policy than for the exponential utility policy. The target policy has higher frequencies for revenues between 1000 and 1200 and between 1700 and 1800. It has lower frequencies between 1300 and 1600 than its counterpart. This explains the lower mean revenue. Figures 3 and 4 show that the target policies dent the distribution slightly below the target. Thereby, the whole distribution, for values lower and greater than the target, is influenced. Frequencies below this dent may increase as frequencies for the target do. In particular, distribution lower then the target need not be modified in a favourable manner regarding the lowest revenues, that is to say the worst cases.
The results of Table 3 show, that decisions makers should choose a policy according their prioritisation of measures. For example, a risk-averse policy is appropriate for decision makers if their business could be more negatively impacted by a (few) worst case scenarios than forgoing revenue in average.
Further numerical experiments. We did further numerical experiments beyond the previous illustrative one. In order to investigate the target policy, we show five more scenarios which differed with respect to their load factor. The load factor is given by λ ¼ 1=C P N n¼0 P k j¼1 p jn and gives information about demand in relation to capacity. The previous example had a load factor of λ = 1.32.
We changed only the request probabilities of the previous example and hold the other parameters fixed to get further scenarios. To this end, we built the further scenarios by choosing random request probabilities which yielded different load factors. Then we simulated 1000 sample runs with each scenario. Table 4 show results of the risk-neutral policy and of the target policy which were applied to the scenarios. We selected those targets for each policy which were 115, 100 and 85% times the expected revenue of the risk-neutral policy. As Table 4 shows, the target policies achieved the desired target more often than the risk-neutral policies in the numerical simulations. That advantage of the target policies increased along with the increasing load factor.

Conclusions
A risk-averse policy minimising the failure of a previously defined, certain revenue target has been proposed for a revenue management problem, namely the dynamic capacity control setting. This policy is derived by extending the state space of the Markov decision process formulation of the problem. We have discussed aspects for implementing the policy numerically. In numerical experiments, we have analysed the proposed policy and evaluated against risk-neutral and another risk-sensitive policies. We have compared the mean, standard deviation, and conditional-value-of-risk of those policies. The optimal policy for a given target revenue focuses on minimising the likelihood of the failing of this certain target but does not compensate for other risk measures.
The analysis of the revenue distributions of the target revenue aimed policies in numerical experiments disclose how important correct understanding of such a policy is when applied. The decision maker must be aware of its limitations; in particular, that it is the policy with lowest probability of failing the target, but the probability of worst outcomes are not eliminated.  However, using a low target revenue supports limiting such risk. The presented approach can be further developed in order to achieve a policy which optimises value-at-risk as proposed by Boda and Filar (2006). Furthermore, it also offers the basis for the development of investigating policies balancing out mean revenue versus target achievement. Table 4 Results of numerical simulation with scenarios which differed by their load factors λ. Revenues are averaged over 1000 sample runs. The given differences are the observed relative frequencies of failed target instances of the risk-neutral policy minus the observed relative frequencies of failed target instances of the target policy. The differences show how more often the target level policies achieved the target but the risk-neutral ones did not. For example, in the example of the last row of the table, the target policy did failed in 1.4% of all sample runs and the risk-neutral one in 7.1% λ = 0.89, expected revenue = 1157.2, simulation revenue = 1155.9 Target The sets A, S, R are all finite. The finite convex combination and minimum of distribution functions are distribution functions, too. Thus, Vπ * n ððc; iÞ; ÁÞ is a distribution function.
Theorem 2 For each n ⩾ 1, the sequence fVπ