The multi-stage dynamic stochastic decision process with unknown distribution of the random utilities

We consider a decision maker who performs a stochastic decision process over a multiple number of stages, where the choice alternatives are characterized by random utilities with unknown probability distribution. The decisions are nested each other, i.e. the decision taken at each stage is affected by the subsequent stage decisions. The problem consists in maximizing the total expected utility of the overall multistage stochastic dynamic decision process. By means of some results of the extreme values theory, the probability distribution of the totalmaximumutility is derived and its expected value is found. This value is proportional to the logarithm of the accessibility of the decision maker to the overall set of alternatives in the different stages at the start of the decision process. It is also shown that the choice probability to select alternatives becomes a Nested Multinomial Logit model.

random utility models [15,16]. According to [23], in these models "a decision maker i faces a choice among N alternatives and will assign a certain level of utility to each alternative. The utility that the decision maker assigns to alternative j isũ i j , j = 1, . . . , N . The decision maker will choose the alternative that provides the greatest utility, i.e. choose alternative j if and only ifũ i j >ũ ik , ∀k = j. Consider now an external observer. The observer does not observe the decision maker's utilityũ i j , but he observes just some attributes of the alternatives as faced by the decision maker, labeled v i j , ∀ j. Since there are aspects of the utility that the observer cannot catch, v i j =ũ i j . Utility is then decomposed asũ i j = v i j +x i j , wherex i j captures the factors that affect utility but are not included in v i j . The observer does not knowx i j and therefore treats these terms as random variables". The aforementioned setting is typical of several applications of operations management, where the decisions must be taken in advance and a limited knowledge of some quantities is present, as in project management, supply chain optimization, service network design, logistics, and transportation (see, e.g., [3,4,13,14]).
In this paper, we consider a decision process evolving over multiple stages (e.g., over a discrete time horizon) in which a decision maker is asked to solve consecutively several random utility models, i.e. he needs to select, at each stage, an alternative among a finite set of mutually exclusive choices. A certain level of utility, depending on stochastic variables with unknown probability distribution, is associated with each alternative and the decision maker wants to maximize the expected value of the total utility originating from the overall decision process. However, the decision process cannot be decomposed per stage because decisions are nested each other, i.e., the utility of an alternative at each stage is affected by the utilities associated to the selected alternatives in the subsequent stages.
When facing the special case in which only a single stage exists (i.e., when the decision maker has only a static set of alternatives to choose from), it is well-known that the choice probability reduces to a Multinomial Logit (MNL) model under the assumption that the random utilities are independent and identically distributed (i.i.d.) and the common distribution is a Gumbel function (see [1,2,5,12]). In the past, some contributions have shown that the assumption of a Gumbel distribution for the random utilities is too restrictive when the number of alternatives becomes large. Actually, a MNL model can be still derived under the mild assumption that the common distribution of the i.i.d. random utilities has an asymptotically exponential behavior in its tail [9,10,22]. The effectiveness of such an asymptotic approximation have been proved in several applications in the context of location, routing, loading, packing, and other logistics operations [17][18][19][20][21].
The main contribution of this paper is to generalize the above theory to a multistage dynamic stochastic decision process in which decisions are nested each other. We will show that, by using some results of the extreme values theory, the probability distribution of the total maximum utility can be asymptotically approximated and its expected value can be found. Moreover, we will be able to show that the choice probability to select alternatives becomes a Nested Multinomial Logit model. A similar result was obtained in [11], but there the authors considered a static multi-level nested location problem with a known Gumbel distribution for the random utilities.
The rest of the paper is organized as follows. In Sect. 2, we formally define the process under study and the necessary notation. Section 3 is devoted to present how it is possible to derive an asymptotic approximation for the probability distribution of the random utilities in the multi-stage dynamic stochastic decision process, while Sect. 4 focuses on how to model the choice probability as a Nested Multinomial Logit. Finally, conclusions are drawn in Sect. 5.

Problem formulation
Let us consider a multi-stage dynamic random utility model described by the following notation t = 0, . . . , T : stages; -N t = {1, . . . , n t }: set of choice alternatives at stage t; -N 0 = {0}: initial start of the decision process, containing a singleton alternative; -L j (t): set of scenarios for alternative j at stage t; l j (t) = |L j (t)|: number of scenarios for alternative j at stage t; -L(t) = ∪ j∈N t L j (t): total set of scenarios of the decision process at stage t; l(t) = |L(t)| = j∈N t l j (t): total number of scenarios of the decision process at stage t; -v i j (t): deterministic utility of alternative j at stage t when the decision process starts from alternative i at stage t − 1; -θ l j (t): random oscillation of the utility of alternative j at stage t under scenario l ∈ L j (t).
Let us assume thatθ l j (t) are independent and identically distributed (i.i.d.) stochastic variables in j, l, and t, with the following common unknown probability distribution The general structure underlying the multi-stage dynamic stochastic decision process we want to study can be represented as in Fig. 1. This type of decision process structure and its optimization perspective can be found very commonly in several practical applications. A straightforward example is represented by the Critical Path Method (CPM) in the solution of project scheduling problems [8]. In these problems, a set of tasks (each one with its own duration) must be performed to complete a project as soon as possible. Since precedence constraints exist among the tasks, the decision maker would like to minimize the make-span (i.e. the completion time of the last task) by satisfying the precedence constraints. The CPM leads to an optimal plan by focusing on the concept of critical path, i.e. the sequence of tasks that are the most critical for the entire project. The method basically works in two phases. In the first one, tasks become nodes of a network clustered into ranks according to the precedence constraints, and in the second one, the longest path is found throughout this network. It is easy to see that, in the case of stochastic and time-dependent task durations, the decision process resorts to the one presented in this paper, in which tasks are the alter- natives, ranks are the stages, and a longest path is a sequence of decisions throughout the stages that minimizes the make-span.
Letṽ i j (t) be the random utility of alternative j at stage (t) when the decision process starts from alternative i at stage t = 1, . . . , T . We assume that the decision process is efficiency-based so that, for any alternative j ∈ N t , t = 1, . . . , T , among the different scenarios l ∈ L j (t), the one which maximizes the random choice utility will be considered. The random utilityṽ whereθ j (t) is defined as the maximum of the random utility oscillationsθ l j (t) among all possible alternative scenarios l ∈ L j (t), i.e.
and U i (t) is the expected utility of alternative i at stage t, i.e.
Equation (2) shows that the alternative j at stage t is evaluated not only by its own utility v i j (t) +θ j (t) but also by the utility U j (t) of the future selected alternatives. In such a way, the utilities become nested over the stages. Note that, since F(x) is unknown,θ j (t) is still a random variable with the following unknown probability distribution Given the definition in (3),θ j (t) ≤ x ⇐⇒θ l j (t) ≤ x, l ∈ L j (t). Sinceθ l j (t) are independent, using (1), (5) becomes where l j (t) is the total number of scenarios for alternative j at stage t.
Eq. (4) becomes and the maximum utility U of the whole multi-stage dynamic stochastic decision process is However, the calculation of U 0 (0) requires the calculation of IEθ ṽ 0 (0) , which in turn requires to know the probability distribution ofṽ 0 (0), or, because of the nested structure of the utilities, of {ṽ i (t), i ∈ N t , t = 0, . . . , T −1}. Let us call the probability distribution ofṽ i (t) as that is still unknown, since {θ l j (t)} have an unknown probability distribution. Nevertheless, the asymptotic approximation of G i (x, t), i.e. an approximation valid when the total number l(t + 1) of scenarios of the decision process at stage (t + 1) becomes large, will be derived in the next session.

The asymptotic approximation of G i (x, t)
Let us assume that the probability distribution of {θ l j (t)}, named F(x) in (1), is asymptotically exponential in its right tail, i.e.
∃β > 0 such that lim By using some results of the asymptotic extreme values theory [6], we will show that under assumption (11) the distribution G i (x, t) asymptotically converges to a Gumbel function as the total number of scenarios of the decision process l(t +1) at stage (t +1) becomes large.
First note that, because of (5), (6), (2), and (7), Eq. (10) becomes Moreover, note that we can set the origin for the utility scale arbitrarily, i.e., the choice probabilities are unaffected by a shift in the utility scale and any additive constant to the utilities can be ignored. Let us choose this constant as the root a l(t+1) of the equation where l(t + 1) is the total number of scenarios of the decision process at stage t + 1. By replacingṽ i j (t) withṽ i j (t) − a l (t + 1) in (12) one has (14) where G i (x, t|l(t + 1)) is used to underline the dependency of G i (x, t) from l(t + 1). Let us consider the ratio and assume that this ratio remains constant for each pair ( j, t + 1) while the values of l(t + 1) = 1, 2, . . . do increase. Then, Eq. (14) can be written as (16) Now, let us assume that l(t +1) is large enough to use lim l(t+1)→+∞ G i (x, t|l(t +1)) as an approximation of G i (x, t). Then, the following theorem holds Theorem 1 Under condition (11), the probability distribution G i (x, t) becomes the following Gumbel function is the accessibility in the sense of Hansen [7] of alternative i at stage t to the overall set of alternatives at stage (t + 1).

Expected value calculation
Having now an explicit form for G i (x, t), we can calculate IEθ ṽ i (t) in (8) as follows By substituting z = A i (t)e −βx , one gets where γ = − +∞ 0 e −z ln z dz 0.5772 is the Euler constant. Because of (25), and disregarding the constant γ /β, the maximum utility U of the whole multi-stage dynamic stochastic decision process in (9) becomes

A Nested Multinomial Logit model for the choice probability
The choice probability p i j (t + 1) for a decision maker who has selected alternative i at stage t to select alternative j at stage (t + 1) can be determined as follows. The decision maker will choose alternative j at stage t + 1 if and only if that alternative will have the largest utility among all alternatives at that stage, i.e. Then By using (5), one gets and, since {θ k (t), k ∈ N t+1 , t = 0, . . . , T − 1} are independent, Pr max Now, from the Total Probability Theorem, Eq. (28) becomes The following theorem holds

Theorem 2
The choice probability p i j (t + 1) for a decision maker who has selected alternative i at stage t to select alternative j at stage t + 1 is given by which is a Nested Multinomial Logit model.

Conclusions
In this paper, we have considered a multi-stage dynamic decision process in which decisions are nested each other. The process is tackled as a random utility model in which decision utilities depend on i.i.d. stochastic variables with unknown probability distribution, and the decision maker aims at maximizing the expected value of the process total utility. By using some results of the extreme values theory, we have derived the asymptotic approximation for the probability distribution of the total utility and calculated its expected value. Interesting enough, the resulting expected value is proportional to the logarithm of the accessibility in the sense of Hansen [7], i.e. the accessibility of the decision maker to the overall set of alternatives at the different stages at the start of the decision process. Moreover, we have also shown that the choice probability to select alternatives becomes a Nested Multinomial Logit model.
In the near future, the theoretical outcomes of the present paper are supposed to be applied and experimentally validated in different operational settings. The main advantages of such an approach with respect to other ways to deal with multi-stage dynamic problems under uncertainty (e.g., Stochastic Programming and Robust Optimization) are the computational tractability of the deterministic approximation and the very mild assumptions needed on the probability distribution of the stochastic variables involved.