## Abstract

We consider a decision maker who performs a stochastic decision process over a multiple number of stages, where the choice alternatives are characterized by random utilities with unknown probability distribution. The decisions are nested each other, i.e. the decision taken at each stage is affected by the subsequent stage decisions. The problem consists in maximizing the total expected utility of the overall multi-stage stochastic dynamic decision process. By means of some results of the extreme values theory, the probability distribution of the total maximum utility is derived and its expected value is found. This value is proportional to the logarithm of the accessibility of the decision maker to the overall set of alternatives in the different stages at the start of the decision process. It is also shown that the choice probability to select alternatives becomes a Nested Multinomial Logit model.

## Introduction

Discrete choice models under the assumption of a utility-maximizing behavior by the decision maker and uncertainty over the estimation of the utility values are called *random utility models* [15, 16]. According to [23], in these models “*a decision maker**i**faces a choice among**N**alternatives and will assign a certain level of utility to each alternative*. *The utility that the decision maker assigns to alternative**j**is*\(\tilde{u}_{ij}, j = 1, \ldots , N\). *The decision maker will choose the alternative that provides the greatest utility, i.e. choose alternative**j**if and only if*\(\tilde{u}_{ij} > \tilde{u}_{ik} ,\;\forall k \ne j\). *Consider now an external observer*. *The observer does not observe the decision maker’s utility*\(\tilde{u}_{ij}\), *but he observes just some attributes of the alternatives as faced by the decision maker, labeled*\(v_{ij},\;\forall j\). *Since there are aspects of the utility that the observer cannot catch*, \(v_{ij} \ne \tilde{u}_{ij}\). *Utility is then decomposed as*\(\tilde{u}_{ij} = v_{ij} + \tilde{x}_{ij}\), *where*\(\tilde{x}_{ij}\)*captures the factors that affect utility but are not included in*\(v_{ij}\). *The observer does not know*\(\tilde{x}_{ij}\)*and therefore treats these terms as random variables*”. The aforementioned setting is typical of several applications of operations management, where the decisions must be taken in advance and a limited knowledge of some quantities is present, as in project management, supply chain optimization, service network design, logistics, and transportation (see, e.g., [3, 4, 13, 14]).

In this paper, we consider a decision process evolving over multiple stages (e.g., over a discrete time horizon) in which a decision maker is asked to solve consecutively several random utility models, i.e. he needs to select, at each stage, an alternative among a finite set of mutually exclusive choices. A certain level of utility, depending on stochastic variables with unknown probability distribution, is associated with each alternative and the decision maker wants to maximize the expected value of the total utility originating from the overall decision process. However, the decision process cannot be decomposed per stage because decisions are nested each other, i.e., the utility of an alternative at each stage is affected by the utilities associated to the selected alternatives in the subsequent stages.

When facing the special case in which only a single stage exists (i.e., when the decision maker has only a static set of alternatives to choose from), it is well-known that the choice probability reduces to a Multinomial Logit (MNL) model under the assumption that the random utilities are independent and identically distributed (i.i.d.) and the common distribution is a *Gumbel* function (see [1, 2, 5, 12]). In the past, some contributions have shown that the assumption of a Gumbel distribution for the random utilities is too restrictive when the number of alternatives becomes large. Actually, a MNL model can be still derived under the mild assumption that the common distribution of the i.i.d. random utilities has an asymptotically exponential behavior in its tail [9, 10, 22]. The effectiveness of such an asymptotic approximation have been proved in several applications in the context of location, routing, loading, packing, and other logistics operations [17,18,19,20,21].

The main contribution of this paper is to generalize the above theory to a multi-stage dynamic stochastic decision process in which decisions are nested each other. We will show that, by using some results of the extreme values theory, the probability distribution of the total maximum utility can be asymptotically approximated and its expected value can be found. Moreover, we will be able to show that the choice probability to select alternatives becomes a Nested Multinomial Logit model. A similar result was obtained in [11], but there the authors considered a static multi-level nested location problem with a known Gumbel distribution for the random utilities.

The rest of the paper is organized as follows. In Sect. 2, we formally define the process under study and the necessary notation. Section 3 is devoted to present how it is possible to derive an asymptotic approximation for the probability distribution of the random utilities in the multi-stage dynamic stochastic decision process, while Sect. 4 focuses on how to model the choice probability as a Nested Multinomial Logit. Finally, conclusions are drawn in Sect. 5.

## Problem formulation

Let us consider a multi-stage dynamic random utility model described by the following notation

\(t=0, \ldots ,T\): stages;

\(N_t= \{1, \ldots , n_{t}\}\): set of choice alternatives at stage

*t*;\(N_0=\{0\}\): initial start of the decision process, containing a singleton alternative;

\(L_j(t)\): set of scenarios for alternative

*j*at stage*t*;\(l_j(t)=|L_j(t)|\): number of scenarios for alternative

*j*at stage*t*;\(L(t)= \cup _{j\in N_t} L_j(t)\): total set of scenarios of the decision process at stage

*t*;\(l(t)=|L(t)|= \sum _{j\in N_t} l_j(t) \): total number of scenarios of the decision process at stage

*t*;\(v_{ij}(t)\): deterministic utility of alternative

*j*at stage*t*when the decision process starts from alternative*i*at stage \(t-1\);\(\tilde{\theta }_j^l(t)\): random oscillation of the utility of alternative

*j*at stage*t*under scenario \(l\in L_j(t)\).

Let us assume that \(\tilde{\theta }_j^l(t)\) are independent and identically distributed (i.i.d.) stochastic variables in *j*, *l*, and *t*, with the following common *unknown* probability distribution

The general structure underlying the multi-stage dynamic stochastic decision process we want to study can be represented as in Fig. 1. This type of decision process structure and its optimization perspective can be found very commonly in several practical applications. A straightforward example is represented by the Critical Path Method (CPM) in the solution of project scheduling problems [8]. In these problems, a set of tasks (each one with its own duration) must be performed to complete a project as soon as possible. Since precedence constraints exist among the tasks, the decision maker would like to minimize the make-span (i.e. the completion time of the last task) by satisfying the precedence constraints. The CPM leads to an optimal plan by focusing on the concept of *critical path*, i.e. the sequence of tasks that are the most critical for the entire project. The method basically works in two phases. In the first one, tasks become nodes of a network clustered into ranks according to the precedence constraints, and in the second one, the longest path is found throughout this network. It is easy to see that, in the case of stochastic and time-dependent task durations, the decision process resorts to the one presented in this paper, in which tasks are the alternatives, ranks are the stages, and a longest path is a sequence of decisions throughout the stages that minimizes the make-span.

Let \(\tilde{v}_{ij}(t)\) be the random utility of alternative *j* at stage (*t*) when the decision process starts from alternative *i* at stage \(t=1, \ldots ,T\). We assume that the decision process is efficiency-based so that, for any alternative \(j\in N_{t}, t=1, \ldots ,T\), among the different scenarios \(l\in L_j(t)\), the one which maximizes the random choice utility will be considered. The random utility \(\tilde{v}_{ij}(t)\) is then

where \(\tilde{\theta }_j(t)\) is defined as the maximum of the random utility oscillations \(\tilde{\theta }_j^l(t)\) among all possible alternative scenarios \(l\in L_j(t)\), i.e.

and \(U_i(t)\) is the expected utility of alternative *i* at stage *t*, i.e.

Equation (2) shows that the alternative *j* at stage *t* is evaluated not only by its own utility \(v_{ij}(t)+\tilde{\theta }_j(t)\) but also by the utility \(U_j(t)\) of the future selected alternatives. In such a way, the utilities become nested over the stages.

Note that, since *F*(*x*) is unknown, \(\tilde{\theta }_j(t)\) is still a random variable with the following unknown probability distribution

Given the definition in (3), \(\tilde{\theta }_j(t)\le x\Longleftrightarrow \tilde{\theta }_j^l(t)\le x, \; l\in L_j(t)\). Since \(\tilde{\theta }_j^l(t)\) are independent, using (1), (5) becomes

where \(l_j(t)\) is the total number of scenarios for alternative *j* at stage *t*.

Now, by defining

Eq. (4) becomes

and the maximum utility *U* of the whole multi-stage dynamic stochastic decision process is

However, the calculation of \(U_0(0)\) requires the calculation of \(\mathrm {IE}_{\tilde{\theta }}\left[ \tilde{v}_{0}(0)\right] \), which in turn requires to know the probability distribution of \(\tilde{v}_{0}(0)\), or, because of the nested structure of the utilities, of \(\{\tilde{v}_{i}(t), i\in N_{t}, t=0, \ldots ,T-1\}\). Let us call the probability distribution of \(\tilde{v}_{i}(t)\) as

that is still unknown, since \(\{\tilde{\theta }_j^l(t)\}\) have an unknown probability distribution. Nevertheless, the asymptotic approximation of \(G_i(x,t)\), i.e. an approximation valid when the total number \(l(t+1)\) of scenarios of the decision process at stage \((t+1)\) becomes large, will be derived in the next session.

## The asymptotic approximation of \(G_{i}(x,t)\)

Let us assume that the probability distribution of \(\{\tilde{\theta }_j^l(t)\}\), named *F*(*x*) in (1), is asymptotically exponential in its right tail, i.e.

By using some results of the asymptotic extreme values theory [6], we will show that under assumption (11) the distribution \(G_i(x,t)\) asymptotically converges to a Gumbel function as the total number of scenarios of the decision process \(l(t+1)\) at stage \((t+1)\) becomes large.

First note that, because of (5), (6), (2), and (7), Eq. (10) becomes

Moreover, note that we can set the origin for the utility scale arbitrarily, i.e., the choice probabilities are unaffected by a shift in the utility scale and any additive constant to the utilities can be ignored. Let us choose this constant as the root \(a_{l(t+1)}\) of the equation

where \(l(t+1)\) is the total number of scenarios of the decision process at stage \(t+1\).

By replacing \(\tilde{v}_{ij}(t)\) with \(\tilde{v}_{ij}(t) - a_l(t+1)\) in (12) one has

where \(G_i{(x,t|l(t+1))}\) is used to underline the dependency of \(G_i{(x,t)}\) from \(l(t+1)\).

Let us consider the ratio

and assume that this ratio remains constant for each pair \((j,t+1)\) while the values of \(l(t+1)=1,2,\ldots \) do increase. Then, Eq. (14) can be written as

Now, let us assume that \(l(t+1)\) is large enough to use \(\lim _{l(t+1)\rightarrow +\infty }G_i(x,t|l(t+1))\) as an approximation of \(G_i(x,t)\). Then, the following theorem holds

### Theorem 1

Under condition (11), the probability distribution \(G_i(x,t)\) becomes the following Gumbel function

where

is the accessibility in the sense of Hansen [7] of alternative *i* at stage *t* to the overall set of alternatives at stage \((t+1)\).

### Proof

Since \(\lim _{l(t+1)\rightarrow +\infty }1/l(t+1)=0\), from (13) we have \(\lim _{l(t+1)\rightarrow +\infty }1-F(a_{l(t+1)}|l(t+1))=0\). This means that \(\lim _{l(t+1)\rightarrow +\infty }a_{l(t+1)}|l(t+1)=+\infty \). From (11), where \(a_{l(t+1)}|l(t+1)\) plays the role of *y*, one obtains

By substituting (21) into (19), after multiplying numerator and denominator by \(\alpha _j(t+1)\), one has

and, by reminding that \(\lim _{y\rightarrow +\infty }(1+ \frac{x}{y})^y=e^x\) and using (18), (22) becomes

The probability distribution derived in (23) is a Gumbel distribution. \(\square \)

### Expected value calculation

Having now an explicit form for \(G_{i}(x,t)\), we can calculate \(\mathrm {IE}_{\tilde{\theta }}\left[ \tilde{v}_{i}(t)\right] \) in (8) as follows

By substituting \(z=A_{i}(t)e^{-\beta x}\), one gets

where \(\gamma =-\int _{0}^{+\infty }e^{-z}\ln z\;dz\simeq 0.5772\) is the Euler constant.

Because of (25), and disregarding the constant \(\gamma /\beta \), the maximum utility *U* of the whole multi-stage dynamic stochastic decision process in (9) becomes

## A Nested Multinomial Logit model for the choice probability

The choice probability \(p_{ij}(t+1)\) for a decision maker who has selected alternative *i* at stage *t* to select alternative *j* at stage \((t+1)\) can be determined as follows. The decision maker will choose alternative *j* at stage \(t+1\) if and only if that alternative will have the largest utility among all alternatives at that stage, i.e.

Then

By using (5), one gets

and, since \(\{\tilde{\theta }_k(t), k\in N_{t+1}, t=0, \ldots ,T-1\}\) are independent,

Now, from the Total Probability Theorem, Eq. (28) becomes

The following theorem holds

### Theorem 2

The choice probability \(p_{ij}(t+1)\) for a decision maker who has selected alternative *i* at stage *t* to select alternative *j* at stage \(t+1\) is given by

which is a Nested Multinomial Logit model.

### Proof

By using (6) and (15), from (31) one obtains

As per Theorem 1, by comparing (19) and (23), one can show that, when \(l \longrightarrow +\infty \),

Finally, by setting \(\gamma = e^{\beta a_l}\) and \(z=e^{-\beta x}\), from (18) and (34), Eq. (33) becomes

\(\square \)

## Conclusions

In this paper, we have considered a multi-stage dynamic decision process in which decisions are nested each other. The process is tackled as a random utility model in which decision utilities depend on i.i.d. stochastic variables with unknown probability distribution, and the decision maker aims at maximizing the expected value of the process total utility.

By using some results of the extreme values theory, we have derived the asymptotic approximation for the probability distribution of the total utility and calculated its expected value. Interesting enough, the resulting expected value is proportional to the logarithm of the accessibility in the sense of Hansen [7], i.e. the accessibility of the decision maker to the overall set of alternatives at the different stages at the start of the decision process. Moreover, we have also shown that the choice probability to select alternatives becomes a Nested Multinomial Logit model.

In the near future, the theoretical outcomes of the present paper are supposed to be applied and experimentally validated in different operational settings. The main advantages of such an approach with respect to other ways to deal with multi-stage dynamic problems under uncertainty (e.g., Stochastic Programming and Robust Optimization) are the computational tractability of the deterministic approximation and the very mild assumptions needed on the probability distribution of the stochastic variables involved.

## References

Ben-Akiva, M., Lerman, S.R.: Disaggregate travel and mobility choice models and measures of accessibility. In: Hensher, D., Stopher, P. (eds.) Behavioral Travel Modeling. Croom Helm, London (1979)

Ben-Akiva, M., Lerman, S.R.: Discrete Choice Analysis: Theory and Application to Travel Demand, vol. 9. MIT Press, Cambridge (1985)

Beraldi, P., Bruni, M.E., Manerba, D., Mansini, R.: A stochastic programming approach for the traveling purchaser problem. IMA J. Manag. Math.

**28**(1), 41–63 (2017)Crainic, T.G., Gobbato, L., Perboli, G., Rei, W.: Logistics capacity planning: a stochastic bin packing formulation and a progressive hedging meta-heuristic. Eur. J. Oper. Res.

**253**(2), 404–417 (2016)Domencich, T., McFadden, D.: Urban Travel Dynamics: A Behavioral Analysis. North Holland, Amsterdam (1975)

Galambos, J.: The Asymptotic Theory of Extreme Order Statistics. Wiley, New York (1978)

Hansen, W.: How accessibility shapes land use. J. Am. Inst. Plan.

**25**, 73–76 (1959)Kelley, J.E.: Critical-path planning and scheduling: mathematical basis. Oper. Res.

**9**(3), 296–320 (1961)Leonardi, G.: The structure of random utility models in the light of the asymptotic theory of extremes. In: Florian, M. (ed.) Transportation Planning Models, pp. 107–133. Elsevier, Amsterdam (1984)

Leonardi, G.: Asymptotic approximations of the assignment model with stochastic heterogeneity in the matching utilities. Environ. Plan. A

**17**, 1303–1314 (1985)Leonardi, G., Tadei, R.: Random utility demand models and service location. Reg. Sci. Urb. Econ.

**14**, 399–431 (1984)Luce, R.D.: Individual Choice Behavior: A Theoretical Analysis. Wiley, New York (1959)

Manerba, D., Perboli, G.: New solution approaches for the capacitated supplier selection problem with total quantity discount and activation costs under demand uncertainty. Computers and Operations Research

**101**, 29–42 (2019)Manerba, D., Mansini, R., Perboli, G.: The capacitated supplier selection problem with total quantity discount policy and activation costs under uncertainty. Int. J. Prod. Econ.

**198**, 119–132 (2018)Marley, A.A.J.: Random utility models and their applications: recent developments. Math. Soc. Sci.

**43**(3), 289–302 (2002)Marschak, J.: Binary choice constraints on random utility indications. In: Arrow, K. (ed.) Stanford Symposium on Mathematical Methods in the Social Sciences, pp. 312–329. Stanford University Press, Stanford (1960)

Perboli, G., Tadei, R., Baldi, M.M.: The stochastic generalized bin packing problem. Discrete Appl. Math.

**160**, 1291–1297 (2012)Perboli, G., Tadei, R., Gobbato, L.: The multi-handler knapsack problem under uncertainty. Eur. J. Oper. Res

**236**(3), 1000–1007 (2014)Tadei, R., Ricciardi, N., Perboli, G.: The stochastic p-median problem with unknown cost probability distribution. Oper. Res. Lett.

**37**, 135–141 (2009)Tadei, R., Perboli, G., Baldi, M.M.: The capacitated transshipment location problem with stochastic handling costs at the facilities. Int. Trans. Oper. Res.

**19**(6), 789–807 (2012)Tadei, R., Perboli, G., Perfetti, F.: The multi-path traveling salesman problem with stochastic travel costs. EURO J. Transp. Logist.

**6**(1), 2–23 (2017). https://doi.org/10.1007/s13676-014-0056-2Tadei, R., Perboli, G., Manerba, D.: A recent approach to derive the multinomial logit model for choice probability. In Daniele, P., Scrimali, L., (eds.), New Trends in Emerging Complex Real Life Problems, AIRO Springer Series—ODS2018, Sept 10–13, 2018. Taormina (Italy), vol. 1 (2018). https://doi.org/10.1007/978-3-030-00473-6_50

Train, K.E.: Discrete Choice Methods with Simulation. Cambridge University Press, Cambridge (2003)

## Author information

### Authors and Affiliations

### Corresponding author

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## About this article

### Cite this article

Tadei, R., Perboli, G. & Manerba, D. The multi-stage dynamic stochastic decision process with unknown distribution of the random utilities.
*Optim Lett* **14**, 1207–1218 (2020). https://doi.org/10.1007/s11590-019-01412-1

Received:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s11590-019-01412-1

### Keywords

- Multi-stage dynamic decision process
- Stochastic utilities
- Extreme values theory
- Asymptotic approximation
- Nested Multinomial Logit model