A framework for sensitivity analysis of decision trees
 4.7k Downloads
 4 Citations
Abstract
In the paper, we consider sequential decision problems with uncertainty, represented as decision trees. Sensitivity analysis is always a crucial element of decision making and in decision trees it often focuses on probabilities. In the stochastic model considered, the user often has only limited information about the true values of probabilities. We develop a framework for performing sensitivity analysis of optimal strategies accounting for this distributional uncertainty. We design this robust optimization approach in an intuitive and not overly technical way, to make it simple to apply in daily managerial practice. The proposed framework allows for (1) analysis of the stability of the expectedvaluemaximizing strategy and (2) identification of strategies which are robust with respect to pessimistic/optimistic/modefavoring perturbations of probabilities. We verify the properties of our approach in two cases: (a) probabilities in a tree are the primitives of the model and can be modified independently; (b) probabilities in a tree reflect some underlying, structural probabilities, and are interrelated. We provide a free software tool implementing the methods described.
Keywords
Decision trees Decision optimization Decision sensitivity1 Introduction
Sequentiality and uncertainty are inherent in managerial practice. The former means that managers have to consider multistaged strategies, encompassing several actions following one another, rather than only a single action; the latter—that a company’s payoffs depend not only on managers’ actions but also on exogenous events (states of the world), which may often be perceived as random from the perspective of the decision maker. The actions and reactions are usually intertwined, further complicating the picture. Decision trees are used as a model that helps in discovering, understanding, and communicating the structure of such decision problems—see Clemen and Reilly (2001) and Waters (2011).
The decision makers are often uncertain about the exact parameters of such trees. In one line of literature, the payoffs are defined imprecisely as intervals, e.g. see Barker and Wilson (2012) and Cao (2014). Another approach that we focus on in the present paper is to assume that a decision maker cannot uniquely assign the probabilities to the possible events (Huntley and Troffaes 2012; Jaffray 2007; Walley 1991). This was dubbed ambiguity by Borgonovo and Marinacci (2015); however, other terms are sometimes used (e.g. secondorder uncertainty). Importantly, these probabilities are often uncertain in a nonstochastic way, precluding the assignment of any probability distribution (hence, secondorder); this was confirmed by our survey among managers taking an MBA course (detailed description of the survey and its results are available at http://bogumilkaminski.pl/pub/mbasurvey.pdf). Such a scenario is a natural setting for adapting ideas from the literature on distributionally robust optimization, see e.g. the work of Delage and Yinyu (2010) and references therein.
In what follows, we assume that if the decision maker knew the probabilities, then she would be willing to base her decision on the expected value principle. Due to nonstochastic distributional uncertainty there is no single expected value; hence, a need for a sensitivity analysis (SA) arises, to learn how the output of the decision making process changes when the input is varied—see Saltelli et al. (2009). The importance of a thorough SA is well known and discussed by numerous publications, e.g. Borgonovo and Tarantola (2012). Kouvelis and Yu (2013) point out that uncertainty is a basic structural feature of any business environment and hence should be taken into account in optimal decision making. In particular, uncertainty cannot be replaced by a deterministic model—the optimal solution of a deterministic model is often very different from the optimal solution of a model where uncertainty is present. Kouvelis and Yu (2013) further show that in sequential problems the impact of these uncertainties on decision optimality is even greater than in onetime decisions.
The case when multiple probability distributions can be considered in a decision tree has been previously studied in the literature. Høyland and Wallace (2001) consider assigning a probability density function to each node within a tree for sequential decision making. However, they note that it might be difficult for a user to decide, firstly, what probability density function should be assigned to a particular node and, secondly, how those probabilities should be correlated. Huntley and Troffaes (2012) present several ideas for choice functions, i.e. criteria the decision maker may use to select a subset of strategies. For example, the maximality criterion suggests selecting a strategy X that is not uniformly worse than some other strategy Y (uniformly worse meaning that Y offers a greater expected value for all feasible probability distributions). Unfortunately, this criterion may lead to multiple strategies being selected, possibly confusing the decision maker. The solution could be to proceed the other way round: to determine how rich the family of probability distributions may be, in order for the base case strategy to remain optimal; a concept of admissible interval (Bhattacharjya and Shachter 2012).
The approach we propose is most suitable when a decision problem is solved once but the optimal strategy is then applied in numerous individual cases. Firstorder uncertainty can be addressed by calculating the expected value; however, the distributional uncertainty cannot be averagedout, see the discussion above, as well as in BenTal and Nemirovski (2000), Høyland and Wallace (2001), Huntley and Troffaes (2012) and Kouvelis and Yu (2013). For example, let us consider a bank designing a multistep debt recovery process. The debt recovery process can be defined and solved for a single debtor, yet the policy developed will be used in numerous cases. Since there may be multiple potentially long paths, the histories of initial implementations will provide only limited information for improving the estimates of the probabilities. Another type of problem that is also addressed in our paper is the situation where many different problems are solved and the long term outcome is what matters, e.g. a capital investor devising financing plans for multiple startups. This scenario precludes learning the probabilities from past implementations of the decision. In summary, the setting we propose is valid when the expected value is a natural policy that should be used to select an optimal decision, but it is not reasonable to assume that at the moment of making the decision the decision maker may collect enough data to quantitatively assess the distributional uncertainty of the probabilities in the decision tree.
The contribution of the paper is threefold: (1) a conceptual framework for sensitivity analysis of decision trees; (2) a methodology for performing SA when values in several nodes change simultaneously, and (3) a software implementation that enables practical application of the concepts discussed in the paper. In the following three paragraphs, the contribution is presented in more detail.
Firstly, a single conceptual framework for decision tree sensitivity analysis is created. The framework allows us to conduct threshold proximity SA of decision trees (Nielsen and Jensen 2003), including alternative approaches to that of maximality alone. In particular, the framework is also able to cover the \(\varGamma \)maximin criterion (Huntley and Troffaes 2012), under its appropriate parameterization. In doing so, we keep the SA setup simple, e.g. consider only trivial families of probability distribution perturbations, which do not require numerous additional metaparameters, making it easier to apply by practitioners.
Secondly, the SA methodology proposed in the paper may be conducted simultaneously for multiple parameters in a tree (i.e. the variation in combination, see French (2003)), while the standard approach in the literature is to change parameters one at a time and then inspect the results using a tornado diagram, e.g. Briggs et al. (2006), Briggs et al. (2012), Howard (1988), and Lee et al. (2009). We also show how the results of such SA relate to one another for various types of trees. In particular, we consider nonseparable trees, i.e. trees in which probabilities in various parts of the tree are interrelated, as, for instance, to represent the same state of the world. As we show in the paper, the interrelations between the parameters both complicate the analytical approach and can lead to nonintuitive results.
Thirdly, we provide an open source software package that calculates all the concepts defined in the paper—Chondro. The Chondro web page can be accessed at https://github.com/pszufe/chondro/. The software can work on the dictionary representation of decision trees (see the documentation on the software’s home page) as well as being able to open decision trees from SilverDecisions, a software for visual construction of decision trees, available at http://silverdecisions.pl/, which also has decision and value sensitivity analysis functionality.
The paper is organized as follows. In Sect. 2, we introduce a formal model of a decision tree and introduce an important distinction between two types of trees: separable and nonseparable. In separable trees, the probabilities in various chance nodes can be changed independently; in nonseparable trees, there are constraints defining the relationships between these probabilities. We then present our methods of SA focusing on separable trees in Sect. 3. We discuss how the nonseparable case differs in Sect. 4. Section 5 concludes the paper. The proofs of all the remarks have been placed in the Appendix.
2 A model of a decision tree
A decision tree is constructed using a directed graph \(G=(V,E)\), \(E\subset V^2\), with set of nodes V (we only consider finite V) split into three disjoint sets \(V={\mathcal {D}}\cup {\mathcal {C}}\cup {\mathcal {T}}\) of decision, chance, and terminal nodes, respectively. For each edge \(e\in E\) we let \(e_1\in V\) denote its first element (parent node) and let \(e_2\in V\) denote its second element (child node). In further discussion we use the following definition: a directed graph is weakly connected if and only if it is possible to reach any node from any other by traversing edges in any direction (irrespectively of their orientation).
A decision tree is equipped with two functions: one denoting payoffs, \(y:E\rightarrow {\mathbb R}\), and the other denoting probabilities, \(p:\{e\in E: e_1\in {\mathcal {C}}\}\rightarrow [0,1]\). With this formalism we make the following assumptions: payoffs are defined for all edges and may follow both actions and reactions; probabilities are defined only for edges stemming from chance nodes. We allow zero probabilities in the general definition of a decision tree, which simplifies the technicalities in subsequent sections. In Fig. 1, we have \(y({\textsf {d}},{\textsf {c}})=10\), \(y({\textsf {d}},{\textsf {t3}})=20\), \(y({\textsf {c}},\textsf {t1})=20\), \(y({\textsf {c}},\textsf {t2})=0\) and \(p({\textsf {c}},\textsf {t1})=75\%\), \(p({\textsf {c}},\textsf {t2})=25\%\).
 C1)
there exists \(r\in V\) (root) such that \(\forall v\in V\setminus \{r\}\) there exists a unique path from r to v, written as \(r\leadsto v\);
 C2)
all and only terminal nodes have no children, i.e. \(\forall v\in V:v\in {\mathcal {T}} \Leftrightarrow \lnot \exists e\in E:e_1=v;\)
 C3)
\(p(\cdot )\) is correctly defined, i.e. \(\forall v\in {\mathcal {C}}:\sum _{e\in E, e_1=v}p(e)=1\);

\(V'=\{u\in V:v=u\text { or } v\text { lies on }r(DT)\leadsto u\}\),

\(E'=E\cap (V'^2)\),

\(y'\) and \(p'\) are y and p, respectively, both restricted to \(E'\).
The decision maker is allowed to select actions in decision nodes. A strategy (policy) in a tree DT is defined as a decision function \(d:\{e\in E: e_1\in {\mathcal {D}}\}\rightarrow \{0,1\}\), such that \(\forall {v\in {\mathcal {D}}}:\, \sum _{e\in E, e_1=v}d(e)=1\). Under this definition the decision maker can only use pure strategies, i.e. explicitly select actions rather than select probabilities and randomize actual actions. A strategy unanimously prescribes an action in every decision node, different strategies (different functions \(d(\cdot )\)) can, however, be considered as equally good.
For future use, we will designate an almost reachable set (of vertices) a set of vertices of a maximal, weakly connected subgraph of graph \((V,E^{**})\) containing r(DT), where \(E^{**}=\{e\in E:e_1\in {\mathcal {C}}\vee d(e)=1\}\). We will categorize two strategies \(d_1\), \(d_2\) as strongly identical if their almost reachable sets are identical. If two strategies are strongly identical they are identical. Conversely—identical strategies on G are strongly identical on a subgraph of G where all subtrees starting from edges where \(p(e)=0\) are removed. Thus, for a proper DT we have \(E^*=E^{**}\), and so the reachable set and almost reachable set coincide for every single strategy. In such a situation two strategies are strongly identical, if they are identical.
Before we discuss the sensitivity analysis methods, we need to introduce the concept of separability proposed by Jeantet and Spanjaard (2009). A decision tree is separable if changing the probabilities in one chance node does not automatically require changing any probabilities in any other chance node (in other words, condition C3 given in the definition of a decision tree in Sect. 2, is sufficient for the probabilities in the tree to be correctly specified). Formally, the set of all allowed probability distributions across all chance nodes is equal to the Cartesian product of the possible probability distributions in every chance node, see Jeantet and Spanjaard (2009).
Often, the probabilities across two or more chance nodes may be interrelated, e.g. entire subtrees may be repeated in the decision tree (coalescence), and two different chance nodes may represent the same uncertain event. In Fig. 2, a host is wondering who will come to her party, and the presence of Alice and Bob is independent. Changing the probability of Bob showing up requires an analogous change in the other chance node, i.e. chance nodes \({\textsf {c2}}\) and \(\textsf {c3}\) are interrelated. For future use, observe that Alice and Bob are amiable but dislike each other and the party is only going to be fun if exactly one is present.
Probabilities in different chance nodes may also be interrelated, if they are derived (e.g., using Bayes’ formula) from a common, exogenous parameter. In this case, changing this parameter requires recalculating all the derived probabilities.
If at least two chance nodes are interrelated, we will characterize the entire decision tree as nonseparable. Mathematically, nonseparability denotes that not all functions p are allowed; e.g. in Fig. 2, all such p where \(p({\textsf {c2}},\textsf {t1})\ne p(\textsf {c3},{\textsf {t3}})\) are forbidden.
When analyzing nonseparable trees, we consider a space of assessed probabilities which are separable (using terminology presented in Eidsvik et al. (2015)); they need not be directly represented in the tree. The inferred probabilities are used in the tree, and they are derived from the assessed probabilities via formulas, possibly linking more than one assessed probability in a given edge. In what follows, we assume that inferred probabilities are continuous in assessed probabilities, which is true when using Bayes’ formula.
As a side note, observe that we could alternatively use a space of assessed (general) parameters, i.e. numbers not necessarily restricted to [0, 1] intervals, etc. This change would introduce a qualitative difference between separable and nonseparable cases and would require redefining the approach to sensitivity analysis in a way that rendered the two cases less compatible, which is undesirable.
3 Sensitivity analysis in separable trees
In this section, we propose several approaches to SA in separable trees. Even though this case may be of limited use in practice, as it requires all the uncertainties to be different along every branch of the tree, it makes it easier to define the ideas first, before proceeding to nonseparable trees in the next section. Below, in the first subsection, we focus on the threshold SA, where we analyze how stable a Poptimal strategy is. Then, we focus on the scenario SA, and define an optimal strategy for various scenarios of probability perturbation (the range of perturbations for which the currently optimal strategy remains optimal can still be calculated). The notions defined here are then discussed for nonseparable trees in Sect. 4.
The decision maker may consider the amount of ambiguity regarding different probabilities as different, and for that reason we introduce an additional function \(s:{\mathcal {C}}\rightarrow [0,1]\), representing whether a given node should be subject to sensitivity analysis. In the simplest approach, the decision maker could choose the values of s(c) from the set \(\{0,1\}\), then \(s(c)=1\) denotes that c should be subject to sensitivity analysis, as the probabilities of events stemming out of this node are not given precisely, and \(s(c)=0\) denotes that probabilities are given precisely. Values between 0 and 1 could also be used to denote various degrees of ambiguity (and the formulas below account for this possibility). The choice of the value is subjective and left to the judgment of the decision maker. For instance consider three chance nodes c1, c2 and c3. Chance node c1 represents a coin toss, so the decision maker sets \(s(\textsf {c1})=0\) as she assumes that the probabilities are known. Then the decision maker feels that she is twice as certain about the value of probabilities in c2 than c3, so she sets \(s({\textsf {c2}})=0.5\) and \(s(\textsf {c3})=1\).
3.1 Stability analysis
Remark 1
Take \(0\le \varepsilon _1\le \varepsilon _2\), a separable decision tree with some sensitivity function \(s(\cdot )\), and a Poptimal strategy, d. If d is \(\varepsilon _2\)stable, it is also \(\varepsilon _1\)stable. If d is \(\varepsilon _1\)stable for \(\varepsilon _1\ge 1/{\widetilde{s}}\), it is also \(\varepsilon _2\)stable.
The interpretation of I(DT, d, s) is straightforward and should be intuitive even for a nontechnical decision maker: if none of the initial probabilities assessed imprecisely change by more than I(DT, d, s), then the strategy remains optimal. Thus, the larger the I(DT, d, s), the more confident the decision maker may feel about the original Poptimal strategy, as a larger deviation is allowed with no consequences for the recommended course of actions.
Based on the properties of I(DT, d, s) stated in Remark 1, we can numerically approximate I(DT, d, s) using a bisection in \([0,1/{\widetilde{s}}\,]\). For instance, in the simple case where \({\widetilde{s}}=1\) we check the stability for \(\varepsilon =\frac{1}{2}\); if d is stable, we check \(\varepsilon =\frac{3}{4}\), if d is not, we check \(\varepsilon =\frac{1}{4}\), etc. Verifying stability for a given \(\varepsilon \) in separable trees can be done via backward induction. Intuitively, we need to try to modify probabilities (where allowed, i.e. \(s(\cdot )>0\)) in such a way that d is not picked as optimal when solving the tree. That requires worsening the expected payoffs for chance nodes on the optimal path (for the almost reachable set) and improving the payoffs for chance nodes off the optimal path, both in backward induction. Changing payoffs in a single node is done via reallocating probabilities between edges stemming out of this node (cf. Eq. 1) and can be done, for example, using a greedy algorithm or linear programming, as convenient.
A unique Poptimal strategy will have a nontrivial region of stability as indicated by the following remark.
Remark 2
Take a separable (not necessarily proper) decision tree, DT, with some sensitivity function, \(s(\cdot )\). Assume all Poptimal strategies are strongly identical (have the same almost reachable set). Then, for any Poptimal strategy, d, we have \(I(DT,d,s)>0\).
The stability index is specific for the decision problem as a whole in the following sense.
Remark 3
Take a separable proper decision tree, DT, with some sensitivity function, \(s(\cdot )\). For any two Poptimal strategies, \(d_1\) and \(d_2\), we have \(I(DT,d_1,s)=I(DT,d_2,s)\).
Remark 3 will often hold trivially in proper trees in the following sense. If there exist two, nonidentical Poptimal strategies, \(d_1\) and \(d_2\), and there is a nondegenerate chance node being in the reachable set of only one of them, then \(I(DT,d_1,s)=0=I(DT,d_2,s)\). (A chance node is called nondegenerate, if the expected payoff, cf. equation 1, calculated in this node has a nonzero derivative with respect to probabilities \(p(\cdot )\)). If there is no such nondegenerate chance node for any pair of nonidentical Poptimal strategies, then the stability index will be equal and greater than zero. For proper trees, we can then simply let I(DT, s) denote the stability index, meaning it is valid for any Poptimal strategy.
The stability index of two strongly identical strategies is also equal for improper trees (not necessarily for two identical strategies: they may start differing when part of a tree starts being reachable after perturbing probabilities). Using Remark 2, we also see that if all Poptimal strategies are strongly identical, then this (unique) index will be greater than 0.
Generally, for improper trees there can exist two Poptimal strategies with different stability indices. For example, if in Fig. 1 we set \(p({\textsf {c}}, \textsf {t1})=0\), \(p({\textsf {c}}, \textsf {t2})=1\) and \(y({\textsf {d}}, {\textsf {t3}})=10\) and allow perturbation of the probabilities in the chance node c, then both strategies, involving \(d({\textsf {d}},{\textsf {t3}})=1\) (lower branch of the tree) and \(d({\textsf {d}},{\textsf {c}})=1\) (upper branch), are Poptimal, but the stability index of the first is equal to 0 and that of the second is equal to 1.
The managers we surveyed expressed an interest in seeing which strategy is optimal when the perturbation of probabilities is unfavorable and they had strong interest in the most likely outcome of a decision. Regarding the former, observe that unfavorable perturbation may mean different perturbations in a single chance node, depending on which actions are selected in subsequent (farther from the root) nodes. Regarding the latter, we find that simply deleting all edges except for the most likely ones is too extreme and seek to embed this modefavoring approach into the general framework of maximizing the expected value with modified probabilities. We present our approach in the following subsection.
3.2 Perturbation approach
We now want to introduce a modetending perturbation of probabilities, i.e. putting more weight on the most probable events or increasing the contrast between the assigned probabilities—an approach explicitly required by the surveyed managers (in a way, representing being even more certain about the initial assignment of probabilities). Several approaches could be considered here, therefore it is worthwhile explaining why we adopted a specific one by beginning with a discussion of other possibilities. Defining the worst(best)casetending can be looked at as finding, for a given node, the set of new probabilities (\({\mathbf {x}}=(x_1,\ldots ,x_n)\), probabilities of respective edges stemming from the node), within a ball of radius \(\varepsilon \) centered at original probabilities (\({{\mathbf {p}}}=(p_1,\ldots ,p_n)\)) that minimizes (maximizes) the average payoff of respective subtrees weighted with \({\mathbf {x}}\). One natural idea would be to utilize an analogous approach but, instead, to maximize the average \({{\mathbf {p}}}\) weighted with \({\mathbf {x}}\), which would enforce putting even more emphasis on likely events (we would tend to increase those components of \({\mathbf {x}}\) which correspond to the large components of \({{\mathbf {p}}}\)). The disadvantage of this approach is that reversing it (minimizing the average) leads not to assigning equal probabilities to all the events (within respective chance nodes, which we would consider as natural), but to selecting the leastlikely events, which is an odd scenario.
Remark 4
Using \(\gamma \) is more convenient than using \(\theta \): \(\gamma =0\) implies equating all probabilities in \({\mathbf {x}}\) (and corresponds to \(\theta =0\)), \(\gamma =1\) implies using original probabilities \({\mathbf {x}}={{\mathbf {p}}}\) (corresponds to \(\theta =D_{KL}({{\mathbf {p}}}{\mathbf {u}})\)), and for \(\gamma \rightarrow +\infty \) the probabilities in \({\mathbf {x}}\) concentrate in a mode (modes) of \({{\mathbf {p}}}\) (all \(x_i\) corresponding to probabilities \(p_i\) less than \(\max p_i\) tend to 0, and all the remaining ones tend to equal positive values). Simple algebraic manipulations show that \(dx_i/d\gamma _{\gamma =1}>0\) if and only if \(p_i>\prod _{j=1}^np_j^{p_j}\), i.e. a geometric mean of \(p_i\) weighted by \(p_i\). In short, if \(p_i\) is large, it gets larger; and if it is small, it gets smaller.
We can now define \(P_{\text {mode},\varepsilon }\)optimality. For a given \(\varepsilon \ge 0\) we transform in each chance node the probabilities according to Eq. (9) for as large \(\gamma \) as possible while \(DT,DT'_s\le \varepsilon \). Such \(\gamma \) can be found by means of any onedimensional rootfinding algorithm, since increasing \(\gamma \) increases \(\max _ix_ip_i\). Observe that in each chance node \(\gamma \) is selected independently. For these perturbed probabilities, we select the expected payoff maximizing decision, denoting it as \(P_{\text {mode},\varepsilon }\)optimal. Repeating the calculations for various \(\varepsilon \) provides an approximate split of the \([0,1/{\widetilde{s}}\,]\) interval into subintervals in which various decisions are \(P_{\text {mode},\varepsilon }\)optimal.
The notions of stability and \(P_{\cdot ,\varepsilon }\)optimality can be linked.
Remark 5
Take a separable tree, DT, with a sensitivity function, \(s(\cdot )\). If d is Poptimal, then it is also \(P_{\min ,\varepsilon }\)optimal, \(P_{\max ,\varepsilon }\)optimal, and \(P_{\text {mode},\varepsilon }\)optimal for any \(\varepsilon \le I(DT, d, s)\).
The relation between stability and \(P_{\cdot ,\varepsilon }\)optimality is illustrated in Fig. 5. The left part presents an exemplary tree with three strategies \(d_1\), \(d_2\), and \(d_3\) (setting, respectively, \(d(\textsf {d1},\textsf {c1})=1\), \(d(\textsf {d1},{\textsf {c2}})=1\), and \(d(\textsf {d1},\textsf {c3})=1\)) and baseline probabilities. \(d_1\) is Poptimal. The right part presents the impact of modifying probabilities on the expected payoff of these three strategies. The horizontal axis presents the deviation in the probability of selecting the upper edge (leading to t1, t3, and t5, respectively); the probability for the lower edge changes residually. We independently set the individual deviations for \(d_1\), \(d_2\), and \(d_3\), but we decide jointly on the range of feasible deviations (the width of a shaded region). Expected payoffs are denoted with solid, dashed, and dotted lines for \(d_1\), \(d_2\), and \(d_3\), respectively. If we allow the probabilities to vary in the range of \(\varepsilon _1\approx 3.63\%\), then \(d_1\) remains Poptimal, while beyond this range it may start losing to \(d_2\) (when \(p(\textsf {c1},\textsf {t1})\) and \(p({\textsf {c2}},{\textsf {t3}})\) are decreased, marked with a thin horizontal line). If the decision maker is confident that the initial probabilities are imprecise by no more than 3.63%, then \(d_1\) is definitely a good choice.
If the probabilities may vary by more than 3.63%, then \(d_1\) may not be Poptimal. Then, the decision maker may prefer to make a safe choice, i.e. select a strategy that offers the greatest expected payoff in the case of the most unfavorable perturbation. Obviously, for perturbations within \(\varepsilon _1\), \(d_1\) is such a strategy (cf. Remark 5). As Fig. 5 shows, for deviations smaller than \(\varepsilon _3\approx 26.67\%\), \(d_1\) remains \(P_{\min ,\varepsilon }\)optimal. Only when we allow a larger deviation, may the possible expected payoff of \(d_1\) be worse than the worst possible expected payoff of \(d_3\); hence, \(d_1\) ceases to be the safest choice.
One more remark is due. The results of the stability analysis and \(P_{\cdot ,\varepsilon }\)optimality depend strongly on how the decision problem is structured. For instance, we can split a single edge stemming out of a chance node into two edges and assign them half of the original probability. Nothing has changed in terms of the base case representation of the problem. However, a given \(\varepsilon \) now effectively allows for twice as large a variation in the probability of the event (before split), and so the results of all the analyses may change. On the one hand, this may be perceived as a disadvantage of the proposed methods but, on the other hand, we would argue that selecting a particular way of representing the problem apparently provides insight into how a decision maker perceives the distinct uncertainties, events, and imprecise probabilities. It is therefore not surprising that changes in perception should be reflected in changes in the results of SA.
4 Sensitivity analysis in nonseparable trees
In nonseparable trees, the probabilities in various chance nodes cannot be perturbed independently, which represents a challenge for the algorithms presented above. Backward induction would not yield the correct results, e.g. in Fig. 2, when assessing \(P_{\min ,\varepsilon }\)optimality (assuming this tree is a part of some decision), backward induction would increase \(p({\textsf {c2}},\textsf {t1})\) (i.e. Bob present) in \({\textsf {c2}}\) and at the same time increase \(p(\textsf {c3},\textsf {t4})\) (i.e. Bob not present) in \(\textsf {c3}\).
As mentioned in the last part of Sect. 2, this could be modeled in terms of some additional restrictions on the \(p(\cdot )\) function but we find it more intuitive to assume that the probabilities reflected in the tree are derived from some more primitive probabilities, assessed probabilities (denoted with capital P), which themselves are separable and represent discrete distributions. In the case illustrated in Fig. 2, this assumption would imply using two assessed probabilities: \(P(\text {Alice present})\) and \(P(\text {Alice not present})\) (P(A) and (\(P(\lnot A)\) in short, used to define \(p(\cdot )\) in \(\textsf {c1}\)) and \(P(\text {Bob present})\) and \(P(\text {Bob not present})\) (P(B) and \(P(\lnot B)\), similarly used to define \(p(\cdot )\) in \({\textsf {c2}}\) and \(\textsf {c3}\)).
We now suggest defining \(s(\cdot )\) and calculating \(\varepsilon \)deviations in the space of assessed probabilities. This approach requires redefining the notions introduced in Sect. 3 into the space of assessed probabilities. Hence, we require that assessed values are indeed probabilities, rather than arbitrary parameters and that they represent discrete distributions (in a sense, virtual chance nodes). For example, for the distribution function \(F_A\), representing the fact of Alice being present or not, we have two assessed probabilities P(A) and \(P(\lnot A)\). For concrete values of \(s(F_A)\) and \(\varepsilon \), we have the constraint that neither of these probabilities in SA can diverge from the initial values by more than \(s(F_A)\varepsilon \).
A new problem arises in the nonseparable case, since the expected payoff is in general no longer convex in the space of assessed probabilities. In our example from Fig. 2 it amounts to 15 for \(P(A)=\frac{1}{2}\), \(P(B)=\frac{1}{2}\), and to 20 for \(P(A)=1\), \(P(B)=0\) and \(P(A)=0\), \(P(B)=1\), and to 10 for \(P(A)=1\), \(P(B)=1\) and \(P(A)=0\), \(P(B)=0\). Thus, if we want to find a pessimistic or an optimistic evaluation of a given strategy, d, and \(\varepsilon \), we have to use an algorithm that takes into account that there might be multiple local minima of the expected payoff. In our implementation, we use a simple grid search over assessed probabilities but for large trees a more efficient algorithm might be needed (e.g. a genetic algorithm). In consequence, looking for a \(P_{\min ,\varepsilon }\)optimal or a \(P_{\max ,\varepsilon }\)optimal strategy is more difficult than in the separable case: an exhaustive search over all strategies in a decision tree is required and for a single considered strategy a global optimization has to be performed. Determining the \(P_{\text {mode},\varepsilon }\)optimal strategy remains straightforward: it suffices to perturb assessed probabilities using the softmax rule and to calculate the optimal strategy in the modified tree.
Let us examine how the proposed methodology works in a more complicated, nonseparable case. Consider an investor owning a plot of land, possibly (a priori probability amounting to 70%) hiding shale gas layers. The plot can be sold immediately (800, all prices in $’000). The investor can build a gas extraction unit for a cost of 300. If gas is found, the profit will amount to 2,500 (if not, there will be no profit, and no possibility of selling the land). Geological tests can be performed for a cost of 50, and will produce either a positive or a negative signal. The sensitivity amounts to 90%, and the specificity amounts to 70%. The installation can be built after the test or the land may be sold for 1,000 (600) after a positive (negative) test result.
As mentioned in Sect. 2, representing this problem from the decision maker’s perspective requires transforming the above probabilities. Observe that the probabilities, as given in the text above, are not logically interrelated, so they can be modified without forcing other probabilities to be changed also. Thus, they form the assessed probabilities, namely: \(P(\text {gas})\), \(\text {sensitivity}\), and \(\text {specificity}\). All three probabilities represent binary distributions; in order to simplify the notation further, we propose performing a sensitivity analysis of these probabilities (however, it should be remembered that the complements of these probabilities are also assessed probabilities). We keep in mind that in general we perform the sensitivity analysis on discrete distributions—this would be important, if we had more than two possible outcomes in a distribution represented by assessed probabilities.
It is Poptimal to perform the test and build the gas extraction system only when the result is positive (sell the land otherwise). On the bottom Fig. 7 we can see the stability and perturbation analysis assuming that sensitivity and specificity values are known exactly (i.e. \(s(\text {specificity})=0\) and \(s(\text {sensitivity})=0\)). The dark shaded regions illustrate the boundary for stability \(\varepsilon _1\approx 3.42\%\). For the mode \(P_{\text {mode},\varepsilon }\)optimality and \(P_{\max ,\varepsilon }\)optimality perturbation the same epsilon value is valid; for larger \(\varepsilon \) the optimal strategy changes to dig (the right side of the dark area). The \(P_{\min ,\varepsilon }\)optimality perturbation does not change the optimal strategy up to \(\varepsilon _2\approx 39.59\%\); for larger \(\varepsilon \) the optimal strategy is to sell (the left side of the lightgray area).
Now let us assume that the investor does not know the exact values of sensitivity and specificity for the existence of gas test, although this uncertainty is quite low. Specifically, we assume \(s(P(\text {gas}))=1\), \(s(\text {specificity})=0.1\), and \(s(\text {sensitivity})=0.1\). The stability of the base optimal strategy (test: sell if negative, dig if positive) is \(2.90\%\). It is natural that the stability has decreased in comparison to the previous scenario as we allow sensitivity and specificity to be perturbed and they do not affect the immediately dig strategy. \(P_{\text {mode},\varepsilon }\) and \(P_{\max ,\varepsilon }\) perturbations in the range \(\varepsilon \in [0\%, 4.16\%[\) do not change the Poptimal strategy while for the \(\varepsilon \in ]4.16\%,100\%]\) the optimal strategy is to immediately dig. Again, this result (wider interval for base optimal strategy) might have been expected, since a favorable perturbation of sensitivity and specificity increases their values and thus makes the base optimal strategy more attractive. Finally, for the \(P_{\min ,\varepsilon }\) perturbation in the range \(\varepsilon \in [0\%, 37.14\%[\), the optimal strategy does not change while for \(\varepsilon \in ]37.14\%, 100\%]\) the optimal strategy is to immediately sell. Similarly to previous perturbation methods, the decrease of the interval width for the base decision follows the fact that sensitivity and specificity do not affect the immediately sell strategy.
5 Concluding remarks
In the paper, we presented a framework for performing SA in decision trees when some probabilities are not known precisely. In this framework, we tried to encompass what managers declared to be of interest when analyzing decision problems with uncertainty: verifying the impact of modifying probabilities on the decision remaining optimal, thinking in terms of unfavorable/favorable perturbations, or thinking in terms of most likely outcomes. All these approaches can be unified in a single model, calculated in the software we provide, and illustrated for sample cases (see Fig. 5). We found that it is crucial whether the probabilities in the tree can be set independently between various chance nodes, i.e. whether a tree is separable. If not, then a more complicated approach needs to be taken to define the model, and additionally more complex algorithms to perform SA need to be used.
Our approach to SA allows the decision maker to examine the advantages and disadvantages of the available decision alternatives from several angles. Figure 5 nicely illustrates how various approaches to SA can yield different answers. As with all the decision support tools—it is the decision maker who needs to make the final decision and is responsible for it.
As mentioned in the introduction, the methods we suggest can be linked to ideas discussed, e.g. by Huntley and Troffaes (2012): \(\varGamma \)maximin, maximality/Eadmissibility, and interval dominance. Hence, \(P_{\min }\)optimality is directly equivalent to \(\varGamma \)maximin. Still, the difference is that rather than treating the set of probability distributions as given exogenously, we build it endogenously instead, verifying how large it can be (in terms of Eq. (2), around the baseline probabilities) for the basecase Poptimal strategy to be \(P_{\min }\)optimal.
The stability index defines a set of probability distributions in which the Poptimal strategy is the only maximal and the only Eadmissible one. Again, in our approach we do not use maximality/Eadmissibility to make a choice for a given set of probabilities but instead define the strength of the Poptimal strategy by looking at how imprecise the original probabilities can be for this strategy to remain the only maximal/Eadmissible one. In our approach, the difference between maximality and Eadmissibility is inconsequential.
The ideas presented in the present paper also relate to the informationgap theory proposed and developed by BenHaim (2001), where one measures for each strategy how much uncertainty is allowed (i.e., how large a deviation of model parameters is allowed) for the considered strategies to definitely offer a payoff greater than some assumed minimal level (i.e. the robustness), or in another approach: how much uncertainty is needed to make it possible for the strategies to offer some assumed desired outcome (i.e. the opportuneness). Both approaches, ours and BenHaim’s, are local, i.e., there is some baseline value of uncertain parameters from which the deviations are considered (and not simply a family of possible parameterizations is considered). Also, in both approaches no probability distributions are assigned to the deviations. Lastly, we consider both the unfavorable and favorable deviations, as analogs of robustness and opportuneness, respectively. Nevertheless, there are important differences. Firstly, we apply our ideas specifically to decision trees; hence, the contribution of the present paper also lies in how the ideas are implemented in that particular context. Secondly, the line of thinking in the informationgap approach goes from the desired outcome (e.g., minimal required outcome) to the amount of uncertainty (guaranteeing this threshold is exceeded), while we treat the amount of ambiguity related to parameters as the starting point and proceed towards the recommended strategy (and, e.g., the minimal guaranteed outcome). We find treating the ambiguity as primitive and the resulting satisfaction as an outcome to be more intuitive and to follow the causeeffect path. Thirdly, we also present our own additional extensions to the SA (e.g., mode favoring deviations).
There are some limitations to the present study and pertinent ideas for further research. We used a very simple definition of the distance between two trees, cf. Eq. (2). This metric allows multiple probabilities to differ simultaneously from their baseline values, not aggregating individual differences to reflect that overall the set of probabilities changed substantially (as, for example, the sum of absolute deviations would do). We would maintain that as long as the probabilities in various chance nodes are unrelated (i.e. we consider separable trees), this is a desired feature. The decision maker can express the degree of imprecision (and possibly differentiate it between various nodes with \(s(\cdot )\)) but this imprecision can simultaneously affect several chance nodes: being more incorrect in one chance node does not increase the precision of knowing the true probabilities in some other chance node. Moreover, such a definition is simple and intuitive to understand for the decision makers.
We only calculate the stability index of the optimal strategy. At first glance, it may be of interest to know how stable the secondoptimal strategy is, especially if the stability index of the optimal strategy is small. Should the stability index of the second best one (somehow defined) be large, we might be tempted to select it, since it looks robust, even if only secondoptimal. For example, assume the stability index for the firstoptimal \(d_1\) equals 0.02 (i.e. 2 pp), and for the second optimal \(d_2\) it amounts to as much as 0.3 (i.e. 30 pp). The true interpretation, however, would be the following. If baseline probabilities sufficiently approximate the true ones (within 2 pp), then \(d_1\) is sure to maximize the expected payoff (be optimal). If the imprecision is larger than 2 pp (but smaller than 30 pp), then \(d_1\) might not be optimal (\(d_2\) might be); yet \(d_1\) might still be optimal for deviations larger than 2 pp! These stability indices guarantee that if the deviation is within 30 pp, then either \(d_1\) or \(d_2\) is optimal, but there is still no reason to favor \(d_2\) over \(d_1\). This is also related to the fact that in our research we focused on decision sensitivity (Nielsen and Jensen 2003), and not value sensitivity, i.e. we analyze when the optimal strategy changes with varying input, not by how much the payoff is reduced.
If the decision maker is concerned with a possibly greater negative impact of perturbations on one strategy and wants to select a safe (even if not optimal) strategy, then \(P_{\min ,\varepsilon }\)optimality is the appropriate concept. Figure 5 nicely shows that even if the optimal strategy has a relatively small stability index, it is still very safe for large deviations, i.e., it offers the highest guaranteed (worstcase) expected value. Observe that putting greater weight on less favorable outcomes (for each strategy separately) may also be interesting when no ambiguity is present. It may be taken to represent risk aversion somewhat similarly to rankdependent utility models, in which we reweight the probabilities, overweighting the extremely unfavorable and underweighting the extremely favorable ones—see Quiggin (1982). In the case of \(P_{\min ,\varepsilon }\)optimality, for sufficiently small \(\varepsilon \) we only reweight two single outcomes (the most unfavorable and the most favorable one) in a given chance node, but this reweighting builds up over multiple chance nodes in a nontrivial way (e.g. perturbed probabilities in one chance node being multiplied by perturbed probabilities in another chance node). Another approach to model risk aversion would be to transform payoffs to von Neumann–Morgenstern utilities (we would have to attribute the utilities only up to the edges just before the terminal nodes or account for the fact that the marginal utility is diminishing as payoffs aggregate along the paths in the tree).
Footnotes
 1.
Using function d defined for a larger tree DT also for subtrees, e.g. \({DT(e_2)}\), causes no problems.
Notes
Acknowledgements
We thank the participants of the CEMBA and MBASGH courses whom we surveyed. The figures of the trees in the paper were done with SilverDecisions software. Hence, we also express our gratitude to the members of our SilverDecisions team: Michał Wasiluk (developer) and Marcin Czupryna, Timothy Harrell and Anna Wiertlewska (testing and documentation). The SilverDecisions development process has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant agreement No. 645860 as a part of ROUTETOPA project http://routetopa.eu/ to support decision processes visualization within the Social Platform for Open Data (http://spod.routetopa.eu/).
References
 Barker K, Wilson K (2012) Decision trees with single and multiple intervalvalued objectives. Decis Anal 9(4):348–358CrossRefGoogle Scholar
 BenHaim Y (2001) Informationgap theory: decisions under severe uncertainty. Academic Press, LondonGoogle Scholar
 BenTal A, Nemirovski A (2000) Robust solutions of linear programming problems contaminated with uncertain data. Math Program 88(3):411–424CrossRefGoogle Scholar
 BenTal A, Ghaoui LE, Nemirovski A (2009) Robust optimization. Princeton University Press, PrincetonCrossRefGoogle Scholar
 Bhattacharjya D, Shachter RD (2012) Sensitivity analysis in decision circuits. In: Proceedings of the 24th conference on uncertainty in artificial intelligence (UAI), pp 34–42Google Scholar
 Borgonovo E, Marinacci M (2015) Decision analysis under ambiguity. Eur J Oper Res 244:823–836CrossRefGoogle Scholar
 Borgonovo E, Tarantola S (2012) Advances in sensitivity analysis. Reliab Eng Syst Saf 107:1–2 (special issue on advances in sensitivity analysis)CrossRefGoogle Scholar
 Briggs A, Claxton K, Sculpher M (2006) Decision modelling for health economic evaluation (handbooks in health economic evaluation). Oxford University Press, OxfordGoogle Scholar
 Briggs A, Weinstein M, Fenwick E, Karnon J, Sculpher M, Paltiel A (2012) The ISPORSMDM modeling good research practices task force model parameter estimation and uncertainty analysis a report of the isporsmdm modeling good research practices task force working group6. Med Decis Mak 32(5):722–732CrossRefGoogle Scholar
 Cao Y (2014) Reducing intervalvalued decision trees to conventional ones: comments on decision trees with single and multiple intervalvalued objectives. Decis Anal. doi: 10.1287/deca.2014.0294
 Clemen RT, Reilly T (2001) Making hard decisions with decisiontools®. SouthWestern Cengage Learning, MasonGoogle Scholar
 Delage E, Yinyu Y (2010) Distributionally robust optimization under moment uncertainty with application to datadriven problems. Oper Res 58:595–612CrossRefGoogle Scholar
 Eidsvik J, Mukerji T, Bhattacharjya D (2015) Value of information in the Earth sciences. Integrating spatial modeling and decision analysis. Cambridge University Press, CambridgeCrossRefGoogle Scholar
 French S (2003) Modelling, making inferences and making decisions: the roles of sensitivity analysis. Sociedad de Estadística e Investigación Operutiva Top 11(2):229–251Google Scholar
 Howard RA (1988) Decision analysis: practice and promise. Manag Sci 34(6):679–695CrossRefGoogle Scholar
 Høyland K, Wallace SW (2001) Generating scenario trees for multistage decision problems. Manag Sci 47(2):295–307CrossRefGoogle Scholar
 Huntley N, Troffaes MC (2012) Normal form backward induction for decision trees with coherent lower previsions. Ann Oper Res 195(1):111–134CrossRefGoogle Scholar
 Jaffray JY (2007) Rational decision making with imprecise probabilities. In: 1st International symposium of imprecise probability: theory and applications, pp 1–6Google Scholar
 Jeantet G, Spanjaard O (2009) Optimizing the Hurwicz criterion in decision trees with imprecise probabilities. In: Rossi F, Tsoukias A (eds) Algorithmic decision theory, lecture notes in computer science, vol 5783. Springer, Berlin, Heidelberg, pp 340–352, doi: 10.1007/9783642044281_30
 Kouvelis P, Yu G (2013) Robust discrete optimization and its applications, vol 14. Springer Science & Business Media, BerlinGoogle Scholar
 Kullback S, Leibler R (1951) On information and sufficiency. Ann Math Stat 22(1):79–86CrossRefGoogle Scholar
 Lee A, Joynt G, Ho A, Keitz S, McGinn T, Wyer P (2009) The EBM teaching scripts working group tips for teachers of evidencebased medicine: making sense of decision analysis using a decision tree. J Gen Intern Med 24(5):642–648CrossRefGoogle Scholar
 Machina M (1987) Choice under uncertainty: problems solved and unsolved. J Econ Perspect 1(1):121–154CrossRefGoogle Scholar
 Nielsen TD, Jensen FV (2003) Sensitivity analysis in influence diagrams. IEEE Trans Syst Man Cybern Part A Syst Hum 33(1):223–234CrossRefGoogle Scholar
 Quiggin J (1982) A theory of anticipated utility. J Econ Behav Organ 3:323–343CrossRefGoogle Scholar
 Saltelli A, Chan K, Scott EM (2009) Sensitivity analysis. Wiley, New YorkGoogle Scholar
 Tierney L (1996) Introduction to general statespace markov chain theory. In: Gilks WR, Richardson S, Spiegelhalter DJ (eds) Markov chain Monte Carlo in practice. Chapman and Hall, London, pp 59–74Google Scholar
 Walley P (1991) Statistical reasoning with imprecise probabilities. Chapman and Hall, LondonCrossRefGoogle Scholar
 Waters D (2011) Quantitative methods for business, Fifth edn. Pearson Education Limited, LondonGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.