In explorable uncertainty, we are given, instead of precise data points, only rough information in form of uncertainty intervals. Precise data points can be revealed by using queries. Since querying data points comes at a cost, the goal is to extract sufficient information to solve the problem while minimizing the total query cost. In the following, we formally introduce the model of explorable uncertainty and highlight important concepts for solving problems under explorable uncertainty.
3.1 The Model
We are given a ground set \(\mathcal {I}\) of uncertainty intervals. Each interval \(I_i \in \mathcal {I}\) is associated with a precise value w
i ∈ I
i that is initially unknown. The precise value of an uncertainty interval I
i can be extracted by using a query. Intuitively, querying the interval I
i = (L
i, U
i) replaces the open interval (L
i, U
i) with the singleton [w
i]. We call L
i and U
i the lower and upper limits of I
i. How to obtain the upper and lower limits of the uncertainty intervals is problem specific and depends on the application. As an example consider the distances between mobile agents. While the agents change their positions and the precise distance between two agents might not always be known, last known locations as well as maximum movement speeds can be used to compute an uncertainty interval that is guaranteed to contain the precise distance. In the following, we abstract from the process of obtaining the uncertainty intervals and assume they are part of the input. If I
i = [w
i], we define L
i = U
i = w
i. A query to an interval I
i comes with a query cost of c
i. For the remainder of this chapter, we only consider uniform query costs, i.e., c
i = 1 for all \(I_i \in \mathcal {I}\).
We can define various optimization problems based on the ground set of uncertainty intervals. For each problem, the goal is to extract sufficient information to solve the problem for a fixed but initially unknown realization of precise values, while minimizing the total query costs. In the case of uniform query costs, the total cost is just the number of queried intervals. A query set
\(Q \subseteq \mathcal {I}\) is feasible if querying Q extracts sufficient information to optimally solve the problem at hand. Thus, a query set Q is only feasible if querying Q allows us to compute a solution for the underlying optimization problem that is guaranteed to be optimal for all possible precise values of intervals in \(\mathcal {I}\setminus Q\). We further discuss this assumption at the end of the chapter. We analyze the performance of algorithms in terms of their competitive ratio. Let \(\mathcal {J}\) denote the set of all instances for a problem under explorable uncertainty, let alg(J) for \(J \in \mathcal {J}\) denote the query cost needed by an algorithm alg to solve instance J, and let opt(J) denote the optimal query cost for solving J. That is, for a fixed instance J with fixed precise values, opt(J) denotes the minimum query cost necessary to solve J. Then, the competitive ratio of alg is defined as
$$\displaystyle \begin{aligned}\max_{J\in\mathcal{J}} \frac{\mbox{ALG}(J)}{\mbox{{OPT}}(J)}.\end{aligned}$$
In the following, we introduce two example problems under explorable uncertainty.
3.1.1 Example: Minimum and Selection Problems
In the minimum problem, the goal is to determine, for a given set of uncertainty intervals \(\mathcal {I}\), an interval \(I_i \in \mathcal {I}\) with minimum precise value, i.e., \(I_i = \arg \min _{I_i \in \mathcal {I}} w_i\). Note that this problem does not necessarily involve computing the actual precise value of that interval.
As an example recall the scenario given in Sect. 2, where a company has to select the “best” out of a pool of possible sub-contractors, facility locations, transportation means, etc. without having all the information to determine it. This scenario can be modeled as a minimum problem: the possible choices can be modeled by the index set {1, …, n}. For each possible choice i ∈{1, …, n}, we have an initial estimation of its quality (based, e.g., on publicly available information, past experiences, and already known basic conditions) that can be modeled by the uncertainty interval I
i. A precise estimation for a possible choice can be obtained, e.g., by executing measurements, lab tests, or customer interviews. Then, the process of obtaining a precise estimation for a possible choice can be modeled by a query. As the described operations come typically at a high cost, the goal is to make the best possible choice while minimizing this extra cost. This corresponds to the minimum problem.
Since the precise values are initially unknown, it might not be possible to find the interval of minimum precise value without executing queries. For example, in Fig. 1, we are given a set of two uncertainty intervals with the task to determine the interval with minimum precise value. Since those intervals overlap, both of them could possibly be of minimum precise value. To solve the problem, an algorithm has to execute at least one query.
The example of Fig. 1 also shows that no algorithm is better than 2-competitive for the minimum problem, as Kahan (1991) observed already in his seminal paper. By definition, for an algorithm to be better than 2-competitive, the ratio between alg(J) and opt(J) has to be strictly smaller than 2 for every instance J. In the example, we consider two instances with the same intervals that differ only in the precise values (crosses vs. circles). Since an algorithm has no knowledge of the precise values, both instances look the same to the algorithm, and thus, a deterministic algorithm will make the same first query for both instances. We argue that each possible first query will lead to a ratio of 2 for at least one of the instances, which implies that no deterministic algorithm is better than 2-competitive. In such a worst-case analysis, we may assume that different precise values are revealed for different algorithms. (In general, the precise values are independent of the query order.) If an algorithm queries I
1 first, then, in the worst case, the green circle is revealed as the precise value of I
1. After querying I
1, it is still unknown which interval has minimum value, which forces the algorithm to also query I
2. If the query to I
2 again reveals the green circle as the precise value of I
2, an optimal query set could determine that I
1 has minimum precise value by only querying I
2. Thus, the cost of the algorithm is twice the cost of the optimal query set. Vice versa, if an algorithm queries I
2 first, then, in the worst case, the red crosses are revealed as precise values, and the algorithm queries {I
1, I
2}, while the optimal query set queries only I
1. Hence, for any algorithm’s choice on this instance (either query I
1 first or I
2), there is a realization of precise values on which the algorithm requires two queries, whereas an optimal query set with one query exists. This implies that no deterministic algorithm (an algorithm that makes the same decisions when given the same input) can be better than 2-competitive.
In a more general variant of the minimum problem, we are given a family \(\mathcal {S}\) of (possibly non-disjoint) subsets of \(\mathcal {I}\), and the goal is to determine the member of minimum precise value for each subset \(S_j \in \mathcal {S}\). Consider the example given in Bampis et al. (2021) concerning a multi-national medical company. The company relies on certain products for its operation in each country, e.g., a chemical ingredient or a medicine. However, due to different approval mechanisms, the concrete products that are accessible differ for each country. The task is to find the best approved product for each country. The product quality can be determined by extensive tests in a lab (queries) and, since the quality is independent of the country, each product has to be tested at most once. The set of products available in one country corresponds to a set in \(\mathcal {S}\), and the problem of identifying the best product in each country is the minimum problem in multiple sets.
In a similar way, we can model other selection problems, e.g., finding the kth smallest element and sorting.
3.1.2 Example: Minimum Spanning Tree Problem
In the minimum spanning tree (MST) problem, we are given a weighted, undirected, connected graph G = (V, E), with nodes V and edges E, where each edge e ∈ E has associated a weight w
e ≥ 0. The task is to find a spanning tree of minimum total weight. A spanning tree is a connected acyclic graph whose edges span all the vertices. See Fig. 2 for an example graph. The MST problem has various applications, e.g., in the design of distribution networks: nodes can be used to model storage facilities, manufacturers, and transportation systems, while the edges and their weights can model the cost of establishing a direct connection between two such points of interest, where a direct connection could, for example, be a road, a pipeline, or an Ethernet connection. To establish connections between all points of interest in a cost-minimal way, we have to compute a minimum spanning tree.
In the MST problem with uncertainty, the precise edge weights w
e are unknown. Each edge e ∈ E is associated with an uncertainty interval \(I_e \in \mathcal {I}\), and w
e is guaranteed to be in the given interval I
e. The task is to find an MST in the uncertainty graph G for an a priori unknown realization of edge weights. Note that this problem does not necessarily involve computing the actual MST weight. In the application given above, uncertainty could arise from an unknown existing infrastructure or unclear environmental and political factors. For example, the exact existing underground infrastructure might be unknown and potentially decrease the cost of building a connection, and the building of a pipeline could lead to conflicts with environmental protection groups or nearby residents that might increase the cost. These dynamic changes in the cost can be modeled by uncertainty intervals. Such uncertainties can then be resolved by inspecting the existing infrastructure or surveying residents and other potential stakeholder; both actions can be modeled by queries. Since the described actions to resolve the uncertainty can be cost extensive, the goal is to find an MST while minimizing the query cost.
It is well known that edges that have unique minimum weight in a cut of the graph are part of any MST. Furthermore, edges that have unique maximum weight on a cycle are part of no MST. Thus, to solve the MST problem under explorable uncertainty, we have to analyze the behavior of intervals and queries in terms of their interplay on cycles and in cuts. A simple cycle with three edges (triangle) already gives both lower bound examples and insights about the structure of a feasible query set. Consider the example of Fig. 3. It is clear that edge h is part of every MST, but we cannot decide which of the two edges f and g is in the MST without querying at least one of them. Similar to the lower bound example for the minimum problem, querying f first, in the worst case, reveals the green circles as precise weights, while querying g first reveals the red crosses. This forces any deterministic algorithm to query two elements, while the optimal query set contains just one. Thus, as was observed in Erlebach et al. (2008), no such algorithm can achieve a competitive ratio smaller than 2.
3.2 Mandatory Queries
A key aspect of several algorithms for problems under explorable uncertainty is the identification of mandatory queries. An interval \(I_i \in \mathcal {I}\) is mandatory for the problem instance if each feasible query set has to query I
i, i.e., I
i ∈ Q for all feasible query sets Q. The identification of mandatory queries is important since an algorithm can query such intervals without ever worsening its competitive ratio. In a sense, mandatory queries allow an algorithm to extract new information “for free.” The revealed precise values might in turn allow the identification of further mandatory queries and, thus, lead to chains of mandatory queries. While it is possible to achieve theoretical worst-case guarantees without exploiting mandatory elements, empirical results indicate that the performance of algorithms significantly improves when the algorithm prioritizes the identification and querying of mandatory intervals (Erlebach et al. 2020; Focke et al. 2020).
When characterizing mandatory queries, we distinguish between characterizations based on the unknown precise values and characterizations that are only based on the uncertainty intervals. While the latter only uses information that can be accessed by an algorithm and, therefore, can actually be used to identify mandatory queries, the former is still helpful to analyze algorithms and will be useful in the following sections. We continue by characterizing mandatory queries for the two example problems.
3.2.1 Identifying Mandatory Queries for the Minimum Problem
Consider the minimum problem in multiple sets as introduced in the previous section. For a set \(S \in \mathcal {S}\), we call an interval I
i ∈ S the precise minimum of S if I
i has minimum precise value over all elements of S. The following lemma allows us to identify mandatory queries based on the precise values of the intervals.
Lemma 1 (Erlebach et al. (2020))
An interval I
i
is mandatory for the minimum problem if and only if (a) I
i
is a precise minimum of a set S and contains w
j
of another interval I
j ∈ S ∖{I
i} (in particular, if I
j ⊆ I
i
), or (b) I
i
is not a precise minimum of a set S with I
i ∈ S but contains the value of the precise minimum of S.
A common proof technique to show that an interval I
i is mandatory is to consider the query set \(\mathcal {I}\setminus \{I_i\}\). Showing that querying every element except I
i does not solve the problem implies that I
i is mandatory. Vice versa, if querying \(\mathcal {I}\setminus \{I_i\}\) solves the problem, then I
i is not mandatory. The following proof, which was given in Erlebach et al. (2020), uses this proof technique to show Lemma 1.
Proof
If I
i is the precise minimum of S and contains w
j of another interval I
j ∈ S, then S cannot be solved even if we query all intervals in S ∖{I
i}, as we cannot prove w
i ≤ w
j or w
j ≤ w
i. If I
i is not a precise minimum of set S with I
i ∈ S and contains the precise minimum value w
∗, then S cannot be solved even if we query all intervals in S ∖{I
i}, as we cannot prove that w
∗≤ w
i.
If I
i is the precise minimum of a set S, but w
j∉I
i for every I
j ∈ S ∖{I
i}, then S ∖{I
i} is a feasible query set for S. If I
i is not a precise minimum of a set S and does not contain the precise minimum value of S, then again S ∖{I
i} is a feasible query set for S. If every set S that contains I
i falls into one of these two cases, then querying all intervals except I
i is a feasible query set for the whole instance. □
Explicitly, Lemma 1 only enables us to identify mandatory intervals given full knowledge of the precise values, but it also implies criteria to identify known mandatory intervals, i.e., intervals that are known to be mandatory given only the intervals, and precise values revealed by previous queries. We call an interval leftmost in a set S if it is an interval with minimum lower limit in S. The following corollary follows from Lemma 1 and gives a characterization of known mandatory intervals.
Corollary 1 (Erlebach et al. (2020))
If the leftmost interval I
l
in a set S contains the precise value of another interval in S, then I
l
is mandatory. In particular, if I
l
is leftmost in S and I
j ⊆ I
l
for some I
j ∈ S ∖{I
l}, then I
l
is mandatory.
3.2.2 Identifying Mandatory Queries for the Minimum Spanning Tree Problem
Mandatory queries for the MST problem can be characterized by using a structural property given by Megow et al. (2017). Let the lower limit tree
T
L ⊆ E be an MST for values w
L with \(w^L_e = L_e + \epsilon \) for an infinitesimally small 𝜖 > 0. Similarly, let the upper limit tree
T
U be an MST for values w
U with \(w^U_e = U_e - \epsilon \). Using the lower and upper limit trees, the following lemma allows us to identify mandatory queries based only on the intervals.
Lemma 2 (Megow et al. (2017))
Any edge in T
L ∖ T
U
is mandatory.
Thus, we may repeatedly query edges in T
L ∖ T
U until T
L = T
U, and this will not worsen the competitive ratio. By this preprocessing, we may assume T
L = T
U. A characterization of the mandatory queries based on the full knowledge of the precise values is given by Erlebach and Hoffmann (2014).
3.3 Methods and Results
While the identification and querying of mandatory elements improve the performance of algorithms empirically and will be a key ingredient in the following sections, it is not sufficient to solve our two example problems. Therefore, we consider the witness set algorithm, one of the most important frameworks in explorable uncertainty. The witness set algorithm was introduced by Bruce et al. (2005) and relies on the identification of witness sets. A set \(W \subseteq \mathcal {I}\) is a witness set if each feasible query set has to query at least one member of W, i.e., if W ∩ Q ≠ ∅ for all feasible query sets Q. Note that witness sets W with |W| = 1 are exactly the mandatory queries. Algorithm 1 formulates the witness set algorithm in a problem independent way. The algorithm essentially just queries witness sets until the problem is solved. Similar to mandatory queries, we distinguish between witness sets that can be identified based on the uncertainty intervals alone and witness sets that can only be identified based on knowledge of the precise values. The algorithm can only use the former kind.
Algorithm 1: Abstract formulation of the witness set algorithm
The competitive ratio of the witness set algorithm depends on the size of the queried witness sets as formulated in the following lemma.
Lemma 3 (Bruce et al. (2005))
If |W|≤ ρ holds for all witness sets W that are queried by the witness set algorithm, then the algorithm is ρ-competitive.
Proof
Since querying elements multiple times does not reveal additional information, we can assume that all queried witness sets are pairwise disjoint. Let W
1, …, W
k denote those witness sets. Then, by definition of witness sets and since the sets are pairwise disjoints, the optimal query set contains at least k elements. By assumption, |W
j|≤ ρ holds for all j ∈{1, …, k}. Thus, the algorithm queries at most ρ ⋅ k elements, and the competitive ratio is at most \(\frac {\rho \cdot k}{k} = \rho \). □
In order to apply (and analyze) the witness set algorithm to a concrete problem, one has to characterize witness sets, bound the size of the witness sets, and show that the problem is solved once the characterization does not admit any more witness sets. In the following, we apply the algorithm to the two example problems.
3.3.1 Witness Set Algorithm for the Minimum Problem
For the minimum problem, we can identify witness sets of size one, i.e., mandatory queries, by using Corollary 1. Furthermore, we can identify witness sets of size two using the following lemma that was first (implicitly) shown by Kahan (1991).
Lemma 4 (Kahan (1991))
A set
\(\{I_i, I_j\} \subseteq \mathcal {I}\)
is a witness set if there exists an
\(S \in \mathcal {S}\)
with {I
i, I
j}⊆ S, I
i ∩ I
j ≠ ∅, and either I
i
or I
j
leftmost in S.
Similar to the proof of the mandatory characterization, the lemma can be shown by considering the query set \(Q = \mathcal {I} \setminus \{I_i,I_j\}\). After querying Q, both I
i and I
j still could be of minimum precise value in S. Thus, the problem is not solved yet, and at least one of I
i and I
j needs to be queried. This is a common proof strategy for showing that a subset of \(\mathcal {I}\) is a witness set.
The witness set algorithm for the minimum problem repeatedly identifies and queries witness sets of size at most two by applying Corollary 1 and Lemma 4 until they cannot be applied anymore. If Lemma 4 cannot be applied anymore, then the leftmost interval I
i of each set S is not overlapped by any I
j ∈ S ∖{I
i}. This implies that the leftmost intervals are the precise minima of the sets. Consequently, the problem then is solved, which implies the following theorem. The theorem was first (implicitly) shown by Kahan (1991) for a single set and translates to multiple sets.
Theorem 1 (Kahan (1991))
The witness set algorithm is 2-competitive for the minimum problem. This competitive ratio is best possible for deterministic algorithms.
3.3.2 Witness Set Algorithm for the Minimum Spanning Tree Problem
For the minimum spanning tree problem, we can identify witness sets of size one by using Lemma 2. Furthermore, we can identify witness sets of size two by using the following lemma that was shown in Erlebach et al. (2008), Megow et al. (2017). Recall that T
L and T
U are the lower and upper limit trees of the instance. Let f
1, …, f
l denote the edges in E ∖ T
L ordered by non-decreasing lower limit, and let C
i be the unique cycle in T
L ∪{f
i}.
Lemma 5 (Erlebach et al. (2008))
Let i ∈{1, …, l} be the smallest index such that
\(I_{f_i} \cap I_e \neq \emptyset \)
holds for some e ∈ C
i ∖{f
i}. Then, {f
i, e} is a witness set.
The witness set algorithm for the MST problem repeatedly identifies and queries witness sets of size at most two by applying Lemmas 2 and 5 until they cannot be applied anymore. If Lemma 5 cannot be applied anymore, then each f
i does not overlap with any e ∈ C
i ∖{f
i}. This implies that each f
i is maximal in C
i and therefore not part of any MST. Thus, T
L is known to be an MST and the problem is solved. This implies the following theorem.
Theorem 2 (Erlebach et al. (2008))
The witness set algorithm is 2-competitive for the MST problem. This competitive ratio is best possible for deterministic algorithms.