What's the Over/Under? Probabilistic Bounds on Information Leakage

Quantitative information flow (QIF) is concerned with measuring how much of a secret is leaked to an adversary who observes the result of a computation that uses it. Prior work has shown that QIF techniques based on abstract interpretation with probabilistic polyhedra can be used to analyze the worst-case leakage of a query, on-line, to determine whether that query can be safely answered. While this approach can provide precise estimates, it does not scale well. This paper shows how to solve the scalability problem by augmenting the baseline technique with sampling and symbolic execution. We prove that our approach never underestimates a query's leakage (it is sound), and detailed experimental results show that we can match the precision of the baseline technique but with orders of magnitude better performance.


Introduction
As more sensitive data is created, collected, and analyzed, we face the problem of how to productively use this data while preserving privacy. One approach to this problem is to analyze a query f in order to quantify how much information about secret input s is leaked by the output f (s). More precisely, we can consider a querier to have some prior belief of the secret's possible values. The belief can be modeled as a probability distribution [9], i.e., a function δ from each possible value of s to its probability. When a querier observes output o = f (s), he revises his belief, using Bayesian inference, to produce a posterior distribution δ ′ . If the posterior could reveal too much about the secret, then the query should be rejected. One common definition of "too much" is Bayes Vulnerability, which is the probability of the adversary guessing the secret in one try [44]. Formally, Various works [5,19,25,26] propose rejecting f if there exists an output that makes the vulnerability of the posterior exceed a fixed threshold K. In particular, for all possible values i of s (i.e., δ(i) > 0), if the output o = f (i) could induce a posterior δ ′ with V (δ ′ ) > K, then the query is rejected.
One way to implement this approach is to estimate f (δ)-the distribution of f 's outputs when the inputs are distributed according to δ-by viewing f as a program in a probabilistic programming language (PPL) [18]. Unfortunately, as discussed in Section 9, most PPLs are approximate in a manner that could easily result in underestimating the vulnerability, leading to an unsafe security decision. Techniques designed specifically to quantify information leakage often assume only uniform priors, cannot compute vulnerability (favoring, for example, Shannon entropy), and/or cannot maintain assumed knowledge between queries.
Mardziel et al. [26] propose a sound analysis technique based on abstract interpretation [11]. In particular, they estimate a program's probability distribution using an abstract domain called a probabilistic polyhedron (PP), which pairs a standard numeric abstract domain, such as convex polyhedra [12], with some additional ornaments, which include lower and upper bounds on the size of the support of the distribution, and bounds on the probability of each possible secret value. Using PP can yield a precise, yet safe, estimate of the vulnerability, and allows the posterior PP (which is not necessarily uniform) to be used as a prior for the next query. Unfortunately, PPs can be very inefficient. Defining intervals [10] as the PP's numeric domain can dramatically improve performance, but only with an unacceptable loss of precision.
In this paper we present a new approach that ensures a better balance of both precision and performance in vulnerability computation, augmenting PP with two new techniques. In both cases we begin by analyzing a query using the fast interval-based analysis. Our first technique is then to use sampling to augment the result. In particular, we execute the query using possible secret values i sampled from the posterior δ ′ derived from a particular output o i . If the analysis were perfectly accurate, executing f (i) would produce o i . But since intervals are overapproximate, sometimes it will not. With many sampled outcomes, we can construct a Beta distribution to estimate the size of the support of the posterior, up to some level of confidence. We can use this estimate to boost the lower bound of the abstraction, and thus improve the precision of the estimated vulnerability.
Our second technique is of a similar flavor, but uses symbolic reasoning to magnify the impact of a successful sample. In particular, we execute a query result-consistent sample concolically [41], thus maintaining a symbolic formula (called the path condition) that characterizes the set of variable valuations that would cause execution to follow the observed path. We then count the number of possible solutions and use the count to boost the lower bound of the support (with 100% confidence).
Sampling and concolic execution can be combined for even greater precision.
We have formalized and proved our techniques are sound (Sections 3-6) and implemented and evaluated them (Sections 7 and 8). Using a privacy-sensitive ship planning scenario (Section 2) we find that our techniques provide similar precision to convex polyhedra while providing orders-of-magnitude better performance. As far as we are aware (Section 9), our approach constitutes the best balance of precision and performance proposed to date for estimating query vulnerability. Our implementation freely available at https://github.com/GaloisInc/TAMBA.

Overview
To provide an overview of our approach, we will describe the application of our techniques to a scenario that involves a coalition of ships from various nations operating in a shared region. Suppose a natural disaster has impacted some islands in the region. Some number of individuals need to be evacuated from the islands, and it falls to a regional disaster response coordinator to determine how to accomplish this. While the coalition wants to collaborate to achieve these humanitarian aims, we assume that each nation also wants to protect their sensitive data-namely ship locations and capacity.
More formally, we assume the use of the data model shown in Figure 1, which considers a set of ships, their coalition affiliation, the evacuation capacity of the ship, and its position, given in terms of latitude and longitude. 1 We sometimes refer to the latter two as a location L, with L.x as the longitude and L.y as the latitude. We will often index properties by ship ID, writing Capacity(z) for the capacity associated with ship ID z, or Location(z) for the location.
The evacuation problem is defined as follows Given a target location L and number of people to evacuate N , compute a set of nearby ships S such that z∈S Capacity(z) ≥ N .
Our goal is to solve this problem in a way that minimizes the vulnerability to the coordinator of private information, i.e., the ship locations and their exact capacity. We assume that this coordinator initially has no knowledge of the positions or capabilities of the ships other than that they fall within certain expected ranges. If all members of the coalition share all of their data with the coordinator, then a solution is easy to compute, but it affords no privacy. Figure 2 gives an algorithm the response coordinator can follow that does not require each member to share all of their data. Instead, it iteratively performs queries AtLeast and Nearby. These queries do not reveal precise values about ship locations or capacity, but rather admit ranges of possibilities. The algorithm works by maintaining upper and lower bounds on the capacity of each ship i in the array berths. Each ship's bounds are updated based on the results of queries about its capacity and location. These queries aim to be privacy preserving, doing a sort of binary search to narrow in on the capacity of each ship in the operating area. The procedure completes once is_solution determines the minimum required capacity is reached.

Computing vulnerability with abstract interpretation
Using this procedure, what is revealed about the private variables (location and capacity)? Consider a single Nearby(z, l, d) query. At the start, the coordinator is assumed to know only that z is somewhere within the operating region. If the query returns true, the coordinator now knows that s is within d units of l (using Manhattan distance). This makes Location(z) more vulnerable because the adversary has less uncertainty about it.
Mardziel et al. [26] proposed a static analysis for analyzing queries such as Nearby(z, l, d) to estimate the worst-case vulnerability of private data. If the worst-case vulnerability is too great, the query can be rejected. A key element of their approach is to perform abstract interpretation over the query using an abstract domain called a probabilistic polyhedron. An element P of this domain represents the set of possible distributions over the query's state. This state includes both the hidden secrets and the visible query results. The abstract interpretation is sound in the sense that the true distribution δ is contained in the set of distributions represented by the computed probabilistic polyhedron P .
A probabilistic polyhedron P is a tuple comprising a shape and three ornaments. The shape C is an element of a standard numeric domain-e.g., intervals [10], octagons [30], or convex polyhedra [12]-which overapproximates the set of possible values in the support of the distribution. The ornaments p ∈ [0, 1], m ∈ R, and s ∈ Z are pairs which store upper and lower bounds on the probability per point, the total mass, and number of support points in the distribution, respectively. (Distributions represented by P are not necessarily normalized, so the mass m is not always 1.) Figure 3(a) gives an example probabilistic polyhedron that represents the posterior of a Nearby query that returns true. In particular, if Nearby(z,L 1 ,D) is true then Location(z) is somewhere within the depicted diamond around L 1 . Using convex polyhedra or octagons for the shape domain would permit representing this diamond exactly; using intervals would overapproximate it as the depicted 9x9 bounding box. The ornaments would be the same in any case: the size s of the support is 41 possible (x,y) points, the probability p per point is 0.01, and the total mass is 0.41, i.e., p · s. In general, each ornament is a pair of a lower and upper bound (e.g., s min and s max ), and m might be a more accurate estimate than p · s. In this case shown in the figure, the bounds are tight.
Mardziel et al's procedure works by computing the posterior P for each possible query output o, and from that posterior determining the vulnerability. This is easy to do. The upper bound p max of p maximizes the probability of any given point. Dividing this by the lower bound m min of the probability mass m normalizes this probability for the worst case. For P shown in Figure 3(a), the bounds of p and m are tight, so the vulnerability is simply 0.01/0.41 = 0.024.

Improving precision with sampling and concolic execution
In Figure 3(a), the parameters s, p, and m are precise. However, as additional operations are performed, these quantities can accumulate imprecision. For example, suppose we are using intervals for the shape domain, and we wish to analyze the query Nearby(z, L 1 , 4) ∨ Nearby(z, L 2 , 4) (for some nearby point L 2 ). The result is produced by analyzing the two queries separately and then combining them with an abstract join; this is shown in the top row of Figure 3(b). Unfortunately, the result is very imprecise. The bottom row of Figure 3(b) illustrates the result we would get by using convex polyhedra as our shape domain. When using intervals (top row), the vulnerability is estimated as 0.036, whereas the precise answer (bottom row) is actually 0.026. Unfortunately, obtaining this precise answer is far more expensive than obtaining the imprecise one.
This paper presents two techniques that can allow us to use the less precise interval domain but then recover lost precision in a relatively cheap post-processing step. The effect of our techniques is shown in the middle-right of Figure 3(b). Both techniques aim to obtain better lower bounds for s. This allows us to update lower bounds on the probability mass m since m min is at least s min · p min (each point has at least probability p min and there are at least s min of them). A larger m means a smaller vulnerability.
The first technique we explore is sampling, depicted to the right of the arrow in Figure 3(b). Sampling chooses random points and evaluates the query on them to determine whether they are in the support of the posterior distribution for a particular query result. By tracking the ratio of points that produce the expected output, we can produce an estimate of s, whose confidence increases as we include more samples. This approach is depicted in the figure,     can safely boost the lower bound of s. This approach is depicted to the left of the arrow in Figure 3(b). The depicted shapes represent discovered path condition's disjuncts, whose size sums to 63. This is a better lower bound on s and improves the vulnerability estimate to 0.032. These techniques can be used together to further increase precision. In particular, we can first perform concolic execution, and then sample from the area not covered by this underapproximation. Importantly, Section 8 shows that using our techniques with the interval-based analysis yields an orders of magnitude performance improvement over using polyhedra-based analysis alone, while achieving similar levels of precision, with high confidence.

Preliminaries: Syntax and Semantics
This section presents the core language-syntax and semantics-in which we formalize our approach to computing vulnerability. We also review probabilistic polyhedra [26], which is the baseline analysis technique that we augment.

Core Language and Semantics
The programming language we use for queries is given in Figure 4. The language is essentially standard, apart from pif q then S 1 else S 2 , which implements probabilistic choice: S 1 is executed with probability q, and S 2 with probability 1 − q. We limit the form of expressions E so that they can be approximated by standard numeric abstract domains such as convex polyhedra [12]. Such domains require linear forms; e.g., there is no division operator and multiplication of two variables is disallowed. 2 We define the semantics of a program in terms of its effect on (discrete) distributions of states. States σ are partial maps from variables to integers; we write domain(σ) for the set of variables over which σ is defined. Distributions δ are maps from states to nonnegative real numbers, interpreted as probabilities (in range [0, 1]). The denotational semantics considers a program as a relation between distributions. In particular, the semantics of statement S, written [[S]], is a function of the form Dist → Dist; we write [[S]]δ = δ ′ to say that the semantics of S maps input distribution δ to output distribution δ ′ . Distributions are not necessarily normalized; we write ∥δ∥ as the probability mass of δ (which is between 0 and 1). We writeσ to denote the point distribution that gives σ probability 1, and all other states 0.
The semantics is standard and not crucial in order to understand our techniques. In Appendix A we provide the semantics in full, see Clarkson et al. [9] or Mardziel et al [26] for detailed explanations.

Probabilistic polyhedra
To compute vulnerability for a program S we must compute (an approximation of) its output distribution. One way to do that would be to use sampling: Choose states σ at random from the input distribution δ, "run" the program using that input state, and collect the frequencies of output states σ ′ into a distribution δ ′ . While using sampling in this manner is simple and appealing, it could be both expensive and imprecise. In particular, depending on the size of the input and output space, it may take many samples to arrive at a proper approximation of the output distribution.
Probabilistic polyhedra [26] can address both problems. This abstract domain combines a standard domain C for representing numeric program states with additional ornaments that all together can safely represent S's output distribution.
Probabilistic polyhedra work for any numeric domain; in this paper we use both convex polyhedra [12] and intervals [10]. For concreteness, we present the defintion using convex polyhedra. We use the meta-variables β, β 1 , β 2 , etc. to denote linear inequalities.
. . , β m }, interpreted conjunctively, over variables V . We write C for the set of all convex polyhedra. A polyhedron C represents a set of states, denoted γ C (C), as follows, where σ |= β indicates that the state σ satisfies the inequality β.
Probabilistic polyhedra extend this standard representation of sets of program states to sets of distributions over program states.

Definition 2.
A probabilistic polyhedron P is a tuple (C, s min , s max , p min , p max , m min , m max ). We write P for the set of probabilistic polyhedra. The quantities s min and s max are lower and upper bounds on the number of support points in the concrete distribution(s) P represents. A support point of a distribution is one which has non-zero probability. The quantities p min and p max are lower and upper bounds on the probability mass per support point. The m min and m max components give bounds on the total probability mass (i.e., the sum of the probabilities of all support points). Thus P represents the set of distributions γ P (P) defined below.
We will write domain(P) def = domain(C) to denote the set of variables used in the probabilistic polyhedron.
Note the set γ P (P) is a singleton exactly when s min = s max = #(C) and p min = p max , and m min = m max , where #(C) denotes the number of discrete points in convex polyhedron C. In such a case γ P (P) contains only the uniform distribution where each state in γ C (C) has probability p min . In general, however, the concretization of a probabilistic polyhedron will have an infinite number of distributions, with per-point probabilities varied somewhere in the range p min and p max . Distributions represented by a probabilistic polyhedron are not necessarily normalized. In general, there is a relationship between p min , s min , and m min , in that m min ≥ p min · s min (and m max ≤ p max · s max ), and the combination of the three can yield more information than any two in isolation.
The abstract semantics of S is written ⟨⟨S⟩⟩ P = P ′ , and indicates that abstractly interpreting S where the distribution of input states are approximated by P will produce P ′ , which approximates the distribution of output states. Following standard abstract interpretation terminology, ℘Dist (sets of distributions) is the concrete domain, P is the abstract domain, and γ P : P → ℘Dist is the concretization function for P. We do not present the abstract semantics here; details can be found in Mardziel et al. [26]. Importantly, this abstract semantics is sound: Proof. See Theorem 6 in Mardziel et. al [26].
Consider the example from Section 2.2. We assume the adversary has no prior information about the location of ship s. So, δ 1 above is simply the uniform distribution over all possible locations. The statement S is the query issued by the adversary, Nearby(z, L 1 , 4) ∨ Nearby(z, L 2 , 4). 3 If we assume that the result of the query is true then the adversary learns that the location of s is within (Manhattan) distance 4 of L 1 or L 2 . This posterior belief (δ 2 ) is represented by the overlapping diamonds on the bottom-right of Figure 3(b). The abstract interpretation produces a sound (interval) overapproximation (P 2 ) of the posterior belief. This is modeled by the rectangle which surrounds the overlapping diamonds. This rectangle is the "join" of two overlapping boxes, which each correspond to one of the Nearby calls in the disjuncts of S.

Computing Vulnerability: Basic procedure
The key goal of this paper is to quantify the risk to secret information of running a query over that information. This section explains the basic approach by which we can use probabilistic polyhedra to compute vulnerability, i.e., the probability of the most probable point of the posterior distribution. Improvements on this basic approach are given in the next two sections.
Our convention will be to use C 1 , s min 1 , s max 1 , etc. for the components associated with probabilistic polyhedron P 1 . In the program S of interest, we assume that secret variables are in the set T , so input states are written σ T , and we assume there is a single output variable r. We assume that the adversary's initial uncertainty about the possible values of the secrets T is captured by the probabilistic polyhedron P 0 (such that domain(P 0 ) ⊇ T ).
Computing vulnerability occurs according to the following procedure.
1. Perform abstract interpretation: ⟨⟨S⟩⟩ P 0 = P 2. Given a concrete output value of interest, o, perform abstract conditioning to define P r=o The vulnerability V is the probability of the most likely state(s). When a probabilistic polyhedron represents one or more true distributions (i.e., the probabilities all sum to 1), the most probable state's probability is bounded by p max . However, the abstract semantics does not always normalize the probabilistic polyhedron as it computes, so we need to scale p max according to the total probability mass. To ensure that our estimate is on the safe side, we scale p max using the minimum probability mass: V = p max m min . In Figure 3(b), the sound approximation in the top-right has V ≤ 0.02 0.55 = 0.036 and the most precise approximation in the bottom-right has V ≤ 0.02 0.77 = 0.026.

Improving precision with sampling
We can improve the precision of the basic procedure using sampling. First we introduce some notational convenience: = P T revised polyhedron with confidence ω 4 We write P ∧ B and not P | B because P need not be normalized. P T is equivalent to step 2, above, but projected onto the set of secret variables T . P T + is the improved (via sampling) polyhedron.
After computing P T with the basic procedure from the previous section we take the following additional steps: 1. Set counters α and β to zero. 2. Do the following N times (for some N , see below): (a) Randomly select an input state σ T ∈ γ C (C T ). At this point we can compute the vulnerability as in the basic procedure, but using P T + instead of P T . Consider the example of Section 2.2. In Figure 3 There are several things to notice about this procedure. First, observe that in step 2b we "run" the program using the point distributionσ as an input; in the case that S is deterministic (has no pif statements) the output distribution will also be a point distribution. However, for programs with pif statements there are multiple possible outputs depending on which branch is taken by a pif. We consider all of these outputs so that we can confidently determine whether the input state σ could ever cause S to produce result o. If so, then σ should be considered part of P T + . If not, then we can safely rule it out (i.e., it is part of the overapproximation).
Second, we only update the size parameters of P T + ; we make no changes to p min T + and p max T + . This is because our sampling procedure only determines whether it is possible for an input state to produce the expected output. The probability that an input state produces an output state is already captured (soundly) by p T so we do not change that. This is useful because the approximation of p T does not degrade with the use of the interval domain in the way the approximation of the size degrades (as illustrated in Figure 3(b)). Using sampling is an attempt to regain the precision lost on the size component (only).
Finally, the confidence we have that sampling has accurately assessed which input states are in the support is orthogonal to the probability of any given state. In particular, P T is an abstraction of a distribution δ T , which is a mathematical object. Confidence ω is a measure of how likely it is that our abstraction (or, at least, the size part of it) is accurate.
We prove (in Appendix A.1) that our sampling procedure is sound:

Improving precision with concolic execution
Another approach to improving the precision of a probabilistic polyhedron P is to use concolic execution. The idea here is to "magnify" the impact of a single sample to soundly increase s min by considering its execution symbolically. More precisely, we concretely execute a program using a particular secret value, but maintain symbolic constraints about how that value is used. This is referred to as concolic execution [41]. We use the collected constraints to identify all points that would induce the same execution path, which we can include as part of s min .
We begin by defining the semantics of concolic execution, and then show how it can be used to increase s min soundly.

(Probabilistic) Concolic Execution
Concolic execution is expressed as rewrite rules defining a judgment ⟨Π, S⟩ −→ p π ⟨Π ′ , S ′ ⟩. Here, Π is pair consisting of a concrete state σ and symbolic state ζ. The latter maps variables x ∈ Var to symbolic expressions E which extend expressions E with symbolic variables α. This judgment indicates that under input state Π the statement S reduces to statement S ′ and output state Π ′ with probability p, with path condition π. The path condition is a conjunction of boolean symbolic expressions B (which are just boolean expressions B but altered to use symbolic expressions E instead of expressions E) that record which branch is taken during execution. For brevity, we omit π in a rule when it is true.
The rules for the concolic semantics are given in Figure 5. Most of these are standard, and deterministic (the probability annotation p is 1). Path conditions are recorded for if and while, depending on the branch taken. The semantics of pif q then S 1 else S 2 is non-deterministic: the result is that of S 1 with probability q, and S 2 with probability 1 − q. We write ζ(B) to substitute free variables x ∈ B We define a complete run of the concolic semantics with the judgment ⟨Π, S⟩ ⇓ p π Π ′ , which has two rules: A complete run's probability is thus the product of the probability of each individual step taken. The run's path condition is the conjunction of the conditions of each step. The path condition π for a complete run is a conjunction of the (symbolic) boolean guards evaluated during an execution. π can be converted to disjunctive normal form (DNF), and given the restrictions of the language the result is essentially a set of convex polyhedra over symbolic variables α.

Improving precision
Using concolic execution, we can improve our estimate of the size of a probabilistic polyhedron as follows: 1. Randomly select an input state σ T ∈ γ C (C T ) (recall that C T is the polyhedron describing the possible valuations of secrets T ). 2. Set Π = (σ T , ζ T ) where ζ T maps each variable x ∈ T to a fresh symbolic variable α x . Perform a complete concolic run ⟨Π, S⟩ ⇓ p π (σ ′ , ζ ′ ). Make sure that σ ′ (r) = o, i.e., the expected output. If not, select a new σ T and retry. Give up after some number of failures N . For our example shown in Figure 3(b), we might obtain a path condition |Loc(z).x − L 1 .x| + |Loc(z).y − L 1 .y| ≤ 4 that captures the left diamond of the disjunctive query. 3. After a successful concolic run, convert path condition π to DNF, where each conjunctive clause is a polyhedron C i . Also convert uses of disequality (≤ and ≥) to be strict (< and >).
4. Let C = C T ⊓ ( i C i ); that is, it is the join of each of the polyhedra in DN F (π) "intersected" with the original constraints. This captures all of the points that could possibly lead to the observed outcome along the concolically executed path. Compute n = #(C). Let P T + = P T except define s min T + = n if s min T < n and m min T + = p min T ·n if m min T < p min T ·n. (Leave them as is, otherwise.) For our example, n = 41, the size of the left diamond. We do not update s min T since 41 < 55, the probabilistic polyhedron's lower bound (but see below).

Combining Sampling with Concolic Execution
Sampling can be used to further augment the results of concolic execution. The key insight is that the presence of a sound under-approximation generated by the concolic execution means that it is unnecessary to sample from the underapproximating region. Here is the algorithm: 1. Let C = C 0 ⊓ ( i C i ) be the under-approximating region. 2. Perform sampling per the algorithm in Section 5, but with two changes: if a sampled state σ T ∈ γ C (C), ignore it -When done sampling, compute s min T + = p L · (#(C T ) − #(C)) + #(C) and s max T + = p U · (#(C T ) − #(C)) + #(C). This differs from Section 5 in not including the count from concolic region C in the computation. This is because, since we ignored samples σ T ∈ γ C (C), the credible interval [p L , p U ] bounds the likelihood that any given point in C T \ C is in the support of the true distribution.
For our example, concolic execution indicated there are at least 41 points that satisfy the query. With this in hand, and using the same samples as shown in Section 5, we can refine s ∈ [74, 80] and m ∈ [0.74, 0.160] (the credible interval is formed over only those samples which satisfy the query but fall outside the underapproximation returned by concolic execution). We improve the vulnerability estimate to V ≤ 0.02 0.0.74 = 0.027. These bounds (and vulnerability estimate) are better than those of sampling alone (s ∈ [72, 81] with V ≤ 0.028).
The statement of soundness and its proof can be found in Appendix A.
We have implemented our approach as an extension of Mardziel et al. [26], which is written in OCaml. This baseline implements numeric domains C via an OCaml interface to the Parma Polyhedra Library [4]. The counting procedure #(C) is implemented by LattE [14]. Support for arbitrary precision and exact arithmetic (e.g., for manipulating m min , p min , etc.) is provided by the mlgmp OCaml interface to the GNU Multi Precision Arithmetic library. Rather than maintaining a single probabilistic polyhedron P , the implementation maintains a powerset of polyhedra [3], i.e., a finite disjunction. Doing so results in a more precise handling of join points in the control flow, at a somewhat higher performance cost.
We have implemented our extensions to this baseline for the case that domain C is the interval numeric domain [10]. Of course, the theory fully applies to any numeric abstract domain. We use Gibbs sampling, which we implemented ourselves. We delegate the calculation of the beta distribution and its corresponding credible interval to the cephes OCaml library, which in turns uses the GNU Scientific Library. It is straightforward to lift the various operations we have described to the powerset domain. All of our code is available at https://github.com/GaloisInc/TAMBA.

Experiments
To evaluate the benefits of our techniques, we applied them to queries based on the evacuation problem outlined in Section 2. We found that while the baseline technique can yield precise answers when computing vulnerability, our new techniques can achieve close to the same level of precision far more efficiently.

Experimental Setup
For our experiments we analyzed queries similar to Nearby(s, l, d) from Figure 2. We generalize the Nearby query to accept a set of locations L-the query returns true if s is within d units of any one of the islands having location l ∈ L. In our experiments we fix d = 100. We consider the secrecy of the location of s, Location(s). We also analyze the execution of the resource allocation algorithm of Figure 2 directly; we discuss this in Section 8.3.
We measure the time it takes to compute the vulnerability (i.e., the probability of the most probable point) following each query. In our experiments, we consider a single ship s and set its coordinates so that it is always in range of some island in L, so that the concrete query result returns true (i.e. Nearby(s, L, 100) = true). We measure the vulnerability following this query result starting from a prior belief that the coordinates of s are uniformly distributed with 0 ≤ Location(s).x ≤ 1000 and 0 ≤ Location(s).y ≤ 1000.
In our experiments, we varied several experimental parameters: analysis method (either P, I, CE, S, or CE+S), query complexity c; AI precision level p; and number of samples n. We describe each in turn.

Analysis method
We compared five techniques for computing vulnerability: P: Abstract interpretation (AI) with convex polyhedra for domain C (Section 4), I: AI with intervals for C (Section 4), S: AI with intervals augmented with sampling (Section 5), CE: AI with intervals augmented with concolic execution (Section 6), and CE+S: AI with intervals augmented with both techniques (Section 6.3) The first two techniques are due to Mardziel et al. [26], where the former uses convex polyhedra and the latter uses intervals (aka boxes) for the underlying polygons. In our experiments we tend to focus on P since I's precision is unacceptably poor (e.g., often vulnerability = 1).
Query complexity. We consider queries with different L; we say we are increasing the complexity of the query as L gets larger. Let c = |L|; we consider 1 ≤ c ≤ 5, where larger L include the same locations as smaller ones. We set each location to be at least 2 · d Manhattan distance units away from any other island (so diamonds like those in Figure 3(a) never overlap).
Precision. The precision parameter p bounds the size of the powerset abstract domain at all points during abstract interpretation. This has the effect of forcing joins when the powerset grows larger than the specified precision. As p grows larger, the results of abstract interpretation are likely to become more precise (i.e. vulnerability gets closer to the true value). We considered p values of 1, 2, 4, 8, 16, 32, and 64.
Samples taken. For the latter three analysis methods, we varied the number of samples taken n. For analysis CE, n is interpreted as the number of samples to try per polyhedron before giving up trying to find a "valid sample." 5 For analysis S, n is the number of samples, distributed proportionally across all the polyhedra in the powerset. For analysis CE+S, n is the combination of the two. We considered sample size values of 1, 000 − 50, 000 in increments of 1, 000. We always compute an interval with ω =99.9% confidence (which will be wider when fewer samples are used).
System description. We ran experiments varying all possible parameters. For each run, we measured the total execution time (wall clock) in seconds to analyze the query and compute vulnerability. All experiments were carried out on a MacBook Air with OSX version 10.11.6, a 1.7GHz Intel Core i7, and 8GB of RAM. We ran a single trial for each configuration of parameters. Only wall-clock time varies across trials; informally, we observed time variations to be small. Figure 6(a)-(c) measure vulnerability (y-axis) as a function of time (x-axis) for each analysis. 6 These three figures characterize three interesting "zones" in the space of complexity and precision. The results for method I are not shown in any of the figures. This is because I always produces a vulnerability of 1. The refinement methods (CE, S, and CE+S) are all over the interval domain, and should be considered as "improving" the vulnerability of I.  In Figure 6(a) we fix c = 1 and p = 1. In this configuration, baseline analysis P can compute the true vulnerability in ∼ 0.95 seconds. Analysis CE is also able to compute the true vulnerability, but in ∼ 0.19 seconds. Analysis S is able to compute a vulnerability to within ∼ 5 · e −6 of optimal in ∼ 0.15 seconds. These data points support two key observations. First, even a very modest number of samples improves vulnerability significantly over just analyzing with intervals. Second, concolic execution is only slightly slower and can achieve the optimal vulnerability. Of course, concolic execution is not a panacea. As we will see, a feature of this configuration is that no joins take place during abstract interpretation. This is critical to the precision of the concolic execution.

Results
In Figure 6(b) we fix c = 2 and p = 4. In contrast the the configuration of Figure 6(a), the values for c and p in this configuration are not sufficient to prevent all joins during abstract interpretaion. This has the effect of taking polygons that represent individual paths through the program and joining them into a single polygon representing many paths. We can see that this is the case because baseline analysis P is now achieving a better vulnerability than CE. However, one pattern from the previous configuration persists: all three refinement methods (CE, S, CE+S) can achieve vulnerability within ∼ 1 · e −5 of P, but in 1 4 the time. In contrast to the previous configuration, analysis CE+S is now able to make a modest improvement over CE (since it does not achieve the optimal).
In Figure 6(c) we fix c = 5 and p = 32. This configuration magnifies the effects we saw in Figure 6(b). Similarly, in this configuration there are joins happening, but the query is much more complex and the analysis is much more precise. In this figure, we label the X axis as a log scale over time. This is because analysis P took over two minutes to complete, in contrast the longest-running refinement method, which took less than 6 seconds. The relationship between the refinement analyses is similar to the previous configuration. The key observation here is that, again, all three refinement analyses achieve within ∼ 3 · e −5 of P, but this time in 4% of the time (as opposed to 1 4 in the previous configuration). Figure 6(d) makes more explicit the relationship between refinements (CE, S, CE+S) and P. We fix n = 50, 000 (the maximum) here, and p = 64 (the maximum). We can see that as query complexity goes up, P gets exponentially slower, while CE, S, and CE+S slow at a much lower rate, while retaining (per the previous graphs) similar precision.

Evacuation Problem
We conclude this section by briefly discussing an analysis of an execution of the resource allocation algorithm of Figure 2. In our experiment, we set the number of ships to be three, where two were in range d = 300 of the evacuation site, and their sum-total berths (500) were sufficient to satisfy demand at the site (also 500). For our analysis refinements we set n = 1000. Running the algorithm, a total of seven pairs of Nearby and Capacity queries were issued. In the end, the algorithm selects two ships to handle the evacuation. Table 1 shows the time to execute the algorithm using the different analysis methods, along with the computed vulnerability-this latter number represents the coordinator's view of the most likely nine-tuple of the private data of the three ships involved (x coordinate, y coordinate, and capacity for each). We can see that, as expected, our refinement analyses are far more efficient than baseline P, and far more precise than baseline I. The CE methods are precise but slower than S. This is because of the need to count the number of points in the DNF of the concolic path conditions, which is expensive.

Related Work
Quantifying Information Flow. There is a rich research literature on techniques that aim to quantify information that a program may release, or has released, and then use that quantification as a basis for policy. One question is what measure of information release should be used. Past work largely considers information theoretic measures, including Bayes vulnerability [44] and Bayes risk [7], Shannon entropy [42], and guessing entropy [27]. The g-vulnerability framework [1] was recently introduced to express measures having richer operational interpretations, and subsumes other measures.
Our work focuses on Bayes Vulnerability, which is related to min entropy. Vulnerability is appealing operationally: As Smith [44] explains, it estimates the risk of the secret being guessed in one try. While challenging to compute, this approach provides meaningful results for non-uniform priors. Work that has focused on other, easier-to-compute metrics, such as Shannon entropy and channel capacity, require deterministic programs and priors that conform to uniform distributions [2,21,23,24,28,33]. Like Mardziel et al. [26], we are able to compute the worst-case vulnerability, i.e., due to a particular output, rather than a static estimate, i.e., as an expectation over all possible outputs. Köpf and Basin [22] originally proposed this idea, and Mardziel et al. were the first to implement it, followed by several others [5,19,25].
Köpf and Rybalchenko [23] (KR) also use sampling and concolic execution to statically quantify information leakage. But their approach is quite different from ours. KR uses sampling of a query's inputs in lieu of considering (as we do) all possible outputs, and uses concolic execution with each sample to ultimately compute Shannon entropy, by underapproximation, within a confidence interval. This approach benefits from not having to enumerate outputs, but also requires expensive model counting for each sample. By contrast, we use sampling and concolic execution from the posterior computed by abstract interpretation, using the results to boost the lower bound on the size/probability mass of the abstraction. Our use of sampling is especially efficient, and the use of concolic execution is completely sound (i.e., it retains 100% confidence in the result). As with the above work, KR requires deterministic programs and uniform priors.
Probabilistic Programming Langauges. A probabilistic program is essentially a lifting of a normal program operating on single values to a program operating on distributions of values. As a result, the program represents a joint distribution over its variables [18]. As discussed in this paper, quantifying the information released by a query can be done by writing the query in a probabilistic programming language (PPL) and representing the uncertain secret inputs as distributions. Quantifying release generally corresponds to either the maximum likelihood estimation (MLE) problem or the maximum a-posteriori probability (MAP) problem. Not all PPLs support computation of MLE and MAP, but several do.
PPLs based on partial sampling [17,35] or full enumeration [38] of the state space are unsuitable in our setting: they are either too inefficient or too imprecise. PPLs based on algebraic decision diagrams [8], graphical models [29], and factor graphs [6,31,37] translate programs into convenient structures and take advantage of efficient algorithms for their manipulation or inference, in some cases supporting MAP or MLE queries (e.g. [34,36]). PSI [15] supports exact inference via computation of precise symbolic representations of posterior distributions, and has been used for dynamic policy enforcement [25]. Guarnieri et al. [19] use probabilistic logic programming as the basis for inference; it scales well but only for a class of queries with certain structural limits, and which do not involve numeric relationships.
Our implementation for probabilistic computation and inference differs from the above work in two main ways. Firstly, we are capable of sound approximation and hence can trade off precision for performance, while maintaining soundness in terms of a strong security policy. Even when using sampling, we are able to provide precise confidence measures. The second difference is our compositional representation of probability distributions, which is based on numerical abstractions: intervals [10], octagons [30], and polyhedra [12]. The posterior can be easily used as the prior for the next query, whereas prior work would have to repeatedly analyze the composition of past queries.
A few other works have also focused on abstract interpretation, or related techniques, for reasoning about probabilistic programs. Monniaux [32] defines an abstract domain for distributions. Smith [45] describes probabilistic abstract interpretation for verification of quantitative program properties. Cousot [13] unifies these and other probabilistic program analysis tools. However, these do not deal with sound distribution conditioning, which is crucial for beliefbased information flow analysis. Work by Sankaranarayanan et al [40] uses a combination of techniques from program analysis to reason about distributions (including abstract interpretation), but the representation does not support efficient retrieval of the maximal probability, needed to compute vulnerability.

Conclusions
Quantitative information flow is concerned with measuring the knowledge about secret data that is gained by observing the answer to a query. This paper has presented a combination of static analysis using probabilistic abstract interpretation, sampling, and underapproximation via concolic execution to compute high-confidence upper bounds on information flow more precisely and efficiently than past work. Experimental results show dramatic improvements in overall precision and/or performance compared to abstract interpretation alone. As next steps, we plan to integrate static analysis and sampling more closely so as to avoid precision loss at decision points in programs. We also look to extend programs to be able to store random choices in variables, to thereby implement more advanced probabilistic structures.

A.1 Proofs
Here we restate the soundness theorems for our techniques, and include their proofs.

Theorem 2 (Sampling is Sound).
If δ 0 ∈ γ P (P 0 ), ⟨⟨S⟩⟩ P 0 = P , and [[S]]δ 0 = δ then Proof. Suppose we have some δ 0 ∈ γ P (P 0 ) whereby [[S]]δ 0 = δ. We want to prove that δ T ∈ γ P (P T + ). Per Definition 2, this means we must show that Our proof goes as follows. First, we know that δ T ∈ γ P (C T ) by Theorem 1, Lemma 15 and Lemma 7 of Mardziel et al. By Definition 2, this means (1) and (4)  To prove (2), we argue as follows. Let p = |support(δ T )| , which represents the probability that a randomly selected point from C T is in support(δ T ). From the computed credible interval over the Beta distribution, we have that p ∈ [p L , p U ] with confidence ω. As such, which is the desired result.
To prove (3), first consider that if m min T + = m min T then the first half of (3) follows from the first half of (c). Otherwise, we have that m min T + = p min T · s min T + . Then we can reason the first half of (3) holds using the following reasoning: s min T + ≤ |support(δ T )| by (2) p min T + · s min T + ≤ p min T + · |support(δ T )| as p min T + nonneg. m min T + ≤ p min T + · |support(δ T )| by def. = Σ σ∈support(δ T ) p min T + ≤ Σ σ∈support(δ T ) δ T (σ) by (4) = ∥δ T ∥ We can prove the soundness of m max T + (the other half of (3)) with similar reasoning.

Theorem 3 (Concolic Execution is Sound).
If δ 0 ∈ γ P (P 0 ), ⟨⟨S⟩⟩ P 0 = P , and [[S]]δ 0 = δ then Proof. Our proof is quite similar to that of Theorem 2. Once again we proceed to show the four elements of Definition 2, where (1) and (4) hold by construction. To prove (2) we are only concerned with the inequality s min T ≤ |support(δ T )|, since s max T + = s max T . We know by the soundness of P T that s min T ≤ |support(δ T )| From the definition of concolic execution we have {σ | σ ∈ C T ∧ σ |= π} ⊆ {σ | δ T (σ) > 0}. Notice that this is just saying that the concolic execution is a valid under-approximation for the support. From this we know that: Given (2), our proof of (3) proceeds similarly to Theorem 2.

Theorem 4 (Concolic and Sampling Composition is Sound).
If δ 0 ∈ γ P (P 0 ), ⟨⟨S⟩⟩ P 0 = P , and [[S]]δ 0 = δ then = P T sampling-and-concolically revised with confidence ω Proof. Our proof is quite similar to that of Theorem 2. Once again we proceed to show the four elements of Definition 2, where (1) and (4) hold by construction. To prove (2), consider the set |support(δ T )|. For any state σ T ∈ support(δ T ) it must be the case that either σ T ∈ C or σ T ∈ C T \ C. This is because C ⊆ C T . Thus, we have |support(δ T )| = |support(C T \ C)| + |support(C)| Additionally, since C represents a sound under-approximation of the support, we know that |support(C)| = #(C) So, p = |support(C T \C)|

B Query code
The following is the query code of the example developed in Section 2.2. Here, s_x and s_y represent a ship's secret location. The variables l1_x, l1_y, l2_x, l2_y, and d are inputs to the query. The first pair represents position L 1 , the second pair represents the position L 2 , and the last is the distance threshold, set to 4.
We assume for the example that L 1 and L 2 have the same y coordinate, and their x coordinates differ by 6 units. We express the query in the language of Figure 4 basically as follows: d_l1 := |s_x -l1_x| + |s_y -l1_y|; d_l2 := |s_x -l2_x| + |s_y -l2_y|; if (d_l1 <= d || d_l2 <= d) then out := true // assume this result else out := false The variable out is the result of the query. We simplify the code by assuming the absolute value function is built-in; we can implement this with a simple conditional. We run this query probabilistically under the assumption that s_x and s_y are uniformly distributed within the range given in Figure 1. We then condition the output on the assumption that out = true. When using intervals as the baseline of probabilistic polyhedra, this produces the result given in the upper right of Figure 3(b); when using convex polyhedra, the result is shown in the lower right of the figure. The use of sampling and concolic execution to augment the former is shown via arrows between the two.