Region-based approximation of probability distributions (for visibility between imprecise points among obstacles)

Let $p$ and $q$ be two imprecise points, given as probability density functions on $\mathbb R^2$, and let $\cal R$ be a set of $n$ line segments (obstacles) in $\mathbb R^2$. We study the problem of approximating the probability that $p$ and $q$ can see each other; that is, that the segment connecting $p$ and $q$ does not cross any segment of $\cal R$. To solve this problem, we approximate each density function by a weighted set of polygons; a novel approach to dealing with probability density functions in computational geometry.


Introduction
Data imprecision is an important obstacle to the application of geometric algorithms to real-world problems. In the computational geometry literature, various models to deal with data imprecision have been suggested. Most generally, in this paper we describe the location of each point by a probability distribution µ i (for instance by a Gaussian distribution). This model is often not worked with directly because of the computational difficulties arisen from its generality.
These difficulties can often be addressed by approximating the distributions by point sets. For instance, for tracking uncertain objects a particle filter uses a discrete set of locations to model uncertainty [17]. Löffler and Phillips [13] and Jørgenson et al. [11] discuss several geometric problems on points with probability distributions, and show how to solve them using discrete point sets (or indecisive points) that have guaranteed error bounds. More specifically, a 2-dimensional point set P is an ε-quantization of an xy-monotone function F (such as a cumulative probability density function), if for every point q in the plane the fraction of P dominated by q differs from F (q) by at most ε.
Imprecise points appear naturally in many applications. They play an important role in databases [8,2,7,5,16,1,6], machine learning [4], and sensor networks [18], where a limited number of probes from a certain data set are gathered, each potentially representing the true location of a data point. Alternatively, data points may be obtained using imprecise measurements or are the result of inexact earlier computations.
Even though a point set may be a provably good approximation of a probability distribution, this is not good enough in all applications. Consider, for example, a situation where we wish to model visibility between imprecise points among obstacles. When both points are given by a probability distribution, naturally there is a probability that the two points see each other. However, when we discretise the distributions, the choice of points may greatly influence the resulting probability, as illustrated in Figure 1.  Instead, we may approximate distributions by regions. The concept of describing an imprecise point by a region or shape was first introduced by Guibas et al. [9], motivated by finite coordinate precision, and later studied extensively in a variety of settings [10,3,14,15,12].
In this work we show how to use region-based approximation of point distributions to solve algorithmic problems on (general) imprecise points. In Section 2 we discuss several ways to do this. In Section 3, we focus on a geometric problem for which previous point-based methods do not work well: visibility computations between imprecise points.

Region-based approximation
Assume that an imprecise point p is given by a probability distribution µ. We wish to describe µ by a set of weighted regions M that provide an additive ε-approximation of the distribution: for any point in the plane, the sum of the weights of the regions containing q differs from µ(q) by at most ε.
One approach to attack this problem is to consider the isolines of the probability density function f (·) at ε levels. These are the curves where f (·) is exactly kε, for some integer k, and they separate the plane into regions where f (·) has a value between iε and (i + 1)ε. Note that if we could take the regions formed by the isolines, and give each of them a weight of ε, they would form a valid ε-approximation of f . However, the isolines are not generally polygonal. Instead, we note that if we take any polygons that stay between two consecutive isolines, and use these as polygons with weight ε, they are guaranteed to form a 2ε-approximation.
Lemma 2.1. Let f be a function with isolines of f at ε levels. Let M be a set of polygons such that exactly one polygon from M lies between any two consecutive isolines. If every polygon in M has weight ε then M forms an additive 2ε-approximation of f .
Proof. Let P be a polygon that lies between the ith and the (i + 1)th isolines, and Q a polygon that lies between the (i + 1)th and (i + 2)th isolines. Then the region between P and Q lies completely between the ith and the (i + 2)th isolines, hence, we know that for any point p inside P but outside Q, we have iε ≤ f (p) ≤ (i + 2)ε. The lemma follows. Figure 2(a) illustrates this.
Of course, the complexity of the polygons depends on the specific distribution. In the following we focus on Gaussian distributions, because they are natural and likely to occur in applications.
Proof. The isolines of a bivariate Gaussian distribution are concentric circles that subdivide R 2 into annuli, and we wish to compute a polygon that stays within each annulus. We observe that the complexity of such a polygon depends only on the relative width of its annulus; that is, given an annulus with inner radius r and outer radius r , we can fit a regular π/ arccos r r -gon. Refer to Figure 2 The probability density function with standard deviation σ is given by the equation The number of annuli depends on the height of the peak of the function we wish to approximate, which is at 1 2πσ 2 , so we need k = 1 2πσ 2 ε isolines. If we solve f (x, 0) = iε for x, we get so the ith annulus has relative width r r = log(2πσ 2 iε) log(2πσ 2 (i + 1)ε) .
Hence, the total complexity of all polygons is k i=1 π/ arccos log(2πσ 2 iε) log(2πσ 2 (i + 1)ε) , which we rather coarsely bound by k times the maximum of these terms, .
As the argument of the arccos approaches 1, the value approaches 0 as the square of the argument, leading to a O( 1 σ 2 ε ) growth rate. The lemma follows.
Alternatively, we may subdivide space into grid cells and give each cell a weight depending on the value of f . The advantage of a grid-based approach is that the subdivision of the plane does not depend on the actual distributions, and that squares are particularly nice polygons. A problem with this approach is that the resolution of the grid depends on the steepest part of f : when the value of f varies by more than ε in a cell, the approximation is not valid. Instead, we may also choose to compute a non-uniform grid, for example based on a quadtree. If we use a quadtree to subdivide R 2 until no cell is crossed by more than one isoline, and we weigh a cell crossed by the ith isoline by iε, we again obtain a 2ε-approximation. Figure 2(b) illustrates this. The orange region is the set of lines intersecting P1 and P2 through s1, s2, s3, s4. Right: Partition L * in dual space. The orange cell corresponds to all lines in the primary space intersecting the same four segments s1, s2, s3, s4. Right: The "hour-glass" shape H * in the dual space that corresponds to a set H of all lines in the primary space that intersect the obstacle.

Visibility between two regions
Consider a set of obstacles R in the plane. We assume that the obstacles are disjoint simple convex polygons with m vertices in total. For two imprecise points with probability distributions µ 1 and µ 2 we approximate them with two sets of weighted regions M 1 and M 2 , each consisting of convex polygons. For every pair of polygons P 1 ⊂ M 1 and P 2 ⊂ M 2 , we compute the probability that a point p 1 chosen uniformly at random from P 1 can see a point p 2 chosen uniformly at random from P 2 . We say that two points can "see" each other if and only if the straight line segment connecting them does not intersect any obstacle from R. The probability of two points p 1 = (x 1 , y 1 ) ∈ P 1 and p 2 = (x 2 , y 2 ) ∈ P 2 seeing each other can be computed by the formula: where v(x 1 , y 1 , x 2 , y 2 ) is 1 if the points see each other, and 0 otherwise.
To compute prob we consider a dual space L where a point with coordinates (α, β) corresponds to a line y = αx − β in the primary space. We construct a region L * in the dual space that corresponds to the set of lines that stab both P 1 and P 2 . This region can be partitioned into cells, each corresponding to a set of lines that cross the same four segments of P 1 and P 2 (refer to Figure 3). The following follows from the fact that each vertex of L * corresponds to a line in primary space through two vertices of P 1 and P 2 .
Lemma 3.1. Given two convex polygons P 1 and P 2 of total size n, the complexity of partition L * in the dual space that corresponds to a set of lines that stab P 1 and P 2 is O(n 2 ).
Proof. Each vertex of L * corresponds to a line in the primary space that goes through a pair of vertices of P 1 and P 2 . f For each obstacle h ⊂ R we construct a region H * in the dual space, that corresponds to the set of lines that intersect h. H * has an "hour-glass" shape (refer to Figure 4). We now compute the First consider the case that P 1 , P 2 and the obstacles are disjoint. We can assume that all obstacles lie in the convex hull of P 1 and P 2 . Then a pair of points from P 1 and P 2 see each other exactly if the line through the points does not intersect an obstacle. Thus, we only need to identify the cells in L not intersecting any of the regions H * , and integrate over these cells. Details on evaluating the integral for one cell is given in Section 4. Overall this case can be handled in O((m + n) 2 ) time. Next, consider the case that P 1 and P 2 are disjoint but might intersect obstacles. Now we need to consider the length of each line segment from the last obstacle in P 1 to the boundary and from the boundary of P 2 to the first obstacle. We can annotate the cells of L with this information by a traversal of L. Between neighboring cells this information can be updated in constant time. Thus, this case can be handled with the same asymptotic running time as the previous case. As a third case, consider P 1 overlapping P 2 but with no obstacles in the overlap area. The computations needed remain the same as in the case of non-overlapping P 1 and P 2 . provided we actually evaluated this integral, we should now be able to compute the value in O(n 2 ) time.
Finally, we consider the general case, in which obstacles might also lie in the overlap of P 1 and P 2 . In the cells of L that correspond to the overlap of P 1 and P 2 we now need to consider the sum of the lengths of each line segment between boundaries of obstacles. If we simply traverse L, maintaining the ordered list of intersected obstacle boundaries, then computing the sum of lengths in one cell requires O(m) time, leading to a total running time of O(m(m + n) 2 ). Instead, we investigate the structure of the problem a little more closely.
Lemma 3.2. Let P be a polygon, possibly with holes or multiple components, of total complexity n. Let S be the space of all maximal line segments, that is, segments which lie in the (closed) interior of P but which are not contained in larger line segments that also lie in the interior of P . Then S has complexity O(n 2 ).
Proof. Line segments have four degrees of freedom, but the condition that they must be locally maximal removes two of them, so S is intrinsically two-dimensional. We may project S onto the set L of all lines (by extending each segment to a line), but this way we may map multiple segments onto the same line. However, we only map finitely many segments to a line. We can visualize this as a finite set of "copies" of (patches of) L above each other. Then, as we move (translate or rotate) our segment through P , it may split into two segments when we hit a vertex; this corresponds to one of the copies of L splitting into two copies. The "seams" along which the copies of patches of L are sewn together in S are one-dimensional curves, which correspond to the segment in P rotating around (and touching) a vertex. The endpoints of these seams are points which correspond to segments in P that connect two vertices. Figure 5 illustrates P , L and S for a small example.
Clearly, there can be at most O(n 2 ) segments that connect two vertices in P , thus, there are only O(n 2 ) vertices in S. This does not immediately give the bound, though, since S is not planar. However, each vertex in S (corresponding to a pair of vertices in P ) can be incident to at most two seams in S: one that corresponds to a segment rotating around either vertex in P . So, the total number of seams can also be at most O(n 2 ). Since a seam always connects exactly three patches, the total complexity of S is O(n 2 ).
If we apply Lemma 3.2 to our setting, then P is the imprecise point with obstacles as holes, of total complexity (n + m). We arrive at the following intermediate result.
In the next section, we show how to compute the probability for a given combinatorial configuration.
Lemma 3.3. Given two polygons P 1 and P 2 of total size n and obstacles of total complexity m, we can compute the probability that a pair of points drawn uniformly at random from P 1 × P 2 can see each other in O((m + n) 2 ) time, assuming we can compute the necessary information within each cell.

Computing the probability for a fixed combinatorial configuration
For simplicity of presentation, we assume that P 1 and P 2 are separable by a vertical line, and P 1 and P 2 are disjoint from R. This will allow us to write the solution in a more concise way without loss of generality.
The numerator of (2) can be written as a sum of integrals over all cells of partition L * \ ∪ h H * in the dual: In Appendix A we give a detailed case-by-case closed-form evaluation of the integrals. Since we integrate over constant-size subproblems, we obtain: Theorem 4.1. Given two convex polygons P 1 and P 2 of total size n and a set of obstacles of total size m, we can compute the probability that a point p 1 chosen uniformly at random in P 1 sees a point p 2 chosen uniformly at random in P 2 in O((m + n) 2 ) time.

Main result
Combining Theorems 2.2 and 4.1, our main result follows: Theorem 5.1. Given two imprecise points, modelled as Gaussian distributions µ 1 and µ 2 with standard deviations σ 1 and σ 2 , and n obstacles, we can ε-approximate the probability that p and q see each Proof. According to Theorem 2.2, we need to solve O(σ −2

A Integral
Here we'll show how to calculate the following integral for a cell C of the partition L * of in the dual space: Suppose lines corresponding to C intersect four segments s 1 , s 2 , s 3 , and s 4 that belong to the lines with the following formulas: Then, the limits of integration can be expressed as: After solving the inner two integrals we get: For some i and j: Denote I ij to be: where cell C is split into vertical splines C v , with each C v bounded by left and right vertical segments with α-coordinates equal to α 1 and α 2 , and bottom and top segments defined by formulas β = A 1 α + B 1 and β = A 2 α + B 2 . Then, Denote F ij (α) to be an indefinite integral with additive constant equal to 0: In the general case, when b i = 0, b j = 0, and a i /b i = a j /b j , If segment i is pointing towards segment j (line drawn through i intersects j), and one of the corners of the integration spline corresponds to the line going through i, then the following equalities hold where α corresponds to the corner of the spline. In that case, If segment j is pointing towards segment i (line drawn through j intersects i), and one of the corners of the integration spline corresponds to the line going through j, then the following equalities hold where α corresponds to the corner of the spline. In that case, In case when b i = 0, b j = 0, but lines are parallel (a i /b i = a j /b j ), If segment j is pointing towards segment i: where α corresponds to the corner of the spline. In that case, If b i = 0, and b j = 0 (segment j is vertical), then If segment i is pointing towards segment j: where α corresponds to the corner of the spline. In that case, If b i = 0 and b j = 0 (both segments are vertical), then