IWOCA 2016: Combinatorial Algorithms pp 449-460

# Querying Probabilistic Neighborhoods in Spatial Data Sets Efficiently

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9843)

## Abstract

The probability that two spatial objects establish some kind of mutual connection often depends on their proximity. To formalize this concept, we define the notion of a probabilistic neighborhood: Let P be a set of n points in $$\mathbb {R}^d$$, $$q \in \mathbb {R}^d$$ a query point, $${\text {dist}}$$ a distance metric, and $$f : \mathbb {R}^+ \rightarrow [0,1]$$ a monotonically decreasing function. Then, the probabilistic neighborhood N(qf) of q with respect to f is a random subset of P and each point $$p \in P$$ belongs to N(qf) with probability $$f({\text {dist}}(p,q))$$. Possible applications include query sampling and the simulation of probabilistic spreading phenomena, as well as other scenarios where the probability of a connection between two entities decreases with their distance. We present a fast, sublinear-time query algorithm to sample probabilistic neighborhoods from planar point sets. For certain distributions of planar P, we prove that our algorithm answers a query in $$O((|N(q,f)| + \sqrt{n})\log n)$$ time with high probability. In experiments this yields a speedup over pairwise distance probing of at least one order of magnitude, even for rather small data sets with $$n=10^5$$ and also for other point distributions not covered by the theoretical results.

## References

1. 1.
Agarwal, P.K., Aronov, B., Har-Peled, S., Phillips, J.M., Yi, K., Zhang, W.: Nearest neighbor searching under uncertainty II. In Proceedings of the 32nd Symposium on Principles of Database Systems, PODS, pp. 115–126. ACM (2013)Google Scholar
2. 2.
Aldecoa, R., Orsini, C., Krioukov, D.: Hyperbolic graph generator. Comput. Phys. Commun. 196, 492–496 (2015). Elsevier, Amsterdam
3. 3.
Arge, L., Larsen, K.G.: I/O-efficient spatial data structures for range queries. SIGSPATIAL Spec. 4, 2–7 (2012)
4. 4.
Batagelj, V., Brandes, U.: Efficient generation of large random networks. Phys. Rev. E 71(3), 036113 (2005)
5. 5.
Bringmann, K., Keusch, R., Lengler, J.: Geometric inhomogeneous random graphs (2015). arXiv preprint arXiv:1511.00576
6. 6.
Center for International Earth Science Information Network CIESIN Columbia University; Centro Internacional de Agricultura Tropical CIAT. Gridded population of the world, version 3 (gpwv3): Population density grid (2005)Google Scholar
7. 7.
Hethcote, H.W.: The mathematics of infectious diseases. SIAM Rev. 42(4), 599–653 (2000)
8. 8.
Hu, X., Qiao, M., Tao, Y.: Independent range sampling. In: Proceedings of the 33rd Symposium on Principles of Database Systems, PODS, pp. 246–255. ACM (2014)Google Scholar
9. 9.
Kamel, I., Faloutsos, C.: Hilbert R-tree: An improved R-tree using fractals. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, pp. 500–509. Morgan Kaufmann Publishers Inc., San Francisco (1994)Google Scholar
10. 10.
Kraetzschmar, G.K., Gassull, G.P., Uhl, K.: Probabilistic quadtrees for variable-resolution mapping of large environments. In: Proceedings of the 5th IFAC/EURON Symposium on Intelligent Autonomous Vehicles (2004)Google Scholar
11. 11.
Kriegel, H.-P., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 337–348. Springer, Heidelberg (2007)
12. 12.
Krioukov, D., Papadopoulos, F., Kitsak, M., Vahdat, A., Boguñá, M.: Hyperbolic geometry of complex networks. Phys. Rev. E 82(3), 036106 (2010)
13. 13.
Pei, J., Hua, M., Tao, Y., Lin, X.: Query answering techniques on uncertain, probabilistic data: tutorial summary. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1357–1364. ACM (2008)Google Scholar
14. 14.
Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., San Francisco (2005)
15. 15.
Staudt, C.L., Sazonovs, A., Meyerhenke, H.: NetworKit: A tool suite for large-scale complex network analysis. In: Network Science. Cambridge University Press (2016, to appear)Google Scholar
16. 16.
von Looz, M., Meyerhenke, H.: Querying Probabilistic Neighborhoods in Spatial Data Sets Efficiently. ArXiv preprint arXiv:1509.01990
17. 17.
von Looz, M., Prutkin, R., Meyerhenke, H.: Generating random hyperbolic graphs in subquadratic time. In: Elbassioni, K., Makino, K. (eds.) ISAAC 2015. LNCS, vol. 9472, pp. 467–478. Springer, Heidelberg (2015)Google Scholar