Querying Probabilistic Neighborhoods in Spatial Data Sets Efficiently

  • Moritz von LoozEmail author
  • Henning Meyerhenke
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9843)


The probability that two spatial objects establish some kind of mutual connection often depends on their proximity. To formalize this concept, we define the notion of a probabilistic neighborhood: Let P be a set of n points in \(\mathbb {R}^d\), \(q \in \mathbb {R}^d\) a query point, \({\text {dist}}\) a distance metric, and \(f : \mathbb {R}^+ \rightarrow [0,1]\) a monotonically decreasing function. Then, the probabilistic neighborhood N(qf) of q with respect to f is a random subset of P and each point \(p \in P\) belongs to N(qf) with probability \(f({\text {dist}}(p,q))\). Possible applications include query sampling and the simulation of probabilistic spreading phenomena, as well as other scenarios where the probability of a connection between two entities decreases with their distance. We present a fast, sublinear-time query algorithm to sample probabilistic neighborhoods from planar point sets. For certain distributions of planar P, we prove that our algorithm answers a query in \(O((|N(q,f)| + \sqrt{n})\log n)\) time with high probability. In experiments this yields a speedup over pairwise distance probing of at least one order of magnitude, even for rather small data sets with \(n=10^5\) and also for other point distributions not covered by the theoretical results.



This work is partially supported by German Research Foundation (DFG) grant ME 3619/3-1 within the Priority Programme 1736 Algorithms for Big Data. The authors thank Mark Ortmann for helpful discussions.


  1. 1.
    Agarwal, P.K., Aronov, B., Har-Peled, S., Phillips, J.M., Yi, K., Zhang, W.: Nearest neighbor searching under uncertainty II. In Proceedings of the 32nd Symposium on Principles of Database Systems, PODS, pp. 115–126. ACM (2013)Google Scholar
  2. 2.
    Aldecoa, R., Orsini, C., Krioukov, D.: Hyperbolic graph generator. Comput. Phys. Commun. 196, 492–496 (2015). Elsevier, AmsterdamCrossRefGoogle Scholar
  3. 3.
    Arge, L., Larsen, K.G.: I/O-efficient spatial data structures for range queries. SIGSPATIAL Spec. 4, 2–7 (2012)CrossRefGoogle Scholar
  4. 4.
    Batagelj, V., Brandes, U.: Efficient generation of large random networks. Phys. Rev. E 71(3), 036113 (2005)CrossRefGoogle Scholar
  5. 5.
    Bringmann, K., Keusch, R., Lengler, J.: Geometric inhomogeneous random graphs (2015). arXiv preprint arXiv:1511.00576
  6. 6.
    Center for International Earth Science Information Network CIESIN Columbia University; Centro Internacional de Agricultura Tropical CIAT. Gridded population of the world, version 3 (gpwv3): Population density grid (2005)Google Scholar
  7. 7.
    Hethcote, H.W.: The mathematics of infectious diseases. SIAM Rev. 42(4), 599–653 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Hu, X., Qiao, M., Tao, Y.: Independent range sampling. In: Proceedings of the 33rd Symposium on Principles of Database Systems, PODS, pp. 246–255. ACM (2014)Google Scholar
  9. 9.
    Kamel, I., Faloutsos, C.: Hilbert R-tree: An improved R-tree using fractals. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, pp. 500–509. Morgan Kaufmann Publishers Inc., San Francisco (1994)Google Scholar
  10. 10.
    Kraetzschmar, G.K., Gassull, G.P., Uhl, K.: Probabilistic quadtrees for variable-resolution mapping of large environments. In: Proceedings of the 5th IFAC/EURON Symposium on Intelligent Autonomous Vehicles (2004)Google Scholar
  11. 11.
    Kriegel, H.-P., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 337–348. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  12. 12.
    Krioukov, D., Papadopoulos, F., Kitsak, M., Vahdat, A., Boguñá, M.: Hyperbolic geometry of complex networks. Phys. Rev. E 82(3), 036106 (2010)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Pei, J., Hua, M., Tao, Y., Lin, X.: Query answering techniques on uncertain, probabilistic data: tutorial summary. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1357–1364. ACM (2008)Google Scholar
  14. 14.
    Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., San Francisco (2005)zbMATHGoogle Scholar
  15. 15.
    Staudt, C.L., Sazonovs, A., Meyerhenke, H.: NetworKit: A tool suite for large-scale complex network analysis. In: Network Science. Cambridge University Press (2016, to appear)Google Scholar
  16. 16.
    von Looz, M., Meyerhenke, H.: Querying Probabilistic Neighborhoods in Spatial Data Sets Efficiently. ArXiv preprint arXiv:1509.01990
  17. 17.
    von Looz, M., Prutkin, R., Meyerhenke, H.: Generating random hyperbolic graphs in subquadratic time. In: Elbassioni, K., Makino, K. (eds.) ISAAC 2015. LNCS, vol. 9472, pp. 467–478. Springer, Heidelberg (2015)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Institute of Theoretical InformaticsKarlsruhe Institute of Technology (KIT)KarlsruheGermany

Personalised recommendations