Voronoi-Diagram Based Partitioning for Distance Join Query Processing in SpatialHadoop

  • Francisco García-García
  • Antonio CorralEmail author
  • Luis Iribarne
  • Michael Vassilakopoulos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11163)


SpatialHadoop is an extended MapReduce framework supporting global indexing techniques that partition spatial data across several machines and improve query processing performance compared to traditional Hadoop systems. SpatialHadoop supports several spatial operations efficiently (e.g. k Nearest Neighbor search, spatial intersection join, etc.). Distance Join Queries (DJQs), e.g. k Nearest Neighbors Join Query, k Closest Pairs Query, etc., are important and common operations used in numerous spatial applications. DJQs are costly operations, since they combine joins with distance-based search. Therefore, performing DJQs efficiently is a challenging task. In this paper, a new partitioning technique based on Voronoi Diagrams is designed and implemented in SpatialHadoop. A new kNNJQ MapReduce algorithm and an improved kCPQ MapReduce algorithm, using the new partitioning mechanism, are also developed for SpatialHadoop. Finally, the results of an extensive set of experiments are presented, demonstrating that the new partitioning technique and the new DJQ MapReduce algorithms are efficient, scalable and robust in SpatialHadoop.


Data partitioning k Nearest Neighbors Join k Closest Pairs SpatialHadoop MapReduce 


  1. 1.
    Aji, A., Vo, H., Wang, F.: Effective spatial data partitioning for scalable query processing. CoRR abs/1509.00910 (2015)Google Scholar
  2. 2.
    Akdogan, A., Demiryurek, U., Kashani, F.B., Shahabi, C.: Voronoi-based geospatial query processing with MapReduce. In: CloudCom Conference, pp. 9–16 (2010)Google Scholar
  3. 3.
    Böhm, C., Krebs, F.: The k-nearest neighbour join: turbo charging the KDD process. Knowl. Inf. Syst. 6(6), 728–749 (2004)CrossRefGoogle Scholar
  4. 4.
    Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Closest pair queries in spatial databases. In: SIGMOD Conference, pp. 189–200 (2000)Google Scholar
  5. 5.
    Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Algorithms for processing k-closest-pair queries in spatial databases. Data Knowl. Eng. 49(1), 67–104 (2004)CrossRefGoogle Scholar
  6. 6.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI Conference, pp. 137–150 (2004)Google Scholar
  7. 7.
    Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in spatial hadoop. PVLDB 8(12), 1602–1613 (2015)Google Scholar
  8. 8.
    Eldawy, A., Mokbel, M.F.: SpatialHadoop: a MapReduce framework for spatial data. In: ICDE Conference, pp. 1352–1363 (2015)Google Scholar
  9. 9.
    García-García, F., Corral, A., Iribarne, L., Vassilakopoulos, M., Manolopoulos, Y.: Enhancing SpatialHadoop with closest pair queries. In: Pokorný, J., Ivanović, M., Thalheim, B., Šaloun, P. (eds.) ADBIS 2016. LNCS, vol. 9809, pp. 212–225. Springer, Cham (2016). Scholar
  10. 10.
    García-García, F., Corral, A., Iribarne, L., Vassilakopoulos, M., Manolopoulos, Y.: Efficient large-scale distance-based join queries in SpatialHadoop. GeoInformatica 22(2), 171–209 (2018)CrossRefGoogle Scholar
  11. 11.
    Kim, W., Kim, Y., Shim, K.: Parallel computation of k-nearest neighbor joins using MapReduce. In: Big Data Conference, pp. 696–705 (2016)Google Scholar
  12. 12.
    Lu, W., Shen, Y., Chen, S., Ooi, B.C.: Efficient processing of k nearest neighbor joins using MapReduce. PVLDB 5(10), 1016–1027 (2012)Google Scholar
  13. 13.
    Nodarakis, N., Pitoura, E., Sioutas, S., Tsakalidis, A., Tsoumakos, D., Tzimas, G.: kdANN+: a rapid AkNN classifier for big data. In: Hameurlain, A., Küng, J., Wagner, R., Decker, H., Lhotska, L., Link, S. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV. LNCS, vol. 9510, pp. 139–168. Springer, Heidelberg (2016). Scholar
  14. 14.
    Song, G., Rochas, J., Beze, L.E., Huet, F., Magoulès, F.: K nearest neighbour joins for big data on mapreduce: a theoretical and experimental analysis. IEEE Trans. Knowl. Data Eng. 28(9), 2376–2392 (2016)CrossRefGoogle Scholar
  15. 15.
    Vo, H., Aji, A., Wang, F.: SATO: a spatial data partitioning framework for scalable query processing. In: SIGSPATIAL Conference, pp. 545–548 (2014)Google Scholar
  16. 16.
    Zhang, C., Li, F., Jestes, J.: Efficient parallel kNN joins for large data in MapReduce. In: EDBT Conference, pp. 38–49 (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Francisco García-García
    • 1
  • Antonio Corral
    • 1
    Email author
  • Luis Iribarne
    • 1
  • Michael Vassilakopoulos
    • 2
  1. 1.Department of InformaticsUniversity of AlmeriaAlmeriaSpain
  2. 2.Department of Electrical and Computer EngineeringUniversity of ThessalyVolosGreece

Personalised recommendations