Skip to main content

Voronoi-Diagram Based Partitioning for Distance Join Query Processing in SpatialHadoop

  • Conference paper
  • First Online:
Book cover Model and Data Engineering (MEDI 2018)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11163))

Included in the following conference series:

Abstract

SpatialHadoop is an extended MapReduce framework supporting global indexing techniques that partition spatial data across several machines and improve query processing performance compared to traditional Hadoop systems. SpatialHadoop supports several spatial operations efficiently (e.g. k Nearest Neighbor search, spatial intersection join, etc.). Distance Join Queries (DJQs), e.g. k Nearest Neighbors Join Query, k Closest Pairs Query, etc., are important and common operations used in numerous spatial applications. DJQs are costly operations, since they combine joins with distance-based search. Therefore, performing DJQs efficiently is a challenging task. In this paper, a new partitioning technique based on Voronoi Diagrams is designed and implemented in SpatialHadoop. A new kNNJQ MapReduce algorithm and an improved kCPQ MapReduce algorithm, using the new partitioning mechanism, are also developed for SpatialHadoop. Finally, the results of an extensive set of experiments are presented, demonstrating that the new partitioning technique and the new DJQ MapReduce algorithms are efficient, scalable and robust in SpatialHadoop.

Work funded by the MINECO research project [TIN2017-83964-R].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available at http://spatialhadoop.cs.umn.edu/datasets.html.

  2. 2.

    Available at https://github.com/aseldawy/spatialhadoop2.

References

  1. Aji, A., Vo, H., Wang, F.: Effective spatial data partitioning for scalable query processing. CoRR abs/1509.00910 (2015)

    Google Scholar 

  2. Akdogan, A., Demiryurek, U., Kashani, F.B., Shahabi, C.: Voronoi-based geospatial query processing with MapReduce. In: CloudCom Conference, pp. 9–16 (2010)

    Google Scholar 

  3. Böhm, C., Krebs, F.: The k-nearest neighbour join: turbo charging the KDD process. Knowl. Inf. Syst. 6(6), 728–749 (2004)

    Article  Google Scholar 

  4. Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Closest pair queries in spatial databases. In: SIGMOD Conference, pp. 189–200 (2000)

    Google Scholar 

  5. Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Algorithms for processing k-closest-pair queries in spatial databases. Data Knowl. Eng. 49(1), 67–104 (2004)

    Article  Google Scholar 

  6. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI Conference, pp. 137–150 (2004)

    Google Scholar 

  7. Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in spatial hadoop. PVLDB 8(12), 1602–1613 (2015)

    Google Scholar 

  8. Eldawy, A., Mokbel, M.F.: SpatialHadoop: a MapReduce framework for spatial data. In: ICDE Conference, pp. 1352–1363 (2015)

    Google Scholar 

  9. García-García, F., Corral, A., Iribarne, L., Vassilakopoulos, M., Manolopoulos, Y.: Enhancing SpatialHadoop with closest pair queries. In: Pokorný, J., Ivanović, M., Thalheim, B., Šaloun, P. (eds.) ADBIS 2016. LNCS, vol. 9809, pp. 212–225. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44039-2_15

    Chapter  Google Scholar 

  10. García-García, F., Corral, A., Iribarne, L., Vassilakopoulos, M., Manolopoulos, Y.: Efficient large-scale distance-based join queries in SpatialHadoop. GeoInformatica 22(2), 171–209 (2018)

    Article  Google Scholar 

  11. Kim, W., Kim, Y., Shim, K.: Parallel computation of k-nearest neighbor joins using MapReduce. In: Big Data Conference, pp. 696–705 (2016)

    Google Scholar 

  12. Lu, W., Shen, Y., Chen, S., Ooi, B.C.: Efficient processing of k nearest neighbor joins using MapReduce. PVLDB 5(10), 1016–1027 (2012)

    Google Scholar 

  13. Nodarakis, N., Pitoura, E., Sioutas, S., Tsakalidis, A., Tsoumakos, D., Tzimas, G.: kdANN+: a rapid AkNN classifier for big data. In: Hameurlain, A., Küng, J., Wagner, R., Decker, H., Lhotska, L., Link, S. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV. LNCS, vol. 9510, pp. 139–168. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49214-7_5

    Chapter  Google Scholar 

  14. Song, G., Rochas, J., Beze, L.E., Huet, F., Magoulès, F.: K nearest neighbour joins for big data on mapreduce: a theoretical and experimental analysis. IEEE Trans. Knowl. Data Eng. 28(9), 2376–2392 (2016)

    Article  Google Scholar 

  15. Vo, H., Aji, A., Wang, F.: SATO: a spatial data partitioning framework for scalable query processing. In: SIGSPATIAL Conference, pp. 545–548 (2014)

    Google Scholar 

  16. Zhang, C., Li, F., Jestes, J.: Efficient parallel kNN joins for large data in MapReduce. In: EDBT Conference, pp. 38–49 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Corral .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

García-García, F., Corral, A., Iribarne, L., Vassilakopoulos, M. (2018). Voronoi-Diagram Based Partitioning for Distance Join Query Processing in SpatialHadoop. In: Abdelwahed, E., Bellatreche, L., Golfarelli, M., Méry, D., Ordonez, C. (eds) Model and Data Engineering. MEDI 2018. Lecture Notes in Computer Science(), vol 11163. Springer, Cham. https://doi.org/10.1007/978-3-030-00856-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00856-7_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00855-0

  • Online ISBN: 978-3-030-00856-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics