Abstract
SpatialHadoop is an extended MapReduce framework supporting global indexing techniques that partition spatial data across several machines and improve query processing performance compared to traditional Hadoop systems. SpatialHadoop supports several spatial operations efficiently (e.g. k Nearest Neighbor search, spatial intersection join, etc.). Distance Join Queries (DJQs), e.g. k Nearest Neighbors Join Query, k Closest Pairs Query, etc., are important and common operations used in numerous spatial applications. DJQs are costly operations, since they combine joins with distance-based search. Therefore, performing DJQs efficiently is a challenging task. In this paper, a new partitioning technique based on Voronoi Diagrams is designed and implemented in SpatialHadoop. A new kNNJQ MapReduce algorithm and an improved kCPQ MapReduce algorithm, using the new partitioning mechanism, are also developed for SpatialHadoop. Finally, the results of an extensive set of experiments are presented, demonstrating that the new partitioning technique and the new DJQ MapReduce algorithms are efficient, scalable and robust in SpatialHadoop.
Work funded by the MINECO research project [TIN2017-83964-R].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Available at http://spatialhadoop.cs.umn.edu/datasets.html.
- 2.
Available at https://github.com/aseldawy/spatialhadoop2.
References
Aji, A., Vo, H., Wang, F.: Effective spatial data partitioning for scalable query processing. CoRR abs/1509.00910 (2015)
Akdogan, A., Demiryurek, U., Kashani, F.B., Shahabi, C.: Voronoi-based geospatial query processing with MapReduce. In: CloudCom Conference, pp. 9–16 (2010)
Böhm, C., Krebs, F.: The k-nearest neighbour join: turbo charging the KDD process. Knowl. Inf. Syst. 6(6), 728–749 (2004)
Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Closest pair queries in spatial databases. In: SIGMOD Conference, pp. 189–200 (2000)
Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Algorithms for processing k-closest-pair queries in spatial databases. Data Knowl. Eng. 49(1), 67–104 (2004)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI Conference, pp. 137–150 (2004)
Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in spatial hadoop. PVLDB 8(12), 1602–1613 (2015)
Eldawy, A., Mokbel, M.F.: SpatialHadoop: a MapReduce framework for spatial data. In: ICDE Conference, pp. 1352–1363 (2015)
García-García, F., Corral, A., Iribarne, L., Vassilakopoulos, M., Manolopoulos, Y.: Enhancing SpatialHadoop with closest pair queries. In: Pokorný, J., Ivanović, M., Thalheim, B., Šaloun, P. (eds.) ADBIS 2016. LNCS, vol. 9809, pp. 212–225. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44039-2_15
García-García, F., Corral, A., Iribarne, L., Vassilakopoulos, M., Manolopoulos, Y.: Efficient large-scale distance-based join queries in SpatialHadoop. GeoInformatica 22(2), 171–209 (2018)
Kim, W., Kim, Y., Shim, K.: Parallel computation of k-nearest neighbor joins using MapReduce. In: Big Data Conference, pp. 696–705 (2016)
Lu, W., Shen, Y., Chen, S., Ooi, B.C.: Efficient processing of k nearest neighbor joins using MapReduce. PVLDB 5(10), 1016–1027 (2012)
Nodarakis, N., Pitoura, E., Sioutas, S., Tsakalidis, A., Tsoumakos, D., Tzimas, G.: kdANN+: a rapid AkNN classifier for big data. In: Hameurlain, A., Küng, J., Wagner, R., Decker, H., Lhotska, L., Link, S. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV. LNCS, vol. 9510, pp. 139–168. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49214-7_5
Song, G., Rochas, J., Beze, L.E., Huet, F., Magoulès, F.: K nearest neighbour joins for big data on mapreduce: a theoretical and experimental analysis. IEEE Trans. Knowl. Data Eng. 28(9), 2376–2392 (2016)
Vo, H., Aji, A., Wang, F.: SATO: a spatial data partitioning framework for scalable query processing. In: SIGSPATIAL Conference, pp. 545–548 (2014)
Zhang, C., Li, F., Jestes, J.: Efficient parallel kNN joins for large data in MapReduce. In: EDBT Conference, pp. 38–49 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
García-García, F., Corral, A., Iribarne, L., Vassilakopoulos, M. (2018). Voronoi-Diagram Based Partitioning for Distance Join Query Processing in SpatialHadoop. In: Abdelwahed, E., Bellatreche, L., Golfarelli, M., Méry, D., Ordonez, C. (eds) Model and Data Engineering. MEDI 2018. Lecture Notes in Computer Science(), vol 11163. Springer, Cham. https://doi.org/10.1007/978-3-030-00856-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-00856-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00855-0
Online ISBN: 978-3-030-00856-7
eBook Packages: Computer ScienceComputer Science (R0)