Enhancing SpatialHadoop with Closest Pair Queries

  • Francisco García-García
  • Antonio Corral
  • Luis Iribarne
  • Michael Vassilakopoulos
  • Yannis Manolopoulos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9809)

Abstract

Given two datasets P and Q, the K Closest Pair Query (KCPQ) finds the K closest pairs of objects from \(P \times Q\). It is an operation widely adopted by many spatial and GIS applications. As a combination of the K Nearest Neighbor (KNN) and the spatial join queries, KCPQ is an expensive operation. Given the increasing volume of spatial data, it is difficult to perform a KCPQ on a centralized machine efficiently. For this reason, this paper addresses the problem of computing the KCPQ on big spatial datasets in SpatialHadoop, an extension of Hadoop that supports spatial operations efficiently, and proposes a novel algorithm in SpatialHadoop to perform efficient parallel KCPQ on large-scale spatial datasets. We have evaluated the performance of the algorithm in several situations with big synthetic and real-world datasets. The experiments have demonstrated the efficiency and scalability of our proposal.

Keywords

Closest pair queries Spatial data processing SpatialHadoop MapReduce 

References

  1. 1.
    Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Closest pair queries in spatial databases. In: SIGMOD Conference, pp. 189–200 (2000)Google Scholar
  2. 2.
    Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Algorithms for processing \(K\)-closest-pair queries in spatial databases. Data Knowl. Eng. 49(1), 67–104 (2004)CrossRefGoogle Scholar
  3. 3.
    Nanopoulos, A., Theodoridis, Y., Manolopoulos, Y.: C\(^2\)P: clustering based on closest pairs. In: VLDB Confernece, pp. 331–340 (2001)Google Scholar
  4. 4.
    Gao, Y., Chen, L., Li, X., Yao, B., Chen, G.: Efficient \(k\)-closest pair queries in general metric spaces. VLDB J. 24(3), 415–439 (2015)CrossRefGoogle Scholar
  5. 5.
    Roumelis, G., Vassilakopoulos, M., Corral, A., Manolopoulos, Y.: A new plane-sweep algorithm for the K-closest-pairs query. In: Geffert, V., Preneel, B., Rovan, B., Štuller, J., Tjoa, A.M. (eds.) SOFSEM 2014. LNCS, vol. 8327, pp. 478–490. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  6. 6.
    Zhang, S., Han, J., Liu, Z., Wang, K., Xu, Z.: SJMR: parallelizing spatial join with MapReduce on clusters. In: CLUSTER Conference, pp. 1–8 (2009)Google Scholar
  7. 7.
    You, S., Zhang, J., Gruenwald, L.: Spatial join query processing in cloud: analyzing design choices and performance comparisons. In: ICPP Conference, pp. 90–97 (2015)Google Scholar
  8. 8.
    Zhang, C., Li, F., Jestes, J.: Efficient parallel \(k\)-NN joins for large data in MapReduce. In: EDBT Conference, pp. 38–49 (2012)Google Scholar
  9. 9.
    Lu, W., Shen, Y., Chen, S., Ooi, B.C.: Efficient processing of \(k\) nearest neighbor joins using MapReduce. PVLDB 5(10), 1016–1027 (2012)Google Scholar
  10. 10.
    Kim, Y., Shim, K.: Parallel top-\(K\) similarity join algorithms using MapReduce. In: ICDE Conference, pp. 510–521 (2012)Google Scholar
  11. 11.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI Conference, pp. 137–150 (2004)Google Scholar
  12. 12.
    Li, F., Ooi, B.C., Özsu, M.T., Wu, S.: Distributed data management using MapReduce. ACM Comput. Surv. 46(3), 31:1–31:42 (2014)Google Scholar
  13. 13.
    Doulkeridis, C., Nørvåg, K.: A survey of large-scale analytical query processing in MapReduce. VLDB J. 23(3), 355–380 (2014)CrossRefGoogle Scholar
  14. 14.
    Eldawy, A., Mokbel, M.F.: SpatialHadoop: a MapReduce framework for spatial data. In: ICDE Conference, pp. 1352–1363 (2015)Google Scholar
  15. 15.
    Pertesis, D., Doulkeridis, C.: Efficient skyline query processing in SpatialHadoop. Inf. Syst. 54, 325–335 (2015)CrossRefGoogle Scholar
  16. 16.
    Lu, J., Güting, R.H.: Parallel secondo: boosting database engines with hadoop. In: ICPADS Conference, pp. 738–743 (2012)Google Scholar
  17. 17.
    Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.H.: Hadoop-GIS: a high performance spatial data warehousing system over MapReduce. PVLDB 6(11), 1009–1020 (2013)Google Scholar
  18. 18.
    Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive - a warehousing solution over a MapReduce framework. PVLDB 2(2), 1626–1629 (2009)Google Scholar
  19. 19.
    You, S., Zhang, J., Gruenwald, L.: Large-scale spatial join query processing in cloud. In: ICDE Workshops, pp. 34–41 (2015)Google Scholar
  20. 20.
    Ma, Q., Yang, B., Qian, W., Zhou, A.: Query processing of massive trajectory data based on MapReduce. In: CloudDB Conference, pp. 9–16 (2009)Google Scholar
  21. 21.
    Zhang, S., Han, J., Liu, Z., Wang, K., Feng, S.: Spatial queries evaluation with MapReduce. In: GCC Conference, pp. 287–292 (2009)Google Scholar
  22. 22.
    Akdogan, A., Demiryurek, U., Kashani, F.B., Shahabi, C.: Voronoi-based geospatial query processing with MapReduce. In: CloudCom Conference, pp. 9–16 (2010)Google Scholar
  23. 23.
    Wang, K., Han, J., Tu, B., Dai, J., Zhou, W., Song, X.: Accelerating spatial data processing with MapReduce. In: ICPADS Conference, pp. 229–236 (2010)Google Scholar
  24. 24.
    Patel, J.M., DeWitt, D.J.: Partition based spatial-merge join. In: SIGMOD Conference, pp. 259–270 (1996)Google Scholar
  25. 25.
    Park, Y., Min, J.K., Shim, K.: Parallel computation of skyline and reverse skyline queries using MapReduce. PVLDB 6(14), 2002–2013 (2013)Google Scholar
  26. 26.
    Eldawy, A., Li, Y., Mokbel, M.F., Janardan, R.: CG_Hadoop: computational geometry in MapReduce. In: SIGSPATIAL Conference, pp. 284–293 (2013)Google Scholar
  27. 27.
    Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in SpatialHadoop. PVLDB 8(12), 1602–1613 (2015)Google Scholar
  28. 28.
    Gutierrez, G., Sáez, P.: The \(k\) closest pairs in spatial databases - When only one set is indexed. GeoInformatica 17(4), 543–565 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Francisco García-García
    • 1
  • Antonio Corral
    • 1
  • Luis Iribarne
    • 1
  • Michael Vassilakopoulos
    • 2
  • Yannis Manolopoulos
    • 3
  1. 1.Department of InformaticsUniversity of AlmeriaAlmeriaSpain
  2. 2.Department of Electrical and Computer EngineeringUniversity of ThessalyVolosGreece
  3. 3.Department of InformaticsAristotle UniversityThessalonikiGreece

Personalised recommendations