Advertisement

RkNN Query Processing in Distributed Spatial Infrastructures: A Performance Study

  • Francisco García-García
  • Antonio CorralEmail author
  • Luis Iribarne
  • Michael Vassilakopoulos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10563)

Abstract

The Reverse k-Nearest Neighbor (RkNN) problem, i.e. finding all objects in a dataset that have a given query point among their corresponding k-nearest neighbors, has received increasing attention in the past years. RkNN queries are of particular interest in a wide range of applications such as decision support systems, resource allocation, profile-based marketing, location-based services, etc. With the current increasing volume of spatial data, it is difficult to perform RkNN queries efficiently in spatial data-intensive applications, because of the limited computational capability and storage resources. In this paper, we investigate how to design and implement distributed RkNN query algorithms using shared-nothing spatial cloud infrastructures as SpatialHadoop and LocationSpark. SpatialHadoop is a framework that inherently supports spatial indexing on top of Hadoop to perform efficiently spatial queries. LocationSpark is a recent spatial data processing system built on top of Spark. We have evaluated the performance of the distributed RkNN query algorithms on both SpatialHadoop and LocationSpark with big real-world datasets. The experiments have demonstrated the efficiency and scalability of our proposal in both distributed spatial data management systems, showing the performance advantages of LocationSpark.

Keywords

Spatial data processing RNNQ SpatialHadoop LocationSpark 

References

  1. 1.
    Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.H.: Hadoop-GIS: a high performance spatial data warehousing system over MapReduce. PVLDB 6(11), 1009–1020 (2013)Google Scholar
  2. 2.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI Conference, pp. 137–150 (2004)Google Scholar
  3. 3.
    Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in SpatialHadoop. PVLDB 8(12), 1602–1613 (2015)Google Scholar
  4. 4.
    Eldawy, A., Mokbel, M.F.: SpatialHadoop: a MapReduce framework for spatial data. In: ICDE Conference, pp. 1352–1363 (2015)Google Scholar
  5. 5.
    García-García, F., Corral, A., Iribarne, L., Vassilakopoulos, M., Manolopoulos, Y.: Enhancing SpatialHadoop with closest pair queries. In: Pokorný, J., Ivanović, M., Thalheim, B., Šaloun, P. (eds.) ADBIS 2016. LNCS, vol. 9809, pp. 212–225. Springer, Cham (2016). doi: 10.1007/978-3-319-44039-2_15 CrossRefGoogle Scholar
  6. 6.
    Ji, C., Hu, H., Xu, Y., Li, Y., Qu, W.: Efficient multi-dimensional spatial RkNN query processing with MapReduce. In: ChinaGrid Conference, pp. 63–68 (2013)Google Scholar
  7. 7.
    Ji, C., Qu, W., Li, Z., Xu, Y., Li, Y., Wu, J.: Scalable multi-dimensional RNN query processing. Concurr. Comput.: Pract. Exp. 27(16), 4156–4171 (2015)CrossRefGoogle Scholar
  8. 8.
    Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. In: SIGMOD Conference, pp. 201–212 (2000)Google Scholar
  9. 9.
    Li, F., Ooi, B.C., Özsu, M.T., Wu, S.: Distributed data management using MapReduce. ACM Comput. Surv. 46(3), 1–42 (2014)Google Scholar
  10. 10.
    Singh, A., Ferhatosmanoglu, H., Tosun, H.S.: High dimensional reverse nearest neighbor queries. In: CIKM Conference, pp. 91–98 (2003)Google Scholar
  11. 11.
    Stanoi, I., Agrawal, D., El Abbadi, A.: Reverse nearest neighbor queries for dynamic databases, pp. 44–53. In: SIGMOD Workshop on Research Issues, Data Mining and Knowledge Discovery (2000)Google Scholar
  12. 12.
    Tang, M., Yu, Y., Malluhi, Q.M., Ouzzani, M., Aref, W.G.: LocationSpark: a distributed in-memory data management system for big spatial data. PVLDB 9(13), 1565–1568 (2016)Google Scholar
  13. 13.
    Tao, Y., Papadias, D., Lian, X.: Reverse kNN search in arbitrary dimensionality. In: VLBD Conference, pp. 744–755 (2004)Google Scholar
  14. 14.
    Wu, W., Yang, F., Chan, C.Y., Tan, K.L.: FINCH: evaluating reverse k-Nearest-Neighbor queries on location data. PVLDB 1(1), 1056–1067 (2008)Google Scholar
  15. 15.
    Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In: SIGMOD Conference, pp. 1071–1085 (2016)Google Scholar
  16. 16.
    Yang, S., Cheema, M.A., Lin, X., Wang, W.: Reverse k nearest neighbors query processing: experiments and analysis. PVLDB 8(5), 605–616 (2015)Google Scholar
  17. 17.
    Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI Conference, pp. 15–28 (2012)Google Scholar
  18. 18.
    Zhang, H., Chen, G., Ooi, B.C., Tan, K.-L., Zhang, M.: In-memory big data management and processing: a survey. TKDE 27(7), 1920–1948 (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Francisco García-García
    • 1
  • Antonio Corral
    • 1
    Email author
  • Luis Iribarne
    • 1
  • Michael Vassilakopoulos
    • 2
  1. 1.Department of InformaticsUniversity of AlmeriaAlmeriaSpain
  2. 2.Department of Electrical and Computer EngineeringUniversity of ThessalyVolosGreece

Personalised recommendations