Abstract
Given two datasets of points (called Query and Training), the Group (K) Nearest-Neighbor (GKNN) query retrieves (K) points of the Training with the smallest sum of distances to every point of the Query. This spatial query has been studied during the recent years and several performance improving techniques and pruning heuristics have been proposed. In previous work, we presented the first MapReduce algorithm, consisting of alternating local and parallel phases, which can be used to effectively process the GKNN query when the Query fits in memory, while the Training one belongs to the Big Data category. In this paper, we present a significantly improved algorithm that incorporates a new high-performance refining method, a fast way to calculate distance sums for pruning purposes and several other minor coding and algorithmic improvements. Moreover, we transform this algorithm (which has been implemented in the Hadoop framework) to SpatialHadoop (a popular distributed framework that is dedicated to spatial processing), using a novel two-level partitioning method. Using real world and synthetic datasets, we also present a thorough experimental study of the Hadoop and SpatialHadoop versions of the algorithm, including a backstage analysis of the algorithm’s performance, using metrics that highlight its internal functioning. Finally, we present an experimental comparison of the Hadoop, the SpatialHadoop versions and the version of our previous work, showing that the improved versions are the big winners, with the SpatialHadoop one being faster than its Hadoop counterpart.
Similar content being viewed by others
Notes
A “local” phase means that it is executed locally on the Master Node machine (Name Node) only.
Phase 1 Reducer only performs a simple summation.
gradf(x, y) or \(\nabla f(x,y)=(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y})\).
References
Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The r*-tree: An efficient and robust access method for points and rectangles. In: SIGMOD Conference, pp. 322–331 (1990)
Chen, C.L.P., Zhang, C.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: OSDI Conference, pp. 137–150 (2004)
Eldawy, A., Mokbel, M.F.: Spatialhadoop: A mapreduce framework for spatial data. In: ICDE Conference, pp. 1352–1363 (2015)
Elmongui, H.G., Mokbel, M.F., Aref, W.G.: Continuous aggregate nearest neighbor queries. GeoInformatica 17(1), 63–95 (2013)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Gani, A., Siddiqa, A., Shamshirband, S., Hanum, F.: A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl. Inf. Syst. 46(2), 241–284 (2016)
Garcia-Garcia, F., Corral, A., Iribarne, L., Vassilakopoulos, M.: MRSLICE: efficient rknn query processing in spatialhadoop. In: MEDI Conference, pp. 235–250 (2019)
García-García, F., Corral, A., Iribarne, L., Vassilakopoulos, M.: Improving distance-join query processing with voronoi-diagram based partitioning in spatialhadoop. Future Gener. Comput. Syst. 111, 723–740 (2020)
Garcia-Garcia, F., Corral, A., Iribarne, L., Vassilakopoulos, M., Manolopoulos, Y.: Efficient large-scale distance-based join queries in spatialhadoop. GeoInformatica 22(2), 171–209 (2018)
Garcia-Garcia, F., Corral, A., Iribarne, L., Vassilakopoulos, M., Manolopoulos, Y.: Efficient distance join query processing in distributed spatial data management systems. Inf. Sci. 512, 985–1008 (2020)
Guo, F., Yuan, Y., Wang, G., Chen, L., Lian, X., Wang, Z.: Cohesive group nearest neighbor queries over road-social networks. In: ICDE Conference, pp. 434–445 (2019)
Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: SIGMOD Conference, pp. 47–57 (1984)
Hashem, T., Kulik, L., Zhang, R.: Privacy preserving group nearest neighbor queries. In: EDBT Conference, pp. 489–500. ACM (2010)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Jiang, T., Gao, Y., Zhang, B., Liu, Q., Chen, L.: Reverse top-k group nearest neighbor search. In: WAIM Conference, pp. 429–439. Springer (2013)
Kalyvas, C., Maragoudakis, M.: Skyline and reverse skyline query processing in spatialhadoop. Data Knowl. Eng. 122, 55–80 (2019)
Li, F., Ooi, B.C., Özsu, M.T., Wu, S.: Distributed data management using mapreduce. ACM Comput. Surv. 46(3), 31:1–31:42 (2014)
Li, F., Yao, B., Kumar, P.: Group enclosing queries. IEEE Trans. Knowl. Data Eng. 23(10), 1526–1540 (2011)
Li, H., Lu, H., Huang, B., Huang, Z.: Two ellipse-based pruning methods for group nearest neighbor queries. In: ACM-GIS Conference, pp. 192–199. ACM (2005)
Li, J., Thomsen, J.R., Yiu, M.L., Mamoulis, N.: Efficient notification of meeting points for moving groups via independent safe regions. IEEE Trans. Knowl. Data Eng. 27(7), 1767–1781 (2015)
Li, J., Wang, B., Wang, G., Bi, X.: Efficient processing of probabilistic group nearest neighbor query on uncertain data. In: DASFAA Conference, pp. 436–450. Springer (2014)
Lian, X., Chen, L.: Probabilistic group nearest neighbor queries in uncertain databases. IEEE Trans. Knowl. Data Eng. 20(6), 809–824 (2008)
Liu, X., Chen, F., Lu, C.: Robust prediction and outlier detection for spatial datasets. In: ICDM Conference, pp. 469–478 (2012)
Liu, Z., Wang, C., Wang, J.: Aggregate nearest neighbor queries in uncertain graphs. World Wide Web 17(1), 161–188 (2014)
Luo, Y., Chen, H., Furuse, K., Ohbo, N.: Efficient methods in finding aggregate nearest neighbor by projection-based filtering. In: ICCSA Conference, pp. 821–833. Springer (2007)
Malik, S.U.R., Khan, S.U., Ewen, S.J., Tziritas, N., Kolodziej, J., Zomaya, A.Y., Madani, S.A., Min-Allah, N., Wang, L., Xu, C., Malluhi, Q.M., Pecero, J.E., Balaji, P., Vishnu, A., Ranjan, R., Zeadally, S., Li, H.: Performance analysis of data intensive cloud systems based on data management and replication: a survey. Distrib. Parallel Databases 34(2), 179–215 (2016)
Moutafis, P., García-García, F., Mavrommatis, G., Vassilakopoulos, M., Corral, A., Iribarne, L.: Mapreduce algorithms for the K group nearest-neighbor query. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, SAC 2019, Limassol, Cyprus, April 8-12, 2019, pp. 448–455 (2019)
Namnandorj, S., Chen, H., Furuse, K., Ohbo, N.: Efficient bounds in finding aggregate nearest neighbors. In: DEXA Conference, pp. 693–700. Springer (2008)
Nghiem, T.P., Green, D., Taniar, D.: Peer-to-peer group k-nearest neighbours in mobile ad-hoc networks. In: ICPADS Conference, pp. 166–173 (2013)
Papadias, D., Shen, Q., Tao, Y., Mouratidis, K.: Group nearest neighbor queries. In: ICDE Conference, pp. 301–312. IEEE (2004)
Papadias, D., Tao, Y., Mouratidis, K., Hui, C.K.: Aggregate nearest neighbor queries in spatial databases. ACM Trans. Database Syst. 30(2), 529–576 (2005)
Roumelis, G., Vassilakopoulos, M., Corral, A., Manolopoulos, Y.: Plane-sweep algorithms for the k group nearest-neighbor query. In: GISTAM Conference, pp. 83–93. Scitepress (2015)
Roumelis, G., Vassilakopoulos, M., Corral, A., Manolopoulos, Y.: The k group nearest-neighbor query on non-indexed ram-resident data. In: C. Grueau, J. Gustavo Rocha (eds.) Geographical Information Systems Theory, Applications and Management, pp. 69–89. Springer, New York (2016)
Safar, M.: Group k-nearest neighbors queries in spatial network databases. J. Geogr. Syst. 10(4), 407–416 (2008)
Sultana, N., Hashem, T., Kulik, L.: Group nearest neighbor queries in the presence of obstacles. In: SIGSPATIAL Conference, pp. 481–484 (2014)
Zhang, D., Chan, C., Tan, K.: Nearest group queries. In: SSDBM Conference, p. 7. ACM (2013)
Zhu, L., Jing, Y., Sun, W., Mao, D., Liu, P.: Voronoi-based aggregate nearest neighbor query processing in road networks. In: ACM-GIS Conference, pp. 518–521. ACM (2010)
Acknowledgements
Work of Francisco García-García, Michael Vassilakopoulos, Antonio Corral and Luis Iribarne funded by the MINECO research project [TIN2017-83964-R].
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Moutafis, P., García-García, F., Mavrommatis, G. et al. Algorithms for processing the group K nearest-neighbor query on distributed frameworks. Distrib Parallel Databases 39, 733–784 (2021). https://doi.org/10.1007/s10619-020-07317-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-020-07317-8