Abstract
k Nearest Neighbor (kNN) search is one of the simplest non-parametric learning approaches, mainly used for classification and regression. kNN identifies the k nearest neighbors to a given node given a distance metric. A new challenging kNN task is to identify the k nearest neighbors for all nodes simultaneously; also known as All kNN (AkNN) search. Similarly, the Continuous All kNN (CAkNN) search answers an AkNN search in real-time on streaming data. Although such techniques find immediate application in computational intelligence tasks, among others, they have not been efficiently optimized to this date. We study specialized scalable solutions for AkNN and CAkNN processing as demanded by the volume–velocity-variety of data in the Big Data era. We present an algorithm, coined Proximity, which does not require any additional infrastructure or specialized hardware, and its efficiency is mainly attributed to our smart search space sharing technique. Its implementation is based on a novel data structure, coined k +-heap. Proximity, being parameter-free, performs efficiently in the face of high velocity and skewed data. In our analytical studies, we found that Proximity provides better time complexity compared to existing approaches and is very well suited for large scale scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Using other geometric shapes (e.g., hexagons, Voronoi polygons, grid-rectangles, etc.) for space partitioning is outside the scope of this paper.
- 2.
The location of a user can be determined either by fine-grain means (e.g., AGPS) or by coarse-grain means (e.g., fingerprint-based geo-location [36]).
References
Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: Proceedings of the ACM SIGMOD international conference on management of data, ser. SIGMOD ‘95. New York, USA: ACM, pp. 71–79 (1995)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (2006)
Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)
Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 18, 515–516 (1968)
Shang, W., Huang, H., Zhu, H., Lin, Y., Wang, Z., Qu, Y.: An Improved kNN Algorithm—Fuzzy kNN. Computational Intell. Secur., Lect. Notes Comput. Sci. 3801, 741–746 (2005)
Callahan, P.B.: Optimal parallel all-nearest-neighbors using the well-separated pair decomposition. In: Proceedings of the 1993 IEEE 34th annual foundations of computer science: IEEE Computer Society, pp. 332–340. Washington, DC (1993)
Clarkson, K.L.: Fast algorithms for the all nearest neighbors problem. Foundations of Computer Science, Annual IEEE Symposium on, vol. 83, pp. 226–232 (1983)
Gabow, H.N., Bentley, J.L., Tarjan, R.E.: Scaling and related techniques for geometry problems. In: Proceedings of the sixteenth annual ACM symposium on theory of computing, ser. STOC ‘84. New York ACM, pp. 135–143 (1984)
Lai, T.H., Sheng, M.-J.: Constructing euclidean minimum spanning trees and all nearest neighbors on reconfigurable meshes. IEEE Trans. Parallel Distrib. Syst. 7(8), 806–817 (1996)
Wang, Y.-R., Horng, S.-J., Wu, C.-H.: Efficient algorithms for the all nearest neighbor and closest pair problems on the linear array with a reconfigurable pipelined bus system. IEEE Trans. Parallel Distrib. Syst. 16, 193–206 (2005)
Chen, Y., Patel, J.: Efficient evaluation of all-nearest-neighbor queries, in Data Engineering. ICDE 2007. IEEE 23rd International Conference on, Apr. 2007, pp. 1056–1065 (2007)
Zhang, J., Mamoulis, N., Papadias, D., Tao, Y.: All-nearest-neighbors queries. In: International conference on spatial databases, scientific and statistical database management, vol. 0, p. 297 (2004)
Deb, K.: Multi-Objective optimization using evolutionary algorithms. Wiley, New York (2002)
Mao, J., Jain, K.: Artificial neural networks for feature extraction and multivariate data projection. IEEE Trans. Neural Netw. 6(2), 296–317 (1995)
Hansen, P., Mladenovic, N.: Variable neighborhood search. In: Editors: Fred W Glover, Gary A Kochenberger.(eds.) Handbook of Metaheuristics, pp. 145–184. Kluwer, Netherlands (2003)
Zhang, Q., Li, H., MOEA/D.: A Multi-objective evolutionary algorithm based on decomposition. In: IEEE Transactions on evolutionary computation (2007)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA II, IEEE TEC (2002)
Federal Communications Commission—Enhanced 911 website Jan 2014. [Online]. Available: http://www.fcc.gov/pshs/services/911-services/enhanced911/
Department of transportation: Intelligent transportation systems new generation 911 website Jan 2014. [Online]. Available. http://www.its.dot.gov/NG911/
Rayzit website (Jan 2014). [Online]. Available. http://www.rayzit.com
Waze website Jan 2014. [Online]. Available: Waze. http://www.waze.com/
Hoffer, J., Ramesh, V., Topi, H.: Modern database management (2013)
Smart metering entity website (Jan 2014). [Online]. Available. http://www.smi-ieso.ca/mdmr
Popular science: Inside google’s quest to popularize self-driving cars article Jan 2014. [Online]. Available. http://www.popsci.com/cars/article/2013-09/google-self-driving-car
Lu, W., Shen, Y., Chen, S., Ooi, B.C.: Efficient processing of k nearest neighbor joins using mapreduce. Proc. VLDB Endow. 5(10), 1016–1027 (2012)
Zhang, C., Li, F., Jestes, J.: Efficient parallel knn joins for large data in mapreduce. In: Proceedings of the 15th international conference on extending database technology, ser. EDBT ‘12. New York ACM, pp. 38–49 (2012)
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. OSDI 2004, 137–150 (2004)
Boehm, C., Krebs, F.: The k-nearest neighbour join: Turbo charging the kdd process. Knowl. Inf. Syst. 6(6), 728–749 (2004)
Seiffert, U., Schleif, F.-M., Zühlke, D.: Recent trends in computational intelligence in life sciences In ESANN (2011)
Thomas, S., Jin, Y.: Reconstructing biological gene regulatory networks: where optimization meets big data, Evolutionary Intelligence, pp. 1–19 (2013)
Witold Pedrycz.: Granular computing: Analysis and design of intelligent systems. In CRC Press (2013)
Ranzato, Q.Le., Monga, R., Devin, M., Chen, K., Corrado, G., Dean, J., Ng, A.: Building high-level features using large scale unsupervised learning. In: International conference in machine learning (2012)
Hall, L.O., Chawla, N., Bowyer, K.W.: Decision tree learning on very large data sets. In: IEEE international conference on system, man and cybernetics (SMC), pp. 187–222 (1998)
Patil, D.V., Bichkar, R.S., A hybrid evolutionary approach to construct optimal decision trees with large data sets. In: IEEE international conference on industrial technology, pp. 429–433 (2006)
Lu, Y.-L., Fahn, C.-S.: Hierarchical artificial neural networks for recognizing high similar large data sets. In: International conference on machine learning and cybernetics, vol. 7, pp. 1930–1935 (2007)
Geolocation API website Jan 2014. [Online]. Available. http://code.google.com/apis/gears/api_geolocation.html
Vaidya, P.M.: An o(n log n) algorithm for the all-nearest-neighbors problem. Discrete, Computational Geom. 4, 101–115 (1989)
Xia, C., Lu, H., Ooi, B.C., Hu, J., Gorder: an efficient method for knn join processing. In: Proceedings of the 13th international conference on Very large data bases—vol 30, ser. VLDB ‘04. VLDB Endowment, pp. 756–767 (2004)
Yao, B., Li, F., Kumar, P.: K nearest neighbor queries and knn-joins in large relational databases (almost) for free. In: Data engineering (ICDE), 2010 IEEE 26th international conference on, pp. 4–15 (2010)
Yu, C., Cui, B., Wang, S., Su, J.: Efficient index-based knn join processing for high-dimensional data. Inf. Softw. Technol. 49(4), 332–344 (2007)
Yu, X., Q.K., Pu, Koudas, N.: Monitoring k-nearest neighbor queries over moving objects. In: Proceedings of the 21st international conference on data engineering ser. ICDE ‘05 IEEE computer society, pp. 631–642 Washington, DC (2005)
Mouratidis, K., Papadias, D., Hadjieleftheriou, M., Conceptual partitioning: an efficient method for continuous nearest neighbor monitoring. In: Proceedings of the ACM SIGMOD international conference on management of data, ser. SIGMOD ‘05. New York: ACM, pp. 634–645 (2005)
Chatzimilioudis, G., Zeinalipour-Yazti, D., Lee, W.-C., Dikaiakos, M. D.: Continuous all k-nearest neighbor querying in smartphone networks. In: 13th international conference on mobile data management (MDM’12) 2012
Rappaport, T.: Wireless communications: principles and practice, 2nd edn. Prentice Hall PTR, Upper Saddle River, NJ (2001)
Universal mobile telephone system world website Jan 2014. [Online]. Available. http://www.umtsworld.com/technology/capacity.htm
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Chatzimilioudis, G., Konstantinidis, A., Zeinalipour-Yazti, D. (2015). Nearest Neighbor Queries on Big Data. In: Pedrycz, W., Chen, SM. (eds) Information Granularity, Big Data, and Computational Intelligence. Studies in Big Data, vol 8. Springer, Cham. https://doi.org/10.1007/978-3-319-08254-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-08254-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08253-0
Online ISBN: 978-3-319-08254-7
eBook Packages: EngineeringEngineering (R0)