Skip to main content

Abstract

k Nearest Neighbor (kNN) search is one of the simplest non-parametric learning approaches, mainly used for classification and regression. kNN identifies the k nearest neighbors to a given node given a distance metric. A new challenging kNN task is to identify the k nearest neighbors for all nodes simultaneously; also known as All kNN (AkNN) search. Similarly, the Continuous All kNN (CAkNN) search answers an AkNN search in real-time on streaming data. Although such techniques find immediate application in computational intelligence tasks, among others, they have not been efficiently optimized to this date. We study specialized scalable solutions for AkNN and CAkNN processing as demanded by the volume–velocity-variety of data in the Big Data era. We present an algorithm, coined Proximity, which does not require any additional infrastructure or specialized hardware, and its efficiency is mainly attributed to our smart search space sharing technique. Its implementation is based on a novel data structure, coined k +-heap. Proximity, being parameter-free, performs efficiently in the face of high velocity and skewed data. In our analytical studies, we found that Proximity provides better time complexity compared to existing approaches and is very well suited for large scale scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Using other geometric shapes (e.g., hexagons, Voronoi polygons, grid-rectangles, etc.) for space partitioning is outside the scope of this paper.

  2. 2.

    The location of a user can be determined either by fine-grain means (e.g., AGPS) or by coarse-grain means (e.g., fingerprint-based geo-location [36]).

References

  1. Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: Proceedings of the ACM SIGMOD international conference on management of data, ser. SIGMOD ‘95. New York, USA: ACM, pp. 71–79 (1995)

    Google Scholar 

  2. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (2006)

    Article  Google Scholar 

  3. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)

    MathSciNet  Google Scholar 

  4. Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 18, 515–516 (1968)

    Article  Google Scholar 

  5. Shang, W., Huang, H., Zhu, H., Lin, Y., Wang, Z., Qu, Y.: An Improved kNN Algorithm—Fuzzy kNN. Computational Intell. Secur., Lect. Notes Comput. Sci. 3801, 741–746 (2005)

    Article  Google Scholar 

  6. Callahan, P.B.: Optimal parallel all-nearest-neighbors using the well-separated pair decomposition. In: Proceedings of the 1993 IEEE 34th annual foundations of computer science: IEEE Computer Society, pp. 332–340. Washington, DC (1993)

    Google Scholar 

  7. Clarkson, K.L.: Fast algorithms for the all nearest neighbors problem. Foundations of Computer Science, Annual IEEE Symposium on, vol. 83, pp. 226–232 (1983)

    Google Scholar 

  8. Gabow, H.N., Bentley, J.L., Tarjan, R.E.: Scaling and related techniques for geometry problems. In: Proceedings of the sixteenth annual ACM symposium on theory of computing, ser. STOC ‘84. New York ACM, pp. 135–143 (1984)

    Google Scholar 

  9. Lai, T.H., Sheng, M.-J.: Constructing euclidean minimum spanning trees and all nearest neighbors on reconfigurable meshes. IEEE Trans. Parallel Distrib. Syst. 7(8), 806–817 (1996)

    Article  Google Scholar 

  10. Wang, Y.-R., Horng, S.-J., Wu, C.-H.: Efficient algorithms for the all nearest neighbor and closest pair problems on the linear array with a reconfigurable pipelined bus system. IEEE Trans. Parallel Distrib. Syst. 16, 193–206 (2005)

    Article  Google Scholar 

  11. Chen, Y., Patel, J.: Efficient evaluation of all-nearest-neighbor queries, in Data Engineering. ICDE 2007. IEEE 23rd International Conference on, Apr. 2007, pp. 1056–1065 (2007)

    Google Scholar 

  12. Zhang, J., Mamoulis, N., Papadias, D., Tao, Y.: All-nearest-neighbors queries. In: International conference on spatial databases, scientific and statistical database management, vol. 0, p. 297 (2004)

    Google Scholar 

  13. Deb, K.: Multi-Objective optimization using evolutionary algorithms. Wiley, New York (2002)

    Google Scholar 

  14. Mao, J., Jain, K.: Artificial neural networks for feature extraction and multivariate data projection. IEEE Trans. Neural Netw. 6(2), 296–317 (1995)

    Google Scholar 

  15. Hansen, P., Mladenovic, N.: Variable neighborhood search. In: Editors: Fred W Glover, Gary A Kochenberger.(eds.) Handbook of Metaheuristics, pp. 145–184. Kluwer, Netherlands (2003)

    Google Scholar 

  16. Zhang, Q., Li, H., MOEA/D.: A Multi-objective evolutionary algorithm based on decomposition. In: IEEE Transactions on evolutionary computation (2007)

    Google Scholar 

  17. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA II, IEEE TEC (2002)

    Google Scholar 

  18. Federal Communications Commission—Enhanced 911 website Jan 2014. [Online]. Available: http://www.fcc.gov/pshs/services/911-services/enhanced911/

  19. Department of transportation: Intelligent transportation systems new generation 911 website Jan 2014. [Online]. Available. http://www.its.dot.gov/NG911/

  20. Rayzit website (Jan 2014). [Online]. Available. http://www.rayzit.com

  21. Waze website Jan 2014. [Online]. Available: Waze. http://www.waze.com/

  22. Hoffer, J., Ramesh, V., Topi, H.: Modern database management (2013)

    Google Scholar 

  23. Smart metering entity website (Jan 2014). [Online]. Available. http://www.smi-ieso.ca/mdmr

  24. Popular science: Inside google’s quest to popularize self-driving cars article Jan 2014. [Online]. Available. http://www.popsci.com/cars/article/2013-09/google-self-driving-car

  25. Lu, W., Shen, Y., Chen, S., Ooi, B.C.: Efficient processing of k nearest neighbor joins using mapreduce. Proc. VLDB Endow. 5(10), 1016–1027 (2012)

    Article  Google Scholar 

  26. Zhang, C., Li, F., Jestes, J.: Efficient parallel knn joins for large data in mapreduce. In: Proceedings of the 15th international conference on extending database technology, ser. EDBT ‘12. New York ACM, pp. 38–49 (2012)

    Google Scholar 

  27. Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. OSDI 2004, 137–150 (2004)

    Google Scholar 

  28. Boehm, C., Krebs, F.: The k-nearest neighbour join: Turbo charging the kdd process. Knowl. Inf. Syst. 6(6), 728–749 (2004)

    Article  Google Scholar 

  29. Seiffert, U., Schleif, F.-M., Zühlke, D.: Recent trends in computational intelligence in life sciences In ESANN (2011)

    Google Scholar 

  30. Thomas, S., Jin, Y.: Reconstructing biological gene regulatory networks: where optimization meets big data, Evolutionary Intelligence, pp. 1–19 (2013)

    Google Scholar 

  31. Witold Pedrycz.: Granular computing: Analysis and design of intelligent systems. In CRC Press (2013)

    Google Scholar 

  32. Ranzato, Q.Le., Monga, R., Devin, M., Chen, K., Corrado, G., Dean, J., Ng, A.: Building high-level features using large scale unsupervised learning. In: International conference in machine learning (2012)

    Google Scholar 

  33. Hall, L.O., Chawla, N., Bowyer, K.W.: Decision tree learning on very large data sets. In: IEEE international conference on system, man and cybernetics (SMC), pp. 187–222 (1998)

    Google Scholar 

  34. Patil, D.V., Bichkar, R.S., A hybrid evolutionary approach to construct optimal decision trees with large data sets. In: IEEE international conference on industrial technology, pp. 429–433 (2006)

    Google Scholar 

  35. Lu, Y.-L., Fahn, C.-S.: Hierarchical artificial neural networks for recognizing high similar large data sets. In: International conference on machine learning and cybernetics, vol. 7, pp. 1930–1935 (2007)

    Google Scholar 

  36. Geolocation API website Jan 2014. [Online]. Available. http://code.google.com/apis/gears/api_geolocation.html

  37. Vaidya, P.M.: An o(n log n) algorithm for the all-nearest-neighbors problem. Discrete, Computational Geom. 4, 101–115 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  38. Xia, C., Lu, H., Ooi, B.C., Hu, J., Gorder: an efficient method for knn join processing. In: Proceedings of the 13th international conference on Very large data basesvol 30, ser. VLDB ‘04. VLDB Endowment, pp. 756–767 (2004)

    Google Scholar 

  39. Yao, B., Li, F., Kumar, P.: K nearest neighbor queries and knn-joins in large relational databases (almost) for free. In: Data engineering (ICDE), 2010 IEEE 26th international conference on, pp. 4–15 (2010)

    Google Scholar 

  40. Yu, C., Cui, B., Wang, S., Su, J.: Efficient index-based knn join processing for high-dimensional data. Inf. Softw. Technol. 49(4), 332–344 (2007)

    Article  Google Scholar 

  41. Yu, X., Q.K., Pu, Koudas, N.: Monitoring k-nearest neighbor queries over moving objects. In: Proceedings of the 21st international conference on data engineering ser. ICDE ‘05 IEEE computer society, pp. 631–642 Washington, DC (2005)

    Google Scholar 

  42. Mouratidis, K., Papadias, D., Hadjieleftheriou, M., Conceptual partitioning: an efficient method for continuous nearest neighbor monitoring. In: Proceedings of the ACM SIGMOD international conference on management of data, ser. SIGMOD ‘05. New York: ACM, pp. 634–645 (2005)

    Google Scholar 

  43. Chatzimilioudis, G., Zeinalipour-Yazti, D., Lee, W.-C., Dikaiakos, M. D.: Continuous all k-nearest neighbor querying in smartphone networks. In: 13th international conference on mobile data management (MDM’12) 2012

    Google Scholar 

  44. Rappaport, T.: Wireless communications: principles and practice, 2nd edn. Prentice Hall PTR, Upper Saddle River, NJ (2001)

    Google Scholar 

  45. Universal mobile telephone system world website Jan 2014. [Online]. Available. http://www.umtsworld.com/technology/capacity.htm

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgios Chatzimilioudis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Chatzimilioudis, G., Konstantinidis, A., Zeinalipour-Yazti, D. (2015). Nearest Neighbor Queries on Big Data. In: Pedrycz, W., Chen, SM. (eds) Information Granularity, Big Data, and Computational Intelligence. Studies in Big Data, vol 8. Springer, Cham. https://doi.org/10.1007/978-3-319-08254-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08254-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08253-0

  • Online ISBN: 978-3-319-08254-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics