Skip to main content
Log in

Efficient Metric All-k-Nearest-Neighbor Search on Datasets Without Any Index

  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

An all-k-nearest-neighbor (AkNN) query finds k nearest neighbors for each query object. This problem arises naturally in many areas, such as GIS (geographic information system), multimedia retrieval, and recommender systems. To support various data types and flexible distance metrics involved in real applications, we study AkNN retrieval in metric spaces, namely, metric AkNN (MAkNN) search. Consider that the underlying indexes on the query set and the object set may not exist, which is natural in many scenarios. For example, the query set and the object set could be the results of other queries, and thus, the underlying indexes cannot be built in advance. To support MAkNN search on datasets without any underlying index, we propose an efficient disk-based algorithm, termed as Partition-Based MAkNN Algorithm (PMA), which follows a partition-search framework and employs a series of pruning rules for accelerating the search. In addition, we extend our techniques to tackle an interesting variant of MAkNN queries, i.e., metric self-AkNN (MSAkNN) search, where the query set is identical to the object set. Extensive experiments using both real and synthetic datasets demonstrate the effectiveness of our pruning rules and the efficiency of the proposed algorithms, compared with state-of-the-art MAkNN and MSAkNN algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Chen H L, Chang Y I. All-nearest-neighbors finding based on the Hilbert curve. Expert Systems with Applications, 2011, 38(6): 7462-7475.

    Article  Google Scholar 

  2. Zhang J, Mamoulis N, Papadias D, Tao Y F. Allnearest-neighbors queries in spatial databases. In Proc. the 16th International Conference on Scientific and Statistical Database Management, June 2004, pp.297-306.

  3. Böhm C, Krebs F. The k-nearest neighbour join: Turbo charging the KDD process. Knowledge and Information Systems, 2004, 6(6): 728-749.

    Article  Google Scholar 

  4. Chen Y, Patel J M. Efficient evaluation of all-nearestneighbor queries. In Proc. the 23rd International Conference on Data Engineering, Apr. 2007, pp.1056-1065.

  5. Emrich T, Graf F, Kriegel H P, Schubert M, Thoma M. Optimizing all-nearest-neighbor queries with trigonometric pruning. In Proc. the 22nd International Conference on Scientific and Statistical Database Management, June 2010, pp.501-518.

  6. Xia C Y, Lu H J, Ooi B C, Hu J. GORDER: An efficient method for KNN join processing. In Proc. the VLDB Conference, Oct. 2004, pp.756-767.

  7. Kybic J, Vnučko I. Approximate all nearest neighbor search for high dimensional entropy estimation for image registration. Signal Processing, 2012, 92(5): 1302-1316.

    Article  Google Scholar 

  8. Lu W, Shen Y Y, Chen S, Ooi B C. Efficient processing of k nearest neighbor joins using MapReduce. Proceedings of the PVLDB Endowment, 2012, 5(10): 1016-1027.

    Article  Google Scholar 

  9. Sankaranarayanan J, Samet H, Varshney A. A fast all nearest neighbor algorithm for applications involving large point-clouds. Computers & Graphics, 2007, 31(2): 157-174.

    Article  Google Scholar 

  10. Nakano K, Olariu S. An optimal algorithm for the anglerestricted all nearest neighbor problem on the reconfigurable mesh, with applications. IEEE Transactions on Parallel and Distributed Systems, 1997, 8(9): 983-990.

    Article  Google Scholar 

  11. Wang Y R, Horng S J, Wu C H. Efficient algorithms for the all nearest neighbor and closest pair problems on the linear array with a reconfigurable pipelined bus system. IEEE Transactions on Parallel and Distributed Systems, 2005, 16(3): 193-206.

    Article  Google Scholar 

  12. Wang Y, Metwally A, Parthasarathy S. Scalable all-pairs similarity search in metric spaces. In Proc. the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2013, pp.829-837.

  13. 13] Jain A K, Murthy M N, Flynn P J. Data clustering: A review. ACM Computing Surveys, 1999, 31(3): 264-323.

  14. Nanopoulos A, Theodoridis Y, Manolopoulos Y. C2P: Clustering based on closest pairs. In Proc. the 27th International Conference on Very Large Data Bases, Jan. 2001, pp.331-340.

  15. Aggarwal C C, Yu P S. Outlier detection for high dimensional data. In Proc. the 2001 ACM SIGMOD International Conference on Management of Data, June 2001, pp.37-46.

  16. Álvarez M, Pan A, Raposo J, Bellas F, Cacheda F. Using clustering and edit distance techniques for automatic web data extraction. In Lecture Notes in Computer Science 4831, Benatallah B, Casati F, Georgakopoulos D, Bartolini C, Sadiq W, Godart C (eds.), Springer-Verlag, 2007, pp.212-224.

  17. Lowe D G. Object recognition from local scale-invariant features. In Proc. the 7th IEEE International Conference on Computer Vision, Sept. 1999, pp.1150-1157.

  18. Jacox E H, Samet H. Spatial join techniques. ACM Transaction on Database Systems, 2007, 32(1): Article No. 7.

  19. Yang C, Yu X H, Liu Y. Continuous KNN join processing for real-time recommendation. In Proc. the IEEE International Conference on Data Mining, Dec. 2014, pp.640-649.

  20. Yao B, Li F F, Kumar P. K nearest neighbor queries and KNN-joins in large relational databases (almost) for free. In Proc. the 26th IEEE International Conference on Data Engineering, Mar. 2010, pp.4-15.

  21. Yu C, Cui B,Wang S G, Su JW. Efficient index-based KNN join processing for high-dimensional data. Information and Software Technology, 2007, 49(4): 332-344.

    Article  Google Scholar 

  22. Yu C, Zhang R, Huang Y C, Xiong H. High-dimensional KNN joins with incremental updates. Geoinformatica, 2010, 14(1): 55-82.

    Article  Google Scholar 

  23. 23] Zhang C, Li F, Jestes J. Efficient parallel KNN joins for large data in MapReduce. In Proc. the 15th International Conference on Extending Database Technology, Jan. 2012, pp.38-49.

  24. Clarkson K L. Nearest neighbor queries in metric spaces. Discrete & Computational Geometry, 1999, 22(1): 63-93.

    Article  MathSciNet  MATH  Google Scholar 

  25. Figueroa K. An efficient algorithm to all k nearest neighbor problem in metric spaces [M.S. Thesis]. Universidad Michoacana, 2000. (in Spanish)

  26. Paredes R, Chávez E, Figueroa K, Navarro G. Practical construction of k-nearest neighbor graphs in metric spaces. In Lecture Notes in Computer Science 4007, Álvarez C, Serna M (eds.), Springer-Verlag, 2006, pp.85-97.

  27. Chen L, Gao Y J, Chen G, Zhang H D. Metric all-knearest-neighbor search. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(1): 98-112.

    Article  Google Scholar 

  28. 28] Park Y, Park S, Lee S G, Jung W. Scalable k-nearest neighbor graph construction based on greedy filtering. In Proc. the 22nd International Conference on World Wide Web, May 2013, pp.227-228.

  29. Emrich T, Kriegel H P, Kröger P, Niedermayer J, Renz M, Züfle A. Reverse-k-nearest-neighbor join processing. In Lecture Notes in Computer Science 8098, Nascimento M A, Sellis T, Cheng R, Sander J, Zheng Y, Kriegel H P, Renz M, Sengstock C (eds.), Springer-Verlag, 2013, pp.277-294.

  30. Song G, Rochas J, Huet F, Magoulés F. Solutions for processing k nearest neighbor joins for massive data on MapReduce. In Proc. the Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Mar. 2015, pp.279-287.

  31. Jang M, Shin Y S, Chang J W. A grid-based k-nearest neighbor join for large scale datasets on MapReduce. In Proc. the 17th IEEE International Conference on High Performance Computing and Communications (HPCC), the 7th IEEE International Symposium on Cyberspace Safety and Security (CSS), the 12th IEEE International Conference on Embedded Software and Systems (ICESS), Aug. 2015, pp.888-891.

  32. Miranda N, Chávez E, Piccoli M F, Reyes N. (Very) Fast (all) k-nearest neighbors in metric and non metric spaces without indexing. In Lecture Notes in Computer Science 8199, Brisaboa N, Pedreira O, Zezula P (eds.), Springer-Verlag, 2013, pp.300-311.

  33. Chávez E, Navarro G, Baeza-Yates R, Marroquín J L. Searching in metric spaces. ACM Computing Surveys, 2001, 33(3): 273-321.

    Article  Google Scholar 

  34. Chen L, Gao Y J, Li X H, Jensen C S, Chen G. Efficient metric indexing for similarity search. In Proc. the 31st IEEE International Conference on Data Engineering, Apr. 2015, pp.591-602.

  35. Achtert E, Kriegel H P, Kröger P, Renz M, Züfle A. Reverse k-nearest neighbor search in dynamic and general metric databases. In Proc. the 12th International Conference on Extending Database Technology: Advances in Database Technology, Jan. 2009, pp.886-897.

  36. Tao Y F, Yiu M L, Mamoulis N. Reverse nearest neighbor search in metric spaces. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(9): 1239-1252.

    Article  Google Scholar 

  37. Jacox E H, Samet H. Metric space similarity joins. ACM Transactions on Database Systems, 2008, 33(2): Article No. 7.

  38. Silva Y N, Pearson S. Exploiting database similarity joins for metric spaces. Proceedings of the VLDB Endowment, 2012, 5(12): 1922-1925.

    Article  Google Scholar 

  39. Chen L, Lian X. Efficient processing of metric skyline queries. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(3): 351-365.

    Article  Google Scholar 

  40. Tiakas E, Valkanas G, Papadopoulos A N, Manolopoulos Y, Gunopulos D. Metric-based top-k dominating queries. In Proc. the 17th International Conference on Extending Database Technology, Mar. 2014, pp.415-426.

  41. Bustos B, Navarro G, Chávez E. Pivot selection techniques for proximity searching in metric spaces. Pattern Recognition Letters, 2003, 24(14): 2357-2366.

    Article  MATH  Google Scholar 

  42. Mao R, Xu H L, Wu W B, Li J Q, Li Y, Lu M H. Overcoming the challenge of variety: Big data abstraction, the next evolution of data management for AAL communication systems. IEEE Communications Magazine, 2015, 53(1): 42-47.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun-Jun Gao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, HD., Xing, ZH., Chen, L. et al. Efficient Metric All-k-Nearest-Neighbor Search on Datasets Without Any Index. J. Comput. Sci. Technol. 31, 1194–1211 (2016). https://doi.org/10.1007/s11390-016-1692-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-016-1692-9

Keywords

Navigation