Abstract
An all-k-nearest-neighbor (AkNN) query finds k nearest neighbors for each query object. This problem arises naturally in many areas, such as GIS (geographic information system), multimedia retrieval, and recommender systems. To support various data types and flexible distance metrics involved in real applications, we study AkNN retrieval in metric spaces, namely, metric AkNN (MAkNN) search. Consider that the underlying indexes on the query set and the object set may not exist, which is natural in many scenarios. For example, the query set and the object set could be the results of other queries, and thus, the underlying indexes cannot be built in advance. To support MAkNN search on datasets without any underlying index, we propose an efficient disk-based algorithm, termed as Partition-Based MAkNN Algorithm (PMA), which follows a partition-search framework and employs a series of pruning rules for accelerating the search. In addition, we extend our techniques to tackle an interesting variant of MAkNN queries, i.e., metric self-AkNN (MSAkNN) search, where the query set is identical to the object set. Extensive experiments using both real and synthetic datasets demonstrate the effectiveness of our pruning rules and the efficiency of the proposed algorithms, compared with state-of-the-art MAkNN and MSAkNN algorithms.
Similar content being viewed by others
References
Chen H L, Chang Y I. All-nearest-neighbors finding based on the Hilbert curve. Expert Systems with Applications, 2011, 38(6): 7462-7475.
Zhang J, Mamoulis N, Papadias D, Tao Y F. Allnearest-neighbors queries in spatial databases. In Proc. the 16th International Conference on Scientific and Statistical Database Management, June 2004, pp.297-306.
Böhm C, Krebs F. The k-nearest neighbour join: Turbo charging the KDD process. Knowledge and Information Systems, 2004, 6(6): 728-749.
Chen Y, Patel J M. Efficient evaluation of all-nearestneighbor queries. In Proc. the 23rd International Conference on Data Engineering, Apr. 2007, pp.1056-1065.
Emrich T, Graf F, Kriegel H P, Schubert M, Thoma M. Optimizing all-nearest-neighbor queries with trigonometric pruning. In Proc. the 22nd International Conference on Scientific and Statistical Database Management, June 2010, pp.501-518.
Xia C Y, Lu H J, Ooi B C, Hu J. GORDER: An efficient method for KNN join processing. In Proc. the VLDB Conference, Oct. 2004, pp.756-767.
Kybic J, Vnučko I. Approximate all nearest neighbor search for high dimensional entropy estimation for image registration. Signal Processing, 2012, 92(5): 1302-1316.
Lu W, Shen Y Y, Chen S, Ooi B C. Efficient processing of k nearest neighbor joins using MapReduce. Proceedings of the PVLDB Endowment, 2012, 5(10): 1016-1027.
Sankaranarayanan J, Samet H, Varshney A. A fast all nearest neighbor algorithm for applications involving large point-clouds. Computers & Graphics, 2007, 31(2): 157-174.
Nakano K, Olariu S. An optimal algorithm for the anglerestricted all nearest neighbor problem on the reconfigurable mesh, with applications. IEEE Transactions on Parallel and Distributed Systems, 1997, 8(9): 983-990.
Wang Y R, Horng S J, Wu C H. Efficient algorithms for the all nearest neighbor and closest pair problems on the linear array with a reconfigurable pipelined bus system. IEEE Transactions on Parallel and Distributed Systems, 2005, 16(3): 193-206.
Wang Y, Metwally A, Parthasarathy S. Scalable all-pairs similarity search in metric spaces. In Proc. the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2013, pp.829-837.
13] Jain A K, Murthy M N, Flynn P J. Data clustering: A review. ACM Computing Surveys, 1999, 31(3): 264-323.
Nanopoulos A, Theodoridis Y, Manolopoulos Y. C2P: Clustering based on closest pairs. In Proc. the 27th International Conference on Very Large Data Bases, Jan. 2001, pp.331-340.
Aggarwal C C, Yu P S. Outlier detection for high dimensional data. In Proc. the 2001 ACM SIGMOD International Conference on Management of Data, June 2001, pp.37-46.
Álvarez M, Pan A, Raposo J, Bellas F, Cacheda F. Using clustering and edit distance techniques for automatic web data extraction. In Lecture Notes in Computer Science 4831, Benatallah B, Casati F, Georgakopoulos D, Bartolini C, Sadiq W, Godart C (eds.), Springer-Verlag, 2007, pp.212-224.
Lowe D G. Object recognition from local scale-invariant features. In Proc. the 7th IEEE International Conference on Computer Vision, Sept. 1999, pp.1150-1157.
Jacox E H, Samet H. Spatial join techniques. ACM Transaction on Database Systems, 2007, 32(1): Article No. 7.
Yang C, Yu X H, Liu Y. Continuous KNN join processing for real-time recommendation. In Proc. the IEEE International Conference on Data Mining, Dec. 2014, pp.640-649.
Yao B, Li F F, Kumar P. K nearest neighbor queries and KNN-joins in large relational databases (almost) for free. In Proc. the 26th IEEE International Conference on Data Engineering, Mar. 2010, pp.4-15.
Yu C, Cui B,Wang S G, Su JW. Efficient index-based KNN join processing for high-dimensional data. Information and Software Technology, 2007, 49(4): 332-344.
Yu C, Zhang R, Huang Y C, Xiong H. High-dimensional KNN joins with incremental updates. Geoinformatica, 2010, 14(1): 55-82.
23] Zhang C, Li F, Jestes J. Efficient parallel KNN joins for large data in MapReduce. In Proc. the 15th International Conference on Extending Database Technology, Jan. 2012, pp.38-49.
Clarkson K L. Nearest neighbor queries in metric spaces. Discrete & Computational Geometry, 1999, 22(1): 63-93.
Figueroa K. An efficient algorithm to all k nearest neighbor problem in metric spaces [M.S. Thesis]. Universidad Michoacana, 2000. (in Spanish)
Paredes R, Chávez E, Figueroa K, Navarro G. Practical construction of k-nearest neighbor graphs in metric spaces. In Lecture Notes in Computer Science 4007, Álvarez C, Serna M (eds.), Springer-Verlag, 2006, pp.85-97.
Chen L, Gao Y J, Chen G, Zhang H D. Metric all-knearest-neighbor search. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(1): 98-112.
28] Park Y, Park S, Lee S G, Jung W. Scalable k-nearest neighbor graph construction based on greedy filtering. In Proc. the 22nd International Conference on World Wide Web, May 2013, pp.227-228.
Emrich T, Kriegel H P, Kröger P, Niedermayer J, Renz M, Züfle A. Reverse-k-nearest-neighbor join processing. In Lecture Notes in Computer Science 8098, Nascimento M A, Sellis T, Cheng R, Sander J, Zheng Y, Kriegel H P, Renz M, Sengstock C (eds.), Springer-Verlag, 2013, pp.277-294.
Song G, Rochas J, Huet F, Magoulés F. Solutions for processing k nearest neighbor joins for massive data on MapReduce. In Proc. the Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Mar. 2015, pp.279-287.
Jang M, Shin Y S, Chang J W. A grid-based k-nearest neighbor join for large scale datasets on MapReduce. In Proc. the 17th IEEE International Conference on High Performance Computing and Communications (HPCC), the 7th IEEE International Symposium on Cyberspace Safety and Security (CSS), the 12th IEEE International Conference on Embedded Software and Systems (ICESS), Aug. 2015, pp.888-891.
Miranda N, Chávez E, Piccoli M F, Reyes N. (Very) Fast (all) k-nearest neighbors in metric and non metric spaces without indexing. In Lecture Notes in Computer Science 8199, Brisaboa N, Pedreira O, Zezula P (eds.), Springer-Verlag, 2013, pp.300-311.
Chávez E, Navarro G, Baeza-Yates R, Marroquín J L. Searching in metric spaces. ACM Computing Surveys, 2001, 33(3): 273-321.
Chen L, Gao Y J, Li X H, Jensen C S, Chen G. Efficient metric indexing for similarity search. In Proc. the 31st IEEE International Conference on Data Engineering, Apr. 2015, pp.591-602.
Achtert E, Kriegel H P, Kröger P, Renz M, Züfle A. Reverse k-nearest neighbor search in dynamic and general metric databases. In Proc. the 12th International Conference on Extending Database Technology: Advances in Database Technology, Jan. 2009, pp.886-897.
Tao Y F, Yiu M L, Mamoulis N. Reverse nearest neighbor search in metric spaces. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(9): 1239-1252.
Jacox E H, Samet H. Metric space similarity joins. ACM Transactions on Database Systems, 2008, 33(2): Article No. 7.
Silva Y N, Pearson S. Exploiting database similarity joins for metric spaces. Proceedings of the VLDB Endowment, 2012, 5(12): 1922-1925.
Chen L, Lian X. Efficient processing of metric skyline queries. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(3): 351-365.
Tiakas E, Valkanas G, Papadopoulos A N, Manolopoulos Y, Gunopulos D. Metric-based top-k dominating queries. In Proc. the 17th International Conference on Extending Database Technology, Mar. 2014, pp.415-426.
Bustos B, Navarro G, Chávez E. Pivot selection techniques for proximity searching in metric spaces. Pattern Recognition Letters, 2003, 24(14): 2357-2366.
Mao R, Xu H L, Wu W B, Li J Q, Li Y, Lu M H. Overcoming the challenge of variety: Big data abstraction, the next evolution of data management for AAL communication systems. IEEE Communications Magazine, 2015, 53(1): 42-47.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, HD., Xing, ZH., Chen, L. et al. Efficient Metric All-k-Nearest-Neighbor Search on Datasets Without Any Index. J. Comput. Sci. Technol. 31, 1194–1211 (2016). https://doi.org/10.1007/s11390-016-1692-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-016-1692-9