Abstract
Furthest Neighbor search in high-dimensional space has been widely used in many applications such as recommendation systems. Because of the “curse of dimensionality” problem, c-approximate furthest neighbor (C-AFN) is a substitute as a trade-off between result accuracy and efficiency. However, most of the current techniques for external memory are only suitable for low-dimensional space.
In this paper, we propose a novel algorithm called reverse incremental LSH based on Indyk’s LSH scheme to solve the problem with theoretical guarantee. Unlike the previous methods using hashing scheme, reverse incremental LSH (RI-LSH) is designed for external memory and can achieve a good performance on I/O cost. We provide rigorous theoretical analysis to prove that RI-LSH can return a \(c\)-AFN result with a constant possibility. Our comprehensive experiment results show that, compared with other \(c\)-AFN methods with theoretical guarantee, our algorithm can achieve better I/O efficiency.
Keywords
- Locality-sensitive hashing
- Furthest neighbour search
- Similarity search
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agarwal, P.K., Matoušek, J., Suri, S.: Farthest neighbors, maximum spanning trees and related problems in higher dimensions. Comput. Geom. 1(4), 189–201 (1992)
Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: ACM SIGMOD Record, vol. 19, pp. 322–331. ACM (1990)
Bentley, J.L.: Multidimensional binary search trees in database applications. IEEE Trans. Softw. Eng. 4, 333–340 (1979)
Bespamyatnikh, S.: Dynamic algorithms for approximate neighbor searching. In: CCCG, pp. 252–257 (1996)
Curtin, R.R., et al.: MLPACK: a scalable C++ machine learning library. J. Mach. Learn. Res. 14, 801–805 (2013)
Curtin, R.R., Gardner, A.B.: Fast approximate furthest neighbors with data-dependent hashing. arXiv preprint arXiv:1605.09784 (2016)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262. ACM (2004)
Gan, J., Feng, J., Fang, Q., Ng, W.: Locality-sensitive hashing scheme based on dynamic collision counting. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 541–552. ACM (2012)
Huang, Q., Feng, J., Fang, Q., Ng, W.: Two efficient hashing schemes for high-dimensional furthest neighbor search. IEEE Trans. Knowl. Data Eng. 29(12), 2772–2785 (2017)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613. ACM (1998)
Pagh, R., Silvestri, F., Sivertsen, J., Skala, M.: Approximate furthest neighbor with application to annulus query. Inf. Syst. 64, 152–162 (2017)
Said, A., Fields, B., Jain, B.J., Albayrak, S.: User-centric evaluation of a k-furthest neighbor collaborative filtering recommender algorithm. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 1399–1408. ACM (2013)
Said, A., Kille, B., Jain, B.J., Albayrak, S.: Increasing diversity through furthest neighbor-based recommendation. In: Proceedings of the WSDM 2012 (2012)
Vasiloglou, N., Gray, A.G., Anderson, D.V.: Scalable semidefinite manifold learning. In: 2008 IEEE Workshop on Machine Learning for Signal Processing, pp. 368–373. IEEE (2008)
Yao, B., Li, F., Kumar, P.: Reverse furthest neighbors in spatial databases. In: 2009 IEEE 25th International Conference on Data Engineering, pp. 664–675. IEEE (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, W., Wang, H., Zhang, Y., Qin, L., Zhang, W. (2020). I/O Efficient Algorithm for c-Approximate Furthest Neighbor Search in High-Dimensional Space. In: Nah, Y., Cui, B., Lee, SW., Yu, J.X., Moon, YS., Whang, S.E. (eds) Database Systems for Advanced Applications. DASFAA 2020. Lecture Notes in Computer Science(), vol 12114. Springer, Cham. https://doi.org/10.1007/978-3-030-59419-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-59419-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59418-3
Online ISBN: 978-3-030-59419-0
eBook Packages: Computer ScienceComputer Science (R0)