Abstract
Various index structures have recently been proposed to facilitate high-dimensional KNN queries, among which the techniques of approximate vector presentation and one-dimensional (1D) transformation can break the curse of dimensionality. Based on the two techniques above, a novel high-dimensional index is proposed, called Bit-code and Distance based index (BD). BD is based on a special partitioning strategy which is optimized for high-dimensional data. By the definitions of bit code and transformation function, a high-dimensional vector can be first approximately represented and then transformed into a 1D vector, the key managed by a B+-tree. A new KNN search algorithm is also proposed that exploits the bit code and distance to prune the search space more effectively. Results of extensive experiments using both synthetic and real data demonstrated that BD outperforms the existing index structures for KNN search in high-dimensional spaces.
Similar content being viewed by others
References
Berchtold, S., Bohm, C., Kriegel, H.P., 1998. The Pyramid-Technique: Towards Breaking the Curse of Dimensionality. Proc. ACM SIGMOD Int. Conf. on Management of Data, p.142–153.
Beyer, K., Goldstein, J., Ramakrishnam, R., 1999. When is “Nearest Neighbor” Meaningful? Proc. 7th Int’l Conf. Database Theory, p.1–11.
Chavez, E., Navarro, G., Baeza-Yates, R., Marroquin, J., 2001. Searching in metric spaces. ACM Computing Surveys, 33(3):273–321. [doi:10.1145/502807.502808]
Cui, B., Shen, H.T., Shen, J., Tan, K.L., 2005. Exploring Bit-Difference for Approximate KNN Search in High-dimensional Databases. Proc. 16th Australian Database Conference, p.165–174.
Fonseca, M.J., Jorge, J.A., 2003. Indexing High-dimensional Data for Content-based Retrieval in Large Databases. Proc. 8th International Conference on Database Systems for Advanced Applications, p.267–274.
Guha, S., Rastogi, R., Shim, K., 1998. Cure: An Efficient Clustering Algorithm for Large Databases. Proc. ACM SIGMOD Int. Conf. on Management of Data, p.73–84.
Hinneburg, A., Aggarwal, C.C., Keim, D.A., 2000. What is the Nearest Neighbor in High-dimensional Spaces. Proc. 26th Int. Conf. on Very Large Data Bases, p.506–515.
Hjaltson, G.R., Samet, H., 2003. Index-driven similarity search in metric spaces. ACM Trans. on Database Syst., 28(4):517–580. [doi:10.1145/958942.958948]
Jin, H., Ooi, B.C., Shen, H.T., Yu, C., Zhou, A.Y., 2003. An Adaptive and Efficient Dimensionality Reduction Algorithm for High-dimensional Indexing. Proc. Int’l Conf. Data Eng., p.87–98.
Jolliffe, I.T., 1986. Principal Component Analysis. Springer-Verlag, New York.
Weber, R., Schek, H.J., Blott, S., 1998. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-dimensional Spaces. Proc. 24th Int. Conf. on Very Large Data Bases, p.194–205.
Yu, C., Ooi, B.C., Tan, K.L., Jagadish, H.V., 2001. Indexing the Distance: An Efficient Method to KNN Processing. Proc. 27th Int. Conf. on Very Large Data Bases, p.421–430.
Yu, C., Bressan, S., Ooi, B.C., Tan, K.L., 2004. Querying high-dimensional data in single dimensional space. Int. J. Very Large Data Bases, 13(2):105–119.
Zhang, R., Ooi, B.C., Tan, K.L., 2004. Making the Pyramid Technique Robust to Query Types and Workloads. Proc. Int. Conf. on Data Eng., p.313–324. [doi:10.1109/ICDE.2004.1320007]
Author information
Authors and Affiliations
Additional information
Project (No. [2005]555) supported by the Hi-Tech Research and Development Program (863) of China
Rights and permissions
About this article
Cite this article
Liang, Jj., Feng, Yc. Indexing the bit-code and distance for fast KNN search in high-dimensional spaces. J. Zhejiang Univ. - Sci. A 8, 857–863 (2007). https://doi.org/10.1631/jzus.2007.A0857
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/jzus.2007.A0857