Indexing the bit-code and distance for fast KNN search in high-dimensional spaces

Liang, Jun-jie; Feng, Yu-cai

doi:10.1631/jzus.2007.A0857

Indexing the bit-code and distance for fast KNN search in high-dimensional spaces

Published: 01 May 2007

Volume 8, pages 857–863, (2007)
Cite this article

Journal of Zhejiang University-SCIENCE A Aims and scope Submit manuscript

Liang Jun-jie^1,2 &
Feng Yu-cai¹

51 Accesses
3 Citations
Explore all metrics

Abstract

Various index structures have recently been proposed to facilitate high-dimensional KNN queries, among which the techniques of approximate vector presentation and one-dimensional (1D) transformation can break the curse of dimensionality. Based on the two techniques above, a novel high-dimensional index is proposed, called Bit-code and Distance based index (BD). BD is based on a special partitioning strategy which is optimized for high-dimensional data. By the definitions of bit code and transformation function, a high-dimensional vector can be first approximately represented and then transformed into a 1D vector, the key managed by a B⁺-tree. A new KNN search algorithm is also proposed that exploits the bit code and distance to prune the search space more effectively. Results of extensive experiments using both synthetic and real data demonstrated that BD outperforms the existing index structures for KNN search in high-dimensional spaces.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Berchtold, S., Bohm, C., Kriegel, H.P., 1998. The Pyramid-Technique: Towards Breaking the Curse of Dimensionality. Proc. ACM SIGMOD Int. Conf. on Management of Data, p.142–153.
Beyer, K., Goldstein, J., Ramakrishnam, R., 1999. When is “Nearest Neighbor” Meaningful? Proc. 7th Int’l Conf. Database Theory, p.1–11.
Chavez, E., Navarro, G., Baeza-Yates, R., Marroquin, J., 2001. Searching in metric spaces. ACM Computing Surveys, 33(3):273–321. [doi:10.1145/502807.502808]
Article Google Scholar
Cui, B., Shen, H.T., Shen, J., Tan, K.L., 2005. Exploring Bit-Difference for Approximate KNN Search in High-dimensional Databases. Proc. 16th Australian Database Conference, p.165–174.
Fonseca, M.J., Jorge, J.A., 2003. Indexing High-dimensional Data for Content-based Retrieval in Large Databases. Proc. 8th International Conference on Database Systems for Advanced Applications, p.267–274.
Guha, S., Rastogi, R., Shim, K., 1998. Cure: An Efficient Clustering Algorithm for Large Databases. Proc. ACM SIGMOD Int. Conf. on Management of Data, p.73–84.
Hinneburg, A., Aggarwal, C.C., Keim, D.A., 2000. What is the Nearest Neighbor in High-dimensional Spaces. Proc. 26th Int. Conf. on Very Large Data Bases, p.506–515.
Hjaltson, G.R., Samet, H., 2003. Index-driven similarity search in metric spaces. ACM Trans. on Database Syst., 28(4):517–580. [doi:10.1145/958942.958948]
Article Google Scholar
Jin, H., Ooi, B.C., Shen, H.T., Yu, C., Zhou, A.Y., 2003. An Adaptive and Efficient Dimensionality Reduction Algorithm for High-dimensional Indexing. Proc. Int’l Conf. Data Eng., p.87–98.
Jolliffe, I.T., 1986. Principal Component Analysis. Springer-Verlag, New York.
Book MATH Google Scholar
Weber, R., Schek, H.J., Blott, S., 1998. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-dimensional Spaces. Proc. 24th Int. Conf. on Very Large Data Bases, p.194–205.
Yu, C., Ooi, B.C., Tan, K.L., Jagadish, H.V., 2001. Indexing the Distance: An Efficient Method to KNN Processing. Proc. 27th Int. Conf. on Very Large Data Bases, p.421–430.
Yu, C., Bressan, S., Ooi, B.C., Tan, K.L., 2004. Querying high-dimensional data in single dimensional space. Int. J. Very Large Data Bases, 13(2):105–119.
Article Google Scholar
Zhang, R., Ooi, B.C., Tan, K.L., 2004. Making the Pyramid Technique Robust to Query Types and Workloads. Proc. Int. Conf. on Data Eng., p.313–324. [doi:10.1109/ICDE.2004.1320007]

Download references

Author information

Authors and Affiliations

College of Computer Science & Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Liang Jun-jie & Feng Yu-cai
Faculty of Mathematics & Computer Science, Hubei University, Wuhan, 430062, China
Liang Jun-jie

Authors

Liang Jun-jie
View author publications
You can also search for this author in PubMed Google Scholar
Feng Yu-cai
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Project (No. [2005]555) supported by the Hi-Tech Research and Development Program (863) of China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liang, Jj., Feng, Yc. Indexing the bit-code and distance for fast KNN search in high-dimensional spaces. J. Zhejiang Univ. - Sci. A 8, 857–863 (2007). https://doi.org/10.1631/jzus.2007.A0857

Download citation

Received: 22 August 2006
Accepted: 05 December 2006
Published: 01 May 2007
Issue Date: May 2007
DOI: https://doi.org/10.1631/jzus.2007.A0857

Key words

CLC number

TP311

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Indexing the bit-code and distance for fast KNN search in high-dimensional spaces

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Various dimension reduction techniques for high dimensional data analysis: a review

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

Indexing the bit-code and distance for fast KNN search in high-dimensional spaces

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Various dimension reduction techniques for high dimensional data analysis: a review

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation