Skip to main content
Log in

Indexing the bit-code and distance for fast KNN search in high-dimensional spaces

  • Published:
Journal of Zhejiang University-SCIENCE A Aims and scope Submit manuscript

Abstract

Various index structures have recently been proposed to facilitate high-dimensional KNN queries, among which the techniques of approximate vector presentation and one-dimensional (1D) transformation can break the curse of dimensionality. Based on the two techniques above, a novel high-dimensional index is proposed, called Bit-code and Distance based index (BD). BD is based on a special partitioning strategy which is optimized for high-dimensional data. By the definitions of bit code and transformation function, a high-dimensional vector can be first approximately represented and then transformed into a 1D vector, the key managed by a B+-tree. A new KNN search algorithm is also proposed that exploits the bit code and distance to prune the search space more effectively. Results of extensive experiments using both synthetic and real data demonstrated that BD outperforms the existing index structures for KNN search in high-dimensional spaces.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Berchtold, S., Bohm, C., Kriegel, H.P., 1998. The Pyramid-Technique: Towards Breaking the Curse of Dimensionality. Proc. ACM SIGMOD Int. Conf. on Management of Data, p.142–153.

  • Beyer, K., Goldstein, J., Ramakrishnam, R., 1999. When is “Nearest Neighbor” Meaningful? Proc. 7th Int’l Conf. Database Theory, p.1–11.

  • Chavez, E., Navarro, G., Baeza-Yates, R., Marroquin, J., 2001. Searching in metric spaces. ACM Computing Surveys, 33(3):273–321. [doi:10.1145/502807.502808]

    Article  Google Scholar 

  • Cui, B., Shen, H.T., Shen, J., Tan, K.L., 2005. Exploring Bit-Difference for Approximate KNN Search in High-dimensional Databases. Proc. 16th Australian Database Conference, p.165–174.

  • Fonseca, M.J., Jorge, J.A., 2003. Indexing High-dimensional Data for Content-based Retrieval in Large Databases. Proc. 8th International Conference on Database Systems for Advanced Applications, p.267–274.

  • Guha, S., Rastogi, R., Shim, K., 1998. Cure: An Efficient Clustering Algorithm for Large Databases. Proc. ACM SIGMOD Int. Conf. on Management of Data, p.73–84.

  • Hinneburg, A., Aggarwal, C.C., Keim, D.A., 2000. What is the Nearest Neighbor in High-dimensional Spaces. Proc. 26th Int. Conf. on Very Large Data Bases, p.506–515.

  • Hjaltson, G.R., Samet, H., 2003. Index-driven similarity search in metric spaces. ACM Trans. on Database Syst., 28(4):517–580. [doi:10.1145/958942.958948]

    Article  Google Scholar 

  • Jin, H., Ooi, B.C., Shen, H.T., Yu, C., Zhou, A.Y., 2003. An Adaptive and Efficient Dimensionality Reduction Algorithm for High-dimensional Indexing. Proc. Int’l Conf. Data Eng., p.87–98.

  • Jolliffe, I.T., 1986. Principal Component Analysis. Springer-Verlag, New York.

    Book  MATH  Google Scholar 

  • Weber, R., Schek, H.J., Blott, S., 1998. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-dimensional Spaces. Proc. 24th Int. Conf. on Very Large Data Bases, p.194–205.

  • Yu, C., Ooi, B.C., Tan, K.L., Jagadish, H.V., 2001. Indexing the Distance: An Efficient Method to KNN Processing. Proc. 27th Int. Conf. on Very Large Data Bases, p.421–430.

  • Yu, C., Bressan, S., Ooi, B.C., Tan, K.L., 2004. Querying high-dimensional data in single dimensional space. Int. J. Very Large Data Bases, 13(2):105–119.

    Article  Google Scholar 

  • Zhang, R., Ooi, B.C., Tan, K.L., 2004. Making the Pyramid Technique Robust to Query Types and Workloads. Proc. Int. Conf. on Data Eng., p.313–324. [doi:10.1109/ICDE.2004.1320007]

Download references

Author information

Authors and Affiliations

Authors

Additional information

Project (No. [2005]555) supported by the Hi-Tech Research and Development Program (863) of China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liang, Jj., Feng, Yc. Indexing the bit-code and distance for fast KNN search in high-dimensional spaces. J. Zhejiang Univ. - Sci. A 8, 857–863 (2007). https://doi.org/10.1631/jzus.2007.A0857

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.2007.A0857

Key words

CLC number

Navigation