Skip to main content
Log in

An adaptive and dynamic dimensionality reduction method for high-dimensional indexing

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

The notorious “dimensionality curse” is a well-known phenomenon for any multi-dimensional indexes attempting to scale up to high dimensions. One well-known approach to overcome degradation in performance with respect to increasing dimensions is to reduce the dimensionality of the original dataset before constructing the index. However, identifying the correlation among the dimensions and effectively reducing them are challenging tasks. In this paper, we present an adaptive Multi-level Mahalanobis-based Dimensionality Reduction (MMDR) technique for high-dimensional indexing. Our MMDR technique has four notable features compared to existing methods. First, it discovers elliptical clusters for more effective dimensionality reduction by using only the low-dimensional subspaces. Second, data points in the different axis systems are indexed using a single B +-tree. Third, our technique is highly scalable in terms of data size and dimension. Finally, it is also dynamic and adaptive to insertions. An extensive performance study was conducted using both real and synthetic datasets, and the results show that our technique not only achieves higher precision, but also enables queries to be processed efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Yu, C.: High-dimensional indexing. Ph.D. thesis, Department of Computer Science, National University of Singapore (2001)

  2. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbors meaningful? In: ICDT, pp. 217–235 (1999)

  3. Ooi, B.C., Tan, K.L., Yu, C., Bressan, S.: Indexing the edges: a simple and yet efficient approach to high-dimensional indexing. In: PODS, pp. 166–174 (2000)

  4. Chakrabarti, K., Mehrotra, S.: Local dimensionality reduction: a new approach to indexing high dimensional spaces. In: VLDB, pp. 89–100 (2000)

  5. Jolliffe, I.T.: Principal Component Analysis. Springer-Verlag, Berlin Heidelberg New York (1986)

    Google Scholar 

  6. Yu, C., Ooi, B.C., Tan, K.L., Jagadish, H.V.: Indexing the distance: an efficient method to KNN processing. In: VLDB, pp. 166–174 (2001)

  7. Jin, H., Ooi, B.C., Shen, H.T., Yu, C., Zhou, A.: An adaptive and efficient dimensionality reduction algorithm for high-dimensional indexing. In: ICDE, pp. 87–98 (2003)

  8. Böhm, C., Berchtold, S., Keim, D.: Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)

    Article  Google Scholar 

  9. Weber, R., Schek, H., Blott, S.: A quantitative analysis and performance study for similarity search methods in high dimensional spaces. In: VLDB, pp. 194–205 (1998)

  10. Berchtold, S., Böhm, C., Kriegel, H.-P.: The pyramid-technique: towards breaking the curse of dimensionality. In: SIGMOD, pp. 142–153 (1998)

  11. Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. In: SIGMOD, pp. 61–72 (1999)

  12. Hinneburg, A., Keim, D.A.: An optimal grid-clustering: towards breaking the curse of diminsionality in high dimensional clustering. In: VLDB, pp. 506–517 (1999)

  13. Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. In: SIGMOD, pp. 193–204 (1999)

  14. Lee, J.H., Kim, D.H., Chung, C.W.: Multi-dimensional selectivity estimation using compressed histogram information. In: SIGMOD, pp. 205–214 (1999)

  15. Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional spaces. In: ICDT, pp. 420–434 (2001)

  16. Duda, R.: Pattern recognition for HCI. http://www.engr.sjsu.edu/~knapp/

  17. Sung, K.K., Poggio, T.: Example-based learning for view-based human face detection. In: PAMI 20(1), 39–51 (1998)

  18. Wang, J.Z., Wiederhold, G., Firschein, O., Wei, S.X.: Content-based image indexing and searching using daubechies wavelets. In. J. Digital Lib. 1(4), 311–328 (1998)

    Article  Google Scholar 

  19. http://www.virtualdub.org

  20. Wu, Y.-L., Agrawal, D., Abbadi, A.E.: A comparison of DFT and DWT based similarity search in time-series databases. In: CIKM, pp. 488–495 (2000)

  21. Chakrabarti, K., Mehrotra, S.: The hybrid tree: an index structure for high dimensional feature spaces. In: ICDE, pp. 322–331 (1999)

  22. Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The A-tree: an index structure for high-dimensional spaces using relative approximation. In: VLDB, pp. 516–526 (2000)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heng Tao Shen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shen, H.T., Zhou, X. & Zhou, A. An adaptive and dynamic dimensionality reduction method for high-dimensional indexing. The VLDB Journal 16, 219–234 (2007). https://doi.org/10.1007/s00778-005-0167-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-005-0167-3

Keywords

Navigation