Fast Search in Large-Scale Image Database Using Vector Quantization

  • Hangjun Ye
  • Guangyou Xu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2728)


Practical content-based image retrieval systems require efficient indexing schemes for fast searches. Researchers have proposed many methods using space and data partitioning for exact similarity searches. However, traditional indexing methods perform poorly and will degrade to simple sequential scans at high dimensionality — that is so-called “curse of dimensionality”. Recently, several filtering approaches based on vector approximation (VA) were proposed and showed promising performance. In fact, existing VA-based methods assume independent distribution of dataset and utilize scalar quantizer to partition each dimension of data space. In real databases, however, images are from different categories and often clustered. In this paper, a novel indexing method using vector quantization is proposed. This approach introduces a vector quantizer to partition data space. It assumes a Gaussian mixture distribution and estimates this distribution through Expectation-Maximization (EM) method. Experiments on a large database of 275,465 images demonstrated a remarkable improvement of retrieval efficiency.


Gaussian Mixture Model Near Neighbor Vector Quantization Relevance Feedback Indexing Scheme 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Y. Rui, T. Huang, S. Chang: Image Retrieval: Current Techniques, Promising Directions and Open Issues. J. of Visual Communication and Image Representation, vol. 10, (1999) 1–23CrossRefGoogle Scholar
  2. [2]
    Y. Rui, T. S. Huang, M. Ortega, et al.: Relevance feedback: A power tool for interactive content-based image retrieval. IEEE Trans. on CSVT, no. 5, (1998) 644–655Google Scholar
  3. [3]
    J. Nievergelt, H. Hinterberger, K. Sevcik: The gridfile: An adaptable symmetric multikey file structure. ACM Transactions on Database Systems, vol. 9, no. 1, (1984) 38–71CrossRefGoogle Scholar
  4. [4]
    J. Robinson: The k-d-b-tree: A search structure for large multidimensional dynamic indexes. Proc. of the ACM SIGMOD ICMD (1981) 10–18Google Scholar
  5. [5]
    N. Beckmann, H. P. Kriegel, R. Schneider, et al.: The R*-tree: An efficient and robust access method for points and rectangles. Proc. ACM SIGMOD ICMD (1990) 322–331Google Scholar
  6. [6]
    N. Katayama, S. Satoh: The SR-tree: An index structure for high-dimensional nearest neighbor queries. Proc. ACM SIGMOD Int. Conf. Management of Data (1997) 369–380Google Scholar
  7. [7]
    R. Weber, H. Schek, S. Blott: A quantitative analysis and performance study for simi-larity-search methods in high-dimensional spaces. Proc. ACM VLDB (1998)Google Scholar
  8. [8]
    K. Beyer, J. Goldstein, R. Ramakrishnan: When Is ‘Nearest Neighbor’ Meaningful?. Proc. of the 7th International Conference on Database Theory, Jerusalem (1999) 217–235Google Scholar
  9. [9]
    D. W. Scott, Density Estimation, Wiley, New York (1992)zbMATHGoogle Scholar
  10. [10]
    H. Ferhatosmanoglu, E. Tuncel, D. Agrawal: Vector Approximation based Indexing for Non-uniform High Dimensional Data Sets. ACM CKIM, McLean, (2000)Google Scholar
  11. [11]
    P. Wu, B. Manjunath, S. Chandrasekaran: An adaptive index structure for highdimensional similarity search. Proc. PCM, Beijing, China, (2001) 71–77Google Scholar
  12. [12]
    G.-H. Cha, X. Zhu, D. Petkovic, et al: An efficient indexing method for nearest neighbor searches in high-dimensional image databases. IEEE Trans. Multimedia, vol. 4, no. 1, (2002) 76–87CrossRefGoogle Scholar
  13. [13]
    A. Gersho, R. M. Gray: Vector Quantization and Signal Compression. Kluwer Academic (1992)Google Scholar
  14. [14]
    T. D. Lookabaugh, R. M. Gray: High-resolution Theory and the Vector Quantizer Advantage. IEEE Trans. On Information Theory, no. 35, (1989) 1020–1033CrossRefMathSciNetGoogle Scholar
  15. [15]
    E. Forgy: Cluster analysis of multivariate data: Efficiency vs. interpretability of classifica-tions. Biometrics, vol. 21, no. 768, (1965)Google Scholar
  16. [16]
    A. P. Dempster, N. M. Laird, D. B. Rubin: Maximum likelihood from incomplete data via the EM algorithm. J. of the Royal Statistical Society B, vol. 39, no. 1, (1977) 1–38zbMATHMathSciNetGoogle Scholar
  17. [17]
    B. S. Manjunath, Aerial photo image database, Scholar
  18. [18]
    B. S. Manjunath, W. Y. Ma: Texture features for browsing and retrieval of image data. IEEE PAMI, vol. 18, no. 8, (1996) 837–842Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Hangjun Ye
    • 1
  • Guangyou Xu
    • 1
  1. 1.Department of Computer Science and TechnologyTsinghua UniversityBeijingChina

Personalised recommendations