Fast Search in Large-Scale Image Database Using Vector Quantization
Practical content-based image retrieval systems require efficient indexing schemes for fast searches. Researchers have proposed many methods using space and data partitioning for exact similarity searches. However, traditional indexing methods perform poorly and will degrade to simple sequential scans at high dimensionality — that is so-called “curse of dimensionality”. Recently, several filtering approaches based on vector approximation (VA) were proposed and showed promising performance. In fact, existing VA-based methods assume independent distribution of dataset and utilize scalar quantizer to partition each dimension of data space. In real databases, however, images are from different categories and often clustered. In this paper, a novel indexing method using vector quantization is proposed. This approach introduces a vector quantizer to partition data space. It assumes a Gaussian mixture distribution and estimates this distribution through Expectation-Maximization (EM) method. Experiments on a large database of 275,465 images demonstrated a remarkable improvement of retrieval efficiency.
KeywordsGaussian Mixture Model Near Neighbor Vector Quantization Relevance Feedback Indexing Scheme
Unable to display preview. Download preview PDF.
- Y. Rui, T. S. Huang, M. Ortega, et al.: Relevance feedback: A power tool for interactive content-based image retrieval. IEEE Trans. on CSVT, no. 5, (1998) 644–655Google Scholar
- J. Robinson: The k-d-b-tree: A search structure for large multidimensional dynamic indexes. Proc. of the ACM SIGMOD ICMD (1981) 10–18Google Scholar
- N. Beckmann, H. P. Kriegel, R. Schneider, et al.: The R*-tree: An efficient and robust access method for points and rectangles. Proc. ACM SIGMOD ICMD (1990) 322–331Google Scholar
- N. Katayama, S. Satoh: The SR-tree: An index structure for high-dimensional nearest neighbor queries. Proc. ACM SIGMOD Int. Conf. Management of Data (1997) 369–380Google Scholar
- R. Weber, H. Schek, S. Blott: A quantitative analysis and performance study for simi-larity-search methods in high-dimensional spaces. Proc. ACM VLDB (1998)Google Scholar
- K. Beyer, J. Goldstein, R. Ramakrishnan: When Is ‘Nearest Neighbor’ Meaningful?. Proc. of the 7th International Conference on Database Theory, Jerusalem (1999) 217–235Google Scholar
- H. Ferhatosmanoglu, E. Tuncel, D. Agrawal: Vector Approximation based Indexing for Non-uniform High Dimensional Data Sets. ACM CKIM, McLean, (2000)Google Scholar
- P. Wu, B. Manjunath, S. Chandrasekaran: An adaptive index structure for highdimensional similarity search. Proc. PCM, Beijing, China, (2001) 71–77Google Scholar
- A. Gersho, R. M. Gray: Vector Quantization and Signal Compression. Kluwer Academic (1992)Google Scholar
- E. Forgy: Cluster analysis of multivariate data: Efficiency vs. interpretability of classifica-tions. Biometrics, vol. 21, no. 768, (1965)Google Scholar
- B. S. Manjunath, Aerial photo image database, http://vision.ece.ucsb.edu/datasets/Google Scholar
- B. S. Manjunath, W. Y. Ma: Texture features for browsing and retrieval of image data. IEEE PAMI, vol. 18, no. 8, (1996) 837–842Google Scholar