Efficient and Flexible Bitmap Indexing for Complex Similarity Queries

  • Guang-Ho Cha
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2973)

Abstract

In this paper, we propose a novel indexing method for complex similarity queries in high-dimensional image and video databases. In order to provide the indexing method with the flexibility in dealing with multiple features and multiple query objects, we treat every dimension independently. The efficiency of our method is realized by a specialized bitmap indexing that represents all objects in a database as a set of bitmaps. The percentage of data accessed in our indexing method is inversely proportional to the overall dimensionality, and thus the performance deterioration with the increasing dimensionality does not occur. To demonstrate the efficacy of our method we conducted extensive experiments and compared the performance with the linear scan by using real image and video datasets, and obtained a remarkable speed-up over the linear scan.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C.C., Yu, P.S.: The IGrid Index: Reversing the Dimensionality Curse for Similarity Indexing in High Dimensional Space. In: Proc. ACM SIGKDD, pp. 119–129 (2000)Google Scholar
  2. 2.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proc. ACM SIGMOD Conf. (1998)Google Scholar
  3. 3.
    Arya, S., et al.: An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions. JACM 45(6), 891–923 (1998)MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: An index structure for high-dimensional data. In: Proc. VLDB Conf., pp. 28–39 (1996)Google Scholar
  5. 5.
    Beyer, K.S., Goldstein, J., Ramakrishan, R., Shaft, U.: When is nearest neighbor meaningful? In: Proc. Int’l Conf. on Database Theory, pp. 217–235 (1999)Google Scholar
  6. 6.
    Cha, G.-H., Chung, C.-W.: The GC-Tree: A High-Dimensional Index Structure for Similarity Search in Image Databases. IEEE Trans. on Multimedia 4(2), 235–247 (2002)CrossRefGoogle Scholar
  7. 7.
    Cha, G.-H., Zhu, X., Petkovic, D., Chung, C.-W.: An Efficient Indexing Method for Nearest Neighbor Searches in High-Dimensional Image Databases. IEEE Trans. on Multimedia 4(1), 76–87 (2002)CrossRefGoogle Scholar
  8. 8.
    Cha, G.-H., Chung, C.-W.: Object-Oriented Retrieval Mechanism for Semistructured Image Collections. In: Proc. ACM Multimedia Conf., pp. 323–332 (1998)Google Scholar
  9. 9.
    Chakrabarti, K., Mehrotra, S.: Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces. In: Proc. of the Int’l Conf. on VLDB, pp. 89–100 (2000)Google Scholar
  10. 10.
    Chen, M.C., McNamee, L., Matloff, N.: Selectivity Estimation Using Homogeneity Measurement. In: Proc. IEEE Data Engineering, pp. 304–310 (1990)Google Scholar
  11. 11.
    Fagin, R.: Combining Fuzzy Information from Multiple Systems. In: Proc. ACM Symp. on PODS, pp. 216–226 (1996)Google Scholar
  12. 12.
    Ferhatosmanoglu, H., et al.: Vector approximation based indexing for nonuniform high dimensional datasets. In: Proc. ACM CIKM, pp. 202–209 (2000)Google Scholar
  13. 13.
    Hinneburg, A.: Aggarwal, C.C., Keim, D.A.: What is the nearest neighbor in high dimensional spaces? In: Proc. VLDB Conf. (2000)Google Scholar
  14. 14.
    Indyk, P., Motwani, R.: Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In: Proc. of the ACM Symp. Theory of Computing, pp. 604–613 (1998)Google Scholar
  15. 15.
    Kanth, K.V.R., Agrawal, D., Singh, A., EI Abbadi, A.: Dimensionality Reduction for Similarity Searching in Dynamic Databases. In: Proc. of ACM SIGMOD Conf., pp. 166–176 (1998)Google Scholar
  16. 16.
    Kushilevitz, E., Ostrovsky, R., Rabani, Y.: Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces. In: Proc. of the ACM STOC, pp. 614–623 (1998)Google Scholar
  17. 17.
    Lai, W.-C., Chang, C., Chang, E., Cheng, K.-T., Crandell, M.: PBIR-MM: Multimodal Image Retrieval and Annotation. In: Proc. ACM Multimedia Conf., pp. 421–422 (2002)Google Scholar
  18. 18.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. 5th Berkeley Symp. Math. Statist, Prob., vol. 1, pp. 281–297 (1967)Google Scholar
  19. 19.
    Muralikrichna, M., DeWitt, D.: Equi-depth Histogram for Estimating Selectivity Factors for Multidimensional Queries. In: Proc. ACM SIGMOD Int’l Conf., pp. 28–36 (1988)Google Scholar
  20. 20.
    O’Neil, P.E., Quass, D.: Improved Query Performance with Variant Indexes. In: Proc. ACM SIGMOD Int’l Conf., pp. 38–49 (1997)Google Scholar
  21. 21.
    Piatetsky, S.G., Connell, G.: Accurate Estimation of the Number of Tuples Satisfying a Condition. In: Proc. ACM SIGMOD Int’l Conf., pp. 256–276 (1984)Google Scholar
  22. 22.
    Tuncel, E., Ferhatosmanoglu, H., Rose, K.: VQ-index: An index structure for similarity searching in multimedia databases. In: Proc. ACM Multimedia Conf., pp. 543–552 (2002)Google Scholar
  23. 23.
    Weber, R., Schek, H.-J., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: Proc. of VLDB Conf., pp. 194–205 (1998)Google Scholar
  24. 24.
    White, D., Jain, R.: Similarity indexing with the SS-tree. Proc. IEEE Data Engineering, 516–523 (1996)Google Scholar
  25. 25.
    Wu, L., Faloutsos, C., Sycara, K., Payne, T.R.: FALCON: Feedback Adaptive Loop for Content-Based Retrieval. In: Proc. of VLDB Conf. (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Guang-Ho Cha
    • 1
  1. 1.Department of Multimedia ScienceSookmyung Women’s UniversitySeoulSouth Korea

Personalised recommendations