Indexing Incomplete Databases

  • Guadalupe Canahuate
  • Michael Gibas
  • Hakan Ferhatosmanoglu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3896)

Abstract

Incomplete databases, that is, databases that are missing data, are present in many research domains. It is important to derive techniques to access these databases efficiently. We first show that known indexing techniques for multi-dimensional data search break down in terms of performance when indexed attributes contain missing data. This paper utilizes two popularly employed indexing techniques, bitmaps and quantization, to correctly and efficiently answer queries in the presence of missing data. Query execution and interval evaluation are formalized for the indexing structures based on whether missing data is considered to be a query match or not. The performance of Bitmap indexes and quantization based indexes is evaluated and compared over a variety of analysis parameters for real and synthetic data sets. Insights into the conditions for which to use each technique are provided.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amer-Yahia, S., Johnson, T.: Optimizing queries on compressed bitmaps. The VLDB Journal, 329–338 (2000)Google Scholar
  2. 2.
    Antoshenkov, G.: Byte-aligned bitmap compression. In: Data Compression Conference, Nashua, NH (1995)(Oracle Corp)Google Scholar
  3. 3.
    Antoshenkov, G., Ziauddin, M.: Query processing and optimization in oracle rdb. The VLDB Journal (1996)Google Scholar
  4. 4.
    Chan, C.-Y., Ioannidis, Y.E.: Bitmap index design and evaluation. In: Proceedings of the 1998 ACM SIGMOD international conference on Management of data, pp. 355–366. ACM Press, New York (1998)CrossRefGoogle Scholar
  5. 5.
    Chan, C.-Y., Ioannidis, Y.E.: An efficient bitmap encoding scheme for selection queries. SIGMOD Rec. 28(2), 215–226 (1999)CrossRefGoogle Scholar
  6. 6.
    Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., Abbadi, A.E.: Vector approximation based indexing for non-uniform high dimensional data sets. In: Proceedings of the ninth international conference on Information and knowledge management, pp. 202–209. ACM Press, New York (2000)CrossRefGoogle Scholar
  7. 7.
    Inc, S.: Sybase IQ Indexes., chapter Sybase IQ Release 11.2 Collection, chapter 5. Sybase Inc. (March 1997)Google Scholar
  8. 8.
    Johnson, T.: Performance measurements of compressed bitmap indices. In: Proceedings of the 25th International Conference on Very Large Data Bases, pp. 278–289. Morgan Kaufmann Publishers, San Francisco (1999)Google Scholar
  9. 9.
    Koudas, N.: Space efficient bitmap indexing. In: Proceedings of the ninth international conference on Information and knowledge management, pp. 194–201. ACM Press, New York (2000)CrossRefGoogle Scholar
  10. 10.
    O’Neil, P., Quass, D.: Improved query performance with variant indexes. In: Proceedings of the 1997 ACM SIGMOD international conference on Management of data, pp. 38–49. ACM Press, New York (1997)CrossRefGoogle Scholar
  11. 11.
    O’Neil, P.E.: Model 204 architecture and performance. In: Proceedings of the 2nd International Workshop on High Performance Transaction Systems, pp. 40–59. Springer, Heidelberg (1989)Google Scholar
  12. 12.
    Ooi, B.C., Goh, C.H., Tan, K.-L.: Fast high-dimensional data search in incomplete databases. In: Proceedings of the 24rd International Conference on Very Large Data Bases, pp. 357–367. Morgan Kaufmann Publishers, San Francisco (1998)Google Scholar
  13. 13.
    Stockinger, K.: Bitmap indices for speeding up high-dimensional data analysis. In: Proceedings of the 13th International Conference on Database and Expert Systems Applications, pp. 881–890. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  14. 14.
    Weber, R., Blott, S.: An approximation based data structure for similarity search (1997)Google Scholar
  15. 15.
    Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the 24th International Conference on Very Large Databases, pp. 194–205 (1998)Google Scholar
  16. 16.
    Wu, K., Otoo, E., Shoshani, A.: Compressing bitmap indexes for faster search operations. In: SSDBM (2002)Google Scholar
  17. 17.
    Wu, K., Otoo, E., Shoshani, A.: On the performance of bitmap indices for high cardinality attributes. Technical Report LBNL-54673, Lawrence Berkeley National Laboratory (March 2004)Google Scholar
  18. 18.
    Wu, K., Otoo, E.J., Shoshani, A.: A performance comparison of bitmap indexes. In: Proceedings of the tenth international conference on Information and knowledge management, pp. 559–561. ACM Press, New York (2001)CrossRefGoogle Scholar
  19. 19.
    Wu, K., Otoo, E.J., Shoshani, A., Nordberg, H.: Notes on design and implementation of compressed bit vectors. Technical Report LBNL PUB-3161, Lawrence Berkeley National Laboratory (2001)Google Scholar
  20. 20.
    Wu, M.-C.: Query optimization for selections using bitmaps. In: Proceedings of the 1999 ACM SIGMOD international conference on Management of data, pp. 227–238. ACM Press, New York (1999)CrossRefGoogle Scholar
  21. 21.
    Zimanyi, E.: Incomplete and Uncertain Information in Relational Databases. PhD thesis, Université Libre de Bruxelles (1992)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Guadalupe Canahuate
    • 1
  • Michael Gibas
    • 1
  • Hakan Ferhatosmanoglu
    • 1
  1. 1.Department of Computer Science and EngineeringThe Ohio State University 

Personalised recommendations