Breaking the Curse of Cardinality on Bitmap Indexes

  • Kesheng Wu
  • Kurt Stockinger
  • Arie Shoshani
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5069)

Abstract

Bitmap indexes are known to be efficient for ad-hoc range queries that are common in data warehousing and scientific applications. However, they suffer from the curse of cardinality, that is, their efficiency deteriorates as attribute cardinalities increase. A number of strategies have been proposed, but none of them addresses the problem adequately. In this paper, we propose a novel binned bitmap index that greatly reduces the cost to answer queries, and therefore breaks the curse of cardinality. The key idea is to augment the binned index with an Order-preserving Bin-based Clustering (OrBiC) structure. This data structure significantly reduces the I/O operations needed to resolve records that can not be resolved with the bitmaps. To further improve the proposed index structure, we also present a strategy to create single-valued bins for frequent values. This strategy reduces index sizes and improves query processing speed. Overall, the binned indexes with OrBiC great improves the query processing speed, and are 3 – 25 times faster than the best available indexes for high-cardinality data.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berchtold, S., Böhm, C., Kriegal, H.P.: The pyramid-technique: Towards breaking the curse of dimensionality. SIGMOD Record 27(2), 142–153 (1998)CrossRefGoogle Scholar
  2. 2.
    O’Neil, P.: Model 204 architecture and performance. In: Second International Workshop in High Performance Transaction Systems. Springer, Heidelberg (1987)Google Scholar
  3. 3.
    O’Neil, P., Quass, D.: Improved query performance with variant indices. In: SIGMOD. ACM Press, New York (1997)Google Scholar
  4. 4.
    Wu, K., Otoo, E.J., Shoshani, A.: On the performance of bitmap indices for high cardinality attributes. In: VLDB, pp. 24–35. Morgan Kaufmann, San Francisco (2004)CrossRefGoogle Scholar
  5. 5.
    Wu, K., Otoo, E., Shoshani, A.: A performance comparison of bitmap indices. In: CIKM. ACM Press, New York (2001)Google Scholar
  6. 6.
    Lewis, J.: Bitmap indexes - part 1: Understanding bitmap indexes (2006), http://www.dbazine.com/oracle/or-articles/jlewis3
  7. 7.
    Koudas, N.: Space efficient bitmap indexing. In: CIKM. ACM Press, New York (2000)Google Scholar
  8. 8.
    Shoshani, A., Bernardo, L.M., Nordberg, H., Rotem, D., Sim, A.: Multidimensional indexing and query coordination for tertiary storage management. In: SSDBM, pp. 214–225 (1999)Google Scholar
  9. 9.
    Stockinger, K., Duellmann, D., Hoschek, W., Schikuta, E.: Improving the performance of high-energy physics analysis through bitmap indices. In: DEXA. Springer, Heidelberg (2000)Google Scholar
  10. 10.
    Wu, K.L., Yu, P.: Range-based bitmap indexing for high cardinality attributes with skew. Technical Report RC 20449, IBM Watson Research, New York (1996)Google Scholar
  11. 11.
    Johnson, T.: Performance Measurements of Compressed Bitmap Indices. In: VLDB. Morgan Kaufmann, San Francisco (1999)Google Scholar
  12. 12.
    Antoshenkov, G.: Byte-aligned Bitmap Compression. Technical report, Oracle Corp. U.S. Patent number 5,363,098 (1994)Google Scholar
  13. 13.
    Wu, K., Otoo, E., Shoshani, A.: Optimizing bitmap indices with efficient compression. ACM Transactions on Database Systems 31, 1–38 (2006)CrossRefGoogle Scholar
  14. 14.
    Comer, D.: The ubiquitous B-tree. Computing Surveys 11(2), 121–137 (1979)MATHCrossRefGoogle Scholar
  15. 15.
    Wu, K., Otoo, E.J., Shoshani, A.: Compressing bitmap indexes for faster search operations. In: SSDBM, pp. 99–108 (2002)Google Scholar
  16. 16.
    Wong, H.K.T., Liu, H.F., Olken, F., Rotem, D., Wong, L.: Bit transposed files. In: Proceedings of VLDB 1985, pp. 448–457. Stockholm (1985)Google Scholar
  17. 17.
    Chan, C.Y., Ioannidis, Y.E.: Bitmap Index Design and Evaluation. In: SIGMOD. ACM Press, New York (1998)Google Scholar
  18. 18.
    Chan, C.Y., Ioannidis, Y.E.: An Efficient Bitmap Encoding Scheme for Selection Queries. In: SIGMOD. ACM Press, New York (1999)Google Scholar
  19. 19.
    Rotem, D., Stockinger, K., Wu, K.: Minimizing I/O costs of multi-dimensional queries with bitmap indices. In: SSDBM. IEEE, Los Alamitos (2006)Google Scholar
  20. 20.
    Rotem, D., Stockinger, K., Wu, K.: Optimizing candidate check costs for bitmap indices. In: CIKM. ACM Press, New York (2005)Google Scholar
  21. 21.
    Gray, J., Liu, D.T., Nieto-Santisteban, M., Szalay, A., DeWitt, D., Heber, G.: Scientific data management in the coming decade. CTWatch Quarterly (2005)Google Scholar
  22. 22.
    Stonebraker, M., et al.: C-store: A column-oriented dbms. In: VLDB, pp. 553–564 (2005)Google Scholar
  23. 23.
    Boncz, P.A., Zukowski, M., Nes, N.: Monetdb/x100: Hyper-pipelining query execution. In: CIDR, pp. 225–237 (2005)Google Scholar
  24. 24.
    Golub, G.H., van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press (1996)Google Scholar
  25. 25.
    Thaper, N., Guha, S., Indyk, P., Koudas, N.: Dynamic multidimensional histograms. In: SIGMOD, pp. 428–439. ACM, New York (2002)Google Scholar
  26. 26.
    O’Neil, E., O’Neil, P., Wu, K.: Bitmap index design choices and their performance implications. In: IDEAS, pp. 72–84 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Kesheng Wu
    • 1
  • Kurt Stockinger
    • 1
  • Arie Shoshani
    • 1
  1. 1.Lawrence Berkeley National LabUniversity of CaliforniaBerkeleyUSA

Personalised recommendations