Minimizing Index Size by Reordering Rows and Columns

  • Elaheh Pourabbas
  • Arie Shoshani
  • Kesheng Wu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7338)

Abstract

Sizes of compressed bitmap indexes and compressed data are significantly affected by the order of data records. The optimal orders of rows and columns that minimizes the index sizes is known to be NP-hard to compute. Instead of seeking the precise global optimal ordering, we develop accurate statistical formulas that compute approximate solutions. Since the widely used bitmap indexes are compressed with variants of the run-length encoding (RLE) method, our work concentrates on computing the sizes of bitmap indexes compressed with the basic Run-Length Encoding. The resulting formulas could be used for choosing indexes to build and to use. In this paper, we use the formulas to develop strategies for reordering rows and columns of a data table. We present empirical measurements to show that our formulas are accurate for a wide range of data. Our analysis confirms that the heuristics of sorting columns with low column cardinalities first is indeed effective in reducing the index sizes. We extend the strategy by showing that columns with the same cardinality should be ordered from high skewness to low skewness.

Keywords

Data Table Index Size Gray Code Zipf Distribution Bitmap Index 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abadi, D., Madden, S.R., Ferreira, M.C.: Integrating compression and execution in column-oriented database systems. In: SIGMOD. ACM (2006)Google Scholar
  2. 2.
    Amer-Yahia, S., Johnson, T.: Optimizing queries on compressed bitmaps. In: VLDB, pp. 329–338 (2000)Google Scholar
  3. 3.
    Antoshenkov, G.: Byte-aligned bitmap compression. Tech. rep., Oracle Corp. (1994)Google Scholar
  4. 4.
    Antoshenkov, G., Ziauddin, M.: Query processing and optimization in oracle rdb. The VLDB Journal 5, 229–237 (1996)CrossRefGoogle Scholar
  5. 5.
    Apaydin, T., Tosun, A.S., Ferhatosmanoglu, H.: Analysis of Basic Data Reordering Techniques. In: Ludäscher, B., Mamoulis, N. (eds.) SSDBM 2008. LNCS, vol. 5069, pp. 517–524. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Bookstein, A., Klein, S.T.: Using bitmaps for medium sized information retrieval systems. Information Processing & Management 26, 525–533 (1990)CrossRefGoogle Scholar
  7. 7.
    Bookstein, A., Klein, S.T., Raita, T.: Simple bayesian model for bitmap compression. Information Retrieval 1(4), 315–328 (2000)MATHCrossRefGoogle Scholar
  8. 8.
    Booth, K.S., Lueker, G.S.: Testing for the consecutive ones property, interval graphs, and graph planarity using pq-tree algorithms. Journal of Computer and System Sciences 13(3), 335 – 379 (1976), http://dx.doi.org/10.1016/S0022-00007680045-1
  9. 9.
    Chan, C.-Y., Ioannidis, Y.E.: Bitmap index design and evaluation. In: SIGMOD, pp. 355–366 (1998)Google Scholar
  10. 10.
    Chaudhuri, S., Dayal, U., Ganti, V.: Database technology for decision support systems. Computer 34(12), 48–55 (2001)CrossRefGoogle Scholar
  11. 11.
    Colantonio, A., Pietro, R.D.: Concise: Compressed ’n’ composable integer set. Information Processing Letters 110(16), 644–650 (2010), http://dx.doi.org/10.1016/j.ipl.2010.05.018 MathSciNetMATHCrossRefGoogle Scholar
  12. 12.
    Deliège, F., Pedersen, T.B.: Position list word aligned hybrid: optimizing space and performance for compressed bitmaps. In: EDBT 2010: Proceedings of the 13th International Conference on Extending Database Technology, pp. 228–239. ACM, New York (2010)CrossRefGoogle Scholar
  13. 13.
    Deogun, J.S., Gopalakrishnan, K.: Consecutive retrieval property–revisited. Information Processing Letters 69(1), 15–20 (1999), http://dx.doi.org/10.1016/S0020-01909800186-0 MathSciNetCrossRefGoogle Scholar
  14. 14.
    Fusco, F., Stoecklin, M.P., Vlachos, M.: NET-FLi: on-the-fly compression, archiving and indexing of streaming network traffic. Proc. VLDB Endow. 3, 1382–1393 (2010), http://portal.acm.org/citation.cfm?id=1920841.1921011 Google Scholar
  15. 15.
    Ghosh, S.P.: File organization: the consecutive retrieval property. Commun. ACM 15, 802–808 (1972), http://doi.acm.org/10.1145/361573.361578 MATHCrossRefGoogle Scholar
  16. 16.
    Hu, Y., Sundara, S., Chorma, T., Srinivasan, J.: Supporting RFID-based item tracking applications in oracle DBMS using a bitmap datatype. In: VLDB 2005, pp. 1140–1151 (2005)Google Scholar
  17. 17.
    Johnson, T.: Performance of compressed bitmap indices. In: VLDB 1999, pp. 278–289 (1999)Google Scholar
  18. 18.
    Kaser, O., Lemire, D., Aouiche, K.: Histogram-aware sorting for enhanced word-aligned compression in bitmap indexes. In: DOLAP 2008, pp. 1–8. ACM, New York (2008), http://doi.acm.org/10.1145/1458432.1458434 CrossRefGoogle Scholar
  19. 19.
    Koudas, N.: Space efficient bitmap indexing. In: CIKM, pp. 194–201 (2000)Google Scholar
  20. 20.
    Lemire, D., Kaser, O., Aouiche, K.: Sorting improves word-aligned bitmap indexes. Data & Knowledge Engineering 69(1), 3–28 (2010), http://dx.doi.org/10.1016/j.datak.2009.08.006 CrossRefGoogle Scholar
  21. 21.
    Lemire, D., Kaser, O.: Reordering columns for smaller indexes. Information Sciences 181(12), 2550–2570 (2011), http://dx.doi.org/10.1016/j.ins.2011.02.002 MathSciNetMATHCrossRefGoogle Scholar
  22. 22.
    Lin, X., Li, Y., Tsang, C.P.: Applying on-line bitmap indexing to reduce counting costs in mining association rules. Information Sciences 120(1-4), 197–208 (1999)CrossRefGoogle Scholar
  23. 23.
    MacNicol, R., French, B.: Sybase IQ multiplex-designed for analytics. In: Nascimento, M.A., Tamer Özsu, M., Kossmann, D., Miller, R.J., Blakeley, J.A., Bernhard Schiefer, K. (eds.) Proceedings of 13th International Conference on Very Large Data Bases, VLDB 2004, August 31-September 3, pp. 1227–1230 (2004)Google Scholar
  24. 24.
    Olken, F., Rotem, D.: Rearranging data to maximize the efficiency of compression. In: PODS, pp. 78–90. ACM Press (1985)Google Scholar
  25. 25.
    O’Neil, P.: Model 204 Architecture and Performance. In: Gawlick, D., Reuter, A., Haynie, M. (eds.) HPTS 1987. LNCS, vol. 359, pp. 40–59. Springer, Heidelberg (1989)Google Scholar
  26. 26.
    O’Neil, P.: Informix indexing support for data warehouses. Database Programming and Design 10(2), 38–43 (1997)Google Scholar
  27. 27.
    O’Neil, P., Quass, D.: Improved query performance with variant indices. In: SIGMOD, pp. 38–49. ACM Press (1997)Google Scholar
  28. 28.
    Pinar, A., Tao, T., Ferhatosmanoglu, H.: Compressing bitmap indices by data reorganization. In: ICDE 2005, pp. 310–321 (2005)Google Scholar
  29. 29.
    Wu, K.: FastBit: an efficient indexing technology for accelerating data-intensive science. Journal of Physics: Conference Series 16, 556–560 (2005), http://dx.doi.org/10.1088/1742-6596/16/1/077 CrossRefGoogle Scholar
  30. 30.
    Wu, K., Otoo, E., Shoshani, A.: Optimizing bitmap indices with efficient compression. ACM Transactions on Database Systems 31, 1–38 (2006)CrossRefGoogle Scholar
  31. 31.
    Wu, K., Otoo, E., Shoshani, A., Nordberg, H.: Notes on design and implementation of compressed bit vectors. Tech. Rep. LBNL/PUB-3161, Lawrence Berkeley National Lab, Berkeley, CA (2001), http://www-library.lbl.gov/docs/PUB/3161/PDF/PUB-3161.pdf
  32. 32.
    Wu, K., Shoshani, A., Stockinger, K.: Analyses of multi-level and multi-component compressed bitmap indexes. ACM Transactions on Database Systems 35(1), 1–52 (2010), http://doi.acm.org/10.1145/1670243.1670245 CrossRefGoogle Scholar
  33. 33.
    Wu, K., Stockinger, K., Shoshani, A.: Breaking the Curse of Cardinality on Bitmap Indexes. In: Ludäscher, B., Mamoulis, N. (eds.) SSDBM 2008. LNCS, vol. 5069, pp. 348–365. Springer, Heidelberg (2008); preprint appeared as LBNL Tech Report LBNL-173E CrossRefGoogle Scholar
  34. 34.
    Wu, M.C., Buchmann, A.P.: Encoded bitmap indexing for data warehouses. In: ICDE 1998, pp. 220–230. IEEE Computer Society, Washington, DC (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Elaheh Pourabbas
    • 1
  • Arie Shoshani
    • 2
  • Kesheng Wu
    • 2
  1. 1.National Research CouncilRomaItaly
  2. 2.Lawrence Berkeley National LaboratoryBerkeleyUSA

Personalised recommendations