The VLDB Journal

, Volume 25, Issue 3, pp 339–354 | Cite as

Hybrid query optimization for hard-to-compress bit-vectors

Regular Paper

Abstract

Bit-vectors are widely used for indexing and summarizing data due to their efficient processing in modern computers. Sparse bit-vectors can be further compressed to reduce their space requirement. Special compression schemes based on run-length encoders have been designed to avoid explicit decompression and minimize the decoding overhead during query execution. Moreover, highly compressed bit-vectors can exhibit a faster query time than the non-compressed ones. However, for hard-to-compress bit-vectors, compression does not speed up queries and can add considerable overhead. In these cases, bit-vectors are often stored verbatim (non-compressed). On the other hand, queries are answered by executing a cascade of bit-wise operations involving indexed bit-vectors and intermediate results. Often, even when the original bit-vectors are hard to compress, the intermediate results become sparse. It could be feasible to improve query performance by compressing these bit-vectors as the query is executed. In this scenario, it would be necessary to operate verbatim and compressed bit-vectors together. In this paper, we propose a hybrid framework where compressed and verbatim bitmaps can coexist and design algorithms to execute queries under this hybrid model. Our query optimizer is able to decide at run time when to compress or decompress a bit-vector. Our heuristics show that the applications using higher-density bitmaps can benefit from using this hybrid model, improving both their query time and memory utilization.

Keywords

Bit-vector index Bitmap index Bit-sliced index Query optimization Top-k preference queries 

References

  1. 1.
    Antoshenkov, G.: Byte-aligned bitmap compression. In: DCC ’95: Proceedings of the Conference on Data Compression, p. 476. IEEE Computer Society, Washington, DC, USA (1995)Google Scholar
  2. 2.
    Wu, K., Otoo, E.J., Shoshani, A.: Compressing bitmap indexes for faster search operations. In: Proceedings of the 2002 International Conference on Scientific and Statistical Database Management Conference (SSDBM’02), pp. 99–108 (2002)Google Scholar
  3. 3.
    Deliege, F., Pederson, T.: Position list word aligned hybrid: optimizing space and performance for compressed bitmaps. In: Proceedings of the 2010 International Conference on Extending Database Technology (EDBT’10), pp. 228–239 (2010)Google Scholar
  4. 4.
    Wu, K., Otoo, E.J., Shoshani, A., Nordberg, H.: Notes on Design and Implementation of Compressed Bit Vectors, Tech. Rep. LBNL/PUB-3161, Lawrence Berkeley National Laboratory (2001)Google Scholar
  5. 5.
    Colantonio, A., Di Pietro, R.: Concise: compressed ‘n’ composable integer set. Inf. Process. Lett. 110(16), 644–650 (2010)CrossRefMATHGoogle Scholar
  6. 6.
    Fusco, F., Stoecklin, M.P., Vlachos, M.: Net-fli: on-the-fly compression, archiving and indexing of streaming network traffic. Proc. VLDB Endow. 3(2), 1382–1393 (2010)CrossRefGoogle Scholar
  7. 7.
    Guzun, G., Canahuate, G., Chiu, D., Sawin, J.: A tunable compression framework for bitmap indices. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 484–495. IEEE (2014)Google Scholar
  8. 8.
    Wu, K., Otoo, E.J., Shoshani, A.: A performance comparison of bitmap indexes. In: CIKM 2001, pp. 559–561 (2001)Google Scholar
  9. 9.
    Lemire, D., Kaser, O., Aouiche, K.: Sorting improves word-aligned bitmap indexes. Data Knowl. Eng. 69, 3–28 (2010)CrossRefGoogle Scholar
  10. 10.
    Chambi, S., Lemire, D., Kaser, O., Godin, R.: Better Bitmap Performance with Roaring Bitmaps, arXiv preprint arXiv:1402.6407
  11. 11.
    Wu, K., Otoo, E.J., Shoshani, A.: Optimizing bitmap indices with efficient compression. ACM Trans. Database Syst. 31(1), 1–38 (2006). doi:10.1145/1132863.1132864 CrossRefGoogle Scholar
  12. 12.
    O’Neil, P., Quass, D.: Improved query performance with variant indexes. In: ACM Sigmod Record, vol. 26, ACM, pp. 38–49 (1997)Google Scholar
  13. 13.
    Rinfret, D.: Answering preference queries with bit-sliced index arithmetic. In: Proceedings of the 2008 C3S2E Conference (C3S2E ’08), pp. 173–185. ACM, New York, NY, USA (2008). doi:10.1145/1370256.1370286
  14. 14.
    Guzun, G., Tosado, J., Canahuate, G.: Slicing the dimensionality: Top-k query processing for high-dimensional spaces. In: TLDKS 14Google Scholar
  15. 15.
    O’Neil, P., Quass, D.: Improved query performance with variant indexes. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, pp. 38–49. ACM Press (1997). doi:10.1145/253260.253268
  16. 16.
    Chan, C.-Y., Ioannidis, Y.E.: An efficient bitmap encoding scheme for selection queries. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD’99), pp. 215–226. ACM, New York, NY, USA (1999). doi:10.1145/304182.304201
  17. 17.
    Koudas, N.: Space efficient bitmap indexing. In: Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM ’00), pp. 194–201. ACM, New York, NY, USA (2000). doi:10.1145/354756.354819
  18. 18.
    Rinfret, D., O’Neil, P., O’Neil, E.: Bit-sliced index arithmetic. SIGMOD Rec. 30(2), 47–57 (2001). doi:10.1145/376284.375669
  19. 19.
    Wu, M.-C., Buchmann, A.P.: Encoded bitmap indexing for data warehouses. In: ICDE ’98: Proceedings of the Fourteenth International Conference on Data Engineering, pp. 220–230. IEEE Computer Society, Washington, DC, USA (1998)Google Scholar
  20. 20.
    Fabian Corrales, D.C., Sawin, J.: Variable length compression for bitmap indices. In: ACM International Conference on Database and Expert Systems Applications, pp. 381–395 (2011)Google Scholar
  21. 21.
    van Schaik, S.J., de Moor, O.: A memory efficient reachability data structure through bit vector compression. In: ACM SIGMOD International Conference on Management of Data, pp. 913–924 (2011)Google Scholar
  22. 22.
    Lu, P., Wu, S., Shou, L., Tan, K.-L.: An efficient and compact indexing scheme for large-scale data store. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 326–337. IEEE (2013). doi:10.1007/s10115-015-0877-9
  23. 23.
    Guzun, G., Canahuate, G.: Performance evaluation of word-aligned compression methods for bitmap indices. Knowl. Inf. Syst. 1–28 (2015)Google Scholar
  24. 24.
    Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-Law Distributions in Empirical Data (2009). doi:10.1137/070710111
  25. 25.
    Pareto, V.: Manual of Political Economy (1927) (trans: Ann S. Schwier and Alfred N. Page (New York: Augustus M. Kelley, 1971)), pp. 29–31Google Scholar
  26. 26.
    lászló Barabási, A., Albert, R.: Emergence of scaling in random networks. Science 286.5439, 509–512 (1999)Google Scholar
  27. 27.
    Barabasi, A.-L.: The origin of bursts and heavy tails in human dynamics. Nature 435, 207 (2005). arXiv:cond-mat/0505371 CrossRefGoogle Scholar
  28. 28.
    Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
  29. 29.
    Baldi, P., Sadowski, P., Whiteson, D.: Searching for exotic particles in high-energy physics with deep learning. Nat. Comm. 5:1–10Google Scholar
  30. 30.
    Rinfret, D.: Term Matching and Bit-sliced Index Arithmetic. Ph.D. thesis, pp. 1–10. University of Massachusetts, Boston (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.4016 Seamans Center for the Engineering Arts and SciencesThe University of IowaIowa CityUSA

Personalised recommendations