The VLDB Journal

, Volume 25, Issue 3, pp 291–316 | Cite as

Bitwise dimensional co-clustering for analytical workloads

Regular Paper

Abstract

Analytical workloads in data warehouses often include heavy joins where queries involve multiple fact tables in addition to the typical star-patterns, dimensional grouping and selections. In this paper we propose a new processing and storage framework called bitwise dimensional co-clustering (BDCC) that avoids replication and thus keeps updates fast, yet is able to accelerate all these foreign key joins, efficiently support grouping and pushes down most dimensional selections. The core idea of BDCC is to cluster each table on a mix of dimensions, each possibly derived from attributes imported over an incoming foreign key and this way creating foreign key connected tables with partially shared clusterings. These are later used to accelerate any join between two tables that have some dimension in common and additionally permit to push down and propagate selections (reduce I/O) and accelerate aggregation and ordering operations. Besides the general framework, we describe an algorithm to derive such a physical co-clustering database automatically and describe query processing and query optimization techniques that can easily be fitted into existing relational engines. We present an experimental evaluation on the TPC-H benchmark in the Vectorwise system, showing that co-clustering can significantly enhance its already high performance and at the same time significantly reduce the memory consumption of the system.

Keywords

OLAP Data warehouse Clustering Indexing Storage Database design Query processing Sandwich operators 

References

  1. 1.
    Born To Be Parallel. teradata.comGoogle Scholar
  2. 2.
    Infobright Enterprise Edition. infobright.comGoogle Scholar
  3. 3.
    Netezza Admin Guide. support.netezza.comGoogle Scholar
  4. 4.
    Agrawal, S., Narasayya, V.R., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: SIGMOD (2004)Google Scholar
  5. 5.
    Athanassoulis, M., Chen, S., Ailamaki, A., Gibbons, P.B., Stoica, R.: Masm: efficient online updates in data warehouses. In: SIGMOD, pp. 865–876. ACM (2011)Google Scholar
  6. 6.
    Barber, R. et al.: Blink: Not Your Father’s Database! In: BIRTE (2011)Google Scholar
  7. 7.
    Baumann, S., Boncz, P., Sattler, K.U.: Query processing of pre-partitioned data using sandwich operators. In: Enabling Real-Time Business Intelligence, vol. 154 (2013)Google Scholar
  8. 8.
    Baumann, S., de Nijs, G., Strobel, M., Sattler, K.: Flashing databases: expectations and limitations. In: DaMoN (2010)Google Scholar
  9. 9.
    Bayer, R.: The universal b-tree for multidimensional indexing: general concepts. In: WWCA (1997)Google Scholar
  10. 10.
    Chan, C.Y., Ioannidis, Y.E.: Bitmap index design and evaluation. In: SIGMOD. ACM (1998)Google Scholar
  11. 11.
    Chaudhuri, S., Datar, M., Narasayya, V.: Index selection for databases: a hardness study and a principled heuristic solution. TKDE (2004)Google Scholar
  12. 12.
    Chen, W.J., Fisher, A., Lalla, A., McLauchlan, A., Agnew, D.: Database partitioning, table partitioning, and MDC for DB2 9. IBM Redbooks (2007)Google Scholar
  13. 13.
    Chen, X., O’Neil, P., O’Neil, E.: Adjoined dimension column clustering to improve data warehouse query performance. In: ICDE (2008)Google Scholar
  14. 14.
    Deshpande, A., Guestrin, C., Hong, W., Madden, S.: Exploiting correlated attributes in acquisitional query processing. In: ICDE ’05Google Scholar
  15. 15.
    DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Commun. ACM 35, 85–98 (1992)CrossRefGoogle Scholar
  16. 16.
    Harizopoulos, S., Liang, V., Abadi, D., Madden, S.: Performance tradeoffs in read-optimized databases. In: VLDB (2006)Google Scholar
  17. 17.
    Héman, S., Zukowski, M., Nes, N.J., Sidirourgos, L., Boncz, P.A.: Positional update handling in column stores. In: SIGMOD (2010)Google Scholar
  18. 18.
    Herodotou, H., Borisov, N., Babu, S.: Query optimization techniques for partitioned tables. In: SIGMOD (2011)Google Scholar
  19. 19.
    Hu, T., Tucker, A.: Optimal computer search trees and variable-length alphabetical codes. SIAM J. Appl. Math. 21(4), 514–532 (1971)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Huffman, D.: A method for the construction of minimum-redundancy codes. In: Proceedings of the IRE (1952)Google Scholar
  21. 21.
    Inkster, D., Boncz, P., Zukowski, M.: Integration of vectorwise with ingres. SIGMOD Rec. 40(3), 45–53 (2011)CrossRefGoogle Scholar
  22. 22.
    Leslie, H., Jain, R., Birdsall, D., Yaghmai, H.: Efficient search of multi-dimensional B-trees. In: VLDB (1995)Google Scholar
  23. 23.
    Li, Y., Patel, J.M.: Bitweaving: fast scans for main memory data processing. In: SIGMOD (2013)Google Scholar
  24. 24.
    Manegold, S., Boncz, P., Kersten, M.: Generic database cost models for hierarchical memory. In: VLDB (2002)Google Scholar
  25. 25.
    Markl, V.: MISTRAL: Processing relational queries using a multidimensional access technique. Institut für Informatik TU München (1999)Google Scholar
  26. 26.
    Morales, T.: Oracle database VLDB and partitioning guide, 11g Release 1 (11.1). Oracle (2007)Google Scholar
  27. 27.
    O’Neil, P., Cheng, E., Gawlick, D., O’Neil, E.: The log-structured merge-tree. Acta Inf. 33(4), 351–385 (1996)Google Scholar
  28. 28.
    Orenstein, J.A., Merrett, T.H.: A class of data structures for associative searching. PODS ’84Google Scholar
  29. 29.
    Padmanabhan, S., Bhattacharjee, B., Malkemus, T., Cranston, L., Huras, M.: Multi-dimensional clustering: a new data layout scheme in DB2. In: SIGMOD (2003)Google Scholar
  30. 30.
    Polyzotis, N.: Selectivity-based partitioning: a divide-and-union paradigm for effective query optimization. In: CIKM (2005)Google Scholar
  31. 31.
    Selinger, P., Astrahan, M., Chamberlin, D., Lorie, R., Price, T.: Access path selection in a relational database management system. In: SIGMOD (1976)Google Scholar
  32. 32.
    Sidirourgos, L., Kersten, M.: Column imprints: a secondary index structure. In: SIGMOD (2013)Google Scholar
  33. 33.
    Stonebraker, M., et al.: C-Store: a column-oriented DBMS. In: VLDB (2005)Google Scholar
  34. 34.
    Wang, X., Cherniack, M.: Avoiding sorting and grouping in processing queries. In: VLDB (2003)Google Scholar
  35. 35.
    Zilio, D.C., Rao, J., Lightstone, S., Lohman, G.M., Storm, A.J., Garcia-Arellano, C., Fadden, S.: DB2 design advisor: integrated automatic physical database design. In: VLDB (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Technische Universität IlmenauIlmenauGermany
  2. 2.CWIAmsterdamThe Netherlands

Personalised recommendations