Workload-Independent Data-Driven Vertical Partitioning

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 767)

Abstract

Vertical partitioning is a well-explored area of automatic physical database design. The classic approach is as follows: derive an optimal vertical partitioning scheme for a given database and a workload. The workload describes queries, their frequencies, and involved attributes.

In this paper we identify a novel class of vertical partitioning algorithms. The algorithms of this class do not rely on knowledge of the workload, but instead use data properties that are contained in the workload itself. We propose such algorithm that uses a logical scheme represented by functional dependencies, which are derived from stored data. In order to discover functional dependencies we use TANE — a popular functional dependency extraction algorithm. We evaluate our algorithm using an industrial DBMS (PostgreSQL) on number of workloads. We compare the performance of an unpartitioned configuration with partitions produced by our algorithm and several state-of-the-art workload-aware algorithms.

Keywords

Physical design Vertical partitioning Functional dependency 

Notes

Acknowledgments

We would like to thank anonymous reviewers for their valuable comments on this work. This work is partially supported by Russian Foundation for Basic Research grant 16-57-48001.

References

  1. 1.
    Agrawal, S., Narasayya, V., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: SIGMOD 2004, pp. 359–370. ACM, 2004Google Scholar
  2. 2.
    Apers, P.M.G.: Data allocation in distributed database systems. ACM Trans. Database Syst. (TODS) 13(3), 263–304 (1988)CrossRefGoogle Scholar
  3. 3.
    Bellatreche, L., Benkrid, S.: A joint design approach of partitioning and allocation in parallel data warehouses. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 99–110. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-03730-6_9 CrossRefGoogle Scholar
  4. 4.
    Bobrov, N., Chernishev, G., Grigoriev, D., Novikov, B.: An evaluation of TANE algorithm for functional dependency detection. In: Ouhammou, Y., et al. (eds.) MEDI 2017. LNCS, vol. 10563, pp. 208–222. Springer International Publishing, Cham (2017). doi: 10.1007/978-3-319-66854-3_16 Google Scholar
  5. 5.
    Boehm, A.M., Seipel, D., Sickmann, A., Wetzka, M.: Squash: a tool for analyzing, tuning and refactoring relational database applications. In: Seipel, D., Hanus, M., Wolf, A. (eds.) INAP/WLP -2007. LNCS (LNAI), vol. 5437, pp. 82–98. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-00675-3_6 CrossRefGoogle Scholar
  6. 6.
    Cheng, C.-H.: A branch and bound clustering algorithm. IEEE Trans. Syst. Man Cybern. 25, 895–898 (1995)CrossRefGoogle Scholar
  7. 7.
    Chernishev, G.: A survey of dbms physical design approaches. SPIIRAS Proceedings 24, 222–276 (2013)Google Scholar
  8. 8.
    Chernishev, G.: The design of an adaptive column-store system. J. Big Data 4(5), 25 (2017)Google Scholar
  9. 9.
    Cornell, D., Yu, P.: An effective approach to vertical partitioning for physical design of relational databases. IEEE Trans. SE 16, 248–258 (1990)CrossRefGoogle Scholar
  10. 10.
    De Marchi, F., Lopes, S., Petit, J.-M., Toumani, F.: Analysis of existing databases at the logical level: the DBA companion project. SIGMOD Rec. 32, 47–52 (2003)CrossRefGoogle Scholar
  11. 11.
    Fung, C.-W., Karlapalem, K., Li, Q.: Cost-driven vertical class partitioning for methods in object oriented databases. VLDB J. 12, 187–210 (2003)CrossRefGoogle Scholar
  12. 12.
    Galaktionov, V., Chernishev, G., Novikov, B., Grigoriev, D.: Matrix clustering algorithms for vertical partitioning problem: an initial performance study. In: DAMDID/RCDL 2016, Russia, pp. 24–31 (2016)Google Scholar
  13. 13.
    Galaktionov, V., Chernishev, G., Smirnov, K., Novikov, B., Grigoriev, D.A.: A study of several matrix-clustering vertical partitioning algorithms in a disk-based environment. In: Kalinichenko, L., Kuznetsov, S.O., Manolopoulos, Y. (eds.) DAMDID/RCDL 2016. CCIS, vol. 706, pp. 163–177. Springer, Cham (2017). doi: 10.1007/978-3-319-57135-5_12 CrossRefGoogle Scholar
  14. 14.
    Grund, M., Krüger, J., Plattner, H., Zeier, A., Cudre-Mauroux, P., Madden, S.: HYRISE: a main memory hybrid storage engine. Proc. VLDB Endow. 4, 105–116 (2010)CrossRefGoogle Scholar
  15. 15.
    Hammer, M., Niamir, B.: A heuristic approach to attribute partitioning. In: SIGMOD 1979, pp. 93–101 (1979)Google Scholar
  16. 16.
    Hankins, R.A., Patel, J.M.: Data morphing: an adaptive, cache-conscious storage technique. In: VLDB 2003, pp. 417–428 (2003)Google Scholar
  17. 17.
    Hoffer, J.A., Severance, D.G.: The use of cluster analysis in physical data base design. In: VLDB 1975, pp. 69–86 (1975)Google Scholar
  18. 18.
    Jindal, A., Palatinus, E., Pavlov, V., Dittrich, J.: A comparison of knives for bread slicing. Proc. VLDB Endow. 6, 361–372 (2013)CrossRefGoogle Scholar
  19. 19.
    Li, L., Gruenwald, L.: SMOPD: a vertical database partitioning system with a fully automatic online approach. In: IDEAS 2013, pp. 168–173 (2013)Google Scholar
  20. 20.
    Lin, X., Orlowska, M., Zhang, Y.: A graph based cluster approach for vertical partitioning in database design. Data Knowl. Eng. 11, 151–169 (1993)CrossRefMATHGoogle Scholar
  21. 21.
    Ma, H., Schewe, K.-D. Kirchberg, M.: A heuristic approach to fragmentation incorporating query information. In: Databases and Information Systems IV - Selected Papers from the Seventh International Baltic Conference, DB&IS 2006, Vilnius, Lithuania, 3–6 July 2006. Frontiers in Artificial Intelligence and Applications, vol. 155. IOS Press (2006). ISBN 978-1-58603-715-4Google Scholar
  22. 22.
    Malik, T., Wang, X., Burns, R., Dash, D., Ailamaki, A.: Automated physical design in database caches. In: ICDEW 2008, pp. 27–34 (2008)Google Scholar
  23. 23.
    Navathe, S., Ceri, S., Wiederhold, G., Dou, J.: Vertical partitioning algorithms for database design. ACM Trans. Database Syst. 9, 680–710 (1984)CrossRefGoogle Scholar
  24. 24.
    Navathe, S., Karlapalem, K., Ra, M.: A mixed fragmentation methodology for initial distributed database design. J. Comput. Softw. Eng. 3(4) (1995)Google Scholar
  25. 25.
    Pai-Cheng, C.: A transaction-oriented approach to attribute partitioning. Inf. Syst. 17, 329–342 (1992)CrossRefGoogle Scholar
  26. 26.
    Papadomanolakis, S., Ailamaki, A.: AutoPart: automating schema design for large scientific databases using data partitioning. In: SSDBM 2004, pp. 383–392 (2004)Google Scholar
  27. 27.
    Qian, L., LeFevre, K., Jagadish, H.V.: CRIUS: user-friendly database design. Proc. VLDB Endow. 4, 81–92 (2010)CrossRefGoogle Scholar
  28. 28.
    Rodríguez, L., Li, X.: A dynamic vertical partitioning approach for distributed database system. In: SMC 2011, pp. 1853–1858 (2011)Google Scholar
  29. 29.
    Sacca, D., Wiederhold, G.: Database partitioning in a cluster of processors. ACM Trans. Database Syst. 10, 29–56 (1985)CrossRefMATHGoogle Scholar
  30. 30.
    Wiese, D., Rabinovitch, G., Reichert, M., Arenswald, S.: Autonomic tuning expert: A framework for best-practice oriented autonomic database tuning. In: CASCON 2008, pp. 327–341 (2008)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Nikita Bobrov
    • 1
  • George Chernishev
    • 1
    • 2
  • Boris Novikov
    • 1
    • 2
  1. 1.Saint Petersburg UniversitySaint PetersburgRussia
  2. 2.JetBrains ResearchPragueCzech Republic

Personalised recommendations