A Study of Several Matrix-Clustering Vertical Partitioning Algorithms in a Disk-Based Environment

  • Viacheslav Galaktionov
  • George Chernishev
  • Kirill Smirnov
  • Boris Novikov
  • Dmitry A. Grigoriev
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 706)

Abstract

In this paper we continue our efforts to evaluate matrix clustering algorithms. In our previous study we presented a test environment and results of preliminary experiments with the “separate” strategy for vertical partitioning. This strategy assigns a separate vertical partition for every cluster found by the algorithm, including inter-submatrix attribute group. In this paper we introduce two other strategies: the “replicate” strategy, which replicates inter-submatrix attributes to every cluster and the “retain” strategy, which assigns inter-submatrix attributes to their original clusters. We experimentally evaluate all strategies in a disk-based environment using the standard TPC-H workload and the PostgreSQL DBMS. We start with the study of record reconstruction methods in the PostgreSQL DBMS. Then, we apply partitioning strategies to three matrix clustering algorithms and evaluate both query performance and storage overhead of the resulting partitions. Finally, we compare the resulting partitioning schemes with the ideal partitioning scenario.

Keywords

Database tuning Vertical partitioning Experimentation Matrix clustering Fragmentation TPC-H PostgreSQL 

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB 1994 Conference Proceedings, pp. 487–499 (1994)Google Scholar
  2. 2.
    Agrawal, S., Narasayya, V., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: SIGMOD 2004 Conference Proceedings, pp. 359–370 (2004). doi:10.1145/1007568.1007609
  3. 3.
    Apers, P.M.G.: Data allocation in distributed database systems. ACM Trans. Database Syst. 13, 263–304 (1988). doi:10.1145/44498.45063 CrossRefGoogle Scholar
  4. 4.
    Bellatreche, L.: Optimization and tuning in data warehouses. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 1995–2003. Springer, New York (2009). doi:10.1007/978-0-387-39940-9_259 Google Scholar
  5. 5.
    Bellatreche, L., Boukhalfa, K., Richard, P.: Data partitioning in data warehouses: hardness study, heuristics and Oracle validation. In: DAWAK 2008 Conference Proceedings, pp. 87–96 (2008). doi:10.1007/978-3-540-85836-2_9
  6. 6.
    Bhat, M.V., Haupt, A.: An efficient clustering algorithm. IEEE Trans. Syst. Man Cybern. 6(1), 61–64 (1976). doi:10.1109/TSMC.1976.5408399 CrossRefMATHGoogle Scholar
  7. 7.
    Bouakkaz, M., Ouinten, Y., Ziani, B.: Vertical fragmentation of data warehouses using the FP-Max algorithm. In: IIT 2012 Conference Proceedings, pp. 273–276 (2012)Google Scholar
  8. 8.
    Chaudhuri, S., Weikum, G.: Self-management technology in databases. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 2550–2555. Springer, New York (2009). doi:10.1007/978-0-387-39940-9_334 Google Scholar
  9. 9.
    Cheng, C.: Algorithms for vertical partitioning in database physical design. Omega 22(3), 291–303 (1994). doi:10.1016/0305-0483(94)90042-6 CrossRefGoogle Scholar
  10. 10.
    Cheng, C.-H.: A branch and bound clustering algorithm. IEEE Trans. Syst. Man Cybern. 25(5), 895–898 (1995). doi:10.1109/21.376504 CrossRefGoogle Scholar
  11. 11.
    Cheng, C.-H., Lee, W.-K., Wong, K.-F.: A genetic algorithm-based clustering approach for database partitioning. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 32(3), 215–230 (2002). doi:10.1109/TSMCC.2002.804444 CrossRefGoogle Scholar
  12. 12.
    Cheng, C.-H., Motwani, J.: An examination of cluster identification-based algorithms for vertical partitions. Int. J. Bus. Inf. Syst. 4(6), 622–638 (2009). doi:10.1504/IJBIS.2009.026695 Google Scholar
  13. 13.
    Chernishev, G.: Towards self-management in a distributed column-store system. In: Morzy, T., Valduriez, P., Bellatreche, L. (eds.) ADBIS 2015. CCIS, vol. 539, pp. 97–107. Springer, Cham (2015). doi:10.1007/978-3-319-23201-0_12 CrossRefGoogle Scholar
  14. 14.
    Chernishev, G.: Vertical partitioning in relational DBMS. In: Talk at the Moscow ACM SIGMOD chapter meeting; slides and video. http://synthesis.ipi.ac.ru/sigmod/seminar/s20150430
  15. 15.
    Chu, W., Ieong, I.: A transaction-based approach to vertical partitioning for relational database systems. IEEE Trans. Softw. Eng. 19(8), 804–812 (1993)CrossRefGoogle Scholar
  16. 16.
    Du, J., Barker, K., Alhajj, R.: Attraction - a global affinity measure for database vertical partitioning. In: IADIS ICWI 2003 Conference Proceedings, pp. 538–548 (2003)Google Scholar
  17. 17.
    Galaktionov, V., Chernishev, G., Novikov, B., Grigoriev, D.: Matrix clustering algorithms for vertical partitioning problem: an initial performance study. In: Selected Papers of the XVIII International Conference on Data Analytics & Management in Data Intensive Domains (DAMDID/RCDL 2016), Ershovo, Moscow Region, Russia, CEUR Workshop Proceedings, vol. 1752, pp. 24–31 (2016)Google Scholar
  18. 18.
    Gorla, N., Boe, W.J.: Database operating efficiency in fragmented databases in mainframe, mini, and micro system environments. Data Knowl. Eng. 5(1), 1–19 (1990). doi:10.1016/0169-023X(90)90030-H CrossRefGoogle Scholar
  19. 19.
    Gorla, N., Yan, B.P.W.: Vertical fragmentation in databases using data-mining technique. In: Erickson, J. (ed.) Database Technologies: Concepts, Methodologies, Tools, & Applications, pp. 2543–2563. IGI Global (2009)Google Scholar
  20. 20.
    Hammer, M., Niamir, B.: A heuristic approach to attribute partitioning. In: SIGMOD 1979 Conference Proceedings, pp. 93–101 (1979). doi:10.1145/582095.582110
  21. 21.
    Harinarayan, V., Rajaraman, A., Ullman, J.D.: Implementing data cubes efficiently. ACM SIGMOD Record 25(2), 205–216 (1996). doi:10.1145/235968.233333 CrossRefGoogle Scholar
  22. 22.
    Hoffer, J.A., Severance, D.G.: The use of cluster analysis in physical data base design. In: VLDB 1975 Conference Proceedings, pp. 69–86 (1975). doi:10.1145/1282480.1282486
  23. 23.
    Jindal, A., Dittrich, J.: Relax and let the database do the partitioning online. In: BIRTE 2011 Workshop Proceedings, pp. 65–80 (2012). doi:10.1007/978-3-642-33500-6_5
  24. 24.
    King, J.R.: Machine-component grouping in production flow analysis: an approach using a rank order clustering algorithm. Int. J. Prod. Res. 18(2), 213–232 (1980)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Kusiak, A., Chow, W.: An efficient cluster identification algorithm. IEEE Trans. Syst. Man Cybern. 17(4), 696–699 (1987)CrossRefGoogle Scholar
  26. 26.
    LeFevre, J., Sankaranarayanan, J., Hacigumus, H., Tatemura, J., Polyzotis, N., Carey, M.J.: MISO: souping up big data query processing with a multistore system. In: SIGMOD 2014 Conference Proceedings, pp. 1591–1602 (2014). doi:10.1145/2588555.2588568
  27. 27.
    Li, L., Gruenwald, L.: Self-managing online partitioner for databases (SMOPD): a vertical database partitioning system with a fully automatic online approach. In: IDEAS 2013 Conference Proceedings, pp. 168–173 (2013). doi:10.1145/2513591.2513649
  28. 28.
    Lin, X., Orlowska, M., Zhang, Y.: A graph based cluster approach for vertical partitioning in database design. Data Knowl. Eng. 11(2), 151–169 (1993)CrossRefMATHGoogle Scholar
  29. 29.
    McCormick, W., Schweitzer, P., White, W.: Problem decomposition and data reorganization by a clustering technique. Oper. Res. 20(5), 993–1009 (1972)CrossRefMATHGoogle Scholar
  30. 30.
    Navathe, S., Ceri, S., Wiederhold, G., Dou, J.: Vertical partitioning algorithms for database design. ACM Trans. Database Syst. 9, 680–710 (1984)CrossRefGoogle Scholar
  31. 31.
    Navathe, S.B., Ra, M.: Vertical partitioning for database design: a graphical algorithm. In: SIGMOD 1989 Conference Proceedings, pp. 440–450 (1989). doi:10.1145/67544.66966
  32. 32.
    Papadomanolakis, S., Ailamaki, A.: An integer linear programming approach to database design. In: ICDE 2007 Workshop Proceedings, pp. 442–449 (2007)Google Scholar
  33. 33.
    Rodríguez, L., Li, X.: A dynamic vertical partitioning approach for distributed database system. In: IEEE SMC Conference Proceedings, pp. 1853–1858 (2011). doi:10.1109/ICSMC.2011.6083941
  34. 34.
    Rodríguez, L., Li, X.: A support-based vertical partitioning method for database design. In: CCE 2011 Conference Proceedings, pp. 1–6 (2011). doi:10.1109/ICEEE.2011.6106682
  35. 35.
    Rodríguez, L., Li, X., Mejía-Alvarez, P.: An active system for dynamic vertical partitioning of relational databases. In: MICAI 2011 Conference Proceedings, pp. 273–284 (2011). doi:10.1007/978-3-642-25330-0_24
  36. 36.
    Sacca, D., Wiederhold, G.: Database partitioning in a cluster of processors. ACM Trans. Database Syst. 10, 29–56 (1985). doi:10.1145/3148.3161 CrossRefMATHGoogle Scholar
  37. 37.
    Slagle, J.R., Chang, C.L., Heller, S.R.: A clustering and data-reorganizing algorithm. IEEE Trans. Syst. Man Cybern. 5(1), 125–128 (1975)CrossRefMATHGoogle Scholar
  38. 38.
    HyunSon, J., HoKim, M.: α-Partitioning algorithm: vertical partitioning based on the fuzzy graph. In: Mayr, H.C., Lazansky, J., Quirchmayr, G., Vogel, P. (eds.) DEXA 2001. LNCS, vol. 2113, pp. 537–546. Springer, Heidelberg (2001). doi:10.1007/3-540-44759-8_53 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Viacheslav Galaktionov
    • 1
  • George Chernishev
    • 1
    • 2
  • Kirill Smirnov
    • 1
  • Boris Novikov
    • 1
  • Dmitry A. Grigoriev
    • 1
  1. 1.Saint-Petersburg State UniversitySaint-PetersburgRussia
  2. 2.JetBrains ResearchSaint-PetersburgRussia

Personalised recommendations