Skip to main content

A Study of Several Matrix-Clustering Vertical Partitioning Algorithms in a Disk-Based Environment

  • Conference paper
  • First Online:
Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2016)

Abstract

In this paper we continue our efforts to evaluate matrix clustering algorithms. In our previous study we presented a test environment and results of preliminary experiments with the “separate” strategy for vertical partitioning. This strategy assigns a separate vertical partition for every cluster found by the algorithm, including inter-submatrix attribute group. In this paper we introduce two other strategies: the “replicate” strategy, which replicates inter-submatrix attributes to every cluster and the “retain” strategy, which assigns inter-submatrix attributes to their original clusters. We experimentally evaluate all strategies in a disk-based environment using the standard TPC-H workload and the PostgreSQL DBMS. We start with the study of record reconstruction methods in the PostgreSQL DBMS. Then, we apply partitioning strategies to three matrix clustering algorithms and evaluate both query performance and storage overhead of the resulting partitions. Finally, we compare the resulting partitioning schemes with the ideal partitioning scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB 1994 Conference Proceedings, pp. 487–499 (1994)

    Google Scholar 

  2. Agrawal, S., Narasayya, V., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: SIGMOD 2004 Conference Proceedings, pp. 359–370 (2004). doi:10.1145/1007568.1007609

  3. Apers, P.M.G.: Data allocation in distributed database systems. ACM Trans. Database Syst. 13, 263–304 (1988). doi:10.1145/44498.45063

    Article  Google Scholar 

  4. Bellatreche, L.: Optimization and tuning in data warehouses. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 1995–2003. Springer, New York (2009). doi:10.1007/978-0-387-39940-9_259

    Google Scholar 

  5. Bellatreche, L., Boukhalfa, K., Richard, P.: Data partitioning in data warehouses: hardness study, heuristics and Oracle validation. In: DAWAK 2008 Conference Proceedings, pp. 87–96 (2008). doi:10.1007/978-3-540-85836-2_9

  6. Bhat, M.V., Haupt, A.: An efficient clustering algorithm. IEEE Trans. Syst. Man Cybern. 6(1), 61–64 (1976). doi:10.1109/TSMC.1976.5408399

    Article  MATH  Google Scholar 

  7. Bouakkaz, M., Ouinten, Y., Ziani, B.: Vertical fragmentation of data warehouses using the FP-Max algorithm. In: IIT 2012 Conference Proceedings, pp. 273–276 (2012)

    Google Scholar 

  8. Chaudhuri, S., Weikum, G.: Self-management technology in databases. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 2550–2555. Springer, New York (2009). doi:10.1007/978-0-387-39940-9_334

    Google Scholar 

  9. Cheng, C.: Algorithms for vertical partitioning in database physical design. Omega 22(3), 291–303 (1994). doi:10.1016/0305-0483(94)90042-6

    Article  Google Scholar 

  10. Cheng, C.-H.: A branch and bound clustering algorithm. IEEE Trans. Syst. Man Cybern. 25(5), 895–898 (1995). doi:10.1109/21.376504

    Article  Google Scholar 

  11. Cheng, C.-H., Lee, W.-K., Wong, K.-F.: A genetic algorithm-based clustering approach for database partitioning. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 32(3), 215–230 (2002). doi:10.1109/TSMCC.2002.804444

    Article  Google Scholar 

  12. Cheng, C.-H., Motwani, J.: An examination of cluster identification-based algorithms for vertical partitions. Int. J. Bus. Inf. Syst. 4(6), 622–638 (2009). doi:10.1504/IJBIS.2009.026695

    Google Scholar 

  13. Chernishev, G.: Towards self-management in a distributed column-store system. In: Morzy, T., Valduriez, P., Bellatreche, L. (eds.) ADBIS 2015. CCIS, vol. 539, pp. 97–107. Springer, Cham (2015). doi:10.1007/978-3-319-23201-0_12

    Chapter  Google Scholar 

  14. Chernishev, G.: Vertical partitioning in relational DBMS. In: Talk at the Moscow ACM SIGMOD chapter meeting; slides and video. http://synthesis.ipi.ac.ru/sigmod/seminar/s20150430

  15. Chu, W., Ieong, I.: A transaction-based approach to vertical partitioning for relational database systems. IEEE Trans. Softw. Eng. 19(8), 804–812 (1993)

    Article  Google Scholar 

  16. Du, J., Barker, K., Alhajj, R.: Attraction - a global affinity measure for database vertical partitioning. In: IADIS ICWI 2003 Conference Proceedings, pp. 538–548 (2003)

    Google Scholar 

  17. Galaktionov, V., Chernishev, G., Novikov, B., Grigoriev, D.: Matrix clustering algorithms for vertical partitioning problem: an initial performance study. In: Selected Papers of the XVIII International Conference on Data Analytics & Management in Data Intensive Domains (DAMDID/RCDL 2016), Ershovo, Moscow Region, Russia, CEUR Workshop Proceedings, vol. 1752, pp. 24–31 (2016)

    Google Scholar 

  18. Gorla, N., Boe, W.J.: Database operating efficiency in fragmented databases in mainframe, mini, and micro system environments. Data Knowl. Eng. 5(1), 1–19 (1990). doi:10.1016/0169-023X(90)90030-H

    Article  Google Scholar 

  19. Gorla, N., Yan, B.P.W.: Vertical fragmentation in databases using data-mining technique. In: Erickson, J. (ed.) Database Technologies: Concepts, Methodologies, Tools, & Applications, pp. 2543–2563. IGI Global (2009)

    Google Scholar 

  20. Hammer, M., Niamir, B.: A heuristic approach to attribute partitioning. In: SIGMOD 1979 Conference Proceedings, pp. 93–101 (1979). doi:10.1145/582095.582110

  21. Harinarayan, V., Rajaraman, A., Ullman, J.D.: Implementing data cubes efficiently. ACM SIGMOD Record 25(2), 205–216 (1996). doi:10.1145/235968.233333

    Article  Google Scholar 

  22. Hoffer, J.A., Severance, D.G.: The use of cluster analysis in physical data base design. In: VLDB 1975 Conference Proceedings, pp. 69–86 (1975). doi:10.1145/1282480.1282486

  23. Jindal, A., Dittrich, J.: Relax and let the database do the partitioning online. In: BIRTE 2011 Workshop Proceedings, pp. 65–80 (2012). doi:10.1007/978-3-642-33500-6_5

  24. King, J.R.: Machine-component grouping in production flow analysis: an approach using a rank order clustering algorithm. Int. J. Prod. Res. 18(2), 213–232 (1980)

    Article  MathSciNet  Google Scholar 

  25. Kusiak, A., Chow, W.: An efficient cluster identification algorithm. IEEE Trans. Syst. Man Cybern. 17(4), 696–699 (1987)

    Article  Google Scholar 

  26. LeFevre, J., Sankaranarayanan, J., Hacigumus, H., Tatemura, J., Polyzotis, N., Carey, M.J.: MISO: souping up big data query processing with a multistore system. In: SIGMOD 2014 Conference Proceedings, pp. 1591–1602 (2014). doi:10.1145/2588555.2588568

  27. Li, L., Gruenwald, L.: Self-managing online partitioner for databases (SMOPD): a vertical database partitioning system with a fully automatic online approach. In: IDEAS 2013 Conference Proceedings, pp. 168–173 (2013). doi:10.1145/2513591.2513649

  28. Lin, X., Orlowska, M., Zhang, Y.: A graph based cluster approach for vertical partitioning in database design. Data Knowl. Eng. 11(2), 151–169 (1993)

    Article  MATH  Google Scholar 

  29. McCormick, W., Schweitzer, P., White, W.: Problem decomposition and data reorganization by a clustering technique. Oper. Res. 20(5), 993–1009 (1972)

    Article  MATH  Google Scholar 

  30. Navathe, S., Ceri, S., Wiederhold, G., Dou, J.: Vertical partitioning algorithms for database design. ACM Trans. Database Syst. 9, 680–710 (1984)

    Article  Google Scholar 

  31. Navathe, S.B., Ra, M.: Vertical partitioning for database design: a graphical algorithm. In: SIGMOD 1989 Conference Proceedings, pp. 440–450 (1989). doi:10.1145/67544.66966

  32. Papadomanolakis, S., Ailamaki, A.: An integer linear programming approach to database design. In: ICDE 2007 Workshop Proceedings, pp. 442–449 (2007)

    Google Scholar 

  33. Rodríguez, L., Li, X.: A dynamic vertical partitioning approach for distributed database system. In: IEEE SMC Conference Proceedings, pp. 1853–1858 (2011). doi:10.1109/ICSMC.2011.6083941

  34. Rodríguez, L., Li, X.: A support-based vertical partitioning method for database design. In: CCE 2011 Conference Proceedings, pp. 1–6 (2011). doi:10.1109/ICEEE.2011.6106682

  35. Rodríguez, L., Li, X., Mejía-Alvarez, P.: An active system for dynamic vertical partitioning of relational databases. In: MICAI 2011 Conference Proceedings, pp. 273–284 (2011). doi:10.1007/978-3-642-25330-0_24

  36. Sacca, D., Wiederhold, G.: Database partitioning in a cluster of processors. ACM Trans. Database Syst. 10, 29–56 (1985). doi:10.1145/3148.3161

    Article  MATH  Google Scholar 

  37. Slagle, J.R., Chang, C.L., Heller, S.R.: A clustering and data-reorganizing algorithm. IEEE Trans. Syst. Man Cybern. 5(1), 125–128 (1975)

    Article  MATH  Google Scholar 

  38. HyunSon, J., HoKim, M.: α-Partitioning algorithm: vertical partitioning based on the fuzzy graph. In: Mayr, H.C., Lazansky, J., Quirchmayr, G., Vogel, P. (eds.) DEXA 2001. LNCS, vol. 2113, pp. 537–546. Springer, Heidelberg (2001). doi:10.1007/3-540-44759-8_53

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to George Chernishev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Galaktionov, V., Chernishev, G., Smirnov, K., Novikov, B., Grigoriev, D.A. (2017). A Study of Several Matrix-Clustering Vertical Partitioning Algorithms in a Disk-Based Environment. In: Kalinichenko, L., Kuznetsov, S., Manolopoulos, Y. (eds) Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2016. Communications in Computer and Information Science, vol 706. Springer, Cham. https://doi.org/10.1007/978-3-319-57135-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57135-5_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57134-8

  • Online ISBN: 978-3-319-57135-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics