Very Large Workloads Based Approach to Efficiently Partition Data Warehouses

Part of the Studies in Computational Intelligence book series (SCI, volume 488)

Abstract

Horizontal Partitioning (HP) is an optimization technique widely used to improve the physical design of data warehouses. However, the selection of a partitioning schema is an NP-complete problem. Thus, many approaches were proposed to resolve this problem. Nonetheless, the overwhelming majority of these works do not take into account the size of the workload which can be very large. Huge workload increases the time of HP selection algorithms and may deteriorate the quality of final solution. We propose, in this paper, a new approach based on classification and election to select an HP schema in the case of largesized workloads. We conducted an experimental study on the ABP-1 benchmark to test the effectiveness and scalability of our approach.

Keywords

Horizontal Partitioning Large Workloads Classification 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Feinberg, D.: Database management systems. Technology trends, Gartner (2006)Google Scholar
  2. 2.
    Sanjay, A., Narasayya, V., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 359–370 (2004)Google Scholar
  3. 3.
    Bellatreche, L., Boukhalfa, K., Richard, P.: Data partitioning in data warehouses: Hardness study, heuristics and ORACLE validation. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 87–96. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  4. 4.
    Ceri, S., Negri, M., Pelagatti, G.: Horizontal data partitioning in database design. In: Proceedings of the 1982 ACM SIGMOD International Conference on Management of Data (1982)Google Scholar
  5. 5.
    Bellatreche, L.: Utilisation des vues matérialisées, des index et de la fragmentation dans la conception logique et physique d’un entrepôt de données. Thèse de doctorat, Université de Clermont-Ferrand (2000)Google Scholar
  6. 6.
    Rao, J., Zhang, C., Megiddo, N., Lohman, G.: Automating physical database design in a parallel database. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 558–569. ACM, New York (2002)CrossRefGoogle Scholar
  7. 7.
    Cuzzocrea, A., Darmont, J., Mahboubi, H.: Fragmenting very large xml data warehouses via k-means clustering algorithm, 301–328 (2009)Google Scholar
  8. 8.
    Barr, M., Bellatreche, L.: A new approach based on ants for solving the problem of horizontal fragmentation in relational data warehouses. In: 2010 International Conference on Machine and Web Intelligence (ICMWI), pp. 411–415 (2010)Google Scholar
  9. 9.
    Karima, T., Abdellatif, A., Ounalli, H.: Data mining based fragmentation technique for distributed data warehouses environment using predicate construction technique. In: 2010 Sixth International Conference on Networked Computing and Advanced Information Management (NCM), pp. 63–68 (2010)Google Scholar
  10. 10.
    Rehme, R., Bruno, N.: Automated partitioning design in parallel database systems. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 1137–1148. ACM, New York (2011)Google Scholar
  11. 11.
    MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press (1967)Google Scholar
  12. 12.
    Bellatreche, L., Boukhalfa, K.: An evolutionary approach to schema partitioning selection in a data warehouse. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 115–125. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  13. 13.
    Valduriez, P.: Parallel database systems: open problems and new issues. Kluwer Academic Publishers, Hingham (1993)Google Scholar
  14. 14.
    Valduriez, P., Özsu, M.: Principles of Distributed Database Systems, 2nd edn. Prentice Hall, New Jersey (1999)Google Scholar
  15. 15.
    Fiolet, V., Toursel, B.: Intelligent database distribution on a grid using clustering. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 466–472. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  16. 16.
    Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Paulson, E.: Efficient processing of data warehousing queries in a split execution environment. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 1165–1176. ACM, New York (2011)CrossRefGoogle Scholar
  17. 17.
    Baer, H., et al.: Oracle database vldb and partitioning guide 11g release 2. Technical report, Oracle, Inc, Oracle White Paper (2011)Google Scholar
  18. 18.
    Microsoft, C.: Sql server 2012 performance white paper. Technical report, Microsoft Corporation (2012)Google Scholar
  19. 19.
    Cain, M.: Table partitioning strategies db2. Technical report, IBM (2006)Google Scholar
  20. 20.
    Sacca, D., Wiederhold, G.: Database partitioning in a cluster of processors. In: Proceedings of the 9th International Conference on Very Large Data Bases, pp. 242–247. Morgan Kaufmann Publishers Inc., San Francisco (1983)Google Scholar
  21. 21.
    Navathe, S., Ceri, S., Wiederhold, G., Dou, J.: Vertical partitioning algorithms for database design. ACM Trans. Database Syst. 9(4), 680–710 (1984)CrossRefGoogle Scholar
  22. 22.
    Bellatreche, L., Karlapalem, K., Simonet, A.: Horizontal class partitioning in object-oriented databases. In: Tjoa, A.M. (ed.) DEXA 1997. LNCS, vol. 1308, pp. 58–67. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  23. 23.
    Pham, D., Dimov, S., Nguyen, C.: An incremental k-means algorithm. Journal of Mechanical Engineering Science 7(218), 783–795 (2004)Google Scholar
  24. 24.
    Bellatreche, L., Boukhalfa, K., Richard, P., Woameno, K.Y.: Referential horizontal partitioning selection problem in data warehouses: Hardness study and selection algorithms. IJDWM 5(4), 1–23 (2009)Google Scholar
  25. 25.
    Bouchakri, R., Bellatreche, L., Boukhalfa, K.: Une sélection multiple des structures d’optimisation dirigée par la méthode de classification k-means. In: EDA, pp. 207–222 (2010)Google Scholar
  26. 26.
    Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases. Data Mining Knowlege Discovery KDD 2(2), 169–194 (1996)Google Scholar
  27. 27.
    OLAP-Council: Apb-1 benchmark. Technical report, OLAP Council (1998), http://www.olpacouncil.org/research/resrchly.htm

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  1. 1.ESI SchoolAlgiersAlgeria
  2. 2.USTHB UniversityAlgiersAlgeria

Personalised recommendations