Advertisement

Taming Size and Cardinality of OLAP Data Cubes over Big Data

  • Alfredo CuzzocreaEmail author
  • Rim Moussa
  • Achref Laabidi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10365)

Abstract

In this paper, we provide three authoritative application scenarios of TPC-H*d. The latter is a suitable transformation of TPC-H benchmark. The three application scenarios are (i) OLAP cube calculus on top of columnar relational DBMS, (ii) parallel OLAP data cube processing and (iii) virtual OLAP data cube design. We assess the effectiveness and the efficiency of our proposal, using open source systems, namely, Mondrian ROLAP server and its OLAP4j driver, MySQL -row oriented relational database management system and MonetDB -a column-oriented relational database management system.

Keywords

Multidimensional databases Data warehousing Schema evolution Logical OLAP Design 

References

  1. 1.
    Abadi, D., Boncz, P.A., Harizopoulos, S., Idreos, S., Madden, S.: The design and implementation of modern column-oriented database systems. Found. Trends Databases 5(3), 197–280 (2013)CrossRefGoogle Scholar
  2. 2.
    Agarwal, S., Agrawal, R., Deshpande, P., Gupta, A., Naughton, J.F., Ramakrishnan, R., Sarawagi, S.: On the computation of multidimensional aggregates. In: 22th International Conference on Very Large Data Bases, pp. 506–521 (1996)Google Scholar
  3. 3.
    Agrawal, D., Das, S., El Abbadi, A.: Big data and cloud computing: current state and future opportunities. In: 14th International Conference on Extending Database Technology, EDBT, pp. 530–533. ACM (2011)Google Scholar
  4. 4.
    Agrawal, R., Gupta, A., Sarawagi, S.: Modeling multidimensional databases. In: 13th International Conference on Data Engineering, pp. 232–243 (1997)Google Scholar
  5. 5.
    Codd, E.F., Codd, S.B., Salley, C.T.: Providing OLAP (on-line analytical processing) to user-analysts: an IT mandate. Codd Date 32, 3–5 (1993)Google Scholar
  6. 6.
    Cuzzocrea, A.: Data warehousing and OLAP over big data: a survey of the state-of-the-art, open problems and future challenges. IJBPIM 7(4), 372–377 (2015)CrossRefGoogle Scholar
  7. 7.
    Cuzzocrea, A., Matrangolo, U.: Analytical synopses for approximate query answering in OLAP environments. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds.) DEXA 2004. LNCS, vol. 3180, pp. 359–370. Springer, Heidelberg (2004). doi: 10.1007/978-3-540-30075-5_35 CrossRefGoogle Scholar
  8. 8.
    Cuzzocrea, A., Moussa, R.: Multidimensional database design via schema transformation: Turning TPC-H into the TPC-H*d multidimensional benchmark. In: 19th International Conference on Management of Data, pp. 56–67 (2013)Google Scholar
  9. 9.
    Cuzzocrea, A., Moussa, R.: A cloud-based framework for supporting effective and efficient OLAP in big data environments. In: 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 680–684 (2014)Google Scholar
  10. 10.
    Cuzzocrea, A., Moussa, R., Akaichi, H.: AutoMDB: A framework for automated multidimensional database design via schema transformation. In: 19th International Conference on Management of Data, pp. 93–94 (2013)Google Scholar
  11. 11.
    Cuzzocrea, A., Saccà, D., Serafino, P.: A hierarchy-driven compression technique for advanced OLAP visualization of multidimensional data cubes. In: 8th International Conference on Data Warehousing and Knowledge Discovery, pp. 106–119 (2006)Google Scholar
  12. 12.
    DeWitt, D.J., Madden, S., Stonebraker, M.: How to build a high-performance data warehouse (2005). http://db.lcs.mit.edu/madden/highperf.pdf
  13. 13.
    Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. J. Data Min. Knowl. Disc. 1(1), 29–53 (1997)CrossRefGoogle Scholar
  14. 14.
    Gyssens, M., Lakshmanan, L.V.S.: A foundation for multi-dimensional databases. In: 23rd International Conference on Very Large Data Bases, pp. 106–115 (1997)Google Scholar
  15. 15.
    Hoffer, J.A., Severance, D.G.: The use of cluster analysis in physical data base design. In: 1st International Conference on Very Large Data Bases, pp. 69–86 (1975)Google Scholar
  16. 16.
    Inmon, W.H.: Building the Data Warehouse. Wiley, New York (2005)Google Scholar
  17. 17.
    Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. Wiley, New York (2013)Google Scholar
  18. 18.
    Lima, A.A., Furtado, C., Valduriez, P., Mattoso, M.: Parallel olap query processing in database clusters with data replication. Distrib. Parallel Databases 25, 97–123 (2009)CrossRefGoogle Scholar
  19. 19.
    MonetDB: The column-store pionneer (2015). https://www.monetdb.org/Home
  20. 20.
    Moussa, R.: Massive data analytics in the cloud: TPC-H experience on hadoop clusters. IJWA 4(3), 113–133 (2012)MathSciNetGoogle Scholar
  21. 21.
    Moussa, R.: TPC-H benchmark analytics scenarios and performances on hadoop data clouds. In: 4th International Conference on Networked Digital Technologies, pp. 220–234 (2012)Google Scholar
  22. 22.
    Navathe, S.B., Ra, G.M.: Vertical partitioning for database design: a graphical algorithm. In: International Conference on Management of Data, SIGMOD, pp. 440–450. ACM (1989)Google Scholar
  23. 23.
    Pentaho: Mondrian ROLAP Server (2013). http://mondrian.pentaho.org/
  24. 24.
    Stöhr, T., Rahm, E.: Warlock: A data allocation tool for parallel warehouses. In: 27th International Conference on Very Large Data Bases, pp. 721–722 (2001)Google Scholar
  25. 25.
    Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E.J., O’Neil, P.E., Rasin, A., Tran, N., Zdonik, S.B.: C-store: a column-oriented DBMS. In: 31st International Conference on Very Large Data Bases, pp. 553–564 (2005)Google Scholar
  26. 26.
    Surajit, C., Umeshwar, D.: An overview of data warehousing and OLAP technology. In: SIGMOD Record, vol. 26, pp. 65–74. ACM (1997)Google Scholar
  27. 27.
    Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H., Murthy, R.: Hive - a petabyte scale data warehouse using hadoop. In: ICDE, pp. 996–1005 (2010)Google Scholar
  28. 28.
    Transaction Processing Council: TPC-H benchmark (2013). http://www.tpc.org
  29. 29.
    Vassiliadis, P.: Modeling multidimensional databases, cubes and cube operations. In: 10th International Conference on Scientific and Statistical Database Management, SSDBM, pp. 53–62 (1998)Google Scholar
  30. 30.
    Zukowski, M., Boncz, P.A.: Vectorwise: beyond column stores. IEEE Data Eng. Bull. 35(1), 21–27 (2012)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.DIA DepartmentUniversity of Trieste and ICAR-CNRTriesteItaly
  2. 2.LaTICE, ENI-CarthageUniversity of CarthageCarthageTunisia
  3. 3.ENI-CarthageUniversity of CarthageCarthageTunisia

Personalised recommendations