Advertisement

SINGLE vs. MapReduce vs. Relational: Predicting Query Execution Time

  • Maryam AbbasiEmail author
  • Pedro MartinsEmail author
  • José CecílioEmail author
  • João CostaEmail author
  • Pedro FurtadoEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 928)

Abstract

Over the past decade’s several new concepts emerged to organize and query data over large Data Warehouse (DW) system with the same primary objective, that is, optimize processing speed. More recently, with the rise of BigData concept, storage cost lowered significantly, and performance (random accesses) increased, particularly with modern SSD disks. This paper introduces and tested a storage alternative which goes against current data normalization premises, where storage space is no longer a concern. By de-normalizing the entire data schema (transparent to the user) it is proposed a new concept system where query execution time must be entirely predictable, independently of its complexity, called, SINGLE. The proposed data model also allows easy partitioning and distributed processing to enable execution parallelism, boosting performance, as happens in MapReduce. TPC-H benchmark is used to evaluate storage space and query performance. Results show predictable performance when comparing with approaches based on a normalized relational schema, and MapReduce oriented.

Keywords

Predictable Query execution Data warehouse MapReduce Normalization De-normalization Distributed Relational 

Notes

Acknowledgements

This work is financed by national funds through FCT - Fundação para a Ciência e Tecnologia, I.P., under the project UID/Multi/04016/2016. Furthermore, we would like to thank the Instituto Politécnico de Viseu and CI&DETS for their support.

References

  1. 1.
    Chaudhuri, S., Das, G., Narasayya, V.: Optimized stratified sampling for approximate query processing. ACM Trans. Database Syst. (TODS) 32(2), 9 (2007)CrossRefGoogle Scholar
  2. 2.
    Cheng, D., Zhou, X., Lama, P., Wu, J., Jiang, C.: Cross-platform resource scheduling for Spark and MapReduce on YARN. IEEE Trans. Comput. 66, 1341–1353 (2017)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Council, Transaction Processing Performance: TPC-H benchmark specification, vol. 21, pp. 592–603 (2008). http://www.tcp.org
  4. 4.
    DeWitt, D.J., Katz, R.H., Olken, F., Shapiro, L.D., Stonebraker, M.R., Wood, D.A.: Implementation techniques for main memory database systems, vol. 14. ACM (1984)Google Scholar
  5. 5.
    Harris, E.P., Ramamohanarao, K.: Join algorithm costs revisited. VLDB J.—Int. J. Very Large Data Bases 5(1), 064–084 (1996)CrossRefGoogle Scholar
  6. 6.
    Kimball, R.: The Data Warehouse Lifecycle Toolkit. Wiley, Hoboken (2008)Google Scholar
  7. 7.
    Lamb, A., et al.: The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012)CrossRefGoogle Scholar
  8. 8.
    Lemire, D., Kaser, O., Aouiche, K.: Sorting improves word-aligned bitmap indexes. Data Knowl. Eng. 69(1), 3–28 (2010)CrossRefGoogle Scholar
  9. 9.
    Mutharaju, R., Maier, F., Hitzler, P.: A MapReduce algorithm for SC. In: 23rd International Workshop on Description Logics DL2010, p. 456 (2010)Google Scholar
  10. 10.
    O’Neil, P., O’Neil, E., Chen, X., Revilak, S.: The star schema benchmark and augmented fact table indexing. In: Nambiar, R., Poess, M. (eds.) TPCTC 2009. LNCS, vol. 5895, pp. 237–252. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-10424-4_17CrossRefGoogle Scholar
  11. 11.
    Patel, J.M., Carey, M.J., Vernon, M.K.: Accurate modeling of the hybrid hash join algorithm. In: ACM SIGMETRICS Performance Evaluation Review, vol. 22, pp. 56–66. ACM (1994)Google Scholar
  12. 12.
    Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 165–178. ACM (2009)Google Scholar
  13. 13.
    Pinto, Y.: A framework for systematic database denormalization. Glob. J. Comput. Sci. Technol. 9(4), 44–52 (2009)Google Scholar
  14. 14.
    Roy, S., Shit, B., Sen, S.: Association based multi-attribute analysis to construct materialized view. In: Chaki, R., Saeed, K., Cortesi, A., Chaki, N. (eds.) Advanced Computing and Systems for Security. AISC, vol. 567, pp. 115–131. Springer, Singapore (2017).  https://doi.org/10.1007/978-981-10-3409-1_8CrossRefGoogle Scholar
  15. 15.
    Sanders, G.L., Shin, S.: Denormalization effects on performance of RDBMS. In: Proceedings of the 34th Annual Hawaii International Conference on System Sciences 2001, p. 9. IEEE (2001)Google Scholar
  16. 16.
    Zaker, M., Phon-Amnuaisuk, S., Haw, S.C.: Optimizing the data warehouse design by hierarchical denormalizing. In: Proceedings of the 8th Conference on Applied Computer Scince, pp. 131–138. World Scientific and Engineering Academy and Society (WSEAS) (2008)Google Scholar
  17. 17.
    Zhang, Y., Hu, W., Wang, S.: MOSS-DB: a hardware-aware OLAP database. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 582–594. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-14246-8_57CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Computer SciencesUniversity of CoimbraCoimbraPortugal
  2. 2.Polytechnic Institute of ViseuViseuPortugal
  3. 3.Polytechnic Institute of CoimbraCoimbraPortugal

Personalised recommendations