Advertisement

An Innovative Lambda-Architecture-Based Data Warehouse Maintenance Framework for Effective and Efficient Near-Real-Time OLAP over Big Data

  • Alfredo Cuzzocrea
  • Rim Moussa
  • Gianni Vercelli
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10968)

Abstract

In order to speed-up query processing in the context of Data Warehouse Systems, auxiliary summaries, such as materialized views and calculated attributes, are built on top of the data warehouse relations. As changes are made to the data warehouse through maintenance transactions, summary data become stale, unless the refresh of summary data is characterized by an expensive cost. The challenge gets even worst when near real-time environments are considered, even with respect to emerging Big Data features. In this paper, inspired by the well-known Lambda architecture, we introduce a novel approach for effectively and efficiently supporting data warehouse maintenance processes in the context of near real-time OLAP scenarios, making use of so-called big summary data, and we assess it via an empirical study that stresses the complexity of such OLAP scenarios via using the popular TPC-H benchmark.

References

  1. 1.
    Cuzzocrea, A., Song, I., Davis, K.C.: Analytics over large-scale multidimensional data: the big data revolution! In: Proceedings of DOLAP 2011, pp. 101–104. ACM (2011)Google Scholar
  2. 2.
    Cuzzocrea, A.: Aggregation and multidimensional analysis of big data for large-scale scientific applications: models, issues, analytics, and beyond. In: Proceedings of the 27th International Conference on Scientific and Statistical Database Management, SSDBM 2015, La Jolla, 29 June–1 July 2015, pp. 23:1–23:6 (2015)Google Scholar
  3. 3.
    Cuzzocrea, A., Bellatreche, L., Song, I.: Data warehousing and OLAP over big data: current challenges and future research directions. In: Proceedings of the Sixteenth International Workshop on Data Warehousing and OLAP, DOLAP 2013, San Francisco, 28 October 2013, pp. 67–70 (2013)Google Scholar
  4. 4.
    Cuzzocrea, A.: Analytics over big data: Exploring the convergence of data warehousing, OLAP and data-intensive cloud infrastructures. In: 37th Annual IEEE Computer Software and Applications Conference, COMPSAC 2013, Kyoto, 22–26 July 2013, pp. 481–483 (2013)Google Scholar
  5. 5.
    Gupta, H., Mumick, I.S.: Selection of views to materialize in a data warehouse. IEEE Trans. Knowl. Data Eng. 17(1), 24–43 (2005)CrossRefGoogle Scholar
  6. 6.
    Härder, T., Reuter, A.: Principles of transaction-oriented database recovery. ACM Comput. Surv. 15(4), 287–317 (1983)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. John Wiley, New York (2013)Google Scholar
  8. 8.
    Cuzzocrea, A.: CAMS: OLAPing multidimensional data streams efficiently. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 48–62. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-03730-6_5CrossRefGoogle Scholar
  9. 9.
    Marz, N.: Big Data: Principles and Best Practices of Scalable Realtime Data Systems. O’Reilly Media, [S.l.] (2013)Google Scholar
  10. 10.
    Cuzzocrea, A., Saccà, D., Ullman, J.D.: Big data: a research agenda. In: 17th International Database Engineering & Applications Symposium, IDEAS 2013, Barcelona, 09–11 October 2013, pp. 198–203 (2013)Google Scholar
  11. 11.
    Transaction Processing Council: TPC-H Benchmark (2013). http://www.tpc.org/tpch
  12. 12.
    Cuzzocrea, A., Moussa, R.: Towards lambda-based near real-time OLAP over big data. In: 42nd IEEE International Conference on Computers, Software and Applications, Tokyo, 23–27 July 2018Google Scholar
  13. 13.
    Gupta, A., Mumick, I.S.: Maintenance of materialized views: problems, techniques, and applications. IEEE Data Eng. Bull. 18(2), 3–18 (1995)Google Scholar
  14. 14.
    Krishnan, K.: Data Warehousing in the Age of Big Data. Morgan Kaufmann, Waltham (2013)Google Scholar
  15. 15.
    Agrawal, D., Das, S., El Abbadi, A.: Big data and cloud computing: current state and future opportunities. In: Proceedings of the 14th International Conference on Extending Database Technology, EDBT 2011, Uppsala, Sweden, 21–24 March 2011, pp. 530–533 (2011)Google Scholar
  16. 16.
    Inmon, W.H.: Building the Data Warehouse. Wiley, New York (2005)Google Scholar
  17. 17.
    Transaction Processing Council: TPC-DS Benchmark (2013). http://www.tpc.org/tpcds
  18. 18.
    Nambiar, R.O., Poess, M.: The making of TPC-DS. In: Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, 12–15 September 2006, pp. 1049–1058 (2006)Google Scholar
  19. 19.
    Berenson, H., Bernstein, P., Gray, J., Melton, J., O’Neil, E., O’Neil, P.: A critique of ANSI SQL isolation levels. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD, pp. 1–10 (1995)Google Scholar
  20. 20.
    Pritchett, D.: Base: an acid alternative. Queue 6(3), 48–55 (2008)CrossRefGoogle Scholar
  21. 21.
    Nguyen, T.M., Tjoa, A.M., Schiefer, J.: Towards the stream analysis model in grid-based zero-latency data stream warehouse. In: Professional Knowledge Management - Experiences and Visions, Contributions to the 3rd Conference Professional Knowledge Management - Experiences and Visions, WM, pp. 630–635 (2005)Google Scholar
  22. 22.
    Nguyen, T.M., Brezany, P., Tjoa, A.M., Weippl, E.R.: Toward a grid-based zero-latency data warehousing implementation for continuous data streams processing. IJDWM 1(4), 22–55 (2005)Google Scholar
  23. 23.
    Doka, K., Tsoumakos, D., Koziris, N.: Efficient updates for a shared nothing analytics platform. In: Proceedings of the Workshop on Massive Data Analytics on the Cloud, MDAC, pp. 7:1–7:6 (2010)Google Scholar
  24. 24.
    Pereira, D., Azevedo, L.G., Tanaka, A.K., Baião, F.A.: Real time data loading and OLAP queries: living together in next generation BI environments. JIDM 3(2), 110–119 (2012)Google Scholar
  25. 25.
    Dehne, F., Zaboli, H.: Parallel real-time OLAP on multi-core processors. In: Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2012, pp. 588–594. IEEE Computer Society (2012)Google Scholar
  26. 26.
    Dehne, F.K.H.A., Kong, Q., Rau-Chaplin, A., Zaboli, H., Zhou, R.: A distributed tree data structure for real-time OLAP on cloud architectures. In: Proceedings of the IEEE International Conference on Big Data, pp. 499–505 (2013)Google Scholar
  27. 27.
    Dehne, F., Zaboli, H.: Parallel real-time OLAP on multi-core processors. IJDWM 11(1), 23–44 (2015)Google Scholar
  28. 28.
    Li, F., Özsu, M.T., Chen, G., Ooi, B.C.: R-store: a scalable distributed system for supporting real-time analytics. In: IEEE 30th International Conference on Data Engineering, ICDE, pp. 40–51 (2014)Google Scholar
  29. 29.
    Li, F., Özsu, M.T., Chen, G., Ooi, B.C.: R-Store - Source Code (2015). https://github.com/lifeng5042/RStore
  30. 30.
    Ferreira, N., Martins, P., Furtado, P.: Near real-time with traditional data warehouse architectures: factors and how-to. In: 17th International Database Engineering & Applications Symposium, IDEAS, pp. 68–75 (2013)Google Scholar
  31. 31.
    Ferreira, N., Furtado, P.: Real-time data warehouse: a solution and evaluation. IJBIDM 8(3), 244–263 (2013)CrossRefGoogle Scholar
  32. 32.
    Cuzzocrea, A., Ferreira, N., Furtado, P.: Enhancing traditional data warehousing architectures with real-time capabilities. In: Foundations of Intelligent Systems - 21st International Symposium, ISMIS Proceedings, pp. 456–465 (2014)Google Scholar
  33. 33.
    Cuzzocrea, A., Ferreira, N., Furtado, P.: Real-time data warehousing: a rewrite/merge approach. In: 16th International Conference on Data Warehousing and Knowledge Discovery, DaWaK, pp. 78–88 (2014)Google Scholar
  34. 34.
    Gupta, A., Yang, F., Govig, J., Kirsch, A., Chan, K., Lai, K., Wu, S., Dhoot, S.G., Kumar, A.R., Agiwal, A., Bhansali, S., Hong, M., Cameron, J., Siddiqi, M., Jones, D., Shute, J., Gubarev, A., Venkataraman, S., Agrawal, D.: Mesa: Geo-replicated, near real-time, scalable data warehousing. PVLDB 7(12), 1259–1270 (2014)Google Scholar
  35. 35.
    LinkedIn: Pinot - A Realtime Distributed OLAP Datastore (2015). https://github.com/linkedin/pinot/
  36. 36.
    Yang, F., Tschetter, E., Léauté, X., Ray, N., Merlino, G., Ganguli, D.: Druid: a real-time analytical data store. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, pp. 157–168. ACM (2014)Google Scholar
  37. 37.
    Salem, K., Beyer, K., Lindsay, B., Cochrane, R.: How to roll a join: asynchronous incremental view maintenance. SIGMOD Rec. 29(2), 129–140 (2000)CrossRefGoogle Scholar
  38. 38.
    Quass, D., Widom, J.: On-line warehouse view maintenance. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD, pp. 393–404 (1997)Google Scholar
  39. 39.
    Agrawal, D., El Abbadi, A., Singh, A., Yurek, T.: Efficient view maintenance at data warehouses. SIGMOD Rec. 26(2), 417–427 (1997)CrossRefGoogle Scholar
  40. 40.
    Huyn, N.: Multiple-view self-maintenance in data warehousing environments. In: Proceedings of 23rd International Conference on Very Large Data Bases, VLDB 1997, pp. 26–35 (1997)Google Scholar
  41. 41.
    Krishnan, S., Wang, J., Franklin, M.J., Goldberg, K., Kraska, T.: Stale view cleaning: getting fresh answers from stale materialized views. Proc. VLDB Endow. 8(12), 1370–1381 (2015)CrossRefGoogle Scholar
  42. 42.
    Marz, N., Warren, J.: Principles and Best Practices of Scalable Realtime Data Systems. Manning, New York (2015)Google Scholar
  43. 43.
    Marz, N., Warren, J.: Big Data: Principles and Best Practices of Scalable Realtime Data Systems, 1st edn. Manning Publications Co., Greenwich (2015)Google Scholar
  44. 44.
    Kiran, M., Murphy, P., Monga, I., Dugan, J., Baveja, S.S.: Lambda architecture for cost-effective batch and speed big data processing. In: IEEE International Conference on Big Data, pp. 2785–2792 (2015)Google Scholar
  45. 45.
    Piekos, J.: Simplifying the (Complex) Lambda Architecture (2014). https://voltdb.com/blog/simplifying-complex-lambda-architecture
  46. 46.
    Roussopoulos, N.: Materialized views and data warehouses. SIGMOD Rec. 27(1), 21–26 (1998)CrossRefGoogle Scholar
  47. 47.
    Agrawal, S., Chaudhuri, S., Narasayya, V.R.: Automated selection of materialized views and indexes in SQL databases. In: Proceedings of 26th International Conference on Very Large Data Bases, pp. 496–505 (2000)Google Scholar
  48. 48.
    Aouiche, K., Jouve, P.E., Darmont, J.: Clustering-based materialized view selection in data warehouses. In: Proceedings of the 10th East European Conference on Advances in Databases and Information Systems, ADBIS, pp. 81–95 (2006)CrossRefGoogle Scholar
  49. 49.
    Hose, K., Klan, D., Marx, M., Sattler, K.: When is it time to rethink the aggregate configuration of your OLAP server? PVLDB 1(2), 1492–1495 (2008)Google Scholar
  50. 50.
    Cuzzocrea, A., Moussa, R.: Multidimensional database modeling: literature survey and research agenda in the big data era. In: IEEE ISNCC 2017, pp. 1–6 (2017)Google Scholar
  51. 51.
    Widom, J.: Integrating heterogeneous databases: lazy or eager? ACM Comput. Surv. 28(4es), 91 (1996)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Alfredo Cuzzocrea
    • 1
  • Rim Moussa
    • 2
  • Gianni Vercelli
    • 3
  1. 1.ICAR-CNRUniversity of TriesteTriesteItaly
  2. 2.LaTICE LaboratoryUniversity of TunisTunisTunisia
  3. 3.University of GenoaGenoaItaly

Personalised recommendations