Abstract
Next-generation business intelligence (BI) enables enterprises to quickly react in changing business environments. Increasingly, data integration pipelines need to be merged with query pipelines for real-time analytics from operational data. Newly emerging hybrid analytic flows have been becoming attractive which consist of a set of extract-transform-load (ETL) jobs together with analytic jobs running over multiple platforms with different functionality.
In traditional databases, materialized views are used to optimize query performance. In cross-platform, large-scale data transformation environments, similar challenges (e.g. view selection) arise when using materialized views. In this work, we propose an approach that generates materialized views in hybrid flows and maintains these views in a query-driven, incremental manner. To accelerate data integration processes, the location of a materialization point in a transformation flow varies dynamically based on metrics like source update rates and maintenance cost in terms of flow operations. Besides, by picking up the most suitable platform for accommodating views, for example, materializing and maintaining intermediate results of Hadoop jobs in relational databases, better performance has been shown.
Similar content being viewed by others
References
Joerg T, Dessloch S (2008) Towards generating ETL processes for incremental loading. In: IDEAS ’08 Proceedings of the 2008 international symposium on Database engineering & applications 101–110
Dayal U, Castellanos M, Simitsis A, Wilkinson K (2009) Data integration flows for business intelligence. EDBT ’09 Proceedings of the 12th international conference on extending database technology: advances in database technology, 1–11
Simitsis A, Wilkinson K, Castellanos M, Dayal U (2012) Optimizing analytic data flows for multiple execution engines. SIGMOD ’12 Proceedings of the 2012 ACM SIGMOD international conference on management of data, 829–840
Oracle white paper (2012) Best practices for real-time data warehousing
http://strataconf.com/stratany2013/public/schedule/detail/30630. Accessed: 1 Nov. 2013
Blakeley JA, Larson PA, Tompa FW (1986) Efficiently updating materialized views. ACM SIGMOD Record 15:61–71
Gupta A, Mumick IS (1995) Maintenance of materialized views: problems, techniques, and applications. IEEE Data Eng Bull 18:3–18
Zhuge Y, Garcia-Molina H, Hammer J, Widom J (1995) View maintenance in a warehousing environment. ACM SIGMOD Record 24:316–217
Gupta H (1997) Selection of views to materialize in a data warehouse. ICDT ’97 Proceedings of the 6th international conference on database theory, 98–112
Hanson EN (1987) A performance analysis of view materialization strategies. SIGMOD ’87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data 440–453
Griffin T, Libkin L, Trickey H (1997) An improved algorithm for the incremental recomputation of active relational expressions. IEEE Trans Knowl Data Eng 9:508–511
Dessloch S, Hernandez MA, Wisnesky R, Radwan A, Zhou J (2008) Orchid: integrating schema mapping and ETL. ICDE ’08 Proceedings of the 2008 IEEE 24th international conference on data engineering, 1307–1316
Kossmann D (2000) The state of the art in distributed query processing. ACM Comput Surv 32:422–469
Simitsis A, Vassiliadis P, Timos S (2005) Optimizing ETL processes in data warehouses. ICDE ’05 Proceedings of the 21st international conference on data engineering, 564–575
Behrend A, Joerg T (2010) Optimized incremental ETL jobs for maintaining data warehouses. IDEAS ’10 Proceedings of the 14th international database engineering & applications symposium, 216–224
Gupta A, Jagadish HV, Mumick IS (1996) Data integration using self-maintainable views. EDBT ’96 Proceedings of the 5th international conference on extending database technology: advances in database technology, 140–144
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Qu, W., Dessloch, S. A Real-time Materialized View Approach for Analytic Flows in Hybrid Cloud Environments. Datenbank Spektrum 14, 97–106 (2014). https://doi.org/10.1007/s13222-014-0155-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13222-014-0155-0