Stream Processing on Demand for Lambda Architectures

Kroß, Johannes; Brunnert, Andreas; Prehofer, Christian; Runkler, Thomas A.; Krcmar, Helmut

doi:10.1007/978-3-319-23267-6_16

Johannes Kroß¹⁶,
Andreas Brunnert¹⁶,
Christian Prehofer¹⁶,
Thomas A. Runkler¹⁷ &
…
Helmut Krcmar¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9272))

Included in the following conference series:

European Workshop on Performance Engineering

1093 Accesses
8 Citations

Abstract

Growing amounts of data and the demand to process them within time constraints have led to the development of big data systems. A generic principle to design such systems that allows for low latency results is called the lambda architecture. It defines that data is analyzed twice by combining batch and stream processing techniques in order to provide a real time view. This redundant processing of data makes this architecture very expensive. In cases where process results are not continuously required to be low latency or time constraints lie within several minutes, a clear decision whether both processing layers are inevitable is not possible yet. Therefore, we propose stream processing on demand within the lambda architecture in order to efficiently use resources and reduce hardware investments. We use performance models as an analytical decision-making solution to predict response times of batch processes and to decide when to additionally deploy stream processes. By the example of a smart energy use case we implement and evaluate the accuracy of our proposed solution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alrokayan, M., Vahid Dastjerdi, A., Buyya, R.: Sla-aware provisioning and scheduling of cloud resources for big data analytics. In: Proceedings of the 2014 IEEE International Conference on Cloud Computing in Emerging Markets, pp. 1–8. IEEE (2014)
Google Scholar
Amazon Web Services: Amazon Kinesis (2015). http://aws.amazon.com/kinesis/ (accessed: April 28, 2015)
Aniello, L., Baldoni, R., Querzoni, L.: Adaptive online scheduling in storm. In: Proceedings of the 7th ACM International Conference on Distributed Event-based Systems, pp. 207–218. ACM, New York (2013)
Google Scholar
Apache Cassandra: The Apache Cassandra project (2015). http://cassandra.apache.org/ (accessed April 28, 2015)
Apache Hadoop: Welcome to Apache Hadoop! (2015). http://hadoop.apache.org/ (accessed April 28, 2015)
Kafka, A.: A high-throughput distributed messaging system (2015). http://kafka.apache.org/ (accessed April 28, 2015)
Apache Pig: Welcomt to Apache Pig! (2014). https://pig.apache.org/ (accessed April 28, 2015)
Apache Samza: Samza (2015). http://samza.apache.org/ (accessed April 28, 2015)
Apache Spark: Lightning-fast cluster computing (2015). https://spark.apache.org/ (accessed April 28, 2015)
Apache Storm: Storm, distributed and fault-tolerant realtime computation (2015). http://storm.apache.org/ (accessed April 28, 2015)
Barbierato, E., Gribaudo, M., Iacono, M.: Performance evaluation of nosql big-data applications using multi-formalism models. Future Generation Computer Systems 37, 345–353 (2014)
Article Google Scholar
Becker, S., Koziolek, H., Reussner, R.: The palladio component model for model-driven performance prediction. The Journal of Systems and Software 82(1), 3–22 (2009)
Article Google Scholar
Brosig, F., Meier, P., Becker, S., Koziolek, A., Koziolek, H., Kounev, S.: Quantitative evaluation of model-driven performance analysis and simulation of component-based architectures. IEEE Transactions on Software Engineering 41(2), 157–175 (2015)
Article Google Scholar
Brunnert, A., Vögele, C., Danciu, A., Pfaff, M., Mayer, M., Krcmar, H.: Performance management work. Business & Information Systems Engineering 6(3), 177–179 (2014)
Article Google Scholar
Casado, R., Younas, M.: Emerging trends and technologies in big data processing. Concurrency and Computation: Practice and Experience 27(8), 2078–2091 (2015)
Article Google Scholar
Castiglione, A., Gribaudo, M., Iacono, M., Palmieri, F.: Modeling performances of concurrent big data applications. Practice and Experience, Software (2014)
Google Scholar
Chen, C.L.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Information Sciences 275, 314–347 (2014)
Article Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Article Google Scholar
Faulstich, S., Hahn, B., Tavner, P.J.: Wind turbine downtime and its importance for offshore deployment. Wind Energy 14(3), 327–337 (2011)
Article Google Scholar
Faulstich, S., Lyding, P., Tavner, P.: Effects of wind speed on wind turbine availability (2011)
Google Scholar
Herbst, N.R., Huber, N., Kounev, S., Amrehn, E.: Self-adaptive workload classification and forecasting for proactive resource provisioning. Concurrency and Computation: Practice and Experience 26(12), 2053–2078 (2014)
Article Google Scholar
von Kistowski, J., Herbst, N.R., Kounev, S.: LIMBO: A tool for modeling variable load intensities. In: Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering, pp. 225–226. ACM, New York (2014)
Google Scholar
Kroß, J., Brunnert, A., Prehofer, C., Runkler, T.A., Krcmar, H.: Model-based performance evaluation of large-scale smart metering architectures. In: Proceedings of the 4th International Workshop on Large-Scale Testing, pp. 9–12. ACM, New York (2015)
Google Scholar
Liu, X., Iftikhar, N., Xie, X.: Survey of real-time processing systems for big data. In: Proceedings of the 18th International Database Engineering & Applications Symposium, pp. 356–361. ACM, New York (2014)
Google Scholar
Martnez-Prieto, M.A., Cuesta, C.E., Arias, M., Fernnde, J.D.: The solid architecture for real-time management of big semantic data. Future Generation Computer Systems 47, 62–79 (2015), special Section: Advanced Architectures for the Future Generation of Software-Intensive Systems
Google Scholar
Marz, N., Warren, J.: Big data: principles and best practices of scalable real-time data systems. Manning Publications Co. (2015)
Google Scholar
Nabi, Z., Wagle, R., Bouillet, E.: The best of two worlds: integrating ibm infosphere streams with apache yarn. In: Proceedings of the 2014 IEEE International Conference on Big Data, pp. 47–51. IEEE (2014)
Google Scholar
Rychlý, M., Škoda, P., Smrž, P.: Heterogeneity-aware scheduler for stream processing frameworks. International Journal of Big Data Intelligence 2(2), 70–80 (2015)
Article Google Scholar
Schäfer, A.M., Zimmermann, H.-G.: Recurrent Neural Networks Are Universal Approximators. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 632–640. Springer, Heidelberg (2006)
Chapter Google Scholar
Schermann, M., Hemsen, H.: Buchmller, C., Bitter, T., Krcmar, H., Markl, V., Hoeren, T.: Big data - an interdisciplinary opportunity for information systems research. Business & Information. Systems Engineering 6(5), 261–266 (2014)
Google Scholar
Sequeira, H., Carreira, P., Goldschmidt, T., Vorst, P.: Energy cloud: Real-time cloud-native energy management system to monitor and analyze energy consumption in multiple industrial sites. In: Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, pp. 529–534. IEEE (2014)
Google Scholar
Spinner, S., Casale, G., Zhu, X., Kounev, S.: LibReDE: a library for resource demand estimation. In: Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering (ICPE 2014), pp. 227–228. ACM, New York (2014)
Google Scholar
Taylor, J.W.: An evaluation of methods for very short-term load forecasting using minute-by-minute british data. International Journal of Forecasting 24(4), 645–658 (2008)
Article Google Scholar
Verma, A., Cherkasova, L., Campbell, R.H.: Aria: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp. 235–244. ACM, New York (2011)
Google Scholar
Vianna, E., Comarela, G., Pontes, T., Almeida, J., Almeida, V., Wilkinson, K., Kuno, H., Dayal, U.: Analytical performance models for mapreduce workloads. International Journal of Parallel Programming 41(4), 495–525 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

fortiss GmbH, Guerickestr. 25, 80805, Munich, Germany
Johannes Kroß, Andreas Brunnert & Christian Prehofer
Siemens AG, Corporate Technology, Otto-Hahn-Ring 6, 81739, Munich, Germany
Thomas A. Runkler
Technische Universität München, Boltzmannstr. 3, 85748, Garching, Germany
Helmut Krcmar

Authors

Johannes Kroß
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Brunnert
View author publications
You can also search for this author in PubMed Google Scholar
Christian Prehofer
View author publications
You can also search for this author in PubMed Google Scholar
Thomas A. Runkler
View author publications
You can also search for this author in PubMed Google Scholar
Helmut Krcmar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Johannes Kroß .

Editor information

Editors and Affiliations

Universidad Rey Juan Carlos, Mostoles (Madrid), Spain
Marta Beltrán
Imperial College London, London, United Kingdom
William Knottenbelt
Imperial College London, London, United Kingdom
Jeremy Bradley

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kroß, J., Brunnert, A., Prehofer, C., Runkler, T.A., Krcmar, H. (2015). Stream Processing on Demand for Lambda Architectures. In: Beltrán, M., Knottenbelt, W., Bradley, J. (eds) Computer Performance Engineering. EPEW 2015. Lecture Notes in Computer Science(), vol 9272. Springer, Cham. https://doi.org/10.1007/978-3-319-23267-6_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-23267-6_16
Published: 22 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23266-9
Online ISBN: 978-3-319-23267-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics