Skip to main content

Stream Processing on Demand for Lambda Architectures

  • Conference paper
  • First Online:
Computer Performance Engineering (EPEW 2015)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9272))

Included in the following conference series:

Abstract

Growing amounts of data and the demand to process them within time constraints have led to the development of big data systems. A generic principle to design such systems that allows for low latency results is called the lambda architecture. It defines that data is analyzed twice by combining batch and stream processing techniques in order to provide a real time view. This redundant processing of data makes this architecture very expensive. In cases where process results are not continuously required to be low latency or time constraints lie within several minutes, a clear decision whether both processing layers are inevitable is not possible yet. Therefore, we propose stream processing on demand within the lambda architecture in order to efficiently use resources and reduce hardware investments. We use performance models as an analytical decision-making solution to predict response times of batch processes and to decide when to additionally deploy stream processes. By the example of a smart energy use case we implement and evaluate the accuracy of our proposed solution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alrokayan, M., Vahid Dastjerdi, A., Buyya, R.: Sla-aware provisioning and scheduling of cloud resources for big data analytics. In: Proceedings of the 2014 IEEE International Conference on Cloud Computing in Emerging Markets, pp. 1–8. IEEE (2014)

    Google Scholar 

  2. Amazon Web Services: Amazon Kinesis (2015). http://aws.amazon.com/kinesis/ (accessed: April 28, 2015)

  3. Aniello, L., Baldoni, R., Querzoni, L.: Adaptive online scheduling in storm. In: Proceedings of the 7th ACM International Conference on Distributed Event-based Systems, pp. 207–218. ACM, New York (2013)

    Google Scholar 

  4. Apache Cassandra: The Apache Cassandra project (2015). http://cassandra.apache.org/ (accessed April 28, 2015)

  5. Apache Hadoop: Welcome to Apache Hadoop! (2015). http://hadoop.apache.org/ (accessed April 28, 2015)

  6. Kafka, A.: A high-throughput distributed messaging system (2015). http://kafka.apache.org/ (accessed April 28, 2015)

  7. Apache Pig: Welcomt to Apache Pig! (2014). https://pig.apache.org/ (accessed April 28, 2015)

  8. Apache Samza: Samza (2015). http://samza.apache.org/ (accessed April 28, 2015)

  9. Apache Spark: Lightning-fast cluster computing (2015). https://spark.apache.org/ (accessed April 28, 2015)

  10. Apache Storm: Storm, distributed and fault-tolerant realtime computation (2015). http://storm.apache.org/ (accessed April 28, 2015)

  11. Barbierato, E., Gribaudo, M., Iacono, M.: Performance evaluation of nosql big-data applications using multi-formalism models. Future Generation Computer Systems 37, 345–353 (2014)

    Article  Google Scholar 

  12. Becker, S., Koziolek, H., Reussner, R.: The palladio component model for model-driven performance prediction. The Journal of Systems and Software 82(1), 3–22 (2009)

    Article  Google Scholar 

  13. Brosig, F., Meier, P., Becker, S., Koziolek, A., Koziolek, H., Kounev, S.: Quantitative evaluation of model-driven performance analysis and simulation of component-based architectures. IEEE Transactions on Software Engineering 41(2), 157–175 (2015)

    Article  Google Scholar 

  14. Brunnert, A., Vögele, C., Danciu, A., Pfaff, M., Mayer, M., Krcmar, H.: Performance management work. Business & Information Systems Engineering 6(3), 177–179 (2014)

    Article  Google Scholar 

  15. Casado, R., Younas, M.: Emerging trends and technologies in big data processing. Concurrency and Computation: Practice and Experience 27(8), 2078–2091 (2015)

    Article  Google Scholar 

  16. Castiglione, A., Gribaudo, M., Iacono, M., Palmieri, F.: Modeling performances of concurrent big data applications. Practice and Experience, Software (2014)

    Google Scholar 

  17. Chen, C.L.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Information Sciences 275, 314–347 (2014)

    Article  Google Scholar 

  18. Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  19. Faulstich, S., Hahn, B., Tavner, P.J.: Wind turbine downtime and its importance for offshore deployment. Wind Energy 14(3), 327–337 (2011)

    Article  Google Scholar 

  20. Faulstich, S., Lyding, P., Tavner, P.: Effects of wind speed on wind turbine availability (2011)

    Google Scholar 

  21. Herbst, N.R., Huber, N., Kounev, S., Amrehn, E.: Self-adaptive workload classification and forecasting for proactive resource provisioning. Concurrency and Computation: Practice and Experience 26(12), 2053–2078 (2014)

    Article  Google Scholar 

  22. von Kistowski, J., Herbst, N.R., Kounev, S.: LIMBO: A tool for modeling variable load intensities. In: Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering, pp. 225–226. ACM, New York (2014)

    Google Scholar 

  23. Kroß, J., Brunnert, A., Prehofer, C., Runkler, T.A., Krcmar, H.: Model-based performance evaluation of large-scale smart metering architectures. In: Proceedings of the 4th International Workshop on Large-Scale Testing, pp. 9–12. ACM, New York (2015)

    Google Scholar 

  24. Liu, X., Iftikhar, N., Xie, X.: Survey of real-time processing systems for big data. In: Proceedings of the 18th International Database Engineering & Applications Symposium, pp. 356–361. ACM, New York (2014)

    Google Scholar 

  25. Martnez-Prieto, M.A., Cuesta, C.E., Arias, M., Fernnde, J.D.: The solid architecture for real-time management of big semantic data. Future Generation Computer Systems 47, 62–79 (2015), special Section: Advanced Architectures for the Future Generation of Software-Intensive Systems

    Google Scholar 

  26. Marz, N., Warren, J.: Big data: principles and best practices of scalable real-time data systems. Manning Publications Co. (2015)

    Google Scholar 

  27. Nabi, Z., Wagle, R., Bouillet, E.: The best of two worlds: integrating ibm infosphere streams with apache yarn. In: Proceedings of the 2014 IEEE International Conference on Big Data, pp. 47–51. IEEE (2014)

    Google Scholar 

  28. Rychlý, M., Škoda, P., Smrž, P.: Heterogeneity-aware scheduler for stream processing frameworks. International Journal of Big Data Intelligence 2(2), 70–80 (2015)

    Article  Google Scholar 

  29. Schäfer, A.M., Zimmermann, H.-G.: Recurrent Neural Networks Are Universal Approximators. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 632–640. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  30. Schermann, M., Hemsen, H.: Buchmller, C., Bitter, T., Krcmar, H., Markl, V., Hoeren, T.: Big data - an interdisciplinary opportunity for information systems research. Business & Information. Systems Engineering 6(5), 261–266 (2014)

    Google Scholar 

  31. Sequeira, H., Carreira, P., Goldschmidt, T., Vorst, P.: Energy cloud: Real-time cloud-native energy management system to monitor and analyze energy consumption in multiple industrial sites. In: Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, pp. 529–534. IEEE (2014)

    Google Scholar 

  32. Spinner, S., Casale, G., Zhu, X., Kounev, S.: LibReDE: a library for resource demand estimation. In: Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering (ICPE 2014), pp. 227–228. ACM, New York (2014)

    Google Scholar 

  33. Taylor, J.W.: An evaluation of methods for very short-term load forecasting using minute-by-minute british data. International Journal of Forecasting 24(4), 645–658 (2008)

    Article  Google Scholar 

  34. Verma, A., Cherkasova, L., Campbell, R.H.: Aria: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp. 235–244. ACM, New York (2011)

    Google Scholar 

  35. Vianna, E., Comarela, G., Pontes, T., Almeida, J., Almeida, V., Wilkinson, K., Kuno, H., Dayal, U.: Analytical performance models for mapreduce workloads. International Journal of Parallel Programming 41(4), 495–525 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johannes Kroß .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kroß, J., Brunnert, A., Prehofer, C., Runkler, T.A., Krcmar, H. (2015). Stream Processing on Demand for Lambda Architectures. In: Beltrán, M., Knottenbelt, W., Bradley, J. (eds) Computer Performance Engineering. EPEW 2015. Lecture Notes in Computer Science(), vol 9272. Springer, Cham. https://doi.org/10.1007/978-3-319-23267-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23267-6_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23266-9

  • Online ISBN: 978-3-319-23267-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics