European Workshop on Performance Engineering

EPEW 2015: Computer Performance Engineering pp 243-257 | Cite as

Stream Processing on Demand for Lambda Architectures

  • Johannes Kroß
  • Andreas Brunnert
  • Christian Prehofer
  • Thomas A. Runkler
  • Helmut Krcmar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9272)

Abstract

Growing amounts of data and the demand to process them within time constraints have led to the development of big data systems. A generic principle to design such systems that allows for low latency results is called the lambda architecture. It defines that data is analyzed twice by combining batch and stream processing techniques in order to provide a real time view. This redundant processing of data makes this architecture very expensive. In cases where process results are not continuously required to be low latency or time constraints lie within several minutes, a clear decision whether both processing layers are inevitable is not possible yet. Therefore, we propose stream processing on demand within the lambda architecture in order to efficiently use resources and reduce hardware investments. We use performance models as an analytical decision-making solution to predict response times of batch processes and to decide when to additionally deploy stream processes. By the example of a smart energy use case we implement and evaluate the accuracy of our proposed solution.

Keywords

Lambda architecture Big data Performance Model Evaluation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alrokayan, M., Vahid Dastjerdi, A., Buyya, R.: Sla-aware provisioning and scheduling of cloud resources for big data analytics. In: Proceedings of the 2014 IEEE International Conference on Cloud Computing in Emerging Markets, pp. 1–8. IEEE (2014)Google Scholar
  2. 2.
    Amazon Web Services: Amazon Kinesis (2015). http://aws.amazon.com/kinesis/ (accessed: April 28, 2015)
  3. 3.
    Aniello, L., Baldoni, R., Querzoni, L.: Adaptive online scheduling in storm. In: Proceedings of the 7th ACM International Conference on Distributed Event-based Systems, pp. 207–218. ACM, New York (2013)Google Scholar
  4. 4.
    Apache Cassandra: The Apache Cassandra project (2015). http://cassandra.apache.org/ (accessed April 28, 2015)
  5. 5.
    Apache Hadoop: Welcome to Apache Hadoop! (2015). http://hadoop.apache.org/ (accessed April 28, 2015)
  6. 6.
    Kafka, A.: A high-throughput distributed messaging system (2015). http://kafka.apache.org/ (accessed April 28, 2015)
  7. 7.
    Apache Pig: Welcomt to Apache Pig! (2014). https://pig.apache.org/ (accessed April 28, 2015)
  8. 8.
    Apache Samza: Samza (2015). http://samza.apache.org/ (accessed April 28, 2015)
  9. 9.
    Apache Spark: Lightning-fast cluster computing (2015). https://spark.apache.org/ (accessed April 28, 2015)
  10. 10.
    Apache Storm: Storm, distributed and fault-tolerant realtime computation (2015). http://storm.apache.org/ (accessed April 28, 2015)
  11. 11.
    Barbierato, E., Gribaudo, M., Iacono, M.: Performance evaluation of nosql big-data applications using multi-formalism models. Future Generation Computer Systems 37, 345–353 (2014)CrossRefGoogle Scholar
  12. 12.
    Becker, S., Koziolek, H., Reussner, R.: The palladio component model for model-driven performance prediction. The Journal of Systems and Software 82(1), 3–22 (2009)CrossRefGoogle Scholar
  13. 13.
    Brosig, F., Meier, P., Becker, S., Koziolek, A., Koziolek, H., Kounev, S.: Quantitative evaluation of model-driven performance analysis and simulation of component-based architectures. IEEE Transactions on Software Engineering 41(2), 157–175 (2015)CrossRefGoogle Scholar
  14. 14.
    Brunnert, A., Vögele, C., Danciu, A., Pfaff, M., Mayer, M., Krcmar, H.: Performance management work. Business & Information Systems Engineering 6(3), 177–179 (2014)CrossRefGoogle Scholar
  15. 15.
    Casado, R., Younas, M.: Emerging trends and technologies in big data processing. Concurrency and Computation: Practice and Experience 27(8), 2078–2091 (2015)CrossRefGoogle Scholar
  16. 16.
    Castiglione, A., Gribaudo, M., Iacono, M., Palmieri, F.: Modeling performances of concurrent big data applications. Practice and Experience, Software (2014)Google Scholar
  17. 17.
    Chen, C.L.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Information Sciences 275, 314–347 (2014)CrossRefGoogle Scholar
  18. 18.
    Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  19. 19.
    Faulstich, S., Hahn, B., Tavner, P.J.: Wind turbine downtime and its importance for offshore deployment. Wind Energy 14(3), 327–337 (2011)CrossRefGoogle Scholar
  20. 20.
    Faulstich, S., Lyding, P., Tavner, P.: Effects of wind speed on wind turbine availability (2011)Google Scholar
  21. 21.
    Herbst, N.R., Huber, N., Kounev, S., Amrehn, E.: Self-adaptive workload classification and forecasting for proactive resource provisioning. Concurrency and Computation: Practice and Experience 26(12), 2053–2078 (2014)CrossRefGoogle Scholar
  22. 22.
    von Kistowski, J., Herbst, N.R., Kounev, S.: LIMBO: A tool for modeling variable load intensities. In: Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering, pp. 225–226. ACM, New York (2014)Google Scholar
  23. 23.
    Kroß, J., Brunnert, A., Prehofer, C., Runkler, T.A., Krcmar, H.: Model-based performance evaluation of large-scale smart metering architectures. In: Proceedings of the 4th International Workshop on Large-Scale Testing, pp. 9–12. ACM, New York (2015)Google Scholar
  24. 24.
    Liu, X., Iftikhar, N., Xie, X.: Survey of real-time processing systems for big data. In: Proceedings of the 18th International Database Engineering & Applications Symposium, pp. 356–361. ACM, New York (2014)Google Scholar
  25. 25.
    Martnez-Prieto, M.A., Cuesta, C.E., Arias, M., Fernnde, J.D.: The solid architecture for real-time management of big semantic data. Future Generation Computer Systems 47, 62–79 (2015), special Section: Advanced Architectures for the Future Generation of Software-Intensive SystemsGoogle Scholar
  26. 26.
    Marz, N., Warren, J.: Big data: principles and best practices of scalable real-time data systems. Manning Publications Co. (2015)Google Scholar
  27. 27.
    Nabi, Z., Wagle, R., Bouillet, E.: The best of two worlds: integrating ibm infosphere streams with apache yarn. In: Proceedings of the 2014 IEEE International Conference on Big Data, pp. 47–51. IEEE (2014)Google Scholar
  28. 28.
    Rychlý, M., Škoda, P., Smrž, P.: Heterogeneity-aware scheduler for stream processing frameworks. International Journal of Big Data Intelligence 2(2), 70–80 (2015)CrossRefGoogle Scholar
  29. 29.
    Schäfer, A.M., Zimmermann, H.-G.: Recurrent Neural Networks Are Universal Approximators. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 632–640. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  30. 30.
    Schermann, M., Hemsen, H.: Buchmller, C., Bitter, T., Krcmar, H., Markl, V., Hoeren, T.: Big data - an interdisciplinary opportunity for information systems research. Business & Information. Systems Engineering 6(5), 261–266 (2014)Google Scholar
  31. 31.
    Sequeira, H., Carreira, P., Goldschmidt, T., Vorst, P.: Energy cloud: Real-time cloud-native energy management system to monitor and analyze energy consumption in multiple industrial sites. In: Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, pp. 529–534. IEEE (2014)Google Scholar
  32. 32.
    Spinner, S., Casale, G., Zhu, X., Kounev, S.: LibReDE: a library for resource demand estimation. In: Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering (ICPE 2014), pp. 227–228. ACM, New York (2014)Google Scholar
  33. 33.
    Taylor, J.W.: An evaluation of methods for very short-term load forecasting using minute-by-minute british data. International Journal of Forecasting 24(4), 645–658 (2008)CrossRefGoogle Scholar
  34. 34.
    Verma, A., Cherkasova, L., Campbell, R.H.: Aria: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp. 235–244. ACM, New York (2011)Google Scholar
  35. 35.
    Vianna, E., Comarela, G., Pontes, T., Almeida, J., Almeida, V., Wilkinson, K., Kuno, H., Dayal, U.: Analytical performance models for mapreduce workloads. International Journal of Parallel Programming 41(4), 495–525 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Johannes Kroß
    • 1
  • Andreas Brunnert
    • 1
  • Christian Prehofer
    • 1
  • Thomas A. Runkler
    • 2
  • Helmut Krcmar
    • 3
  1. 1.fortiss GmbHMunichGermany
  2. 2.Siemens AG, Corporate TechnologyMunichGermany
  3. 3.Technische Universität MünchenGarchingGermany

Personalised recommendations