Advertisement

Online Workload Forecasting

  • Nikolas HerbstEmail author
  • Ayman Amin
  • Artur Andrzejak
  • Lars Grunske
  • Samuel Kounev
  • Ole J. Mengshoel
  • Priya Sundararajan
Chapter

Abstract

This chapter gives a summary of the state-of-the-art approaches from different research fields that can be applied to continuously forecast future developments of time series data streams. More specifically, the input time series data contains continuously monitored metrics that quantify the amount of incoming workload units to a self-aware system. It is the goal of this chapter to identify and present approaches for online workload forecasting that are required for a self-aware system to act proactively—in terms of problem prevention and optimization—inferred from likely changes in their usage. The research fields covered are machine learning and time series analysis. We describe explicit limitations and advantages for each forecasting method.

Keywords

Bayesian Network ARIMA Model Load Forecast Exponential Smoothing Nonlinear Time Series 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    S. Akioka and Y. Muraoka. Extended forecast of CPU and network load on computational Grid. In IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004, pages 765–772, April 2004.Google Scholar
  2. 2.
    Ayman Amin, Alan Colman, and Lars Grunske. An Approach to Forecasting QoS Attributes of Web Services Based on ARIMA and GARCH Models. In proceedings of the 19th International Conference on Web Services, pages 74–81. IEEE, 2012.Google Scholar
  3. 3.
    Ayman Amin, Lars Grunske, and Alan Colman. An automated approach to forecasting qos attributes based on linear and non-linear time series modeling. In Michael Goedicke, Tim Menzies, and Motoshi Saeki, editors, IEEE/ACM International Conference on Automated Software Engineering, ASE’12, Essen, Germany, September 3-7, 2012, pages 130–139. ACM, 2012.Google Scholar
  4. 4.
    Ayman Amin, Lars Grunske, and Alan Colman. An approach to software reliability prediction based on time series modeling. Journal of Systems and Software, 86(7):1923–1932, 2013.CrossRefGoogle Scholar
  5. 5.
    Mauro Andreolini and Sara Casolari. Load Prediction Models in Web-based Systems. In Proceedings of the 1st International Conference on Performance Evaluation Methodolgies and Tools, valuetools ’06, New York, NY, USA, 2006. ACM.Google Scholar
  6. 6.
    A. Andrzejak and J.B. Gomes. Parallel Concept Drift Detection with Online Map-Reduce. In 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW), pages 402–407, December 2012.Google Scholar
  7. 7.
    Artur Andrzejak and Luis Silva. Using Machine Learning for Non-Intrusive Modeling and Prediction of Software Aging. In IEEE/IFIP Network Operations & Management Symposium (NOMS 2008), Salvador de Bahia, Brazil, April 2008.Google Scholar
  8. 8.
    T. Bollerslev. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3):307–327, 1986.MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Antoine Bordes, Seyda Ertekin, Jason Weston, and Léon Bottou. Fast kernel classifiers with online and active learning. J. Mach. Learn. Res., 6:1579–1619, December 2005.MathSciNetzbMATHGoogle Scholar
  10. 10.
    George E. P. Box and Gwilym M. Jenkins. Time Series Analysis: Forecasting and Control. HoldenDay, San Francisco, 1976.Google Scholar
  11. 11.
    Maria Carla Calzarossa, Luisa Massari, and Daniele Tessera. Workload characterization: A survey revisited. ACM Comput. Surv., 48(3):48:1–48:43, February 2016.Google Scholar
  12. 12.
    Bice Cavallo, Massimiliano Di Penta, and Gerardo Canfora. An empirical comparison of methods to support QoS-aware service selection. In proceedings of the 2nd International Workshop on Principles of Engineering Service-Oriented Systems, pages 64–70. ACM, 2010.Google Scholar
  13. 13.
    E. Cho, S. A. Myers, and J. Leskovec. Friendship and mobility: user movement in location-based social networks. In Proc. of KDD-11, pages 1082–1090, 2011.Google Scholar
  14. 14.
    Peyton Cook and Lyle D Broemeling. Analyzing threshold autoregressions with a Bayesian approach. Advances in Econometrics, 11:89–108, 1996.Google Scholar
  15. 15.
    Alysha M. De Livera, Rob J. Hyndman, and Ralph D. Snyder. Forecasting time series with complex seasonal patterns using exponential smoothing. Journal of the American Statistical Association, 106(496):1513–1527, 2011.MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal Of The Royal Statistical Society, Series B, 39(1):1–38, 1977.MathSciNetzbMATHGoogle Scholar
  17. 17.
    John E Dennis, Jr and Jorge J Moré. Quasi-newton methods, motivation and theory. SIAM review, 19(1):46–89, 1977.MathSciNetCrossRefGoogle Scholar
  18. 18.
    Sheng Di, Derrick Kondo, and Walfredo Cirne. Host load prediction in a google compute cloud with a bayesian model. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12, pages 21:1–21:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.Google Scholar
  19. 19.
    Sheng Di, Derrick Kondo, and Walfredo Cirne. Google hostload prediction based on Bayesian model with optimized feature combination. J. Parallel Distrib. Comput., 74(1):1820–1832, 2014.CrossRefGoogle Scholar
  20. 20.
    P.A. Dinda. Online prediction of the running time of tasks. In 10th IEEE International Symposium on High Performance Distributed Computing, 2001. Proceedings, pages 383–394, 2001.Google Scholar
  21. 21.
    P.A. Dinda. Design, implementation, and performance of an extensible toolkit for resource prediction in distributed systems. IEEE Transactions on Parallel and Distributed Systems, 17(2):160–173, February 2006.MathSciNetCrossRefGoogle Scholar
  22. 22.
    Peter A. Dinda. A Prediction-Based Real-Time Scheduling Advisor. In 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 15-19 April 2002, Fort Lauderdale, FL, USA, CD-ROM/Abstracts Proceedings, 2002.Google Scholar
  23. 23.
    Qia Ding. Long-term load forecast using decision tree method. In Power Systems Conference and Exposition, 2006. PSCE ’06. 2006 IEEE PES, pages 1541–1543, Oct 2006.Google Scholar
  24. 24.
    R.F. Engle. Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation. Econometrica, pages 987–1007, 1982.Google Scholar
  25. 25.
    J Friedman, T Hastie, and R Tibshirani. The elements of statistical learning. 2001. 00571.Google Scholar
  26. 26.
    Jean Dickinson Gibbons and Subhabrata Chakraborti. Nonparametric statistical inference. CRC, 2003.Google Scholar
  27. 27.
    Daniel Gmach, Jerry Rolia, Ludmila Cherkasova, and Alfons Kemper. Workload Analysis and Demand Prediction of Enterprise Data Center Applications. In Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization, IISWC ’07, pages 171–180, Washington, DC, USA, 2007. IEEE Computer Society.Google Scholar
  28. 28.
    Manish Godse, Umesh Bellur, and Rajendra Sonar. Automating QoS Based Service Selection. In proceedings of the IEEE International Conference on Web Services, pages 534–541. IEEE, 2010.Google Scholar
  29. 29.
    Zhenhuan Gong and Xiaohui Gu. PAC: Pattern-driven Application Consolidation for Efficient Cloud Computing. In 2010 IEEE International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), pages 24–33, August 2010.Google Scholar
  30. 30.
    Zhenhuan Gong, Xiaohui Gu, and J. Wilkes. PRESS: PRedictive Elastic ReSource Scaling for cloud systems. In 2010 International Conference on Network and Service Management (CNSM), pages 9–16, October 2010.Google Scholar
  31. 31.
    Bruce Hansen. Testing for linearity. Journal of Economic Surveys, 13(5):551–576, 1999.CrossRefGoogle Scholar
  32. 32.
    Nikolas Roman Herbst, Nikolaus Huber, Samuel Kounev, and Erich Amrehn. Self-Adaptive Workload Classification and Forecasting for Proactive Resource Provisioning. Concurrency and Computation - Practice and Experience, John Wiley and Sons, Ltd., 26(12):2053–2078, 2014.Google Scholar
  33. 33.
    Magnus R. Hestenes and Eduard Stiefel. Methods of Conjugate Gradients for Solving Linear Systems. Journal of Research of the National Bureau of Standards, 49(6):409–436, December 1952.MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    L. Hu, X. L. Che, and S. Q. Zheng. Online system for grid resource monitoring and machine learning-based prediction. IEEE Transactions on Parallel and Distributed Systems, 23(1):134–145, Jan 2012.CrossRefGoogle Scholar
  35. 35.
    Laurent Hyafil and Ronald L. Rivest. Constructing optimal binary decision trees is np-complete. Information Processing Letters, 5(1):15–17, 1976.Google Scholar
  36. 36.
    Rob Hyndman, Anne Khler, Keith Ord, and Ralph Snyder, editors. Forecasting with Exponential Smoothing: The State Space Approach. Springer Series in Statistics. Springer-Verlag Berlin Heidelberg, Berlin, Heidelberg, 2008.Google Scholar
  37. 37.
    Rob J Hyndman, Maxwell Leslie King, Ivet Pitrun, and Baki Billah. Local linear forecasts using cubic smoothing splines. Monash Econometrics and Business Statistics Working Papers 10/02, Monash University, Department of Econometrics and Business Statistics, 2002.Google Scholar
  38. 38.
    Charles D. Kirkpatrick II and Julie Dahlquist. Technical Analysis: The Complete Resource for Financial Market Technicians. FT Press, November 2010.Google Scholar
  39. 39.
    Eamonn J. Keogh and Jessica Lin. Symbolic Aggregate approXimation (SAX) Homepage.Google Scholar
  40. 40.
    A. Khan, X. Yan, Shu Tao, and N. Anerousis. Workload characterization and prediction in the cloud: A multiple time series approach. In 2012 IEEE Network Operations and Management Symposium (NOMS), pages 1287–1294, April 2012.Google Scholar
  41. 41.
    Daphne Koller and Nir Friedman. Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning. The MIT Press, 2009.Google Scholar
  42. 42.
    Ali Lahouar and Jaleleddine Ben Hadj Slama. Random forests model for one day ahead load forecasting. In Renewable Energy Congress (IREC), 2015 6th International, pages 1–6, March 2015.Google Scholar
  43. 43.
    Pavel Laskov, Christian Gehl, Stefan Krüger, and Klaus-Robert Müller. Incremental support vector learning: Analysis, implementation and applications. J. Mach. Learn. Res., 7:1909–1936, December 2006.MathSciNetzbMATHGoogle Scholar
  44. 44.
    Jure Leskovec, Lars Backstrom, and Jon Kleinberg. Meme-tracking and the dynamics of the news cycle. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’09, pages 497–506, New York, NY, USA, 2009. ACM.Google Scholar
  45. 45.
    WK Li and K Lam. Modelling asymmetry in stock returns by a threshold autoregressive conditional heteroscedastic model. The Statistician, pages 333–341, 1995.Google Scholar
  46. 46.
    KS Lim. On the stability of a threshold ar(1) without intercepts. Journal of Time Series Analysis, 13(2):119–132, 1992.MathSciNetCrossRefGoogle Scholar
  47. 47.
    O. J. Mengshoel, R. Desai, A. Chen, and B. Tran. Will we connect again? machine learning for link prediction in mobile social networks. In Proc. of Eleventh Workshop on Mining and Learning with Graphs, Chicago, IL, August 2013.Google Scholar
  48. 48.
    Ole J Mengshoel, Avneesh Saluja, and Priya Sundararajan. Age-layered expectation maximization for parameter learning in bayesian networks. In Proc. of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012.Google Scholar
  49. 49.
    J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, CA, 1988.zbMATHGoogle Scholar
  50. 50.
    Hema Prem and N. R. Srinivasa Raghavan. A support vector machine based approach for forecasting of network weather services. Journal of Grid Computing, 4(1):89–114, 2006.CrossRefGoogle Scholar
  51. 51.
    Dorian Pyle, Text Design, Morgan Kaufmann Publishers, Sixth Floor, and San Francisco. Data Preparation for Data Mining. 1999. 01347.Google Scholar
  52. 52.
    Jian qiang Li, Cheng lin Niu, Ji-Zhen Liu, and Jun jie Gu. The application of data mining in electric short-term load forecasting. In Fuzzy Systems and Knowledge Discovery, 2008. FSKD ’08. Fifth International Conference on, volume 2, pages 519–522, Oct 2008.Google Scholar
  53. 53.
    J. R. Quinlan. Induction of decision trees. Mach. Learn., 1(1):81–106, March 1986.Google Scholar
  54. 54.
    YC Raymond. An application of the arima model to real-estate prices in hong kong. Journal of Property Finance, 8(2):152–163, 1997.CrossRefGoogle Scholar
  55. 55.
    E. Reed, A. Ishihara, and O. J. Mengshoel. Adaptive control of apache web server. In Proc. of Feedback Computing ’13, San Jose, CA, June 2013.Google Scholar
  56. 56.
    Erik B Reed and Ole J Mengshoel. Scaling bayesian network parameter learning with expectation maximization using mapreduce. Proc. of Big Learning: Algorithms, Systems and Tools, 2012.Google Scholar
  57. 57.
    Jerry Rolia, Xiaoyun Zhu, Martin Arlitt, and Artur Andrzejak. Statistical Service Assurances for Applications in Utility Grid Environments. Performance Evaluation Journal, 58(2+3):319–339, November 2004.Google Scholar
  58. 58.
    D. Ruta and B. Gabrys. Neural Network Ensembles for Time Series Prediction. In International Joint Conference on Neural Networks, 2007. IJCNN 2007, pages 1204–1209, August 2007.Google Scholar
  59. 59.
    S. Seneviratne and S. Witharana. A survey on methodologies for runtime prediction on grid environments. In 2014 7th International Conference on Information and Automation for Sustainability (ICIAfS), pages 1–6, December 2014.Google Scholar
  60. 60.
    P. K. Sundararajan, E. Feller, J. Forgeat, and O. J. Mengshoel. A constrained genetic algorithm for rebalancing of services in cloud data centers. In 8th IEEE International Conference on Cloud Computing, CLOUD, pages 653–660, 2015.Google Scholar
  61. 61.
    H. Tong and K.S. Lim. Threshold autoregression, limit cycles and cyclical data. Journal of the Royal Statistical Society. Series B (Methodological), pages 245–292, 1980.Google Scholar
  62. 62.
    Howell Tong. Threshold models in non-linear time series analysis, volume 21. Springer, 1983.Google Scholar
  63. 63.
    Howell Tong. Non-linear time series: a dynamical system approach. Oxford University Press, 1990.Google Scholar
  64. 64.
    Bhuvan Urgaonkar, Giovanni Pacifici, Prashant Shenoy, Mike Spreitzer, and Asser Tantawi. An Analytical Model for Multi-tier Internet Services and Its Applications. In Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’05, pages 291–302, New York, NY, USA, 2005. ACM.Google Scholar
  65. 65.
    Paul E. Utgoff. Incremental induction of decision trees. Mach. Learn., 4(2):161–186, November 1989.CrossRefGoogle Scholar
  66. 66.
    T. Vercauteren, P. Aggarwal, Xiaodong Wang, and Ta-Hsin Li. Hierarchical Forecasting of Web Server Workload Using Sequential Monte Carlo Training. IEEE Transactions on Signal Processing, 55(4):1286–1297, April 2007.MathSciNetCrossRefGoogle Scholar
  67. 67.
    Xiaozhe Wang, Kate Smith-Miles, and Rob Hyndman. Rule induction for forecasting method selection: Meta-learning the characteristics of univariate time series. Neurocomput., 72(10-12):2581–2594, June 2009.CrossRefGoogle Scholar
  68. 68.
    Ian H. Witten and Eibe Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, 2nd edition, 2005. 23937.Google Scholar
  69. 69.
    Rich Wolski, Neil T. Spring, and Jim Hayes. The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing. Future Gener. Comput. Syst., 15(5-6):757–768, October 1999.CrossRefGoogle Scholar
  70. 70.
    Yongwei Wu, Yulai Yuan, Guangwen Yang, and Weimin Zheng. Load Prediction Using Hybrid Model for Computational Grid. In Proceedings of the 8th IEEE/ACM International Conference on Grid Computing, GRID ’07, pages 235–242, Washington, DC, USA, 2007. IEEE Computer Society.Google Scholar
  71. 71.
    J. Xue, F. Yan, R. Birke, L. Y. Chen, T. Scherer, and E. Smirni. Practise: Robust prediction of data center time series. In Network and Service Management (CNSM), 2015 11th International Conference on, pages 126–134, Nov 2015.Google Scholar
  72. 72.
    Hui Zhang, Guofei Jiang, K. Yoshihira, and Haifeng Chen. Proactive Workload Management in Hybrid Cloud Computing. IEEE Transactions on Network and Service Management, 11(1):90–100, March 2014.CrossRefGoogle Scholar
  73. 73.
    Yuanyuan Zhang, Wei Sun, and Yasushi Inoguchi. Predicting Running Time of Grid Tasks Based on CPU Load Predictions. In Proceedings of the 7th IEEE/ACM International Conference on Grid Computing, GRID ’06, pages 286–292, Washington, DC, USA, 2006. IEEE Computer Society.Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Nikolas Herbst
    • 1
    Email author
  • Ayman Amin
    • 2
  • Artur Andrzejak
    • 3
  • Lars Grunske
    • 4
  • Samuel Kounev
    • 1
  • Ole J. Mengshoel
    • 5
  • Priya Sundararajan
    • 5
  1. 1.University of WürzburgWürzburgGermany
  2. 2.Swinburne University of TechnologyMelbourneAustralia
  3. 3.Heidelberg UniversityHeidelbergGermany
  4. 4.Humboldt-Universität zu BerlinBerlinGermany
  5. 5.Carnegie Mellon UniversityPittsburghUSA

Personalised recommendations