Abstract
Today, enterprise applications impose more and more resource requirements to support an ascending number of clients and to deliver them an acceptable Quality of Service (QoS). To ensure such requirements are met, it is essential to apply appropriate resource and application monitoring techniques. Such techniques collect data to enable predictions and actions which can offer better system performance. Typically, system administrators need to consider different data sources, so making the relationship among them by themselves. To address these gaps and considering the context of general networked-based systems, we propose a survey that combines a discussion about system monitoring, data prediction, and resource management procedures in a unified view. The article discusses resource and application monitoring, resource management, and data forecast at both performance and architectural perspectives of enterprise systems. Our idea is to describe consolidated subjects such as monitoring metrics and resource scheduling, together with novel trends, including cloud elasticity and artificial intelligence-based load prediction algorithms. This survey links the aforesaid three pillars, emphasizing relationships among them and also pointing out opportunities and research challenges in the area.
Similar content being viewed by others
References
Aaziz, O., Cook, J., Sharifi, H.: Push me pull you: Integrating opposing data transport modes for efficient hpc application monitoring. In: 2015 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp. 674-681 (2015)
Aceto, G., Botta, A., De Donato, W., Pescapè, A.: Cloud monitoring: a survey. Comput. Netw. 57(9), 2093–2115 (2013)
Agarwala, S., Poellabauer, C., Kong, J., Schwan, K., Wolf, M.: System-level resource monitoring in high-performance computing environments. J. Grid. Comput. 1(3), 273–289 (2003)
Akbar, M.F., Munir, E.U., Rafique, M.M., Malik, Z., Khan, S.U., Yang, L.T.: List-based task scheduling for cloud computing. In: 2016 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 652–659. https://doi.org/10.1109/iThings-GreenCom-CPSCom-SmartData.2016.143 (2016)
Al-Ayyoub, M., Daraghmeh, M., Jararweh, Y., Althebyan, Q.: Towards improving resource management in cloud systems using a multi-agent framework. Int. J. Cloud Comput. 5(1-2), 112–133 (2016)
Al-Dhuraibi, Y., Paraiso, F., Djarallah, N., Merle, P.: Elasticity in cloud computing: state of the art and research challenges. IEEE Trans. Serv. Comput. PP(99), 1–1 (2017). https://doi.org/10.1109/TSC.2017.2711009
Al Wadia, M., Tahir Ismail, M.: Selecting wavelet transforms model in forecasting financial time series data based on arima model. Appl. Math. Sci. 5(7), 315–326 (2011)
Alhamazani, K., Ranjan, R., Mitra, K., Rabhi, F., Jayaraman, P.P., Khan, S.U., Guabtni, A., Bhatnagar, V.: An overview of the commercial cloud monitoring tools: research dimensions, design issues, and state-of-the-art. Computing 97(4), 357–377 (2015)
Amiri, M., Mohammad-Khanli, L.: Survey on prediction models of applications for resources provisioning in cloud. Journal of Network and Computer Applications (2017)
Balcas, J., Kcira, D., Mughal, A., Newman, H., Spiropulu, M., Vlimant, J.: Monalisa, an agent-based monitoring and control system for the lhc experiments. In: Journal of Physics: Conference Series, IOP Publishing, vol. 898, p. 092055 (2017)
Borchert, K., Hirth, M., Zinner, T., Mocanu, D.C.: Correlating qoe and technical parameters of an sap system in an enterprise environment. In: 2016 28th International Teletraffic Congress (ITC 28), IEEE, vol. 3, pp. 34–36 (2016)
Bouabdallah, R., Lajmi, S., Ghedira, K.: Use of reactive and proactive elasticity to adjust resources provisioning in the cloud provider. In: 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) (2016)
Box, G.E., Jenkins, G.M.: Time series analysis forecasting and control. Tech. rep., Wisconsin Univ Madison Dept of Statistics (1970)
Box, G.E., Jenkins, G.M., Reinsel, G.C., Ljung, G.M.: Time series analysis: forecasting and control. Wiley, New York (2015)
Carvallo, P., Cavalli, A.R., Mallouli, W., Rios, E.: Multi-cloud applications security monitoring. In: International Conference on Green, Pervasive, and Cloud Computing, Springer, pp. 748–758 (2017)
Casavant, T.L., Kuhl, J.G.: A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Trans. Softw. Eng. 14(2), 141–154 (1988). https://doi.org/10.1109/32.4634
Chen, J., Wang, C., Zhou, B.B., Sun, L., Lee, Y.C., Zomaya, A.Y.: Tradeoffs between profit and customer satisfaction for service provisioning in the cloud. In: Proceedings of the 20th International Symposium on High Performance Distributed Computing, pp 229–238. ACM, New York (2011). http://doi.acm.org/10.1145/1996130.1996161
Choi, T.M., Yu, Y., Au, K.F.: A hybrid sarima wavelet transform method for sales forecasting. Decis. Support. Syst. 51(1), 130–140 (2011)
Duan, R., Prodan, R., Li, X.: Multi-objective game theoretic schedulingof bag-of-tasks workflows on hybrid clouds. IEEE Trans. Cloud Comput. 2(1), 29–42 (2014). https://doi.org/10.1109/TCC.2014.2303077
Farshchi, M., Schneider, J.G., Weber, I., Grundy J: Metric selection and anomaly detection for cloud operations using log and metric correlation analysis. Journal of Systems and Software (2017)
Fatema, K., Emeakaroha, V.C., Healy, P.D., Morrison, J.P., Lynn, T.: A survey of cloud monitoring tools: Taxonomy, capabilities and objectives. J. Parallel Distrib. Comput. 74(10), 2918–2933 (2014)
Fittkau, F., Hasselbring, W.: Elastic application-level monitoring for large software landscapes in the cloud. In: European conference on service-oriented and cloud computing, Springer, pp. 80–94 (2015)
Frachtenberg, E., Schwiegelshohn, U.: New challenges of parallel job scheduling. In: Proceedings of the 13th International Conference on Job Scheduling Strategies for Parallel Processing. http://dl.acm.org/citation.cfm?id=1791551.1791552, vol. JSSPP’07, pp 1–23. Springer-Verlag, Berlin (2008)
Galante, G., d Bona, L.C.E.: A Survey on Cloud Computing Elasticity. In: 2012 IEEE 5th International Conference on Utility and Cloud Computing, pp. 263–270. https://doi.org/10.1109/UCC.2012.30 (2012)
Galante, G., Erpen De Bona, L.C., Mury, A.R., Schulze, B., Rosa Righi, R.: An analysis of public clouds elasticity in the execution of scientific applications: a survey. J. Grid Comput. 14(2), 193–216 (2016). https://doi.org/10.1007/s10723-016-9361-3
Ghaderi, J.: Simple high-performance algorithms for scheduling jobs in the cloud. In: 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 345–352, https://doi.org/10.1109/ALLERTON.2015.7447025 (2015)
Guan, Q., Zhang, Z., Fu, S.: Proactive failure management by integrated unsupervised and semi-supervised learning for dependable cloud systems. In: 2011 6th International Conference on Availability, Reliability and Security, pp. 83–90. https://doi.org/10.1109/ARES.2011.20 (2011)
Holt, C.C.: Forecasting seasonals and trends by exponentially weighted moving averages. Int. J. Forecast. 20(1), 5–10 (2004). https://doi.org/10.1016/j.ijforecast.2003.09.015. http://www.sciencedirect.com/science/article/pii/S0169207003001134
Hsieh, T.J., Hsiao, H.F., Yeh, W.C.: Forecasting stock markets using wavelet transforms and recurrent neural networks: an integrated system based on artificial bee colony algorithm. Appl. Soft Comput. 11(2), 2510–2525 (2011)
Katsaros, G., Subirats, J., Fitó, J O, Guitart, J., Gilet, P., Espling, D.: A service framework for energy-aware monitoring and vm management in clouds. Futur. Gener. Comput. Syst. 29 (8), 2077–2091 (2013)
Khan, M., Khendek, F., Toeroe, M.: Monitoring service level workload and adapting highly available applications. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, ACM, pp. 522–529 (2016)
Khandelwal, I., Adhikari, R., Verma, G.: Time series forecasting using hybrid arima and ann models based on dwt decomposition. Proc. Comput. Sci. 48, 173–179 (2015)
Khashei, M., Bijari, M.: A novel hybridization of artificial neural networks and arima models for time series forecasting. Appl. Soft Comput. 11(2), 2664–2675 (2011). https://doi.org/10.1016/j.asoc.2010.10.015. http://www.sciencedirect.com/science/article/pii/S1568494610002759, the Impact of Soft Computing for the Progress of Artificial Intelligence
Krauter, K., Buyya, R., Maheswaran, M.: A taxonomy and survey of grid resource management systems for distributed computing. Software: Practice and Experience 32(2), 135–164 (2002). https://doi.org/10.1002/spe.432
Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: A survey of data-intensive scientific workflow management. J. Grid Comput. 13(4), 457–493 (2015)
Liu, J., Pacitti, E., Valduriez, P., De Oliveira, D., Mattoso, M.: Multi-objective scheduling of scientific workflows in multisite clouds. Futur. Gener. Comput. Syst. 63, 76–95 (2016)
Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: Scientific workflow scheduling with provenance data in a multisite cloud. In: Transactions on Large-Scale Data-and Knowledge-Centered Systems XXXIII, Springer, pp. 80–112 (2017)
Liu, J., Pacitti, E., Valduriez, P.: A survey of scheduling frameworks in big data systems. Int. J. Cloud Comput. 7, 1–27 (2018)
Ma, H., Wang, L., Tak, B.C., Wang, L., Tang, C.: Auto-tuning Performance of MPI Parallel Programs Using Resource Management in Container-Based Virtual Cloud. In: 2016 IEEE 9th International Conference on Cloud Computing (CLOUD), pp. 545–552. https://doi.org/10.1109/CLOUD.2016.0078 (2016)
Madni, S.H.H., Latiff, M.S.A., Coulibaly, Y., Abdulhamid, S.M.: Resource Scheduling for Infrastructure As a Service (IaaS) in Cloud Computing. J. Netw. Comput. Appl. 68(C), 173–200 (2016). https://doi.org/10.1016/j.jnca.2016.04.016
Mandal, A., Ruth, P., Baldin, I., Król, D, Juve, G., Mayani, R., Da Silva, R.F., Deelman, E., Meredith, J., Vetter, J., et al.: Toward an end-to-end framework for modeling, monitoring and anomaly detection for scientific workflows. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops. IEEE, pp. 1370–1379 (2016)
Manvi, S.S., Shyam, G.K.: Resource management for infrastructure as a service (iaas) in cloud computing: a survey. J. Netw. Comput. Appl. 41, 424–440 (2014)
Markham, I.S., Rakes, T.R.: The effect of sample size and variability of data on the comparative performance of artificial neural networks and regression. Comput. Oper. Res. 25(4), 251–263 (1998)
Mell, P.M., Grance, T.: SP 800-145. The NIST definition of cloud computing. Tech. Rep. Gaithersburg, United States (2011)
Milidiu, R.L., Machado, R.J., Renteria, R.P.: Time-series forecasting through wavelets transformation and a mixture of expert models. Neurocomputing 28(1), 145–156 (1999)
Morton, A.: Active and passive metrics and methods (with hybrid types in-between). RFC 7799 (Informational) (2016)
Netto, M.A.S., Calheiros, R.N., Rodrigues, E.R., Cunha, R.L.F., Buyya, R.: HPC cloud for scientific and business applications: taxonomy, vision, and research challenges. ACM Comput. Surv. 1 (1), 1–1 (2017)
Pahl, C.: Containerization and the PaaS Cloud. IEEE Cloud Comput. 2(3), 24–31 (2015). 10.1109/MCC.2015.51
Patel, D.K., Tripathy, D., Tripathy, C.: Survey of load balancing techniques for grid. J. Netw. Comput. Appl. 65(C), 103–119 (2016). https://doi.org/10.1016/j.jnca.2016.02.012
Pavlou, G.: On the evolution of management approaches, frameworks and protocols: a historical perspective. J. Netw. Syst. Manag. 15(4), 425–445 (2007). https://doi.org/10.1007/s10922-007-9082-9
Persico, V., Grimaldi, D., Pescapè, A, Salvi, A., Santini, S.: A fuzzy approach based on heterogeneous metrics for scaling out public clouds. IEEE Trans. Parallel Distrib. Syst. 28(8), 2117–2130 (2017). https://doi.org/10.1109/TPDS.2017.2651810
di Pietro, A., Huici, F., Costantini, D., Niccolini, S.: Decon: Decentralized coordination for large-scale flow monitoring.. In: Proceedings..., Proceedings of the IEEE Conference on Computer Communications (INFOCOM). https://doi.org/10.1109/INFCOMW.2010.5466642, pp 1–5. IEEE Computer Society, Washington (2010)
Poddar, R., Vishnoi, A., Mann, V.: HAVEN: Holistic load balancing and auto scaling in the cloud. In: 2015 7th International Conference on Communication Systems and Networks (COMSNETS), pp. 1–8. https://doi.org/10.1109/COMSNETS.2015.7098681 (2015)
d R Righi, R., Rodrigues, V.F., da Costa, C.A., Galante, G., de Bona, L.C.E., Ferreto, T.: AutoElastic: Automatic resource elasticity for high performance applications in the cloud. IEEE Trans. Cloud Comput. 4(1), 6–19 (2016). https://doi.org/10.1109/TCC.2015.2424876
Ranjan, R., Benatallah, B.: Programming cloud resource orchestration framework: operations and research challenges. arXiv:12042204 (2012)
Righi, R.D.R.: MigBSP: a new approach for processes rescheduling management on bulk synchronous parallel applications (2009)
Righi, R.D.R., Rodrigues, V.F., da Costa, C.A., Galante, G., de Bona, L.C.E., Ferreto, T.: Autoelastic: automatic resource elasticity for high performance applications in the cloud. IEEE Trans. Cloud Comput. 4(1), 6–19 (2016). https://doi.org/10.1109/TCC.2015.2424876
Rodrigues, V.F., Correa, E., da Costa, C.A., da Rosa Righi, R.: On exploring proactive cloud elasticity for internet of things demands. In: 2017 XLIII Latin American Computer Conference, CLEI 2017, Córdoba, Argentina, September 4-8, 2017, pp. 1–10. https://doi.org/10.1109/CLEI.2017.8226417 (2017)
Röhl, T, Eitzinger, J., Hager, G., Wellein, G.: Likwid monitoring stack: A flexible framework enabling job specific performance monitoring for the masses. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), IEEE, pp. 781-784 (2017)
da Rosa Righi, R., Pilla, L.L., Carissimi, A.S., Navaux, P.O.A., Heiss, H.U.: Applying processes rescheduling over irregular BSP application, pp 213–223. Springer, Berlin (2009). https://doi.org/10.1007/978-3-642-01970-8_22
da Rosa Righi, R., de Quadros Gomes, R., Rodrigues, V.F., da Costa, C.A., Alberti, A.M., Pilla, L.L., Navaux, P.O.A.: Migpf: Towards on self-organizing process rescheduling of bulk-synchronous parallel applications. Futur. Gener. Comput. Syst. 78, 272–286 (2018). https://doi.org/10.1016/j.future.2016.05.004. http://www.sciencedirect.com/science/article/pii/S0167739X16301145
Sahi, S.K., Dhaka, V.: A survey paper on workload prediction requirements of cloud computing. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), IEEE, pp. 254–258 (2016)
Sawamura, R., Boeres, C., Rebello, V.E.F.: MEC: The Memory Elasticity Controller. In: 2016 IEEE 23rd international conference on high performance computing (HiPC), pp. 111–120. https://doi.org/10.1109/HiPC.2016.022 (2016)
Sekar, V., Reiter, M.K., Willinger, W., Zhang, H., Kompella, R.R., Andersen, D.G.: Csamp: A system for network-wide flow monitoring. In: Proceedings..., USENIX Symposium on Networked Systems Design and Implementation (NSDI), pp 233–246. USENIX Association, Berkeley (2008)
Seneviratne, S., Witharana, S.: A survey on methodologies for runtime prediction on grid environments. In: 2014 7th International Conference on Information and Automation for Sustainability (ICIAfS), IEEE, pp. 1–6 (2014)
Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., Zekauskas, M.: A one-way active measurement protocol (owamp). RFC 4656 (Proposed Standard) (2006)
Shen, H.: RIAL: Resource intensity aware load balancing in clouds. IEEE Trans. Cloud Comput. PP(99), 1–1 (2017). https://doi.org/10.1109/TCC.2017.2737628
Singh, S., Chana, I.: A survey on resource scheduling in cloud computing: Issues and challenges. J. Grid Comput. 14(2), 217–264 (2016)
Sun, P., Wu, D., Wei, K., Guo, X.: Bans-based cloud resources monitoring system. In: 2015 8th International Symposium on Computational Intelligence and Design (ISCID), IEEE, vol. 2, pp. 445-448 (2015)
Tonouchi, T.: A light-weight application monitoring and statistical debugging for a black-box application. In: 2015 17th Asia-Pacific Network Operations and Management Symposium (APNOMS), IEEE, pp. 523–526 (2015)
Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002). https://doi.org/10.1109/71.993206
Waraich, S.S.: Classification of Dynamic Load Balancing Strategies in a Network of Workstations. In: 5th International Conference on Information Technology: New Generations (itng 2008), pp. 1263-1265. https://doi.org/10.1109/ITNG.2008.166 (2008)
Watts, J., Taylor, S.: A practical approach to dynamic load balancing. IEEE Trans. Parallel Distrib. Syst. 9(3), 235–248 (1998). https://doi.org/10.1109/71.674316
Weingärtner, R, Bräscher, G B, Westphall, C.B.: Cloud resource management: a survey on forecasting and profiling models. J. Netw. Comput. Appl. 47, 99–106 (2015)
Winters, P.R.: Forecasting sales by exponentially weighted moving averages. Manag. Sci. 6(3), 324–342 (1960)
Xu, X., Chen, Y., Calero, J.M.A.: Distributed decentralized collaborative monitoring architecture for cloud infrastructures. Clust. Comput. 20(3), 2451–2463 (2017)
Yagoubi, B., Medebber, M.: A load balancing model for grid environment. In: 2007 22nd International Symposium on Computer and Information Sciences, pp. 1–7. https://doi.org/10.1109/ISCIS.2007.4456873(2007)
Yoo, W., Sim, A.: Time-series forecast modeling on high-bandwidth network measurements. J. Grid Comput. 14(3), 463–476 (2016)
Zhang, G.P.: Time series forecasting using a hybrid arima and neural network model. Neurocomputing 50, 159–175 (2003)
Zhang, H., Jiang, G., Yoshihira, K., Chen, H.: Proactive workload management in hybrid cloud computing. IEEE Trans. Netw. Serv. Manag. 11(1), 90–100 (2014). https://doi.org/10.1109/TNSM.2013.122313.130448
Acknowledgements
This article was partially supported by the following Brazilian agencies: CAPES, CNPq and FAPERGS. In addition, we would like to thank DELL for also supporting this research.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The final version of the present manuscript was approved by Marcio Lena on behalf of Dell Technologies.
Rights and permissions
About this article
Cite this article
da Rosa Righi, R., Lehmann, M., Gomes, M.M. et al. A Survey on Global Management View: Toward Combining System Monitoring, Resource Management, and Load Prediction. J Grid Computing 17, 473–502 (2019). https://doi.org/10.1007/s10723-018-09471-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10723-018-09471-x