Abstract
This paper presents a cross-layer reactive monitoring approach for Cloud computing environments. Based on complex event processing (CEP) methodology, our proposal monitors and analyzes performance metrics across Cloud layers to detect and repair performance-related problems. The approach utilizes novel CEP analysis rules and a new action manager framework. The proposed analysis rules are derived from a comprehensive analysis of the interactions between Cloud layers. The results of this study are used to reduce the number of monitored parameters, define the analysis rules and identify the causes of performance-related problems. Our novel action manager framework assigns a set of repair actions to each performance-related problem and checks the success of the applied action. The results of several experiments indicate that the time needed to fix a performance-related problem is reasonably short. They also show that the CPU overhead of using our approach is negligible. Moreover, experimental results demonstrate the merits of our approach in terms of speeding up the repair and reducing the number of triggered alarms compared to baseline methods.
Similar content being viewed by others
Notes
A data point represents one measurement of the studied metric.
The recovery actions are identified by a domain expert and are not sorted in advance.
An I/O performance-related problem is related to a high number of I/O requests to the physical disk.
References
Al-Ayyoub, M., Jararweh, Y., Daraghmeh, M., Althebyan, Q.: Multi-agent based dynamic resource provisioning and monitoring for cloud computing systems infrastructure. Clust. Comput. 18(2), 919–932 (2015)
Alhosban, A., Hashmi, K., Malik, Z., Medjahed, B.: Self-healing framework for Cloud-based services. In: ACS International Conference on Computer Systems and Applications, AICCSA 2013, pp. 1–7. Ifrane, 27–30 May 2013
Bhaduri, K., Das, K., Matthews, B.L.: Detecting abnormal machine characteristics in Cloud infrastructures. In: Proceedings of the International Conference on Data Mining Workshops, pp. 137–144. IEEE Computer Society (2011)
Bhaumik, S.: Root cause analysis in engineering failures. Trans. Indian Inst. Met. 63, 297–299 (2010)
Crocker, D.C.: Some interpretations of the multiple correlation coefficient. Am. Stat. 26, 31–33 (1972)
Cugola, G., Margara, A.: Processing flows of information: from data stream to complex event processing. ACM Comput. Surv. 44(3), 1–62 (2012)
Dai, Y., Xiang, Y., Zhang, G.: Self-healing and hybrid diagnosis in Cloud computing. In: Proceedings of the International Conference on Cloud Computing Technology and Science (CloudCom), vol. 5931, pp. 45–56. Springer, Berlin (2009)
de Chaves, S.A., Uriarte, R.B., Westphall, C.B.: Toward an architecture for monitoring private clouds. IEEE Commun. Mag. 49, 130–137 (2011)
Faul, F., Erdfelder, E., Buchner, A., Lang, A.G.: Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses. Behav. Res. Method 41, 1149–1160 (2009)
Gupta, D., Gardner, R., Cherkasova, L.: XenMon: QoS Monitoring and Performance Profiling Tool. Technical Report, HP Labs (2005)
Magalhaes, J.P., Silva, L.M.: A Framework for self-healing and self-adaptation of cloud-hosted web-based applications. In: Proceedings of the 5th IEEE International Conference on Cloud Computing Technology and Science (CloudCom), pp. 555–564. IEEE Computer Society (2013)
Massie, M.L., Chun, B.N., Culler, D.E.: The ganglia distributed monitoring system: design, implementation, and experience. Parallel Comput. 30, 817–840 (2004)
Mdhaffar, A., Ben-Halima, R., Juhnke, E., Jmaiel, M., Freisleben, B.: AOP4CSM: An aspect-oriented programming approach for Cloud service monitoring. In: Proceedings of the 11th IEEE International Conference on Computer and Information Technology, pp. 363–370. IEEE (2011)
Mdhaffar, A., Halima, R.B., Jmaiel, M., Freisleben, B.: CEP4Cloud: complex event processing for self-healing clouds. In: The Proceedings of the 23rd IEEE International Conference on Enabling Technologies: Infrastructure for Collaborative Entreprises (WETICE 2014), pp. 62–67. IEEE Computer Society Press, Parma (2014)
Mdhaffar, A., Halima, R.B., Jmaiel, M., Freisleben, B.: CEP4CMA: multi-layer cloud performance monitoring and analysis via complex event processing. In: Proceedings of the 2nd International Conference on NETworked sYStems (NETYS), pp. 138–152. Springer, Marrakech (2014)
Rabkin, A.: Chukwa: a large-scale monitoring system. In: Cloud Computing and Its Applications, pp. 1–5 (2008)
Taylor, R.: Interpretation of the correlation coefficient: a basic review. J. Diagn. Med. Sonogr. 6, 35–39 (1990)
Zhu, Q., Tung, T., Xie, Q.: Automatic fault diagnosis in cloud infrastructure. In: Proceedings of the 5th IEEE International Conference on Cloud Computing Technology and Science, pp. 467–474. IEEE Computer Society (2013)
Acknowledgments
This work is partly supported by the German Ministry of Education and Research (BMBF) and the German Academic Exchange Service (DAAD).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mdhaffar, A., Halima, R.B., Jmaiel, M. et al. Reactive performance monitoring of Cloud computing environments. Cluster Comput 20, 2465–2477 (2017). https://doi.org/10.1007/s10586-016-0676-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-016-0676-4