Abstract
The main goal of the anomaly detection analysis is to identify the observations that do not adhere to general patterns considered normal behavior. In distributed systems, anomaly detection helps manage and monitor the system’s performance. In the literature, several machine learning (ML) algorithms have been proposed to detect anomalies, each one returning anomalies according to the particular mechanism used in the search process. For a cloud orchestrator managing a data center, this paper aims to present new insights for its health monitoring based on the coupling of different anomaly detection techniques based on the portfolio principle. We do not compute the weather forecasting of the whole cloud but the weather forecasting of a single cloud system. The Portfolio’s principle executes several ML anomaly detection algorithms with different anomaly detection mechanisms. The idea is to detect a maximum number of anomalies, then consolidate the results according to their similarities to keep only the relevant anomalies from a system administrator’s point of view. We also offer a metric for the health of the cloud and a new cloud service, which consists of adding a barometer indicator that helps experts maintain the cloud in operational conditions and determine the quality of the analyzed cloud. The proposed cloud service allows us to compare the behavior of the cloud at different times, enabling us to trigger actions to remedy the problems of anomalies. On the experimental plan, and to validate our approach, we use datasets present in the traces of Alibaba’s Cloud 2018 that include about 4000 machines over eight days. Compared to a baseline algorithm, we show that the portfolio approach is more stringent with the quality of the detection of anomalies. We also demonstrate that our method can effectively help the cloud administrator predict cloud weather. Thus, for example, he can daily monitor the state of the cloud easily in a quasi-automatic way.
Similar content being viewed by others
Notes
References
Borghesi A, Bartolini A, Lombardi M, Milano M, Benini L (2019) Anomaly detection using autoencoders in high performance computing systems. In: The thirty-third AAAI conference on artificial intelligence, AAAI 2019, the thirty-first innovative applications of artificial intelligence conference, IAAI 2019, the ninth AAAI symposium on educational advances in artificial intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019. AAAI Press, pp 9428–9433
Alibaba Traces. https://github.com/alibaba/clusterdata
Alibaba Coud. https://eu.alibabacloud.com/
Menouer T, Sukhija N, Le Cun B (2017) A learning portfolio solver for optimizing the performance of constraint programming problems on multi-core computing systems. Concurr Comput: Pract Exp 29(4):e3840,cpe.3840
Menouer T, Baarir S (2017) Parallel learning portfolio-based solvers. Procedia Comput Sci 108:335–344. International conference on computational science, ICCS 2017, 12-14 Zurich, Switzerland
Ren R, Li J, Wang L, Zhan J, Cao Z (2018) Anomaly analysis for co-located datacenter workloads in the alibaba cluster. arXiv:1811.06901 [hep-th]
Weingessel A, Hornik K (2000) Local pca algorithms. IEEE Trans Neural Networks 11(6):1242–1250
Hong D, Zhao D, Zhang Y (2016) The entropy and pca based anomaly prediction in data streams. Procedia Comput Sci 96:139–146. Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 20th International Conference KES-2016
Gupta C, Sinha R, Zhang Y (2015) Eagle: user profile-based anomaly detection for securing hadoop clusters. In: 2015 IEEE international conference on Big Data (Big Data), pp 1336–1343
Simon J (2004) Density estimation. Stat Sci 19(4):588–597
Peng K, Leung VCM, Huang Q (2018) Clustering approach based on mini batch kmeans for intrusion detection system over big data. IEEE Access 6:11897–11906
Ren R, Jia Z, Wang L, Zhan J, Yi T (2016) Bdtune: hierarchical correlation-based performance analysis and rule-based diagnosis for big data systems. In: 2016 IEEE international conference on Big Data (Big Data), pp 555–562
Yi-Ren Y, Zheng-Yi L, Yuh-Jye L (2013) Anomaly detection via oversampling principal component analysis. IEEE Trans Knowl Data Eng 25:07
Rettig L, Khayati M, Cudre-Mauroux P, Piorkowski M (2015) Online anomaly detection over big data streams. In: 2015 IEEE international conference on Big Data (Big Data), Los Alamitos, CA, USA. IEEE Computer Society, pp 1113–1122
Zhou Y, Le Y, Liu M, Zhang Y, Li H (2018) Network intrusion detection based on kernel principal component analysis and extreme learning machine. In: 2018 IEEE 18th International Conference on Communication Technology (ICCT), pp 860–864
James Z, Robert G, Ilija V (2019) Anomaly detection in wide area network meshes using two machine learning algorithms. Futur Gener Comput Syst 93:418–426
Thottan M, Ji C (2003) Anomaly detection in ip networks. IEEE Trans Signal Process 51:2191–2204
Ying Z, Wenqi C, Zhiliang W, Yifan C, Kai W, Yahui L, Xia Y, Xingang S, Jiahai Y, Keqin L (2020) Helad: a novel network anomaly detection model based on heterogeneous ensemble learning. Comput Netw 169:107049
Google Traces. https://github.com/google/cluster-data
Agrawal B, Wiktorski T, Rong C (2016) Adaptive anomaly detection in cloud using robust and scalable principal component analysis. In: 2016 15th international symposium on parallel and distributed computing (ISPDC), pp 100–106
Li Yu, Zhiling L (2016) A scalable, non-parametric method for detecting performance anomaly in large scale computing. IEEE Trans Parallel Distrib Syst 27(7):1902–1914
Sauvanaud C, Kaâniche M, Kanoun K, Lazri K, Da Silva Silvestre G (2018) Anomaly detection and diagnosis for cloud services: practical experiments and lessons learned. J Syst Softw 139:84–106
Huang C, Min G, Wu Y, Ying Y, Pei K, Xiang Z (2017) Time series anomaly detection for trustworthy services in cloud computing systems. IEEE Trans Big Data 1–1
Islam MS, Miranskyy A (2020) Anomaly detection in cloud components. In: 2020 IEEE 13th international conference on cloud computing (CLOUD), pp 1–3
Jiang C, Qiu Y, Shi W, Ge Z, Wang J, Chen S, Cerin C, Ren Z, Xu G, Lin J (2020) Characterizing co-located workloads in alibaba cloud datacenters. IEEE Trans Cloud Comput 1
Masoud A, Bryan T (2012) Hybrid pareto archived dynamically dimensioned search for multi-objective combinatorial optimization: application to water distribution network design. J Hydroinf 14(1):192–205
Bökler F, Mutzel P (2015) Output-sensitive algorithms for enumerating the extreme nondominated points of multiobjective combinatorial optimization problems. In Algorithms-ESA 2015. Springer, pp 288–299
Tony Liu F, Ming Ting K, Zhou Z-H (2008) Isolation forest. In: 2008 Eighth IEEE international conference on data mining, pp 413–422
Breunig Markus M, Kriegel Hans-Peter, Ng Raymond T, Sander Jörg (2000) Lof: identifying density-based local outliers. In: ACM sigmod record, volume 29. ACM, pp 93–104
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD. AAAI Press, pp 226–231
Mathieu B, Anne-Laure J, Pierre-Emmanuel D (2013) A proof for the positive definiteness of the jaccard index matrix. Int J Approx Reason 54(5):615–626
Di Barba P, Mognaschi ME (2009) Sorting pareto solutions: a principle of optimal design for electrical machines. COMPEL-Int J Comput Math Electr Electron Eng
Apache Software Foundation - Hadoop https://hadoop.apache.org
Matei Z, Xin Reynold S, Patrick W, Tathagata D, Michael A, Ankur D, Xiangrui M, Josh R, Shivaram V, Franklin Michael J, Ali G, Joseph G, Scott S, Ion S (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65
Schrage Linus E, Miller Louis W (1966) The queue m/g/1 with the shortest remaining processing time discipline. Oper Res 14(4):670–684
Friedman Eric J, Henderson Shane G (2003) Fairness and efficiency in web server protocols. In: Proceedings of the 2003 ACM SIGMETRICS international conference on measurement and modeling of computer systems, SIGMETRICS ’03. Association for Computing Machinery, New York, NY, USA, pp 229–237
Menouer T, Cérin C, Saad W, Shi X (2018) A resource allocation framework with qualitative and quantitative SLA classes. In: Mencagli G, Dora BH, Valeria C, Emiliano C, Emmanuel J, Felix W, Antonio S, Claudio S, Ravi Reddy M, Laura R, Marco B, Laura A, José Daniel GS, Stephen LS (eds) Euro-Par 2018: parallel processing workshops - Euro-Par 2018 international workshops, Turin, Italy, August 27–28, 2018, Revised Selected Papers, volume 11339 of Lecture Notes in Computer Science, . Springer
Tarek M, Christophe C, Ching-Hsien H (2020) Opportunistic scheduling and resources consolidation system based on a new economic model. J Supercomput 76(12):9942–9975
Kung HT, Luccio F, Preparata FP (1975) On finding the maxima of a set of vectors. J ACM 22(4):469–476
Multi-objective optimization presentation. https://engineering.purdue.edu/~sudhoff/ee630/lecture09.pdf
Ding L, Zeng S, Kang L (2003) A fast algorithm on finding the non-dominated set in multi-objective optimization. In: Evolutionary computation, 2003. CEC ’03. The 2003 Congress on, volume 4, pp 2565–2571
Description of Alibaba Traces. https://github.com/alibaba/clusterdata/blob/master/cluster-trace-v2018/schema.txt
Acknowledgements
We want to thank Nicolas Grenèche, the system administrator of the MAGI cluster located at Sorbonne Paris Nord university See https://www-magi.univ-paris13.fr/ for his help in using the testbed. This work has been realized as a part of a PNE (Programme National Exceptionnel) scholarship. Amina KHEDIMI is grateful to the Algerian Ministry of Scientific Research (MESRS). This work has also been realized within the framework of a CNRS delegation at the University of Grenoble Alpes (UGA) and the Grenoble Computer Science Laboratory (LIGLAB) in the DATAMOVE-INRIA team. Thus, the work has been partially supported by the research programme on edge intelligence at the Multi-disciplinary Institute on Artificial Intelligence MIAI at Grenoble Alpes (ANR-19-P3IA-0003).
Author information
Authors and Affiliations
Contributions
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Khedimi, A., Menouer, T., Cérin, C. et al. A cloud weather forecasting service and its relationship with anomaly detection. SOCA 16, 191–208 (2022). https://doi.org/10.1007/s11761-022-00346-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11761-022-00346-4