Skip to main content
Log in

A cloud weather forecasting service and its relationship with anomaly detection

  • Original Research Paper
  • Published:
Service Oriented Computing and Applications Aims and scope Submit manuscript

Abstract

The main goal of the anomaly detection analysis is to identify the observations that do not adhere to general patterns considered normal behavior. In distributed systems, anomaly detection helps manage and monitor the system’s performance. In the literature, several machine learning (ML) algorithms have been proposed to detect anomalies, each one returning anomalies according to the particular mechanism used in the search process. For a cloud orchestrator managing a data center, this paper aims to present new insights for its health monitoring based on the coupling of different anomaly detection techniques based on the portfolio principle. We do not compute the weather forecasting of the whole cloud but the weather forecasting of a single cloud system. The Portfolio’s principle executes several ML anomaly detection algorithms with different anomaly detection mechanisms. The idea is to detect a maximum number of anomalies, then consolidate the results according to their similarities to keep only the relevant anomalies from a system administrator’s point of view. We also offer a metric for the health of the cloud and a new cloud service, which consists of adding a barometer indicator that helps experts maintain the cloud in operational conditions and determine the quality of the analyzed cloud. The proposed cloud service allows us to compare the behavior of the cloud at different times, enabling us to trigger actions to remedy the problems of anomalies. On the experimental plan, and to validate our approach, we use datasets present in the traces of Alibaba’s Cloud 2018 that include about 4000 machines over eight days. Compared to a baseline algorithm, we show that the portfolio approach is more stringent with the quality of the detection of anomalies. We also demonstrate that our method can effectively help the cloud administrator predict cloud weather. Thus, for example, he can daily monitor the state of the cloud easily in a quasi-automatic way.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31
Fig. 32
Fig. 33

Similar content being viewed by others

Notes

  1. See https://www-magi.univ-paris13.fr/.

  2. https://cran.r-project.org/web/packages/solitude/solitude.pdf.

  3. https://cran.r-project.org/web/packages/DDoutlier/DDoutlier.pdf.

  4. https://cran.r-project.org/web/packages/dbscan/dbscan.pdf.

  5. Such as this one: https://www.alibabacloud.com/help/en/application-real-time-monitoring-service/latest/use-an-intelligent-detector-to-detect-data-anomalies.

References

  1. Borghesi A, Bartolini A, Lombardi M, Milano M, Benini L (2019) Anomaly detection using autoencoders in high performance computing systems. In: The thirty-third AAAI conference on artificial intelligence, AAAI 2019, the thirty-first innovative applications of artificial intelligence conference, IAAI 2019, the ninth AAAI symposium on educational advances in artificial intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019. AAAI Press, pp 9428–9433

  2. Alibaba Traces. https://github.com/alibaba/clusterdata

  3. Alibaba Coud. https://eu.alibabacloud.com/

  4. Menouer T, Sukhija N, Le Cun B (2017) A learning portfolio solver for optimizing the performance of constraint programming problems on multi-core computing systems. Concurr Comput: Pract Exp 29(4):e3840,cpe.3840

  5. Menouer T, Baarir S (2017) Parallel learning portfolio-based solvers. Procedia Comput Sci 108:335–344. International conference on computational science, ICCS 2017, 12-14 Zurich, Switzerland

  6. Ren R, Li J, Wang L, Zhan J, Cao Z (2018) Anomaly analysis for co-located datacenter workloads in the alibaba cluster. arXiv:1811.06901 [hep-th]

  7. Weingessel A, Hornik K (2000) Local pca algorithms. IEEE Trans Neural Networks 11(6):1242–1250

    Article  Google Scholar 

  8. Hong D, Zhao D, Zhang Y (2016) The entropy and pca based anomaly prediction in data streams. Procedia Comput Sci 96:139–146. Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 20th International Conference KES-2016

  9. Gupta C, Sinha R, Zhang Y (2015) Eagle: user profile-based anomaly detection for securing hadoop clusters. In: 2015 IEEE international conference on Big Data (Big Data), pp 1336–1343

  10. Simon J (2004) Density estimation. Stat Sci 19(4):588–597

    MATH  Google Scholar 

  11. Peng K, Leung VCM, Huang Q (2018) Clustering approach based on mini batch kmeans for intrusion detection system over big data. IEEE Access 6:11897–11906

    Article  Google Scholar 

  12. Ren R, Jia Z, Wang L, Zhan J, Yi T (2016) Bdtune: hierarchical correlation-based performance analysis and rule-based diagnosis for big data systems. In: 2016 IEEE international conference on Big Data (Big Data), pp 555–562

  13. Yi-Ren Y, Zheng-Yi L, Yuh-Jye L (2013) Anomaly detection via oversampling principal component analysis. IEEE Trans Knowl Data Eng 25:07

    Google Scholar 

  14. Rettig L, Khayati M, Cudre-Mauroux P, Piorkowski M (2015) Online anomaly detection over big data streams. In: 2015 IEEE international conference on Big Data (Big Data), Los Alamitos, CA, USA. IEEE Computer Society, pp 1113–1122

  15. Zhou Y, Le Y, Liu M, Zhang Y, Li H (2018) Network intrusion detection based on kernel principal component analysis and extreme learning machine. In: 2018 IEEE 18th International Conference on Communication Technology (ICCT), pp 860–864

  16. James Z, Robert G, Ilija V (2019) Anomaly detection in wide area network meshes using two machine learning algorithms. Futur Gener Comput Syst 93:418–426

    Article  Google Scholar 

  17. Thottan M, Ji C (2003) Anomaly detection in ip networks. IEEE Trans Signal Process 51:2191–2204

    Article  Google Scholar 

  18. Ying Z, Wenqi C, Zhiliang W, Yifan C, Kai W, Yahui L, Xia Y, Xingang S, Jiahai Y, Keqin L (2020) Helad: a novel network anomaly detection model based on heterogeneous ensemble learning. Comput Netw 169:107049

    Article  Google Scholar 

  19. Google Traces. https://github.com/google/cluster-data

  20. Agrawal B, Wiktorski T, Rong C (2016) Adaptive anomaly detection in cloud using robust and scalable principal component analysis. In: 2016 15th international symposium on parallel and distributed computing (ISPDC), pp 100–106

  21. Li Yu, Zhiling L (2016) A scalable, non-parametric method for detecting performance anomaly in large scale computing. IEEE Trans Parallel Distrib Syst 27(7):1902–1914

    Article  Google Scholar 

  22. Sauvanaud C, Kaâniche M, Kanoun K, Lazri K, Da Silva Silvestre G (2018) Anomaly detection and diagnosis for cloud services: practical experiments and lessons learned. J Syst Softw 139:84–106

    Article  Google Scholar 

  23. Huang C, Min G, Wu Y, Ying Y, Pei K, Xiang Z (2017) Time series anomaly detection for trustworthy services in cloud computing systems. IEEE Trans Big Data 1–1

  24. Islam MS, Miranskyy A (2020) Anomaly detection in cloud components. In: 2020 IEEE 13th international conference on cloud computing (CLOUD), pp 1–3

  25. Jiang C, Qiu Y, Shi W, Ge Z, Wang J, Chen S, Cerin C, Ren Z, Xu G, Lin J (2020) Characterizing co-located workloads in alibaba cloud datacenters. IEEE Trans Cloud Comput 1

  26. Masoud A, Bryan T (2012) Hybrid pareto archived dynamically dimensioned search for multi-objective combinatorial optimization: application to water distribution network design. J Hydroinf 14(1):192–205

    Article  Google Scholar 

  27. Bökler F, Mutzel P (2015) Output-sensitive algorithms for enumerating the extreme nondominated points of multiobjective combinatorial optimization problems. In Algorithms-ESA 2015. Springer, pp 288–299

  28. Tony Liu F, Ming Ting K, Zhou Z-H (2008) Isolation forest. In: 2008 Eighth IEEE international conference on data mining, pp 413–422

  29. Breunig Markus M, Kriegel Hans-Peter, Ng Raymond T, Sander Jörg (2000) Lof: identifying density-based local outliers. In: ACM sigmod record, volume 29. ACM, pp 93–104

  30. Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD. AAAI Press, pp 226–231

  31. Mathieu B, Anne-Laure J, Pierre-Emmanuel D (2013) A proof for the positive definiteness of the jaccard index matrix. Int J Approx Reason 54(5):615–626

    Article  MathSciNet  Google Scholar 

  32. Di Barba P, Mognaschi ME (2009) Sorting pareto solutions: a principle of optimal design for electrical machines. COMPEL-Int J Comput Math Electr Electron Eng

  33. Apache Software Foundation - Hadoop https://hadoop.apache.org

  34. Matei Z, Xin Reynold S, Patrick W, Tathagata D, Michael A, Ankur D, Xiangrui M, Josh R, Shivaram V, Franklin Michael J, Ali G, Joseph G, Scott S, Ion S (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65

    Article  Google Scholar 

  35. Schrage Linus E, Miller Louis W (1966) The queue m/g/1 with the shortest remaining processing time discipline. Oper Res 14(4):670–684

    Article  MathSciNet  Google Scholar 

  36. Friedman Eric J, Henderson Shane G (2003) Fairness and efficiency in web server protocols. In: Proceedings of the 2003 ACM SIGMETRICS international conference on measurement and modeling of computer systems, SIGMETRICS ’03. Association for Computing Machinery, New York, NY, USA, pp 229–237

  37. Menouer T, Cérin C, Saad W, Shi X (2018) A resource allocation framework with qualitative and quantitative SLA classes. In: Mencagli G, Dora BH, Valeria C, Emiliano C, Emmanuel J, Felix W, Antonio S, Claudio S, Ravi Reddy M, Laura R, Marco B, Laura A, José Daniel GS, Stephen LS (eds) Euro-Par 2018: parallel processing workshops - Euro-Par 2018 international workshops, Turin, Italy, August 27–28, 2018, Revised Selected Papers, volume 11339 of Lecture Notes in Computer Science, . Springer

  38. Tarek M, Christophe C, Ching-Hsien H (2020) Opportunistic scheduling and resources consolidation system based on a new economic model. J Supercomput 76(12):9942–9975

    Article  Google Scholar 

  39. Kung HT, Luccio F, Preparata FP (1975) On finding the maxima of a set of vectors. J ACM 22(4):469–476

    Article  MathSciNet  Google Scholar 

  40. Multi-objective optimization presentation. https://engineering.purdue.edu/~sudhoff/ee630/lecture09.pdf

  41. Ding L, Zeng S, Kang L (2003) A fast algorithm on finding the non-dominated set in multi-objective optimization. In: Evolutionary computation, 2003. CEC ’03. The 2003 Congress on, volume 4, pp 2565–2571

  42. Description of Alibaba Traces. https://github.com/alibaba/clusterdata/blob/master/cluster-trace-v2018/schema.txt

Download references

Acknowledgements

We want to thank Nicolas Grenèche, the system administrator of the MAGI cluster located at Sorbonne Paris Nord university See https://www-magi.univ-paris13.fr/ for his help in using the testbed. This work has been realized as a part of a PNE (Programme National Exceptionnel) scholarship. Amina KHEDIMI is grateful to the Algerian Ministry of Scientific Research (MESRS). This work has also been realized within the framework of a CNRS delegation at the University of Grenoble Alpes (UGA) and the Grenoble Computer Science Laboratory (LIGLAB) in the DATAMOVE-INRIA team. Thus, the work has been partially supported by the research programme on edge intelligence at the Multi-disciplinary Institute on Artificial Intelligence MIAI at Grenoble Alpes (ANR-19-P3IA-0003).

Author information

Authors and Affiliations

Authors

Contributions

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Corresponding author

Correspondence to Tarek Menouer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khedimi, A., Menouer, T., Cérin, C. et al. A cloud weather forecasting service and its relationship with anomaly detection. SOCA 16, 191–208 (2022). https://doi.org/10.1007/s11761-022-00346-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11761-022-00346-4

Keywords

Navigation