Advertisement

An Approach for Detecting Abnormal Parallel Applications Based on Time Series Analysis Methods

  • Denis ShaykhislamovEmail author
  • Vadim Voevodin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10777)

Abstract

The low efficiency of parallel program execution is one of the most serious problems in high-performance computing area. There are many researches and software tools aimed at analyzing and improving the performance of a particular program, but the task of detecting such applications that need to be analyzed is still far from being solved.

In this research, methods for detecting abnormal behavior of the programs in the overall supercomputer task flow are being developed. There are no clear criteria for anomalous behavior, and also these criteria can differ significantly for different computing systems, therefore machine learning methods are being used. These methods take system monitoring data as an input, since they provide the most complete information about the dynamics of program execution.

In this article we propose a method based on the time series analysis of dynamic characteristics describing the behavior of programs. In this method, the time series is divided into a set of intervals, where the anomalous ones are detected. After that the final classification of the entire application is performed based on the results of interval classification. The developed method is being tested on real-life data of the Petaflops-level Lomonosov-2 supercomputer.

Keywords

High-performance computing Efficiency analysis Parallel program Task flow Time series analysis Anomaly detection Machine learning 

Notes

Acknowledgments

This work was funded in part by the Russian Found for Basic Research (grant 16-07-00972) and Russian Presidential study grant (SP-1981.2016.5).

References

  1. 1.
    Nikitenko, D., Stefanov, K., Zhumatiy, S., Voevodin, V., Teplov, A., Shvets, P.: System monitoring-based holistic resource utilization analysis for every user of a large HPC center. In: Carretero, J., Garcia-Blas, J., Gergel, V., Voevodin, V., Meyerov, I., Rico-Gallego, J.A., Díaz-Martín, J.C., Alonso, P., Durillo, J., Garcia Sánchez, J.D., Lastovetsky, A.L., Marozzo, F., Liu, Q., Bhuiyan, Z.A., Fürlinger, K., Weidendorfer, J., Gracia, J. (eds.) ICA3PP 2016. LNCS, vol. 10049, pp. 305–318. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-49956-7_24 CrossRefGoogle Scholar
  2. 2.
    Nikitenko, D.A., Voevodin Vad, V., Zhumatiy, S.A., Stefanov, K.S., Teplov, A.M., Shvets, P.A.: Supercomputer application integral characteristics analysis for the whole queued job collection of large-scale HPC systems. In: 10th Annual International Scientific Conference on Parallel Computing Technologies, Arkhangelsk, Russian Federation, CEUR Workshop Proceedings, vol. 1576, pp. 20–30 (2016)Google Scholar
  3. 3.
    Shaykhislamov, D.: Using machine learning methods to detect applications with abnormal efficiency. In: Voevodin, V., Sobolev, S. (eds.) RuSCDays 2016. CCIS, vol. 687, pp. 345–355. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-55669-7_27 CrossRefGoogle Scholar
  4. 4.
    Pedregosa, F., et al.: Scikit-learn: machine learning in python. JMLR 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Pena, E.H.M., de Assis, M.V.O., Proena, M.L.: Anomaly detection using forecasting methods ARIMA and HWDS. In: 32nd International Conference of the Chilean Computer Science Society (SCCC), Temuco, pp. 63–66 (2013)Google Scholar
  6. 6.
    Cheboli, D.: Anomaly detection of time series. Dissertation, University of Minnesota (2010)Google Scholar
  7. 7.
    Malhotra, P., Vig, L., Shroff, G., Agarwal, P.: Long short term memory networks for anomaly detection in time series. In: European Symposium on Artificial Neural Networks, vol. 23 (2015)Google Scholar
  8. 8.
    Matteson, D.S., James, N.A.: A nonparametric approach for multiple change point analysis of multivariate data. J. Am. Stat. Assoc. 109(505), 334–345 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Vostrikova, L.: Detection disorder in multidimensional random processes. Sov. Math. Dokl. 24, 5559 (1981)zbMATHGoogle Scholar
  10. 10.
    Rizzo, M.L., Szkely, G.J.: Disco analysis: a nonparametric extension of analysis of variance. Ann. Appl. Stat. 4(2), 10341055 (2010)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Lomonosov Moscow State UniversityMoscowRussia

Personalised recommendations