LS-ADT: Lightweight and Scalable Anomaly Detection for Cloud Datacentres

Barbhuiya, Sakil; Papazachos, Zafeirios; Kilpatrick, Peter; Nikolopoulos, Dimitrios S.

doi:10.1007/978-3-319-29582-4_8

Sakil Barbhuiya¹³,
Zafeirios Papazachos¹³,
Peter Kilpatrick¹³ &
…
Dimitrios S. Nikolopoulos¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 581))

Included in the following conference series:

International Conference on Cloud Computing and Services Science

793 Accesses

Abstract

Cloud data centres are implemented as large-scale clusters with demanding requirements for service performance, availability and cost of operation. As a result of scale and complexity, data centres typically exhibit large numbers of system anomalies resulting from operator error, resource over/under provisioning, hardware or software failures and security issus anomalies are inherently difficult to identify and resolve promptly via human inspection. Therefore, it is vital in a cloud system to have automatic system monitoring that detects potential anomalies and identifies their source. In this paper we present a lightweight anomaly detection tool for Cloud data centres which combines extended log analysis and rigorous correlation of system metrics, implemented by an efficient correlation algorithm which does not require training or complex infrastructure set up. The LADT algorithm is based on the premise that there is a strong correlation between node level and VM level metrics in a cloud system. This correlation will drop significantly in the event of any performance anomaly at the node-level and a continuous drop in the correlation can indicate the presence of a true anomaly in the node. The log analysis of LADT assists in determining whether the correlation drop could be caused by naturally occurring cloud management activity such as VM migration, creation, suspension, termination or resizing. In this way, any potential anomaly alerts are reasoned about to prevent false positives that could be caused by the cloud operator’s activity. We demonstrate LADT with log analysis in a Cloud environment to show how the log analysis is combined with the correlation of systems metrics to achieve accurate anomaly detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lou, J.G., Fu, Q., Yang, S., Xu, Y., Li, J.: Mining invariants from console logs for system problem detection. In: Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC 2010, pp. 24–24. USENIX Association, Berkeley, CA, USA (2010)
Google Scholar
Xu, W., Huang, L., Fox, A., Patterson, D., Jordan, M.I.: Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP 2009, pp. 117–132. ACM, New York, NY, USA (2009)
Google Scholar
Tan, J., Kavulya, S., Gandhi, R., Narasimhan, P.: Light-weight black-box failure detection for distributed systems. In: Proceedings of the 2012 Workshop on Management of Big Data Systems, MBDS 2012, pp. 13–18. ACM, New York (2012)
Google Scholar
Wang, C.: Ebat: Online methods for detecting utility cloud anomalies. In: Proceedings of the 6th Middleware Doctoral Symposium, MDS 2009, pp. 4:1–4:6. ACM, New York (2009)
Google Scholar
Ward, J.S., Barker, A.: Varanus: In situ monitoring for large scale cloud systems. In: Proceedings of the 2013 IEEE International Conference on Cloud Computing Technology and Science, CLOUDCOM 2013, Computer Society, vol. 02, pp. 341–344. IEEE, Washington, DC (2013)
Google Scholar
Kang, H., Chen, H., Jiang, G.: Peerwatch: a fault detection and diagnosis tool for virtualized consolidation systems. In: Proceedings of the 7th International Conference on Autonomic Computing, ICAC 2010, pp. 119–128. ACM, New York (2010)
Google Scholar
Jiang, M., Munawar, M.A., Reidemeister, T., Ward, P.A.: System monitoring with metric-correlation models: problems and solutions. In: Proceedings of the 6th International Conference on Autonomic Computing, ICAC 2009, pp. 13–22. ACM, New York (2009)
Google Scholar
Barbhuiya, S., Papazachos, Z., Kilpatrick, P., Nikolopoulos, D.: In: A Lightweight Tool for Anomaly Detection in Cloud Data Centres, SCITEPRESS Digital Library, pp. 343–351 (2015)
Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 1099–1110. ACM, New York (2008)
Google Scholar
Oppenheimer, D., Ganapathi, A., Patterson, D.A.: Why do internet services fail, and what can be done about it? In: Proceedings of the 4th Conference on USENIX Symposium on Internet Technologies and Systems, USITS 2003, vol. 4, p. 1. USENIX Association, Berkeley, CA, USA (2003)
Google Scholar
Kumar, V., Cooper, B.F., Eisenhauer, G., Schwan, K.: iManage: policy-driven self-management for enterprise-scale systems. In: Cerqueira, R., Campbell, R.H. (eds.) Middleware 2007. LNCS, vol. 4834, pp. 287–307. Springer, Heidelberg (2007)
Chapter Google Scholar
Pertet, S., Narasimhan, P.: Causes of failure in web applications. Technical report, CMU-PDL-05-109 (2005)
Google Scholar
Kephart, J.O., Chess, D.M.: The vision of autonomic computing. Computer 36, 41–50 (2003)
Article Google Scholar
Rouillard, J.P.: Refereed papers: real-time log file analysis using the simple event correlator (sec). In: Proceedings of the 18th USENIX Conference on System Administration, LISA 2004, pp. 133–150. USENIX Association, Berkeley, CA, USA (2004)
Google Scholar
Prewett, J.E.: Analyzing cluster log files using logsurfer. In: in Proceedings of the 4th Annual Conference on Linux Clusters (2003)
Google Scholar
Hansen, S.E., Atkins, E.T.: Automated system monitoring and notification with swatch. In: Proceedings of the 7th USENIX Conference on System Administration, LISA 1993, pp. 145–152. USENIX Association, Berkeley, CA, USA (1993)
Google Scholar
Azmandian, F., Moffie, M., Alshawabkeh, M., Dy, J., Aslam, J., Kaeli, D.: Virtual machine monitor-based lightweight intrusion detection. ACM SIGOPS Operating Syst. Rev. 45, 38–53 (2011)
Article Google Scholar
Rabkin, A., Katz, R.: Chukwa: a system for reliable large-scale log collection. In: Proceedings of the 24th International Conference on Large Installation System Administration, LISA 2010, pp. 1–15. USENIX Association, Berkeley, CA, USA (2010)
Google Scholar
Vora, M.: Hadoop-hbase for large-scale data. In: 2011 International Conference on Computer Science and Network Technology (ICCSNT), vol. 1, pp. 601–605 (2011)
Google Scholar
Sigar: https://support.hyperic.com/display/sigar (2014)
Virt-Top: http://virt-tools.org/about/ (2014)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), MSST 2010, Computer Society, pp. 1–10. IEEE, Washington, DC (2010)
Google Scholar
Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A.D., Ailamaki, A., Falsafi, B.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2012, pp. 37–48. ACM, New York (2012)
Google Scholar
Dahbur, K., Mohammad, B., Tarakji, A.B.: A survey of risks, threats and vulnerabilities in cloud computing. In: Proceedings of the 2011 International Conference on Intelligent Semantic Web-Services and Applications, ISWSA 2011, pp. 12:1–12:6. ACM, New York (2011)
Google Scholar
Antunes, J., Neves, N., Verissimo, P.: Detection and prediction of resource-exhaustion vulnerabilities. In: 19th International Symposium on Software Reliability Engineering, ISSRE 2008, pp. 87–96 (2008)
Google Scholar
Li, D., Jin, H., Liao, X., Zhang, Y., Zhou, B.: Improving disk i/o performance in a virtualized system. J. Comput. Syst. Sci. 79, 187–200 (2013)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronics, Electrical Engineering and Computer Science, Queens University Belfast, Belfast, BT7 1NN, UK
Sakil Barbhuiya, Zafeirios Papazachos, Peter Kilpatrick & Dimitrios S. Nikolopoulos

Authors

Sakil Barbhuiya
View author publications
You can also search for this author in PubMed Google Scholar
Zafeirios Papazachos
View author publications
You can also search for this author in PubMed Google Scholar
Peter Kilpatrick
View author publications
You can also search for this author in PubMed Google Scholar
Dimitrios S. Nikolopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zafeirios Papazachos .

Editor information

Editors and Affiliations

Dublin City University, Dublin 9, Ireland
Markus Helfert
Universitat Autònoma de Barcelona, Bellaterra, Spain
Víctor Méndez Muñoz
Dell, ROUND ROCK, USA
Donald Ferguson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barbhuiya, S., Papazachos, Z., Kilpatrick, P., Nikolopoulos, D.S. (2016). LS-ADT: Lightweight and Scalable Anomaly Detection for Cloud Datacentres. In: Helfert, M., Méndez Muñoz, V., Ferguson, D. (eds) Cloud Computing and Services Science. CLOSER 2015. Communications in Computer and Information Science, vol 581. Springer, Cham. https://doi.org/10.1007/978-3-319-29582-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-29582-4_8
Published: 03 February 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29581-7
Online ISBN: 978-3-319-29582-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics