A Scalable Monitoring Solution for Large-Scale Distributed Systems

  • Andreea BugaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9520)


Applications running in large-scale distributed systems face many challenges and difficulties. Constraints imposed to such systems need to be thoroughly checked in order to ensure a proper service delivery to the client. The current paper proposes a monitoring solution for large-scale distributed systems relying on abstract state machines. Data gathered from the monitoring components are used in calculating metrics and establishing a diagnosis for the system. Emphasis is put on failure detection and on ensuring non-functional requirements of the system such as fault-tolerance and resilience. The model introduced in this paper will be integrated in a cloud-enabled large-scale distributed system. The novelty of the solution consists of finding the best integration architecture for state-of-the-art algorithms and tools and refining them to an efficient version for large-scale distributed systems.


Large scale distributed systems Monitoring Decentralization Formal modelling 


  1. 1.
    Parkhill, D.F.: The Challenge of the Computer Utility. Addison-Wesley Publishing Company, Reading (1966)Google Scholar
  2. 2.
    Nemes, S. T.: Adaptation Engine for Large-Scale Distributed Systems. In: Computer Aided Systems Theory - EUROCAST 2015, To appear. Springer, Las Palmas (2015)Google Scholar
  3. 3.
    Kutare, M., Eisenhauer, G., Wang, C., Schwan, K., Talwar, V., Wolf, M.: Monalytics: online monitoring and analytics for managing large scale data centers. In: Proceedings of the 7th International Conference on Autonomic Computing, pp. 141–150. ACM (2010)Google Scholar
  4. 4.
    Rak, M., Venticinque, S., Mahr, T., Echevarria, G., Esnal, G.: Cloud application monitoring: the mOSAIC approach. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 758–763. IEEE (2011)Google Scholar
  5. 5.
    Palmieri, R., di Sanzo, P., Quaglia, F., Romano, P., Peluso, S., Didona, D.: Integrated monitoring of infrastructures and applications in cloud environments. In: Alexander, M., D’Ambra, P., Belloum, A., Bosilca, G., Cannataro, M., Danelutto, M., Di Martino, B., Gerndt, M., et al. (eds.) Euro-Par 2011, Part I. LNCS, vol. 7155, pp. 45–53. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  6. 6.
    Massie, M.L., Chun, B.N., Culler, D.E: The ganglia distributed monitoring system: design, parallel computing, implementation and experience (2003)Google Scholar
  7. 7.
    Börger, E., Stärk, R.F.: Abstract State Machines: A Method for High-Level System Design and Analysis. Springer, Heidelberg (2003) CrossRefzbMATHGoogle Scholar
  8. 8.
    Lynch, N.: Distributed Algorithms. Morgan Kaufmann Publishers Inc., San Francisco (1996)zbMATHGoogle Scholar
  9. 9.
    Hamid, B., Mosbah, M.: A formal model for fault-tolerance in distributed systems. In: Winther, R., Gran, B.A., Dahll, G. (eds.) SAFECOMP 2005. LNCS, vol. 3688, pp. 108–121. Springer, Heidelberg (2005) CrossRefGoogle Scholar
  10. 10.
    Driscoll, K., Hall, B., Sivencrona, H., Zumsteg, P.: Byzantine fault tolerance, from theory to reality. In: Anderson, S., Felici, M., Littlewood, B. (eds.) SAFECOMP 2003. LNCS, vol. 2788, pp. 235–248. Springer, Heidelberg (2003) CrossRefGoogle Scholar
  11. 11.
    Stärk, R.F., Schmid, J., Börger, E.: Java and the Java Virtual Machine: Definition, Verification, Validation. Springer, Heidelberg (2001) CrossRefzbMATHGoogle Scholar
  12. 12.
    Blass, A., Gurevich, Y.: Abstract state machines capture parallel algorithms: correction and extension. ACM Trans. Comput. Logic 9(3), 19:1–19:32 (2008)MathSciNetGoogle Scholar
  13. 13.
    Glässer, U., Gu, Q.-P.: Formal description and analysis of a distributed location service for mobile ad hoc networks. In: Theoretical Computer Science (2005)Google Scholar
  14. 14.
    Rady, M., Lampesberger, H.: Monitoring of client-cloud interaction. In: Buchberger, B., Prinz, A., Schewe, K.D., Thalheim, B. (eds.) Correct Software in Web Applications and Web Services. Texts & Monographs in Symbolic Computation, pp. 177–228. Springer, Heidelberg (2014) Google Scholar
  15. 15.
    Bósa, K.: A formal model of a cloud service architecture in terms of ambient ASM. Technical report, Christian Doppler Laboratory for Client-Centric Cloud Computing (CDCC), Johannes Kepler University Linz, Hagenberg, Austria (2012)Google Scholar
  16. 16.
    Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F.: The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, New York (2003) zbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Christian-Doppler Laboratory for Client-Centric Cloud Computing (CDCC)Hagenberg im MühlkreisAustria

Personalised recommendations