A Scalable Monitor for Large Systems

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 512)


Current monitoring solutions are not well suited to monitoring large data centers in different ways: lack of scalability, scarce representativity of global state conditions, inability in guaranteeing persistence in service delivery, and the impossibility of monitoring multi-tenant applications. In this paper, we present a novel monitoring architecture that strives to address these problems. It integrates a hierarchical scheme to monitor the resources in a cluster with a distributed hash table (DHT) to broadcast system state information among different monitors. This architecture strives to obtain high scalability, effectiveness and resilience, as well as the possibility of monitoring services spanning across different clusters or even different data centers of the cloud provider. We evaluate the scalability of the proposed architecture through an experimental analysis and we measure the overhead of the DHT-based communication scheme.


Monitoring architecture Cloud Computing Large-scale Scalability Multi-tenancy 


  1. 1.
    Dean, J., Lopes, J.: MapReduce: simplified data processing on large clusters. In: OSDI 2004, 6th Symposium on Operating Systems Design and Implementation, USENIX Association (2004)Google Scholar
  2. 2.
    Calder, B., et al.: Windows Azure storage: a highly available cloud storage service with strong consistency. In: SOSP 2011, 23rd ACM Symposium on Operating System Principles. ACM (2011)Google Scholar
  3. 3.
    Shvachko, K., et al.: The hadoop distributed file system. In: MSST 2010, 26th Symposium on Massive Storage Systems and Technologies. IEEE Computer Society (2010)Google Scholar
  4. 4.
    Gantz, J., Reinsel, D.: The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east (2012).
  5. 5.
    Traverse: distributed, scalable, high-availability architecture (2010–2013).
  6. 6.
    Rowstron, A., Druschel, P.: Pastry: scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  7. 7.
    Litvinova, A., Engelmann, C., Scott, S.L.: A proactive fault tolerance framework for high-performance computing. In: PDCN 2010, 9th IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN2010). ACTA Press (2010)Google Scholar
  8. 8.
    Massie, M.L., Chun, B.N., Culler, D.E.: The Ganglia distributed monitoring system: design, implementation, and experience. Parallel Comput. 30, 817–840 (2004)CrossRefGoogle Scholar
  9. 9.
    Keller, A., Ludwig, H.: The WSLA framework: specifying and monitoring service level agreements for web services. J. Netw. Syst. Manag. 11, 57–81 (2003)CrossRefGoogle Scholar
  10. 10.
    Surhone, L.M., Tennoe, M.T., Henssonow, S.F.: OpenNMS. Betascript Publishing, Mauritius (2011)Google Scholar
  11. 11.
    Olups, R.: Zabbix 1.8 Network Monitoring. Packt Publishing, Birmingham (2010)Google Scholar
  12. 12.
    Badger, M.: Zenoss Core Network and System Monitoring. Packt Publishing Ltd., Birmingham (2008)Google Scholar
  13. 13.
    Kundu, D., Lavlu, S.: Cacti 0.8 Network Monitoring. Packt Publishing, Birmingham (2009)Google Scholar
  14. 14.
    Davis, C.: Graphite - Scalable Realtime Graphing (2013).
  15. 15.
    Josephsen, D.: Building a Monitoring Infrastructure with Nagios. Prentice Hall, Upper Saddle River (2007)Google Scholar
  16. 16.
    Rabkin, A., Katz, R.: Chukwa: a system for reliable large-scale log collection. In: LISA 2010, 24th International Conference on Large Installation System Administration. USENIX Association (2010)Google Scholar
  17. 17.
    Hoffman, S., Souza, S.D.: Apache Flume: Distributed Log Collection for Hadoop. Packt Publishing, Birmingham (2013)Google Scholar
  18. 18.
    Sacerdoti, F.D., Katz, M.J., Massie, M.L., Culler, D.E.: Wide area cluster monitoring with Ganglia. In: Proceedings of Cluster Computing (2003)Google Scholar
  19. 19.
    Renesse, R.V., Birman, K.P., Vogels, W.: Astrolabe: a robust and scalable technology for distributed system monitoring, management, and data mining. ACM Trans. Comput. Syst. 21, 164–206 (2003)CrossRefGoogle Scholar
  20. 20.
    Lv, Q., Cao, P., Cohen, E., Li, K., Shenker, S.: Search and replication in unstructured peer-to-peer networks. In: ICS 2002, 16th International Conference on Supercomputing. ACM (2002)Google Scholar
  21. 21.
    Babu, S., Subramanian, L., Widom, J.: A data stream management system for network traffic management. In: NRDM 2001, 1st Workshop on Network-Related Data Management (2001)Google Scholar
  22. 22.
    Cranor, C., Johnson, T., Spataschek, O.: Gigascope: a stream database for network applications. In: SIGMOD 2003, 2003 ACM SIGMOD International Conference on Management of Data. ACM (2003)Google Scholar
  23. 23.
    Voicu, R., Newman, H., Cirstoiu, C.: MonALISA: an agent based, dynamic service system to monitor, control and optimize distributed systems. Comput. Phys. Commun. 180, 2472–2498 (2009)CrossRefGoogle Scholar
  24. 24.
    Hasselmeyer, P., d’Heureuse, N.: Towards holistic multi-tenant monitoring for virtual data centers. In: NOMS 2010, 2010 IEEE/IFIP Network Operations and Management Symposium Workshops. IEEE Computer Society (2010)Google Scholar
  25. 25.
    Liu, B., Lee, W.C., Lee, D.L.: Supporting complex multi-dimensional queries in p2p systems. In: Proceedings of 25th IEEE International Conference on Distributed Computing Systems (ICDCS 2005), Columbus, OH (2005)Google Scholar
  26. 26.
    Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Endler, M., Schmidt, D.C. (eds.) Middleware 2003. LNCS, vol. 2672, pp. 21–40. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  27. 27.
    Joung, Y.J., Fang, C.T., Yang, L.W.: Keyword search in dht-based peer-to-peer networks. In: Proceedings of 25th IEEE International Conference on Distributed Computing Systems (ICDCS 2005), Columbus, OH (2005)Google Scholar
  28. 28.
    Tang, C., Xu, Z., Mahalingam, M.: psearch: information retrieval in structured overlays. SIGCOMM Comput. Commun. Rev. 33, 89–94 (2003)CrossRefGoogle Scholar
  29. 29.
    Andreolini, M., Pietri, M., Tosi, S., Balboni, A.: Monitoring large cloud-based systems. In: CLOSER 2014, 4th International Conference on Cloud Computing and Services Science. SCITEPRESS Digital Library (2014)Google Scholar
  30. 30.
    Andreolini, M., Lancellotti, R., Yu, P.S.: A flexible and efficient lookup algorithm for peer-to-peer systems. In: IPDPS 2009, 23rd IEEE International Parallel and Distributed Processing Symposium. IEEE Computer Society (2009)Google Scholar
  31. 31.
    Andreolini, M., Colajanni, M., Pietri, M.: A scalable architecture for real-time monitoring of large information systems. In: NCCA 2012, 2nd IEEE Symposium on Network Cloud Computing and Applications. IEEE Computer Society (2012)Google Scholar
  32. 32.
    Sigoure, B.: OpenTSDB, a distributed, scalable Time Series Database (2010).
  33. 33.
    Andreolini, M., Colajanni, M., Tosi, S.: A software architecture for the analysis of large sets of data streams in cloud infrastructures. In: CIT 2011, 11th IEEE International Conference on Computer and Information Technology. IEEE Computer Society (2011)Google Scholar
  34. 34.
    Olston, C., et al.: Pig Latin: a not-so-foreign language for data processing. In: SIGMOD 2008, 2008 ACM SIGMOD International Conference on Management of Data. ACM, New York (2008)Google Scholar
  35. 35.
    George, L.: HBase: The Definitive Guide. O’Reilly Media, Sebastopol (2011)Google Scholar
  36. 36.
    Castro, M., Druschel, P., Kermarrec, A.M., Rowstron, A.: Scribe: a large-scale and decentralized application-level multicast infrastructure. IEEE J. Sel. Areas Commun. (JSAC) 20, 1489–1499 (2002)CrossRefGoogle Scholar
  37. 37.
    Marchetti, M., Colajanni, M., Messori, M.: Selective and early threat detectionin large networked systems. In: CIT 2010, 10th IEEE International Conference on Computer and Information Technology. IEEE Computer Society (2010)Google Scholar
  38. 38.
    Leu, J.S., Yee, Y.S., Chen, W.L.: Comparison of map-reduce and SQL on large-scale data processing. In: ISPA 2010, 1st International Symposium on Parallel and Distributed Processing with Applications. IEEE Computer Society (2010)Google Scholar
  39. 39.
    Pietri, M., Tosi, S., Andreolini, M., Colajanni, M.: Real-time adaptive algorithm for resource monitoring. In: CNSM 2013, 9th International Conference on Network and Service Management, Zurich, Switzerland, CNSM (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (, which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  1. 1.Department of Physics, Computer Science and MathematicsUniversity of Modena and Reggio EmiliaModenaItaly
  2. 2.Department of Engineering “Enzo Ferrari”University of Modena and Reggio EmiliaModenaItaly

Personalised recommendations