Journal of Computer Science and Technology

, Volume 29, Issue 1, pp 38–52 | Cite as

Improving Scalability of Cloud Monitoring Through PCA-Based Clustering of Virtual Machines

  • Claudia Canali
  • Riccardo Lancellotti
Original Paper


Cloud computing has recently emerged as a leading paradigm to allow customers to run their applications in virtualized large-scale data centers. Existing solutions for monitoring and management of these infrastructures consider virtual machines (VMs) as independent entities with their own characteristics. However, these approaches suffer from scalability issues due to the increasing number of VMs in modern cloud data centers. We claim that scalability issues can be addressed by leveraging the similarity among VMs behavior in terms of resource usage patterns. In this paper we propose an automated methodology to cluster VMs starting from the usage of multiple resources, assuming no knowledge of the services executed on them. The innovative contribution of the proposed methodology is the use of the statistical technique known as principal component analysis (PCA) to automatically select the most relevant information to cluster similar VMs. We apply the methodology to two case studies, a virtualized testbed and a real enterprise data center. In both case studies, the automatic data selection based on PCA allows us to achieve high performance, with a percentage of correctly clustered VMs between 80% and 100% even for short time series (1 day) of monitored data. Furthermore, we estimate the potential reduction in the amount of collected data to demonstrate how our proposal may address the scalability issues related to monitoring and management in cloud computing data centers.


cloud computing resource monitoring principal component analysis k-means clustering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11390_2013_1410_MOESM1_ESM.doc (28 kb)
(DOC 28 kb)


  1. 1.
    Singh R, Shenoy P J, Natu M, Sadaphal V P, Vin H M. Predico: A system for what-if analysis in complex data center applications. In Proc. the 12th International Middleware Conference, Dec. 2011, pp.123-142.Google Scholar
  2. 2.
    Wood T, Shenoy P, Venkataramani A, Yousif M. Black-box and gray-box strategies for virtual machine migration. In Proc. the 4th USENIX Conference on Networked Systems Design and Implementation, Apr. 2007, pp.229-242.Google Scholar
  3. 3.
    Andreolini M, Colajanni M, Tosi S. A software architecture for the analysis of large sets of data streams in cloud infras-tructures. In Proc. the 11th IEEE International Conference on Computer and Information Technology (IEEE CIT 2011), Aug. 31-Sept. 2, 2011, pp.389-394.Google Scholar
  4. 4.
    Ardagna D, Panicucci B, Trubian M, Zhang L. Energy-aware autonomic resource allocation in multitier virtualized environments. IEEE Transactions on Services Computing, 2012, 5(1): 2–19.CrossRefGoogle Scholar
  5. 5.
    Beloglazov A, Buyya R. Adaptive threshold-based approach for energy-efficient consolidation of virtual machines in cloud data centers. In Proc. the 8th Int. Workshop on Middlewave for Grids, Clouds and e-Science, Dec. 2010, Article No.4. Google Scholar
  6. 6.
    Gmach D, Rolia J, Cherkasova L, Kemper A. Resource pool management: Reactive versus proactive or let’s be friends. Computer Networks, 2009, 53(17): 2905–2922.CrossRefGoogle Scholar
  7. 7.
    Lancellotti R, Andreolini M, Canali C, Colajanni M. Dynamic request management algorithms for Web-based services in cloud computing. In Proc. the 35th IEEE Computer Soft-ware and Applications Conference, Jul. 2011, pp.401-406.Google Scholar
  8. 8.
    Tang C, Steinder M, Spreitzer M, Pacifici G. A scalable application placement controller for enterprise data centers. In Proc. the 16th International Conference on World Wide Web, May 2007, pp.331-340.Google Scholar
  9. 9.
    Durkee D. Why cloud computing will never be free. Queue, 2010, 8(4): 20:20–20:29.Google Scholar
  10. 10.
    Canali C, Lancellotti R. Automated clustering of virtual machines based on correlation of resource usage. Communications Software and Systems, 2012, 8(4): 102–109.Google Scholar
  11. 11.
    Canali C, Lancellotti R. Automated clustering of VMs for scalable cloud monitoring and management. In Proc. the 20th International Conference on Software, Telecommunications and Computer Networks, Sept. 2012, pp.1-5.Google Scholar
  12. 12.
    Gong Z, Gu X. PAC: Pattern-driven application consolidation for efficient cloud computing. In Proc. the IEEE Int. Symp. Modeling, Analysis & Simulation of Computer and Telecommunication Systems, Aug. 2010, pp.24-33.Google Scholar
  13. 13.
    Setzer T, Stage A. Decision support for virtual machine reassignments in enterprise data centers. In Proc. the IEEE/IFIP Network Operations and Management Symposium Workshops (NOMS), Apr. 2010, pp.88-94.Google Scholar
  14. 14.
    Castro M, Liskov B. Practical Byzantine fault tolerance. In Proc. the 3rd Symposium on Operating Systems Design and Implementation, Feb. 1999, pp.173-186.Google Scholar
  15. 15.
    Cecchet E, Chanda A, Elnikety S, Marguerite J, Zwaenepoel W. Performance comparison of middleware architectures for generating dynamic Web content. In Proc. the 4th International Middleware Conference, Jun. 2003, pp.242-261.Google Scholar
  16. 16.
    Kavalanekar S, Narayanan D, Sankar S, Thereska E, Vaid K, Worthington B. Measuring database performance in on-line services: A trace-based approach. In Lecture Notes in Computer Science 5895, Nambiar R, Poess M (eds.), Berlin, Heidelberg: Springer-Verlag, 2009, pp.132-145.Google Scholar
  17. 17.
    de Menezes M A, Barabási A L. Separating internal and external dynamics of complex systems. Physical Review Letters, 2004, 93(6).Google Scholar
  18. 18.
    Hyvärinen A, Oja E. Independent component analysis: Algorithms and applications. Neural Networks, 2000, 13(4/5): 411–430.CrossRefGoogle Scholar
  19. 19.
    Greenacre M. Correspondence Analysis in Practice. Chapman and Hall/CRC, 2007.Google Scholar
  20. 20.
    Mardia K V, Kent J T, Bibby J M. Multivariate Analysis (Probability and Mathematical Statistics). Academic Press, 1995.Google Scholar
  21. 21.
    Abdi H, Williams L J. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(4): 433–459.CrossRefGoogle Scholar
  22. 22.
    Jain A K. Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 2010, 31(8): 651–666.CrossRefGoogle Scholar
  23. 23.
    Filippone M, Camastra F, Masulli F, Rovetta S. A survey of kernel and spectral methods for clustering. Pattern Recognition, 2008, 41(1): 176–190.CrossRefzbMATHGoogle Scholar
  24. 24.
    Andreolini M, Colajanni M, Pietri M. A scalable architecture for real-time monitoring of large information systems. In Proc. the 2nd IEEE Symposium on Network Cloud Computing and Applications, Dec. 2012, pp.143-150.Google Scholar
  25. 25.
    Dinda P A, O’Hallaron D R. Host load prediction using linear models. Cluster Computing, 2000, 3(4): 265–280.CrossRefGoogle Scholar
  26. 26.
    Vogels W. Beyond server consolidation. ACM Queue, 2008, 6(1): 20–26.CrossRefGoogle Scholar
  27. 27.
    Amigó E, Gonzalo J, Artiles J, Verdejo F. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Journal of Information Retrieval, 2009, 12(4): 461-486.CrossRefGoogle Scholar
  28. 28.
    Manning C D, Raghavan P, Schtze H. Introduction to Information Retrieval. New York, NY, USA: Cambridge University Press, 2008.CrossRefzbMATHGoogle Scholar
  29. 29.
    Kusic D, Kephart J O, Hanson J E, Kandasamy N, Jiang G. Power and performance management of virtualized computing environment via lookahead. Cluster Computing, 2009, 12(1): 1–15.CrossRefGoogle Scholar
  30. 30.
    Chung W C, Chang R S. A new mechanism for resource monitoring in Grid computing. Future Generation Computer Systems, 2009, 25(1): 1–7.CrossRefGoogle Scholar
  31. 31.
    Naeem A N, Ramadass S, Yong C. Controlling scale sensor networks data quality in the Ganglia grid monitoring tool. Communication and Computer, 2010, 7(11): 18–26.Google Scholar
  32. 32.
    Tu C Y, Kuo W C, Teng W H, Wang Y T, Shiau S. A power- aware cloud architecture with smart metering. In Proc. the 39th International Conference on Parallel Processing Work-shops, Sept. 2010, pp.497-503. Google Scholar

Copyright information

© Springer Science+Business Media New York & Science Press, China 2014

Authors and Affiliations

  1. 1.Department of Information EngineeringUniversity of Modena and Reggio EmiliaModenaItaly

Personalised recommendations