Automated Software Engineering

, Volume 21, Issue 3, pp 319–344 | Cite as

Exploiting ensemble techniques for automatic virtual machine clustering in cloud systems

  • Claudia CanaliEmail author
  • Riccardo Lancellotti


Cloud computing has recently emerged as a new paradigm to provide computing services through large-size data centers where customers may run their applications in a virtualized environment. The advantages of cloud in terms of flexibility and economy encourage many enterprises to migrate from local data centers to cloud platforms, thus contributing to the success of such infrastructures. However, as size and complexity of cloud infrastructures grow, scalability issues arise in monitoring and management processes. Scalability issues are exacerbated because available solutions typically consider each virtual machine (VM) as a black box with independent characteristics, which is monitored at a fine-grained granularity level for management purposes, thus generating huge amounts of data to handle. We claim that scalability issues can be addressed by leveraging the similarity between VMs in terms of resource usage patterns. In this paper, we propose an automated methodology to cluster similar VMs starting from their resource usage information, assuming no knowledge of the software executed on them. This is an innovative methodology that combines the Bhattacharyya distance and ensemble techniques to provide a stable evaluation of similarity between probability distributions of multiple VM resource usage, considering both system- and network-related data. We evaluate the methodology through a set of experiments on data coming from an enterprise data center. We show that our proposal achieves high and stable performance in automatic VMs clustering, with a significant reduction in the amount of data collected which allows to lighten the monitoring requirements of a cloud data center.


Clustering Clustering ensemble Bhattacharyya distance Cloud computing 


  1. Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. J. Inf. Retr. 12(4), 461–486 (2009) CrossRefGoogle Scholar
  2. Andreolini, M., Colajanni, M., Tosi, S.: A software architecture for the analysis of large sets of data streams in cloud infrastructures. In: Proc. of the 11th IEEE International Conference on Computer and Information Technology (IEEE CIT 2011), Cyprus (2011) Google Scholar
  3. Ardagna, D., Panicucci, B., Trubian, M., Zhang, L.: Energy-aware autonomic resource allocation in multitier virtualized environments. IEEE Trans. Serv. Comput. 5(1), 2–19 (2012) CrossRefGoogle Scholar
  4. Beloglazov, A., Buyya, R.: Adaptive threshold-based approach for energy-efficient consolidation of virtual machines in cloud data centers. In: Proc. of (MGC’10), Bangalore, India (2010) Google Scholar
  5. Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. 35, 99–109 (1943) zbMATHMathSciNetGoogle Scholar
  6. Canali, C., Lancellotti, R.: Automated clustering of virtual machines based on correlation of resource usage. Commun. Softw. Syst. 8(4), 102–109 (2012a) Google Scholar
  7. Canali, C., Lancellotti, R.: Automated clustering of VMs for scalable cloud monitoring and management. In: Proc. of 20th International Conference on Software, Telecommunications and Computer Networks (SOFTCOM’12), Split, Croatia (2012b) Google Scholar
  8. Canali, C., Lancellotti, R.: Automatic clustering of VM based on Bhattacharyya distance. In: Proc. of International Workshop on Multi-Cloud Applications and Federated Clouds (MultiCloud’13), Prague, Czech Republic (2013) Google Scholar
  9. Castro, M., Liskov, B.: Practical byzantine fault tolerance. In: OSDI, pp. 173–186 (1999) Google Scholar
  10. Choi, E., Lee, C.: Feature extraction based on the Bhattacharyya distance. Pattern Recognit. 36(8), 1703–1709 (2003) CrossRefGoogle Scholar
  11. Chung, W.C., Chang, R.S.: A new mechanism for resource monitoring in grid computing. Future Gener. Comput. Syst. 25(1), 1–7 (2009) CrossRefGoogle Scholar
  12. Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’04, pp. 551–556. ACM, New York (2004). doi: 10.1145/1014052.1014118 Google Scholar
  13. Durkee, D.: Why cloud computing will never be free. ACM Queue 8(4), 20:20–20:29 (2010) Google Scholar
  14. Filippone, M., Camastra, F., Masulli, F., Rovetta, S.: A survey of kernel and spectral methods for clustering. Pattern Recognit. 41(1), 176–190 (2008) CrossRefzbMATHGoogle Scholar
  15. Freedman, D., Diaconis, P.: On the histogram as a density estimator:L2 theory. Probab. Theory Relat. Fields 57(4), 453–476 (1981) zbMATHMathSciNetGoogle Scholar
  16. Gmach, D., Rolia, J., Cherkasova, L., Kemper, A.: Resource pool management: reactive versus proactive or let’s be friends. Comput. Netw. 53(17), 2905–2922 (2009) CrossRefGoogle Scholar
  17. Gong, Z., Gu, X.: PAC: pattern-driven application consolidation for efficient cloud computing. In: Proc. of IEEE International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS’10), Miami Beach, Florida (2010) Google Scholar
  18. Gullo, F., Tagarelli, A., Greco, S.: Diversity-based weighting schemes for clustering ensembles. In: Proc. of the 9th SIAM International Conference on Data Mining (SDM’09), Sparks, Nevada, USA (2009) Google Scholar
  19. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognit. Lett. 31(8), 651–666 (2010) CrossRefGoogle Scholar
  20. Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab—an S4 package for kernel methods in R. Tech. Rep. 9, WU Vienna University of Economics and Business (2004) Google Scholar
  21. Kusic, D., Kephart, J.O., Hanson, J.E., Kandasamy, N., Jiang, G.: Power and performance management of virtualized computing environment via lookahead. Clust. Comput. 12(1), 1–15 (2009) Google Scholar
  22. Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007). doi: 10.1007/s11222-007-9033-z CrossRefMathSciNetGoogle Scholar
  23. Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008) CrossRefzbMATHGoogle Scholar
  24. Meng, X., Pappas, V., Zhang, L.: Improving the scalability of data center networks with traffic-aware virtual machine placement. In: Proceedings of the 29th Conference on Information Communications, INFOCOM’10, San Diego, California, USA (2010) Google Scholar
  25. Naeem, A.N., Ramadass, S., Yong, C.: Controlling scale sensor networks data quality in the ganglia grid monitoring tool. Commun. Comput. 7(11), 18–26 (2010) Google Scholar
  26. Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856. MIT Press, Cambridge (2001) Google Scholar
  27. Sanguinetti, G., Laidler, J., Lawrence, N.: Automatic determination of the number of clusters using spectral algorithms. In: IEEE Workshop on Machine Learning for Signal Processing, pp. 55–60 (2005). doi: 10.1109/MLSP.2005.1532874 CrossRefGoogle Scholar
  28. Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) CrossRefzbMATHMathSciNetGoogle Scholar
  29. Setzer, T., Stage, A.: Decision support for virtual machine reassignments in enterprise data centers. In: Proc. of IEEE/IFIP Network Operations and Management Symposium Workshops (NOMS’10), Osaka, Japan (2010) Google Scholar
  30. Setzer, T., Stage, A.: Filtering multivariate workload non-conformance in shared IT-infrastructures. In: Proc. of IFIP/IEEE International Symposium on Integrated Network Management (IM’11), Dublin, Ireland (2011) Google Scholar
  31. Singh, R., Shenoy, P.J., Natu, M., Sadaphal, V.P., Vin, H.M.: Predico: a system for what-if analysis in complex data center applications. In: Proc. of 12th International Middleware Conference, Lisbon, Portugal (2011) Google Scholar
  32. Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003) zbMATHMathSciNetGoogle Scholar
  33. Tan, J., Dube, P., Meng, X., Zhang, L.: Exploiting resource usage patterns for better utilization prediction. In: Proc. of the 31st International Conference on Distributed Computing Systems Workshops (ICDCSW’11), Minneapolis, USA (2011) Google Scholar
  34. Tang, C., Steinder, M., Spreitzer, M., Pacifici, G.: A scalable application placement controller for enterprise data centers. In: Proceedings of the 16th International Conference on World Wide Web, WWW’07, Banff, Alberta, Canada (2007) Google Scholar
  35. Tu, C.Y., Kuo, W.C., Teng, W.H., Wang, Y.T., Shiau, S.: A power-aware cloud architecture with smart metering. In: Proc. of 39th International Conference on Parallel Processing Workshops (ICPPW’10), San Diego, CA (2010) Google Scholar
  36. Wood, T., Shenoy, P., Venkataramani, A., Yousif, M.: Black-box and gray-box strategies for virtual machine migration. In: Proceedings of the 4th USENIX Conference on Networked Systems Design and Implementation, NSDI’07, Cambridge, MA (2007a) Google Scholar
  37. Wood, T., Shenoy, P., Venkataramani, A., Yousif, M.: Black-box and gray-box strategies for virtual machine migration. In: Proc. of the 4th USENIX Conference on Networked Systems Design and Implementation, NSDI’07, Cambridge, MA (2007b) Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Department of Engineering “Enzo Ferrari”University of Modena and Reggio EmiliaModenaItaly

Personalised recommendations