The Journal of Supercomputing

, Volume 73, Issue 1, pp 215–226 | Cite as

A clustering-based knowledge discovery process for data centre infrastructure management

  • Diego García-Saiz
  • Marta Zorrilla
  • José Luis Bosque


Data centre infrastructure management (DCIM) is the integration of information technology and facility management disciplines to centralise monitoring and management in data centres. One of the most important problems of DCIM tools is the analysis of the huge amount of data obtained from the real-time monitoring of thousands of resources. In this paper, an adaptation of the knowledge discovery process for dealing with the data analysis in DCIM tools is proposed. A case of study based on monitoring and labelling of nodes of a high performance computing data centre in real time is presented. This shows that characterising the state of the nodes according to a reduced and relevant set of metrics is feasible and its outcome directly usable, simplifying consequently the decision-making process in these complex infrastructures.


Data mining Data centres DCIM Monitoring 


  1. 1.
    Barroso LA, Clidaras J, Hlzle U (2013) The datacenter as a computer: an introduction to the design of warehouse-scale machines, 2nd edn. In: Synthesis lectures on computer architecture. Morgan and Claypool Publishers, San RafaelGoogle Scholar
  2. 2.
    Chapman P, Clinton J, Kerber R, Khabaza T, Reinartz T, Shearer C, Wirth R (2000) Crisp-dm 1.0 step-by-step data mining guide. Technical report, The CRISP-DM consortium, August 2000Google Scholar
  3. 3.
    Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell PAMI-1(2):224–227Google Scholar
  4. 4.
    Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996) The KDD process for extracting useful knowledge from volumes of data. Commun ACM 39(11):27–34CrossRefGoogle Scholar
  5. 5.
    Gómez-Martn C, Vega-Rodríguez MA, González-Sánchez J-L (2015) Performance and energy aware scheduling simulator for HPC: evaluating different resource selection methods. Concurr Comput Pract Exp 27(17):5436–5459 (cpe.3607)Google Scholar
  6. 6.
    Gutierrez-Garcia JO, Ramirez-Nafarrate A (2015) Agent-based load balancing in cloud data centers. Clust Comput 18(3):1041–1062CrossRefGoogle Scholar
  7. 7.
    Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17(2–3):107–145CrossRefMATHGoogle Scholar
  8. 8.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18CrossRefGoogle Scholar
  9. 9.
    Han J (2005) Data mining: concepts and techniques. Morgan Kaufmann Publishers Inc., San FranciscoGoogle Scholar
  10. 10.
    Harris M, Geng H (2015) Data Center Infrastructure Management. In: Geng H (ed) Data center handbook, 1st edn. Wiley, Hoboken, NJ, pp 601–618Google Scholar
  11. 11.
    Massie M, Li B, Nicholes B, Vuksan V, Alexander R, Buchbinder J, Costa F, Dean A, Josephsen D, Phaal P, Pocock D (2012) Monitoring with ganglia, 1st edn. O’Reilly Media, Inc., USAGoogle Scholar
  12. 12.
    Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decis Support Syst 62:22–31CrossRefGoogle Scholar
  13. 13.
    R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. ISBN 3-900051-07-0Google Scholar
  14. 14.
    Schulz G (2009) The green and virtual data center, 1st edn. Auerbach Publications, BostonCrossRefGoogle Scholar
  15. 15.
    Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37CrossRefGoogle Scholar
  16. 16.
    Xu R II, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678CrossRefGoogle Scholar
  17. 17.
    Zorrilla ME, García-Saiz D (2013) A service oriented architecture to provide data mining services for non-expert data miners. Decis Support Syst 55(1):399–411CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Diego García-Saiz
    • 1
  • Marta Zorrilla
    • 1
  • José Luis Bosque
    • 1
  1. 1.Department of Computer Science and ElectronicUniversity de CantabriaSantanderSpain

Personalised recommendations