Monitoring energy consumption with SIOX

Autonomous monitoring triggered by abnormal energy consumption
  • Julian M. KunkelEmail author
  • Alvaro Aguilera
  • Nathanael Hübbe
  • Marc Wiedemann
  • Michaela Zimmer
Special Issue Paper


In the face of the growing complexity of HPC systems, their growing energy costs, and the increasing difficulty to run applications efficiently, a number of monitoring tools have been developed during the last years. SIOX  is one such endeavor, with a uniquely holistic approach: Not only does it aim to record a certain kind of data, but to make all relevant data available for analysis and optimization. Among other sources, this encompasses data from hardware energy counters and trace data from different hardware/software layers. However, not all data that can be recorded should be recorded. As such, SIOX  needs good heuristics to determine when and what data needs to be collected, and the energy consumption can provide an important signal about when the system is in a state that deserves closer attention. In this paper, we show that SIOX  can use Likwid to collect and report the energy consumption of applications, and present how this data can be visualized using SIOX’s web-interface. Furthermore, we outline how SIOX  can use this information to intelligently adjust the amount of data it collects, allowing it to reduce the monitoring overhead while still providing complete information about critical situations.


Parallel I/O Energy consumption  Monitoring Analysis Optimization 



We want to express our gratitude to the German Aerospace Center (DLR) as responsible project agency and to the Federal Ministry of Education and Research (BMBF) for the financial support under grant 01IH11008 A-C.


  1. 1.
    Barrachina S, Barreda M, Catalán S, Dolz MF, Fabregat G, Mayo R, Quintana-Ortí ES (2013) An integrated framework for power-performance analysis of parallel scientific workloads. In: ENERGY 2013, the third international conference on smart grids. Green Communications and IT Energy-aware Technologies, pp 114–119Google Scholar
  2. 2.
    Byna S, Chen Y, Sun XH, Thakur R, Gropp W (2008) Parallel I/O prefetching using MPI file caching and I/O signatures. In: Proceedings of the conference on supercomputing. SC ’08IEEE Press, Piscataway, pp 1–12Google Scholar
  3. 3.
    Carias CG, Hesse W, Navarrete C, Brehm M, Treibig J (2013) A flexible framework for energy and performance analysis. inSiDE J 11(2):60–63Google Scholar
  4. 4.
    Carns PH, Harms K, Allcock WE, Bacon C, Lang S, Latham R, Ross RB (2011) Understanding and improving computational science storage access through continuous characterization. In: Proc. 2011 IEEE 27th symposium on mass storage systems and technologies (MSST)Google Scholar
  5. 5.
    Gebser M, Grote T, Kaminski R, Schaub T (2011) Reactive answer set programming. Proceedings of the 11th international conference on logic programming and nonmonotonic reasoning, LPNMR’11. Springer, Berlin, pp 54–66Google Scholar
  6. 6.
    Hackenberg D, Ilsche T, Schone R, Molka D, Schmidt M, Nagel WE (2013) Power measurement techniques on standard compute nodes: a quantitative comparison. In: 2013 IEEE international symposium on performance analysis of systems and software (ISPASS) 0, pp 194–204Google Scholar
  7. 7.
    Hayes-Roth B, Washington R, Hewett R, Hewett M, Seiver A (1989) Intelligent monitoring and control. In: Proceedings of the 11th international joint conference on artificial intelligence, IJCAI, vol 1. Morgan Kaufmann Publishers Inc., San Francisco, pp 243–249Google Scholar
  8. 8.
    Helmer S, Poulovassilis A, Xhafa F (2013) Reasoning in event-based distributed systems. Springer, BerlinGoogle Scholar
  9. 9.
    Himura Y, Fukuda K, Cho K, Esaki H (2009) An automatic and dynamic parameter tuning of a statistic-based anomaly detection algorithm. In: Proceedings of the 2009 IEEE international conference on communications. ICC’09IEEE Press, Piscataway, pp 1003–1008Google Scholar
  10. 10.
    Intel Corporation (2011) Intel 64 and IA-32 architectures software developer’s manual, vol 3a.
  11. 11.
    Kind A, Stoecklin MP, Dimitropoulos XA (2009) Histogram-based traffic anomaly detection. IEEE Trans Netw Service Manage 6(2):110–121CrossRefGoogle Scholar
  12. 12.
    Knüpfer A, Brunst H, Doleschal J, Jurenz M, Lieber M, Mickler H, Müller M, Nagel W (2008) The Vampir performance analysis tool-set. In: Resch M, Keller R, Himmler V, Krammer B, Schulz A (eds) Tools for high performance computing. Springer, Berlin, pp 139–155CrossRefGoogle Scholar
  13. 13.
    Kunkel J, Zimmer M, Hübbe N, Aguilera A, Mickler H, Wang X, Chut A, Bönisch T, Lüttgau J, Michel R, Weging J (2014 - to-appear) The SIOX architecture—coupling automatic monitoring and optimization of parallel I/O. In: Supercomputing. Lecture notes in computer science, vol 8488. Springer, BerlinGoogle Scholar
  14. 14.
    Madhyastha T, Reed D (2002) Learning to classify parallel input/output access patterns. IEEE Trans Parallel Distrib Syst 13(8):802–813CrossRefGoogle Scholar
  15. 15.
    Mordvinova O, Runz D, Kunkel J, Ludwig T (2010) I/O performance evaluation with Parabench - programmable I/O benchmark. Procedia Computer Science pp 2119–2128Google Scholar
  16. 16.
    Ostrouchov G, Naughton T, Engelmann C, Vallee G, Scott S (2009) Nonparametric multivariate anomaly analysis in support of hpc resilience. In: E-Science Workshops, 2009 5th IEEE international conference, pp 80–85Google Scholar
  17. 17.
    Rotem E, Naveh A, Ananthakrishnan A, Rajwan D, Weissmann E (2012) Power-management architecture of the Intel microarchitecture code-named Sandy Bridge. IEEE Micro 32(2):20–27CrossRefGoogle Scholar
  18. 18.
    Sabri L, Chibani A, Amirat Y, Zarri Gp (2011) Semantic reasoning framework to supervise and manage contexts and objects in pervasive computing environments. In: Proceedings of the 2011 IEEE workshops of international conference on advanced information networking and applications. WAINAIEEE Computer Society, Washington, DC, USA, pp 47–52Google Scholar
  19. 19.
    Sandeep SR, Swapna M, Niranjan T, Susarla S, Nandi S (2008) CLUEBOX: a performance log analyzer for automated troubleshooting. In: Proceedings of the first USENIX conference on analysis of system logs, WASL’08. USENIX Association, Berkeley, CA, USA.
  20. 20.
    Thakur R, Gropp W, Lusk E (2002) Optimizing noncontiguous accesses in MPI/IO. Parallel Comput 28(1):83–105CrossRefzbMATHGoogle Scholar
  21. 21.
    Treibig J, Hager G, Wellein G (2010) Likwid: a lightweight performance-oriented tool suite for x86 multicore environments. In: 39th IEEE international conference on parallel processing workshops (ICPPW), pp 207–216Google Scholar
  22. 22.
    Weaver V, Johnson M, Kasichayanula K, Ralph J, Luszczek P, Terpstra D, Moore S (2012) Measuring energy and power with PAPI. In: 41st international conference on parallel processing workshops (ICPPW), pp 262–268Google Scholar
  23. 23.
    Wiedemann MC, Kunkel J, Zimmer M, Ludwig T, Resch M, Bönisch T, Wang X, Chut A, Aguilera A, Nagel W, Kluge M, Mickler H (2012) Towards I/O analysis of HPC systems and a generic architecture to collect access patterns. Computer science research and development, pp 1–11Google Scholar
  24. 24.
    Yin Y, Li J, He J, Sun XH, Thakur R (2013) Pattern-direct and layout-aware replication scheme for parallel I/O systems. In: 2013 IEEE 27th international symposium on parallel distributed processing (IPDPS), pp 345–356Google Scholar
  25. 25.
    Zimmer M, Kunkel J, Ludwig T (2013) Towards self-optimization in HPC I/O. In: Kunkel JM, Ludwig T, Meuer HW (eds) Supercomputing. Lecture notes in computer science, vol 7905. Springer, Berlin, pp 422–434Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Julian M. Kunkel
    • 1
    Email author
  • Alvaro Aguilera
    • 2
  • Nathanael Hübbe
    • 3
  • Marc Wiedemann
    • 3
  • Michaela Zimmer
    • 3
  1. 1.DKRZ GmbHHamburgGermany
  2. 2.ZIH, TU DresdenDresdenGermany
  3. 3.University of HamburgHamburgGermany

Personalised recommendations