Advertisement

Cluster Computing

, Volume 16, Issue 1, pp 91–115 | Cite as

Parallel application-level behavioral attributes for performance and energy management of high-performance computing systems

  • Jeffrey J. EvansEmail author
  • Charles E. Lucas
Article

Abstract

Run time variability of parallel applications continues to present significant challenges to their performance and energy efficiency in high-performance computing (HPC) systems. When run times are extended and unpredictable, application developers perceive this as a degradation of system (or subsystem) performance. Extended run times directly contribute to proportionally higher energy consumption, potentially negating efforts by applications, or the HPC system, to optimize energy consumption using low-level control techniques, such as dynamic voltage and frequency scaling (DVFS). Therefore, successful systemic management of application run time performance can result in less wasted energy, or even energy savings.

We have been studying run time variability in terms of communication time, from the perspective of the application, focusing on the interconnection network. More recently, our focus has shifted to developing a more complete understanding of the effects of HPC subsystem interactions on parallel applications. In this context, the set of executing applications on the HPC system is treated as a subsystem, along with more traditional subsystems like the communication subsystem, storage subsystem, etc.

To gain insight into the run time variability problem, our earlier work developed a framework to emulate parallel applications (PACE) that stresses the communication subsystem. Evaluation of run time sensitivity to network performance of real applications is performed with a tool called PARSE, which uses PACE. In this paper, we propose a model defining application-level behavioral attributes, that collectively describes how applications behave in terms of their run time performance, as functions of their process distribution on the system (spacial locality), and subsystem interactions (communication subsystem degradation). These subsystem interactions are produced when multiple applications execute concurrently on the same HPC system. We also revisit our evaluation framework and tools to demonstrate the flexibility of our application characterization techniques, and the ease with which attributes can be quantified. The validity of the model is demonstrated using our tools with several parallel benchmarks and application fragments. Results suggest that it is possible to articulate application-level behavioral attributes as a tuple of numeric values that describe course-grained performance behavior.

Keywords

Performance and energy management Run time attributes Performance measurement 

Notes

Acknowledgements

This material is based upon work supported by the Department of Energy under award number DE-SC0004596.

References

  1. 1.
    Argonne National Laboratory: Using the hydra process manager. Web URL (2011). http://wiki.mcs.anl.gov/mpich2/
  2. 2.
    Baik, S., Hood, C., Gropp, W.: Prototype of AM3: active mapper and monitoring module for the myrinet environment. In: Proceedings of the HSLN Workshop (2002) Google Scholar
  3. 3.
    Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineburg, S., Fredrickson, P., Lasinksi, T., Schreiber, R., Simon, H., Venkatakrishnan, V., Weeratunga, S.: The NAS parallel benchmarks. Tech. rep. RNR-94-007, NASA Ames Research Center (1994) Google Scholar
  4. 4.
    Baydal, E., Lopez, P., Duato, J.: A congestion control mechanism for wormhole networks. In: Proceedings of the Ninth Euromicro Workshop on Parallel and Distributed Processing, pp. 19–26 (2001) CrossRefGoogle Scholar
  5. 5.
    Bode, B., Halstead, D., Kendall, R., Lei, Z.: The portable batch scheduler and the Maui scheduler on Linux clusters. In: Proceedings of the 4th Annual Linux Showcase and Conference (2000) Google Scholar
  6. 6.
    Bollinger, J., Gross, T.: A framework-based approach to the development of network-aware applications. IEEE Trans. Softw. Eng. 24(5), 376–390 (1998) CrossRefGoogle Scholar
  7. 7.
    Chakravarthi, S., Pillai, A., Padmanabhan, J., Apte, M., Skjellum, A.: A fine-grain synchronization mechanism for QoS based communication on Myrinet. In: International Conference on Distributed Computing, 2001 (2001, submitted) Google Scholar
  8. 8.
    Coll, S., Flich, J., Malumbres, M., Lopez, P., Duato, J., Mora, F.: A first implementation of in-transit buffers on Myrinet gm software. In: Proceedings of the 15th International Parallel and Distributed Processing Symposium, pp. 1640–1647 (2001) Google Scholar
  9. 9.
    Dally, W.J., Seitz, C.L.: Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans. Comput. C-36, 547–553 (1987) CrossRefGoogle Scholar
  10. 10.
    Du, X., Dong, Y., Zhang, X.: Characterizing communication interactions of parallel and sequential jobs on networks of workstations. In: Proceedings of the IEEE Annual International Conference on Communications, pp. 1133–1137 (1997) Google Scholar
  11. 11.
    D.A.R. (Editor): The roadmap for the revitalization of high-end computing. Tech. rep., Computing Research Association (2003). http://www.nitrd.gov/subcommittee/hec/hecrtf-outreach/20040112_cra_hecrtf_report.pdf
  12. 12.
    Evans, J.J.: Modeling parallel application sensitivity to network performance. Ph.D. thesis, Illinois Institute of Technology (2005) Google Scholar
  13. 13.
    Evans, J.J., Baik, S., Kroculick, J., Hood, C.S.: Network adaptability in clusters and grids. In: Proceedings from the Conference on Advances in Internet Technologies and Applications (CAITA), CDROM. IPSI (2004) Google Scholar
  14. 14.
    Evans, J.J., Hood, C.S.: Network performance variability in NOW clusters. In: Proceedings of the 5th IEEE International Symposium on Cluster Computing and the Grid (CCGrid05) (CDROM) (2005) Google Scholar
  15. 15.
    Evans, J.J., Hood, C.S.: PARSE: a tool for parallel application run time sensitivity evaluation. In: Proceedings of the Twelfth International Conference on Parallel and Distributed Systems (ICPADS), pp. 475–484 (2006) Google Scholar
  16. 16.
    Evans, J.J., Hood, C.S.: A network performance sensitivity metric for parallel applications. In: Proceedings of the Fifth International Symposium on Parallel and Distributed Processing and Applications (ISPA07) (Best Paper), pp. 920–932 (2007) CrossRefGoogle Scholar
  17. 17.
    Evans, J.J., Hood, C.S.: A network performance sensitivity metric for parallel applications. Int. J. High. Perform. Comput. Networking 7(1), 8–18 (2011) (invited paper) CrossRefGoogle Scholar
  18. 18.
    Evans, J.J., Hood, C.S., Gropp, W.D.: Exploring the relationship between parallel application run-time variability and network performance in clusters. In: Workshop on High Speed Local Networks (HSLN) from the Proceedings of the 28th IEEE Conference on Local Computer Networks (LCN), pp. 538–547 (2003) Google Scholar
  19. 19.
    Evans, J.J., Lucas, C.E.: Evaluation of parallel application-level behavioral attributes. In: 25th International Conference on Supercomputing, First International Workshop on Characterizing Applications for Heterogeneous Exascale Systems (CACHES) (2011) Google Scholar
  20. 20.
    Evans, J.J., Lucas, C.E.: PARSE 2.0: a tool for parallel application run time behavior evaluation. In: The 31st International Conference on Distributed Computing Systems, 1st International Workshop on Characterizing Applications for Heterogeneous Exascale Systems (CACHES 2011) (2011) Google Scholar
  21. 21.
    Foster, I.: Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Reading, Addison-Wesley (1995) zbMATHGoogle Scholar
  22. 22.
    Frachtenberg, E., Feitelson, D.G., Petrini, F., Fernandez, J.: Adaptive parallel job scheduling with flexible coscheduling. IEEE Trans. Parallel Distrib. Syst. 16(11), 1066–1077 (2005) CrossRefGoogle Scholar
  23. 23.
    Ge, R., Feng, X., Cameron, K.W.: Performance-constrained distributed DVS scheduling for scientific applications on power-aware clusters. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing (SC ’05), p. 34. IEEE Computer Society, Washington (2005) CrossRefGoogle Scholar
  24. 24.
    Glass, C.J., Ni, L.M.: The turn model for adaptive routing. In: Proceedings of the 19th Annual International Symposium on Computer Architecture, pp. 278–287 (1992) CrossRefGoogle Scholar
  25. 25.
    Gu, W., Eisenhauer, G., Schwan, K.: Falcon: On-line monitoring and steering of parallel programs. In: Ninth International Conference on Parallel and Distributed Computing and Systems (PDCS’97) (1997) Google Scholar
  26. 26.
    Jackson, D., Snell, Q., Clement, M.: Core algorithms of the Maui scheduler. In: 7th Workshop on Job Scheduling Strategies for Parallel Processing (SIGMETRICS 2001). ACM, New York (2001) Google Scholar
  27. 27.
    Chambers, J. et al.: The R language. http://www.r-project.org/ (2010)
  28. 28.
    Jurczyk, M.: Traffic control in wormhole-routing multistage interconnection networks. In: Proceedings of the International Conference on Parallel and Distributed Computing and Systems, vol. 1, pp. 157–162 (2000) Google Scholar
  29. 29.
    Keleher, P.J., Hollingsworth, J.K., Perkovic, D.: Exposing application alternatives. In: International Conference on Distributed Computing Systems, pp. 384–392 (1999) Google Scholar
  30. 30.
    Khonsari, A., Sarbazi-Azad, H., Ould-Khaoua, M.: Analysis of timeout-based adaptive wormhole routing. In: Proceedings of the Ninth International Symposium on Modeling. Analysis and Simulation of Computer and Telecommunication Systems, vol. 1, pp. 275–282 (2001) Google Scholar
  31. 31.
    Liao, C., Martonosi, M., Clark, D.W.: Performance monitoring in a Myrinet-connected shrimp cluster. In: Proceedings of the 2nd SIGMETRICS Symposium on Parallel and Distributed Tools (1998) Google Scholar
  32. 32.
    Lopez, P., Martinez, J., Duato, J.: A very efficient distributed deadlock detection mechanism for wormhole networks. In: Proceedings of the 4th International Symposium on High-Performance Computer Architecture, pp. 57–66 (1998) Google Scholar
  33. 33.
    Lyon, G., Snelick, R., Kacker, R.: Synthetic-perturbation tuning of MIMD programs. J. Supercomput. 8(1), 5–28 (1994) zbMATHCrossRefGoogle Scholar
  34. 34.
    Miller, B.P., Callaghan, M.D., Cargille, J.M., Hollinsworth, J.K., Irvin, R.B., Karavanic, K.L., Kunchithapadam, K., Newhall, T.: The paradyn parallel performance measurement tools. In: IEEE Computer, vol. 28, pp. 37–46 (1995) Google Scholar
  35. 35.
    Mukherjee, T., Banerjee, A., Varsamopoulos, G., Gupta, S.K.S., Rungta, S.: Spatio-temporal thermal-aware job scheduling to minimize energy consumption in virtualized heterogeneous data centers. Comput. Netw. 53(17), 2888–2904 (2009) zbMATHCrossRefGoogle Scholar
  36. 36.
    Ogle, D.M., Schwan, K., Snodgrass, R.: Application-dependent dynamic monitoring of distributed and parallel systems. IEEE Trans. Parallel Distrib. Syst. 4(7), 762–778 (1993) CrossRefGoogle Scholar
  37. 37.
    Orduna, J.M., Silla, F., Duato, J.: A new task mapping technique for communication-aware scheduling strategies. In: International Conference on Parallel Processing Workshops, pp. 349–354 (2001) CrossRefGoogle Scholar
  38. 38.
    Ribler, R.L., Vetter, J.S., Simitci, H., Reed, D.A.: Autopilot: adaptive control of distributed applications. In: Proceedings of the Seventh International Symposium on Distributed Computing, pp. 172–179 (1998) Google Scholar
  39. 39.
    Scaramella, J.: Idc worldwide server power and cooling expense 2006–2010. On-line document (2006) Google Scholar
  40. 40.
    Sheehan, T., Maloney, A., Shende, S.: A runtime monitoring framework for the tau profiling system. In: Proceedings of the Third International Symposium on Computing in Object-Oriented Parallel Environments (ISCOPE’99) (1999) Google Scholar
  41. 41.
    Sinnen, O., Sousa, L.A., Sandnes, F.E.: Toward a realistic task scheduling model. IEEE Trans. Parallel Distrib. Syst. 17(3), 263–275 (2006) CrossRefGoogle Scholar
  42. 42.
    Sottile, M.J., Minnich, R.G.: Supermon: A high-speed cluster monitoring system. In: Proceedings of the IEEE International Conference on Cluster Computing, pp. 39–46 (2002) CrossRefGoogle Scholar
  43. 43.
    Subramani, V., Kettimuthu, R., Srinivasan, S., Johnston, J.: Selective buddy allocation for scheduling parallel jobs on clusters. In: Proceedings of the International Conference on Cluster Computing, pp. 107–116 (2002) CrossRefGoogle Scholar
  44. 44.
    Tamches, A., Miller, B.P.: Using dynamic kernel instrumentation for kernal and application tuning. Int. J. High Perform. Comput. Appl. 13(3), 263–276 (1999) CrossRefGoogle Scholar
  45. 45.
    Tang, Q., Gupta, S.K.S., Varsamopoulos, G.: Thermal-aware task scheduling for data centers through minimizing heat recirculation. In: in IEEE Cluster (2007) Google Scholar
  46. 46.
    Tapus, C., Chung, I.H., Hollingsworth, J.K.: Active harmony: Towards automated performance tuning. In: Proceedings from the Conference on High Performance Networking and Computing (2002) Google Scholar
  47. 47.
    Veeraraghavan, P., Evans, J.J.: Parallel application communication performance on multi-core high performance computing systems. In: Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems (PDCS 2010), pp. 9–16 (2010) Google Scholar
  48. 48.
    Vetter, J.S., Mueller, F.: Communication characteristics of large-scale scientific applications for contemporary cluster architectures. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 2002), pp. 27–36 (2002) Google Scholar
  49. 49.
    Vetter, J.S., Reed, D.A.: Real-time performance monitoring, adaptive control, and interactive steering of computational grids. Int. J. High Perform. Comput. Appl. 14(4), 357–366 (2000) CrossRefGoogle Scholar
  50. 50.
    Vetter, J.S., Worley, P.: Asserting performance expectations. In: Supercomputing, ACM/IEEE 2002 Conference, pp. 33–46 (2002) Google Scholar
  51. 51.
    der Wijngaart, R.F.V.: NAS parallel benchmarks version 2.4. Tech. rep. NAS-02-007, NASA Ames Research Center (2002) Google Scholar
  52. 52.
    Williams, T., Kelley, C.: Gnuplot. http://www.gnuplot.info/ (2010)
  53. 53.
    Worley, P., Loftis, B.: Scrubbed run log from ornl jaguar xt5 partition, 2010. Direct Correspondence (2011) Google Scholar
  54. 54.
    Worley, P.H.: Parallel spectral transform shallow water model. Onine document (2003). http://www.csm.ornl.gov/chammp/pstswm/
  55. 55.
    Worley, P.H., Robinson, A.C., Mackay, D.R., Barragy, E.J.: A study of application sensitivity to variation in message passing latency and bandwidth. In: Concurrency: Practice and Experience, vol. 10, pp. 387–406. Wiley, New York (1998) Google Scholar
  56. 56.
    Worley, P.H., Toonan, B.: A Users’ Guide to PSTSWM. Oak Ridge National Laboratory (1995). ORNL/TM-12779 Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Purdue UniversityWest LafayetteUSA
  2. 2.PC Krause and Associates, Inc.West LafayetteUSA

Personalised recommendations