End-to-End Modeling and Simulation of High- Performance Computing Systems

Minkenberg, Cyriel; Denzel, Wolfgang; Rodriguez, German; Birke, Robert

doi:10.1007/978-3-642-28777-0_11

Cyriel Minkenberg²,
Wolfgang Denzel²,
German Rodriguez² &
…
Robert Birke²

2589 Accesses
8 Citations

Abstract

Designing large-scale High-Performance Computing (HPC) systems, including architecture design space exploration and performance prediction, is a daunting task that can benefit enormously from discrete event simulation techniques, as the interactions between the various components of such a system generally render analytic approaches intractable. The work described in this chapter specifically deals with end-to-end, full-system simulation, as opposed to simulation of individual components or nodes. The tools described here can be used in the design phase of a new HPC system to optimize system design for a given set of workloads, or to create performance forecasts for new workloads on existing systems.

We have taken a network-centric approach, as the scale of current high-end HPC systems is in the range of hundreds of thousands of processing cores, so that the impact of the communication among so many cores will be a key factor in determining overall system performance. To this end, we developed an Omnest-based simulation environment that enables studying the impact of an HPC machine’s communication subsystem on the overall system’s performance for specific workloads.

Full system simulation at an abstraction level that still maintains a reasonably high level of detail is infeasible without resorting to parallel simulation, the main limiting factors being simulation run time and memory footprint. By applying Parallel Discrete Event Simulation techniques, the power of modern parallel computers can be exploited to great effect to perform these kinds of simulations at large scales.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arimilli, B., Arimilli, R., Chung, V., Clark, S., Denzel, W., Drerup, B., Hoefler, T., Joyner, J., Lewis, J., Li, J., Ni, N., Rajamony, R.: The PERCS high-performance interconnect. In: 2010 IEEE 18th Annual Symposium on High-Performance Interconnects on Proc. High Performance Interconnects (HOTI), August 18-20, pp. 75–82 (2010)
Google Scholar
Bagrodia, R., Takai, M.: Performance evaluation of conservative algorithms in parallel simulation languages. IEEE Transactions Parallel Distributed Systems 11(4), 395–411 (2000)
Article Google Scholar
Boden, N.J., Cohen, D., Felderman, R.E., Kulawik, A.E., Seitz, C.L., Seizovic, J.N., Su, W.K.: Myrinet: A gigabit-per-second local area network. IEEE Micro. 15(1), 29–36 (1995)
Article Google Scholar
Chandy, M., Misra, J.: Distributed simulation: A case study in design and verification of distributed programs. IEEE Transactions on Software Engineering 5, 440–452 (1979)
Article MathSciNet MATH Google Scholar
Dally, W.J., Towles, B.: Principles and practices of interconnection networks, 1st edn. Morgan Kaufmann (2004)
Google Scholar
Denzel, W., Li, J., Walker, P., Jin, Y.: A framework for end-to-end simulation of high-performance computing systems. SIMULATION - Transactions of The Society for Modeling and Simulation International 86(5-6), 331–350 (2010)
Article Google Scholar
Desai, N., Balaji, P., Sadayappan, P., Islam, M.: Are nonblocking networks really needed for high-end-computing workloads. In: Proc. 2008 IEEE International Conference on Cluster Computing (Cluster 2008), Tsukuba, Japan, September 29-October 1, pp. 152–159 (2008)
Google Scholar
Fujimoto, R.M.: Parallel discrete event simulation. In: Proceedings of the 21st Conference on Winter Simulation, pp. 19–28 (1989)
Google Scholar
Geoffray, P., Hoefler, T.: Adaptive routing strategies for modern high performance networks. In: Proc. 16th IEEE Symposium on High Performance Interconnects (HOTI 2008), Stanford, CA, August 27-28, pp. 165–172 (2008)
Google Scholar
Kamil, S., Shalf, J., Oliker, L., Skinner, D.: Understanding ultra-scale application communication requirements. In: Proc. Workload Characterization Symposium, October 2005, pp. 178–187 (2005)
Google Scholar
Kim, J., Dally, W.J., Scott, S., Abts, D.: Technology-driven, highly-scalable dragonfly network. In: Proc. International Symposium on Computer Architecture (ISCA), Beijing, China, pp. 77–88 (2008)
Google Scholar
Leiserson, C.E., Abuhamdeh, Z.S., Douglas, D.C., Feynman, C.R., Ganmukhi, M.N., Hill, J.V., Hillis, W.D., Kuszmaul, B.C., St. Pierre, M.A., Wells, D.S., Wong, M.C., Yang, S.W., Zak, R.: The network architecture of the Connection Machine CM-5. In: Proc. 4th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), San Diego, CA, pp. 272–285 (June 1992)
Google Scholar
Lencse, G.: Parallel simulation with OMNeT++ using the statistical synchronization method. In: Proceedings of the 2nd International OMNeT++ Workshop, pp. 24–32 (2002)
Google Scholar
Luszczek, P., Bailey, D., Dongarra, J., et al.: The HPC challenge (HPCC) benchmark suite. In: Proc. 2006 ACM/IEEE Conference on Supercomputing, SC 2006, Tampa, FL, USA (2006)
Google Scholar
Magnusson, P.S., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., Werner, B.: Simics: A full system simulation platform. IEEE Computer 35(2), 50–58 (2002)
Article Google Scholar
Minkenberg, C., Rodriguez, G.: Trace-driven co-simulation of high-performance computing systems using OMNeT++. In: Proc. SIMUTools 2nd International Workshop on OMNeT++ (OMNeT++ 2009), Rome, Italy, March 6 (2009)
Google Scholar
Öhring, S., Ibel, M., Das, S.K., Kumar, M.J.: On generalized fat trees. In: Proc. 9th International Symposium on Parallel Processing (IPPS 1995), Santa Barbara, CA, April 25-28, pp. 37–44 (1995)
Google Scholar
Peterson, J.L., et al.: Application of full-system simulation in exploratory system design and development. IBM Journal of Research and Development 50(2/3), 321–332 (2006)
Article Google Scholar
Petrini, F., Vanneschi, M.: k-ary n-trees: High-performance networks for massively parallel architectures. In: Proc. 11th International Symposium on Parallel Processing (IPPS 1997), Geneva, Switzerland, April 1-5, pp. 87–93 (1997)
Google Scholar
Rajamony, R., Arimilli, L.B., Gildea, K.: PERCS: The IBM POWER7-IH high-performance computing system. IBM Journal of Research and Development 55(3), 3:1–3:12 (2011)
Article Google Scholar
Rodriguez, G., Beivide, R., Minkenberg, C., Labarta, J., Valero, M.: Exploring pattern-aware routing in generalized fat tree networks for HPC. In: Proc. 23rd International Conference on Supercomputing (ICS 2009), New York, NY, June 9-11 (2009)
Google Scholar
Scherson, I.D., Chien, C.K.: Least common ancestor networks. In: Proc. 7th International Parallel Processing Symposium (IPPS), pp. 507–513 (1993)
Google Scholar
Sinharoy, B., Kalla, R., Starke, W.J., Le, H.Q., Cargnoni, R., Van Norstrand, J.A., Ronchetti, B.J., Stuecheli, J., Leenstra, J., Guthrie, G.L., Nguyen, D.Q., Blaner, B., Marino, C.F., Retter, E., Williams, P.: IBM POWER7 multicore server processor. IBM Journal of Research and Development 55(3) 1, 1:1–1:29 (2011)
Google Scholar
Varga, A.: The OMNeT++ discrete event simulation system. In: Proc. European Simulation Multiconference (ESM 2001), Prague, Czech Republic (June 2001)
Google Scholar
Varga, A.: OMNet++ User Manual (2010), http://www.omnetpp.org/doc/omnetpp41/Manual.pdf (accessed October 27, 2011)
Varga, A., Sekercioglu, Y.A., Egan, G.K.: A practical efficiency criterion for the null message algorithm. In: Proc. European Simulation Symposium (ESS 2003), Delft, The Netherlands, October 26–29 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Research – Zurich, Säumerstrasse 4, 8803, Rüschlikon, Switzerland
Cyriel Minkenberg, Wolfgang Denzel, German Rodriguez & Robert Birke

Authors

Cyriel Minkenberg
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Denzel
View author publications
You can also search for this author in PubMed Google Scholar
German Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar
Robert Birke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cyriel Minkenberg .

Editor information

Editors and Affiliations

Freiligrathstraße 23, Zwickau, 08058, Germany
Steffen Bangsow

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Minkenberg, C., Denzel, W., Rodriguez, G., Birke, R. (2012). End-to-End Modeling and Simulation of High- Performance Computing Systems. In: Bangsow, S. (eds) Use Cases of Discrete Event Simulation. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28777-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-28777-0_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28776-3
Online ISBN: 978-3-642-28777-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics