Skip to main content

End-to-End Modeling and Simulation of High- Performance Computing Systems

  • Chapter
Use Cases of Discrete Event Simulation

Abstract

Designing large-scale High-Performance Computing (HPC) systems, including architecture design space exploration and performance prediction, is a daunting task that can benefit enormously from discrete event simulation techniques, as the interactions between the various components of such a system generally render analytic approaches intractable. The work described in this chapter specifically deals with end-to-end, full-system simulation, as opposed to simulation of individual components or nodes. The tools described here can be used in the design phase of a new HPC system to optimize system design for a given set of workloads, or to create performance forecasts for new workloads on existing systems.

We have taken a network-centric approach, as the scale of current high-end HPC systems is in the range of hundreds of thousands of processing cores, so that the impact of the communication among so many cores will be a key factor in determining overall system performance. To this end, we developed an Omnest-based simulation environment that enables studying the impact of an HPC machine’s communication subsystem on the overall system’s performance for specific workloads.

Full system simulation at an abstraction level that still maintains a reasonably high level of detail is infeasible without resorting to parallel simulation, the main limiting factors being simulation run time and memory footprint. By applying Parallel Discrete Event Simulation techniques, the power of modern parallel computers can be exploited to great effect to perform these kinds of simulations at large scales.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Arimilli, B., Arimilli, R., Chung, V., Clark, S., Denzel, W., Drerup, B., Hoefler, T., Joyner, J., Lewis, J., Li, J., Ni, N., Rajamony, R.: The PERCS high-performance interconnect. In: 2010 IEEE 18th Annual Symposium on High-Performance Interconnects on Proc. High Performance Interconnects (HOTI), August 18-20, pp. 75–82 (2010)

    Google Scholar 

  • Bagrodia, R., Takai, M.: Performance evaluation of conservative algorithms in parallel simulation languages. IEEE Transactions Parallel Distributed Systems 11(4), 395–411 (2000)

    Article  Google Scholar 

  • Boden, N.J., Cohen, D., Felderman, R.E., Kulawik, A.E., Seitz, C.L., Seizovic, J.N., Su, W.K.: Myrinet: A gigabit-per-second local area network. IEEE Micro. 15(1), 29–36 (1995)

    Article  Google Scholar 

  • Chandy, M., Misra, J.: Distributed simulation: A case study in design and verification of distributed programs. IEEE Transactions on Software Engineering 5, 440–452 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  • Dally, W.J., Towles, B.: Principles and practices of interconnection networks, 1st edn. Morgan Kaufmann (2004)

    Google Scholar 

  • Denzel, W., Li, J., Walker, P., Jin, Y.: A framework for end-to-end simulation of high-performance computing systems. SIMULATION - Transactions of The Society for Modeling and Simulation International 86(5-6), 331–350 (2010)

    Article  Google Scholar 

  • Desai, N., Balaji, P., Sadayappan, P., Islam, M.: Are nonblocking networks really needed for high-end-computing workloads. In: Proc. 2008 IEEE International Conference on Cluster Computing (Cluster 2008), Tsukuba, Japan, September 29-October 1, pp. 152–159 (2008)

    Google Scholar 

  • Fujimoto, R.M.: Parallel discrete event simulation. In: Proceedings of the 21st Conference on Winter Simulation, pp. 19–28 (1989)

    Google Scholar 

  • Geoffray, P., Hoefler, T.: Adaptive routing strategies for modern high performance networks. In: Proc. 16th IEEE Symposium on High Performance Interconnects (HOTI 2008), Stanford, CA, August 27-28, pp. 165–172 (2008)

    Google Scholar 

  • Kamil, S., Shalf, J., Oliker, L., Skinner, D.: Understanding ultra-scale application communication requirements. In: Proc. Workload Characterization Symposium, October 2005, pp. 178–187 (2005)

    Google Scholar 

  • Kim, J., Dally, W.J., Scott, S., Abts, D.: Technology-driven, highly-scalable dragonfly network. In: Proc. International Symposium on Computer Architecture (ISCA), Beijing, China, pp. 77–88 (2008)

    Google Scholar 

  • Leiserson, C.E., Abuhamdeh, Z.S., Douglas, D.C., Feynman, C.R., Ganmukhi, M.N., Hill, J.V., Hillis, W.D., Kuszmaul, B.C., St. Pierre, M.A., Wells, D.S., Wong, M.C., Yang, S.W., Zak, R.: The network architecture of the Connection Machine CM-5. In: Proc. 4th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), San Diego, CA, pp. 272–285 (June 1992)

    Google Scholar 

  • Lencse, G.: Parallel simulation with OMNeT++ using the statistical synchronization method. In: Proceedings of the 2nd International OMNeT++ Workshop, pp. 24–32 (2002)

    Google Scholar 

  • Luszczek, P., Bailey, D., Dongarra, J., et al.: The HPC challenge (HPCC) benchmark suite. In: Proc. 2006 ACM/IEEE Conference on Supercomputing, SC 2006, Tampa, FL, USA (2006)

    Google Scholar 

  • Magnusson, P.S., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., Werner, B.: Simics: A full system simulation platform. IEEE Computer 35(2), 50–58 (2002)

    Article  Google Scholar 

  • Minkenberg, C., Rodriguez, G.: Trace-driven co-simulation of high-performance computing systems using OMNeT++. In: Proc. SIMUTools 2nd International Workshop on OMNeT++ (OMNeT++ 2009), Rome, Italy, March 6 (2009)

    Google Scholar 

  • Öhring, S., Ibel, M., Das, S.K., Kumar, M.J.: On generalized fat trees. In: Proc. 9th International Symposium on Parallel Processing (IPPS 1995), Santa Barbara, CA, April 25-28, pp. 37–44 (1995)

    Google Scholar 

  • Peterson, J.L., et al.: Application of full-system simulation in exploratory system design and development. IBM Journal of Research and Development 50(2/3), 321–332 (2006)

    Article  Google Scholar 

  • Petrini, F., Vanneschi, M.: k-ary n-trees: High-performance networks for massively parallel architectures. In: Proc. 11th International Symposium on Parallel Processing (IPPS 1997), Geneva, Switzerland, April 1-5, pp. 87–93 (1997)

    Google Scholar 

  • Rajamony, R., Arimilli, L.B., Gildea, K.: PERCS: The IBM POWER7-IH high-performance computing system. IBM Journal of Research and Development 55(3), 3:1–3:12 (2011)

    Article  Google Scholar 

  • Rodriguez, G., Beivide, R., Minkenberg, C., Labarta, J., Valero, M.: Exploring pattern-aware routing in generalized fat tree networks for HPC. In: Proc. 23rd International Conference on Supercomputing (ICS 2009), New York, NY, June 9-11 (2009)

    Google Scholar 

  • Scherson, I.D., Chien, C.K.: Least common ancestor networks. In: Proc. 7th International Parallel Processing Symposium (IPPS), pp. 507–513 (1993)

    Google Scholar 

  • Sinharoy, B., Kalla, R., Starke, W.J., Le, H.Q., Cargnoni, R., Van Norstrand, J.A., Ronchetti, B.J., Stuecheli, J., Leenstra, J., Guthrie, G.L., Nguyen, D.Q., Blaner, B., Marino, C.F., Retter, E., Williams, P.: IBM POWER7 multicore server processor. IBM Journal of Research and Development 55(3) 1, 1:1–1:29 (2011)

    Google Scholar 

  • Varga, A.: The OMNeT++ discrete event simulation system. In: Proc. European Simulation Multiconference (ESM 2001), Prague, Czech Republic (June 2001)

    Google Scholar 

  • Varga, A.: OMNet++ User Manual (2010), http://www.omnetpp.org/doc/omnetpp41/Manual.pdf (accessed October 27, 2011)

  • Varga, A., Sekercioglu, Y.A., Egan, G.K.: A practical efficiency criterion for the null message algorithm. In: Proc. European Simulation Symposium (ESS 2003), Delft, The Netherlands, October 26–29 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cyriel Minkenberg .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Berlin Heidelberg

About this chapter

Cite this chapter

Minkenberg, C., Denzel, W., Rodriguez, G., Birke, R. (2012). End-to-End Modeling and Simulation of High- Performance Computing Systems. In: Bangsow, S. (eds) Use Cases of Discrete Event Simulation. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28777-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28777-0_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28776-3

  • Online ISBN: 978-3-642-28777-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics