Advertisement

A Contention-Aware Performance Model for HPC-Based Networks: A Case Study of the InfiniBand Network

  • Maxime Martinasso
  • Jean-François Méhaut
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6852)

Abstract

Multi-core clusters are cost-effective clusters largely used in high-performance computing. Parallel applications using message passing as a communication mechanism may introduce complex communication behaviours on such clusters. By sending and receiving data simultaneously from and to several nodes, parallel applications create concurrent accesses to the resources of the network. In this paper, we present a general model that expresses network resource sharing characterised by a dynamic contention graph. The model is based on a linear system weighted by bandwidth distribution factors called penalty coefficients that are specific to a network technology. We propose a method to solve the linear system and present an analysis to determine penalty coefficients on InfiniBand technology. We use complex network conflicts to assess the ability of the model to predict with low errors.

Keywords

Contention model performance prediction InfiniBand 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alexandrov, A., Ionescu, M., Schauser, K., Scheiman, C.: LogGP: Incorporating Long Messages into the LogP model for Parallel Computation. Journal of Parallel and Distributed Computing 44(1), 71–79 (1997)CrossRefGoogle Scholar
  2. 2.
    Moritz, C.A., Frank, M.I.: LoGPC: Modeling Network Contention in Message-Passing Programs. IEEE Transactions on Parallel and Distributed Systems 12(4), 404–415 (2001)CrossRefGoogle Scholar
  3. 3.
    Culler, D., Karp, R., Patterson, D., Sahay, A., Santos, E., Schauser, K., Subramonian, R., von Eicken, T.: LogP: a practical model of parallel computation. Commun. ACM 39(11), 78–85 (1996)CrossRefGoogle Scholar
  4. 4.
    Casanova, H., Legrand, A., Quinson, M.: SimGrid: a Generic Framework for Large-Scale Distributed Experiments. In: 10th IEEE International Conference on Computer Modeling and Simulation (2008)Google Scholar
  5. 5.
    InfiniBand Trade Association: InfiniBand Architecture Specification, Release 1.2.1Google Scholar
  6. 6.
    Intel Corporation: Intel Trace Analyzer and Collector 8.0 Reference GuideGoogle Scholar
  7. 7.
    Steffenel, L.A., Martinasso, M., Trystram, D.: Assessing Contention Effects on MPI_Alltoall Communications. In: Cérin, C., Li, K.-C. (eds.) GPC 2007. LNCS, vol. 4459, pp. 424–435. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  8. 8.
    Geimer, M., Felix, W., Wylie, B.J.N., Ábrahám, E., Becker, D., Mohr, B.: The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience 22(6), 702–719 (2010)Google Scholar
  9. 9.
    Martinasso, M., Méhaut, J.-F.: Model of concurrent MPI communications over SMP clusters. Tech. Rep. 00071352, HAL-INRIA (2006)Google Scholar
  10. 10.
    Hockney, R.W.: The Communication Challenge for MPP: Intel Paragon and Meiko CS-2. In: Parallel Computing, vol. 20, pp. 389–398. North-Holland, Amsterdam (1994)Google Scholar
  11. 11.
    Kim, S.C., Lee, S.: Measurement and Prediction of Communication Delays in Myrinet Networks. Journal of Parallel and Distributed Computing 61(11), 1692–1704 (2001)CrossRefzbMATHGoogle Scholar
  12. 12.
    Hoefler, T., Mehlan, T., Mietke, F., Rehm, W.: LogfP - A Model for small Messages in InfiniBand. In: Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium, IPDPS (2006)Google Scholar
  13. 13.
    Hoefler, T., Schneider, T., Lumsdaine, A.: Multistage Switches are not Crossbars: Effects of Static Routing in High-Performance Networks. In: Proceedings of the 2008 IEEE International Conference on Cluster Computing, pp. 116–125 (2008)Google Scholar
  14. 14.
    Kielmann, T., Bal, H.E., Verstoep, K.: Fast Measurement of LogP Parameters for Message Passing Platforms. In: IPDPS 2000: Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing, pp. 1176–1183 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Maxime Martinasso
    • 1
  • Jean-François Méhaut
    • 1
  1. 1.Computer Science Laboratory LIGUniversity of GrenobleGrenobleFrance

Personalised recommendations