A Simple Concept for the Performance Analysis of Cluster-Computing

  • Heinz Kredel
  • Sabine Richling
  • Jan Philipp Kruse
  • Erich Strohmaier
  • Hans-Günther Kruse
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7905)


There seems to be a lack of reliable thumb rules to estimate the size and performance of clusters with respect to applications. Since modern cluster architecture is based on multi-cores we follow a concept derived by S. Williams et. al. for the analysis of such systems. The performance is described by the dimensionless speed-up in dependence on important hardware and application parameters. The hardware parameters are the number and the theoretical performance of each processing unit and the bandwidth of the network. The application parameters are the total number of operations performed on a number of bytes and the total number of bytes communicated between the processing units. In order to test our theoretical concept we apply our model to the scalar product of vectors, matrix multiplication, Linpack and the TOP500-list.


performance model performance analysis compute clusters roofline model 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kruse, H.G.: Leistungsbewertung bei Computer-Systemen. Springer (2009)Google Scholar
  2. 2.
    Kredel, H., Kruse, H.-G., Richling, S.: Zur Leistung von verteilten, homogenen Clustern. PIK (2), 166–171 (2010); English summary, see section V in [3]Google Scholar
  3. 3.
    Richling, S., Hau, S., Kredel, H., Kruse, H.-G.: Operating Two InfiniBand Grid Clusters over 28 km Distance. In: Proc. 3PGCIC 2010. IEEE (2010)Google Scholar
  4. 4.
    Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multi-core architectures. Commun. ACM 52(4), 65–76 (2009)CrossRefGoogle Scholar
  5. 5.
    Amdahl, G.: Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities. In: AFIPS Conference Proceedings, vol. 30, pp. 483–485 (1967)Google Scholar
  6. 6.
    Gustafson, J.: Reevaluating Amdahl’s law. Commun. ACM 31(5), 532–533 (1988)CrossRefGoogle Scholar
  7. 7.
    Hockney, R.W.: Parametrization of computer performance. Parallel Computing 5(1-2), 97–103 (1987)CrossRefGoogle Scholar
  8. 8.
    Hockney, R.W., Jesshope, C.R.: Parallel Computers 2: architecture, programming and algorithms. Adam Hilger, Bristol (1988)zbMATHGoogle Scholar
  9. 9.
    Hockney, R.W.: Computational similarity. Concurrency – Practice and Experience 7(2), 147–166 (1995)CrossRefGoogle Scholar
  10. 10.
    Kredel, H., Kruse, H.-G., Richling, S.: Einige Überlegungen zur Leistung von Cluster-Computern. PIK (3), 207–211 (2012); For a partial English summary and extensions see section 3 in [11]Google Scholar
  11. 11.
    Kredel, H., Kruse, H.G., Richling, S., Strohmaier, E.: Performance Analysis and Prediction for distributed homogenous clusters. In: Computer Science – Research and Development, Special Issue ISC 2012, Hamburg (May 2012)Google Scholar
  12. 12.
    LinPack and HPL, Linear Algebra Package and High Performance LinPack, (accessed January 2012)
  13. 13.
    Dongarra, J., et al.: ScaLAPack documentation, (accessed January 2012)
  14. 14.
    Luszczek, P., Dongarra, J.: Reducing the Time to Tune Parallel Dense Linear Algebra Routines with Partial Execution and Performance Modeling. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011, Part I. LNCS, vol. 7203, pp. 730–739. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  15. 15.
    Richling, S., Hau, S., Kredel, H., Kruse, H.-G.: A Long-distance InfiniBand Interconnection between two Clusters in Production use. In: Proc. Supercomputing, November 12-18. IEEE (2011)Google Scholar
  16. 16.
    Numrich, R.W.: Computational Force: A Unifying Concept for Scalability Analysis. In: Proc PARCO, pp. 107–112 (2007)Google Scholar
  17. 17.
    Numrich, R.W.: A metric space for computer programs and the principle of computational least action. Journal of Supercomputing 43(3), 281–298 (2008)CrossRefGoogle Scholar
  18. 18.
    Numrich, R.W.: Computer performance analysis and the Pi Theorem. Comput. Sci. Res. Dev. (2010)Google Scholar
  19. 19.
    bwGRiD, Member of the German D-Grid initiative, funded by the Ministry of Education and Research and the Ministry for Science, Research and Arts Baden-Württemberg, Universities of Baden-Württemberg, 2007-2010, 2007-2012, (accessed December 2012)
  20. 20.
    Meuer, H., Strohmaier, E., Dongarra, J., Simon, H.: Top 500 Supercomputer Sites, (accessed November 2012)
  21. 21.
  22. 22.
    Uno, A.: K computer system overview, (accessed May 2012)
  23. 23.
    Graph 500 Steering Committee, Benchmarks for data intensive supercomputer applications,
  24. 24.
    Kredel, H., Kruse, H.-G., Ott, I.: Performance analysis and performance modeling of Web-applications. In: Proc. 3PGCIC 2011, pp. 115–122. IEEE (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Heinz Kredel
    • 1
  • Sabine Richling
    • 2
  • Jan Philipp Kruse
    • 3
  • Erich Strohmaier
    • 4
  • Hans-Günther Kruse
    • 1
  1. 1.IT-CenterUniversity of MannheimGermany
  2. 2.IT-CenterUniversity of HeidelbergGermany
  3. 3.Institute of GeosciencesGoethe University FrankfurtGermany
  4. 4.Future Technology GroupLawrence Berkeley National LaboratoryBerkeleyUSA

Personalised recommendations