Advertisement

Quality Assurance for Clusters: Acceptance-, Stress-, and Burn-In Tests for General Purpose Clusters

  • Matthias S. Müller
  • Guido Juckeland
  • Matthias Jurenz
  • Michael Kluge
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4782)

Abstract

Although common sense says that all nodes of a cluster should behave identically since they consist of exactly the same hardware parts and are running the same software, experience tells otherwise.

We present a collection of programs and tools that were gathered over several years during various cluster installations at different sites with clusters from various vendors. The collection contains programs to check for the setup, functionality, and performance of clusters. Components like CPU, memory, disk, network, MPI and file system are checked. Together with the short description of the tools we describe our experiences using them.

Keywords

High Performance Computing Batch System Execution Environment High Performance Cluster Computing Large System Performance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sterling, T., et al.: How to build a beowulf. In: Cluster Computing Conference, Emory University, Atlanta, GA (1997)Google Scholar
  2. 2.
    Sterling, T., Savarese, D., Becker, D., Dorband, J., Ranawake, I., Packer, C.: Beowulf: A parallel workstation for scientific computation. In: International Conference on Parallel Processing, Oconomowoc, WI, pp. 11–14 (1995)Google Scholar
  3. 3.
    Buyya, R. (ed.): High Performance Cluster Computing, vol. 1. Prentice-Hall, Englewood Cliffs (1999)Google Scholar
  4. 4.
    Pfister, G.: Seach For Clusters. Prentice-Hall, Englewood Cliffs (1998)Google Scholar
  5. 5.
  6. 6.
    Papadopoulos, P.M., Katz, M.J., Bruno, G.: Npaci rocks: Tools and techniques for easily deploying manageable linux clusters. Concurrency and Computation: Practice and Experience Special Issue: Cluster 2001 (2001)Google Scholar
  7. 7.
    Dongarra, J.: The linpack benchmark: An explanation. In: Houstis, E.N., Papatheodorou, T.S., Polychronopoulos, C.D. (eds.) Proceedings of the 1st International Conference on Supercomputing, Athens, Greece, pp. 456–474. Springer, Heidelberg (1988)Google Scholar
  8. 8.
    Bailey, D., Barton, J., Lasinski, T., Simon, H.: The NAS Parallel Benchmarks. Technical Report RNR-91-002, NASA Ames Research Center, Moffett Field, CA (1991)Google Scholar
  9. 9.
    Bailey, D., Harris, T., Saphir, W.: van der Wijngaar t, R., Woo, A., Yarrow, M.: The NAS Parallel Benchmarks 2.0. Technical Report NAS-95-020, NASA Ames Research Center, Moffett Field, CA (1995), http://www.nas.nasa.gov/Software/NPB
  10. 10.
    Dongarra, J., Luszczek, P.: Introduction to the hpcchallenge benchmark suite. ICL Technical Report ICL-UT-05-01, ICL (2005)Google Scholar
  11. 11.
    Minnich, R., Hendricks, J., Webster, D.: The linux bios. In: The Fourth Annual Linux Showcase and Conference, Atlanta, GA (2000)Google Scholar
  12. 12.
    Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Proceedings, 11th European PVM/MPI Users Group Meeting, Budapest, Hungary, pp. 97–104 (2004)Google Scholar
  13. 13.
  14. 14.
    Keller, R., Resch, M.: Testing the correctness of MPI implementations. In: ISPDC 2006, Timisoara, Romania (2006)Google Scholar
  15. 15.
    Müller, M.S., Niethammer, C., Chapman, B., Wen, Y., Liu, Z.: Validating OpenMP 2.5 for fortran and c/c++. In: Sixth European Workshop on OpenMP, KTH Royal Institute of Technology, Stockholm, Sweden (2004)Google Scholar
  16. 16.
    McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. IEEE TCCA (1995), http://www.cs.virginia.edu/stream
  17. 17.
    Bray, T.: Bonnie (1990), http://www.textuality.com/bonnie/
  18. 18.
    Saito, H., Gaertner, G., Jones, W., Eigenmann, R., Iwashita, H., Lieberman, R., van Waveren, M., Whitney, B.: Large system performance of SPEC OMP2001 benchmarks. In: Zima, H.P., Joe, K., Sato, M., Seo, Y., Shimasaki, M. (eds.) ISHPC 2002. LNCS, vol. 2327, Springer, Heidelberg (2002)Google Scholar
  19. 19.
    Müller, M.S., Kalyanasundaram, K., Gaertner, G., Jones, W., Eigenmann, R., Lieberman, R., van Waveren, M., Whitney, B.: SPEC HPG benchmarks for high performance systems. International Journal of High Performance Computing and Networking 1, 162–170 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Matthias S. Müller
    • 1
  • Guido Juckeland
    • 1
  • Matthias Jurenz
    • 1
  • Michael Kluge
    • 1
  1. 1.Technische Universität Dresden, Center for Information Services and High Performance Computing (ZIH), 01062 DresdenGermany

Personalised recommendations