Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

European Conference on Parallel Processing

Euro-Par 2011: Euro-Par 2011: Parallel Processing Workshops pp 211–220Cite as

  1. Home
  2. Euro-Par 2011: Parallel Processing Workshops
  3. Conference paper
The Malthusian Catastrophe Is Upon Us! Are the Largest HPC Machines Ever Up?

The Malthusian Catastrophe Is Upon Us! Are the Largest HPC Machines Ever Up?

  • Patricia Kovatch30,
  • Matthew Ezell30 &
  • Ryan Braby30 
  • Conference paper
  • 1180 Accesses

  • 6 Citations

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 7156)

Abstract

Thomas Malthus, an English political economist who lived from 1766 to 1834, predicted that the earth’s population would be limited by starvation since population growth increases geometrically and the food supply only grows linearly. He said, “the power of population is indefinitely greater than the power in the earth to provide subsistence for man,” thus defining the Malthusian Catastrophe. There is a parallel between this prediction and the conventional wisdom regarding super-large machines: application problem size and machine complexity is growing geometrically, yet mitigation techniques are only improving linearly.

To examine whether the largest machines are usable, the authors collected and examined component failure rates and Mean Time Between System Failure data from the world’s largest production machines, including Oak Ridge National Laboratory’s Jaguar and the University of Tennessee’s Kraken. The authors also collected MTBF data for a variety of Cray XT series machines from around the world, representing over 6 Petaflops of compute power. An analysis of the data is provided as well as plans for future work. High performance computing’s Malthusian Catastrophe hasn’t happened yet, and advances in system resiliency should keep this problem at bay for many years to come.

Keywords

  • high performance computing
  • resiliency
  • MTBF
  • failures
  • scalability

Download conference paper PDF

References

  1. TeraGrid, http://www.teragrid.org/

  2. Piazzalunga, D.: Project Triangle. Figure in public domain, downloaded from, http://en.wikipedia.org/wiki/File:Project_Triangle.svg

  3. Stearley, J.: Defining and Measuring Supercomputer Reliability, Availability, and Serviceability (RAS). In: 6th LCI Conference on Linux Clusters (April 2005)

    Google Scholar 

  4. Top500 Supercomputer Sites, http://top500.org/

  5. The Computer Failure Data Repository, http://cfdr.usenix.org/

  6. Gottumukkala, N., Nassar, R., Paun, M., Leangsuksun, C., Scott, S.: Reliability of a System of k Nodes for High Performance Computing Applications. IEEE Transactions on Reliability 59(1), 162–169 (2010)

    CrossRef  Google Scholar 

  7. Johnson, S.: Cray Inc. Personal Communication

    Google Scholar 

  8. Andrews, P., Kovatch, P., Hazlewood, V., Baer, T.: Scheduling a 100,000 core Supercomputer for Maximum Utilization and Capability. In: 39th International Conference on Parallel Processing Workshops (2010)

    Google Scholar 

  9. Becklehimer, J., Willis, C., Lothian, J., Maxwell, D., Vasil, D.: Real Time Health Monitoring of the Cray XT3/XT4 Using the Simple Event Correlator (SEC). Cray Users Group (2007)

    Google Scholar 

  10. Schroeder, B., Gibson, G.: A Large-Scale Study of Failures in High-Performance Computing Systems

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. National Institute for Computational Sciences, The University of Tennessee, Knoxville, USA

    Patricia Kovatch, Matthew Ezell & Ryan Braby

Authors
  1. Patricia Kovatch
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Matthew Ezell
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Ryan Braby
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Scilytics, Koellnerhofgasse 3/15A, 1010, Vienna, Austria

    Michael Alexander

  2. ICAR-CNR, Via P. Castellino, 111, 80131, Napoli, Italy

    Pasqua D’Ambra

  3. University of Amsterdam, 1090, Amsterdam, Netherlands

    Adam Belloum

  4. Innovative Computing Laboratory, The University of Tennessee, US

    George Bosilca

  5. Department of Experimental Medicine and Clinic, University Magna Græcia, 88100, Catanzaro, Italy

    Mario Cannataro

  6. Computer Science Department, University of Pisa, Italy

    Marco Danelutto

  7. Second University of Naples, Italy

    Beniamino Di Martino

  8. TUMünchen,, Boltzmannstr. 3, ,, 85748, Garching, Germany

    Michael Gerndt

  9. Equipe Runtime, INRIA Bordeaux Sud-Ouest, 33405, Talence Cedex, France

    Emmanuel Jeannot & Raymond Namyst & 

  10. Equipe HIEPACS, INRIA Bordeaux Sud-Ouest, 33405, Talence Cedex, France

    Jean Roman

  11. Computer Science and Mathematics Division, Oak Ridge National Laboratory, 37831-6164, Oak Ridge, TN, USA

    Stephen L. Scott

  12. Department of Scientific Computing, University of Vienna, Nordbergstr. 15/3C, 1090, Vienna, Austria

    Jesper Larsson Traff

  13. Computer Science and Mathematics Division, Oak Ridge National Laboratory, 37831, Oak Ridge, TN, USA

    Geoffroy Vallée

  14. Technische Universität München, Germany

    Josef Weidendorfer

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kovatch, P., Ezell, M., Braby, R. (2012). The Malthusian Catastrophe Is Upon Us! Are the Largest HPC Machines Ever Up?. In: Alexander, M., et al. Euro-Par 2011: Parallel Processing Workshops. Euro-Par 2011. Lecture Notes in Computer Science, vol 7156. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29740-3_25

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-29740-3_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29739-7

  • Online ISBN: 978-3-642-29740-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature