Skip to main content

The Impact of Noise on the Scaling of Collectives: A Theoretical Approach

  • Conference paper
High Performance Computing – HiPC 2005 (HiPC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3769))

Included in the following conference series:

Abstract

The performance of parallel applications running on large clusters is known to degrade due to the interference of kernel and daemon activities on individual nodes, often referred to as noise. In this paper, we focus on an important class of parallel applications, which repeatedly perform computation, followed by a collective operation such as a barrier. We model this theoretically and demonstrate, in a rigorous way, the effect of noise on the scalability of such applications. We study three natural and important classes of noise distributions: The exponential distribution, the heavy-tailed distribution, and the Bernoulli distribution. We show that the systems scale well in the presence of exponential noise, but the performance goes down drastically in the presence of heavy-tailed or Bernoulli noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gioiosa, R., Petrini, F., Davis, K., Lebaillif-Delamare, F.: Analysis of System Overhead on Parallel Computers. In: The 4th IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2004), Rome, Italy (December 2004)

    Google Scholar 

  2. Jones, T.R., Brenner, L.B., Fier, J.M.: Impacts of Operating Systems on the Scalibility of Parallel Applications. Tech. Rep. UCRL-MI-202629, Lawrence Livermore National Laboratory (Mar 2003)

    Google Scholar 

  3. Petrini, F., Kerbyson, D.J., Pakin, S.: The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q. In: ACM/IEEE Conference on Supercomputing (SC 2003), Phoenix, Arizona, USA (November 2003)

    Google Scholar 

  4. Kramer, W.T.C., Ryan, C.: Performance Variability of Highly Parallel Architectures. In: International Conference on Computational Science (ICCS 2003), Melbourne, Australia (June 2003)

    Google Scholar 

  5. Frachtenberg, E., Petrini, F., Fernandez, J., Pakin, S., Coll, S.: STORM: Lightning-Fast Resource Management. In: ACM/IEEE Conference on Supercomputing (SC 2002), Baltimore, Maryland, USA (November 2002)

    Google Scholar 

  6. Hori, A., Tezuka, H., Ishikawa, Y.: Highly Efficient Gang Scheduling Implementation. In: ACM/IEEE Conference on Supercomputing (SC 1998), Orlando, FL, USA (November 1998)

    Google Scholar 

  7. Jones, T., Dawson, S., Neely, R., Tuel, W., Brenner, L., Fier, J., Blackmore, R., Caffrey, P., Maskell, B., Tomlinson, P., Roberts, M.: Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System. In: ACM/IEEE Conference on Supercomputing (SC 2003), Phoenix, Arizona, USA (November 2003)

    Google Scholar 

  8. Frachtenberg, E., Feitelson, D., Petrini, F., Fernández, J.: Flexible Coscheduling: Mitigating Load Imbalance and Improving Utilization of Heterogeneous Resources. In: International Parallel and Distributed Processing Symposium 2003 (IPDPS 2003), Nice, France (April 2003)

    Google Scholar 

  9. Agarwal, S., Choi, G.S., Das, C.R., Yoo, A.B., Nagar, S.: Co-ordinated Coscheduling in Time-Sharing Clusters through a Generic Framework. In: IEEE International Conference on Cluster Computing (CLUSTER 2003), Hong Kong (December 2003)

    Google Scholar 

  10. Agarwal, S., Garg, R., Vishnoi, N.: The Impact of Noise on the Scaling of Collectives: A Theoretical Approach., Tech. Rep. RI-05003, IBM Research Report (February 2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Agarwal, S., Garg, R., Vishnoi, N.K. (2005). The Impact of Noise on the Scaling of Collectives: A Theoretical Approach. In: Bader, D.A., Parashar, M., Sridhar, V., Prasanna, V.K. (eds) High Performance Computing – HiPC 2005. HiPC 2005. Lecture Notes in Computer Science, vol 3769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11602569_31

Download citation

  • DOI: https://doi.org/10.1007/11602569_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30936-9

  • Online ISBN: 978-3-540-32427-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics