Skip to main content
Log in

Benchmarking the effects of operating system interference on extreme-scale parallel machines

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

We investigate operating system noise, which we identify as one of the main reasons for a lack of synchronicity in parallel applications. Using a microbenchmark, we measure the noise on several contemporary platforms and find that, even with a general-purpose operating system, noise can be limited if certain precautions are taken. We then inject artificially generated noise into a massively parallel system and measure its influence on the performance of collective operations. Our experiments indicate that on extreme-scale platforms, the performance is correlated with the largest interruption to the application, even if the probability of such an interruption on a single process is extremely small. We demonstrate that synchronizing the noise can significantly reduce its negative influence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agarwal, S., Garg, R., Vishnoi, N.K.: The impact of noise on the scaling of collectives: A theoretical approach. In: Proceedings of the 12th International Conference on High Performance Computing, Goa, India. Springer Lecture Notes in Computer Science, vol. 3769, pp. 280–289 (2005)

  2. Almási, G.: Private communication (2006)

  3. Beckman, P., Iskra, K., Yoshii, K., Coghlan, S.: Operating system issues for petascale systems. ACM SIGOPS Oper. Syst. Rev. 40(2), 29–33 (2006)

    Article  Google Scholar 

  4. Brightwell, R., Riesen, R., Underwood, K., Hudson, T.B., Bridges, P., Maccabe, A.B.: A performance comparison of Linux and a lightweight kernel. In: Proceedings of the 5th IEEE International Conference on Cluster Computing, Kowloon, Hong Kong, China (2003)

  5. Burger, D.C., Hyder, R.S., Miller, B.P., Wood, D.A.: Paging tradeoffs in distributed-shared-memory multiprocessors. J. Supercomput. 10(1), 87–104 (1996)

    Article  Google Scholar 

  6. Dietrich, S.-T., Walker, D.: The evolution of real-time Linux, November 2005, http://www.linuxdevices.com/files/rtlws-2005/SvenThorstenDietrich.pdf

  7. Garg, R., De, P.: Impact of noise on scaling of collectives: An empirical evaluation. In: Proceedings of the 13th International Conference on High Performance Computing, Bangalore, India. Springer Lecture Notes in Computer Science, vol. 4297, pp. 460–471 (2006)

  8. Hudson, T., Brightwell, R.: Network performance impact of a lightweight Linux for Cray XT3 compute nodes. In: Proceedings of the ACM/IEEE Conference on Supercomputing, Tampa, FL (2006)

  9. Jones, T., Dawson, S., Neely, R., Tuel, W., Brenner, L., Fier, J., Blackmore, R., Caffrey, P., Maskell, B., Tomlinson, P., Roberts, M.: Improving the scalability of parallel jobs by adding parallel awareness to the operating system. In: Proceedings of the ACM/IEEE Conference on Supercomputing, Phoenix, AZ (2003)

  10. Jones, T.R., Brenner, L.B., Fier, J.M.: Impacts of operating systems on the scalability of parallel applications. Technical Report UCRL-MI-202629, Lawrence Livermore National Laboratory (2003)

  11. Kelly, S.M., Brightwell, R.: Software architecture of the light weight kernel, Catamount. In: Proceedings of the 47th Cray User Group Conference, Albuquerque, NM (2005)

  12. Kramer, W., Ryan, C.: Performance variability of highly parallel architectures. In: Proceedings of the International Conference on Computational Science, Melbourne, Australia and St. Petersburg, Russia. Springer Lecture Notes in Computer Science, vol. 2659 (2003)

  13. Moreira, J.E., et al.: Blue Gene/L programming and operating environment. IBM J. Res. Dev. 49(2/3), 367–376 (2005)

    Article  Google Scholar 

  14. Petrini, F., Kerbyson, D.J., Pakin, S.: The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of ASCI Q. In: Proceedings of the ACM/IEEE Conference on Supercomputing. Phoenix, AZ (2003)

  15. Sottile, M., Minnich, R.: Analysis of microbenchmarks for performance tuning of clusters. In: Proceedings of the 6th IEEE International Conference on Cluster Computing, San Diego, CA, pp. 371–377 (2004)

  16. Terry, P., Shan, A., Huttunen, P.: Improving application performance on HPC systems with process synchronization. Linux J. 127, 68–73 (2004)

    Google Scholar 

  17. TOP500 supercomputer sites, http://www.top500.org/

  18. Tsafrir, D., Etsion, Y., Feitelson, D.G., Kirkpatrick, S.: System noise, OS clock ticks, and fine-grained parallel applications. In: Proceedings of the 19th International Conference on Supercomputing, Cambridge, MA, pp. 303–312 (2005)

  19. van der Pas, R.: Memory hierarchy in cache-based systems. Technical Report 817-0742-10, Sun Microsystems, Nov. 2002

  20. Wagner, A., Buntinas, D., Panda, D.K., Brightwell, R.: Application-bypass reduction for large-scale clusters. In: Proceedings of the 5th IEEE International Conference on Cluster Computing, Kowloon, Hong Kong, China, pp. 404–411 (2003)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kamil Iskra.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Beckman, P., Iskra, K., Yoshii, K. et al. Benchmarking the effects of operating system interference on extreme-scale parallel machines. Cluster Comput 11, 3–16 (2008). https://doi.org/10.1007/s10586-007-0047-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-007-0047-2

Keywords

Navigation