Skip to main content
Log in

Experimental analysis of operating system jitter caused by page reclaim

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Operating system jitter is one of the causes of runtime overhead in high-performance computing applications. Many high-performance computing applications perform burst accesses to I/O, and such accesses consume a large amount of memory. When the Linux kernel runs out of memory, it awakens special kernel threads to reclaim memory pages. If the kernel threads are frequently awakened, application performance is degraded because of the threads’ resource consumption as well as the increase in the application’s page faults and migration between CPU cores. In this study, we empirically analyze the impact of jitter caused by reclaiming memory pages, and we propose a method for reducing it. The proposed method reclaims memory pages in advance of the kernel thread. It reclaims more pages at one time than the kernel threads, thus reducing the frequency of page reclaim and the impact of jitter. We conducted experiments using practical weather forecast software, the results of which showed that the proposed method minimized performance degradation caused by jitter.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Akkan H, Lang M, Liebrock LM (2012) Stepping towards noiseless Linux environment. In: Proceedings of the 2nd international workshop on runtime and operating systems for supercomputers

  2. Argonne Leadership Computing Facility: Mira/Cetus/Vesta. http://www.alcf.anl.gov/user-guides/mira-cetus-vesta. Accessed 20 Mar 2016

  3. Beckman P, Iskra K, Yoshii K, Coghlan S (2006) The influence of operating systems on the performance of collective operations at extreme scale. In: Proceedings of the 2006 IEEE international conference on cluster computing

  4. Betti E, Cesati M, Gioiosa R, Piermaria F (2009) A global operating system for HPC clusters. In: Proceedings of the 2009 IEEE international conference on cluster computing

  5. Chinner D, Higdon J (2006) Exploring high bandwidth filesystems on large systems. Proc Ott Linux Symp 2006:177–191

    Google Scholar 

  6. De P, Kothari R, Mann V (2007) Identifying sources of operating system jitter through fine-grained kernel instrumentation. In: Proceedings of the 2007 IEEE international conference on cluster computing, pp 331–340

  7. De P, Mann V, Mittaly U (2009) Handling OS jitter on multicore multithreaded systems. In: Proceedings of the 23rd IEEE international symposium on parallel and distributed processing

  8. Dunigan TH (1994) Early experiences and performance of the Intel Paragon. Tech. Rep. ORNL/TM-12194, Oak Ridge National Laboratory

  9. Ferreira KB, Bridges P, Brightwell R (2008) Characterizing application sensitivity to os interference using kernel-level noise injection. In: Proceedings of the 2008 ACM/IEEE conference on supercomputing

  10. Giampapa M, Gooding T, Inglett T, Wisniewski RW (2010) Experiences with a lightweight supercomputer kernel: lessons learned from Blue Gene’s CNK. In: Proceedings of SC10

  11. Gioiosa R, Petrini F, Davis K, Lebaillif-Delamare F (2004) Analysis of system overhead on parallel computers. In: Proceedings of the 4th IEEE international symposium on signal processing and information technology, pp 387–390

  12. GlusterFS. http://www.gluster.org/. Accessed 20 Mar 2016

  13. Hoefler T, Schneider T, Lumsdaine A (2010) Characterizing the influence of system noise on large-scale applications by simulation. In: Proceedings of SC10

  14. Isaila F, Balaprakash P, Wild SM, Kimpe D, Latham R, Ross R, Hovland P (2015) Collective I/O tuning using analytical and machine learning models. In: Proceedings of the 2015 IEEE international conference on cluster computing, pp 128–137

  15. Jones T (2011) Linux kernel co-scheduling for bulk synchronous parallel applications. In: Proceedings of the 1st international workshop on runtime and operating systems for supercomputers, pp 57–64

  16. Kuo CS, Shah A, Nomura A, Matsuoka S, Wolf F (2014) How file access patterns influence interference among cluster applications. In: Proceedings of 2014 IEEE international conference on cluster computing, pp 185–193

  17. Morari A, Gioiosa R, Wisniewski RW, Cazorla FJ, Valero M (2011) A quantitative analysis of OS noise. In: Proceedings of the 2011 IEEE international parallel and distributed processing symposium, pp 852–863

  18. Moriya S (2011) Tunable watermark. https://lwn.net/Articles/422291/. Accessed 20 Mar 2016

  19. Nataraj A, Morris A, Malony AD, Sottile M, Beckman P (2007) The ghost in the machine: observing the effects of kernel operation on parallel application performance. In: Proceedings of SC07

  20. Oral S, Wang F, Shipman GM, Dillow D, Miller R, Maxwell D, Becklehimer J, Larkin J, Henseler D (2010) Reducing application runtime variability on Jaguar XT5. Cray User Group (CUG) Meeting

  21. Oyama Y, Ishiguro S, Murakami J, Sasaki S, Matsumiya R, Tatebe O (2014) Reduction of operating system jitter caused by page reclaim. In: Proceedings of the 4th international workshop on runtime and operating systems for supercomputers (ROSS’14)

  22. Park Y, Hensbergen EV, Hillenbrand M, Inglett T, Rosenburg B, Ryu KD, Wisniewski RW (2012) FusedOS: fusing LWK performance with FWK functionality in a heterogeneous environment. In: Proceedings of the 24th international symposium on computer architecture and high performance computing, pp 211–218

  23. Rosenthal E, León EA, Moody AT (2013) Mitigating system noise with simultaneous multi-threading. In: Proceedings of SC13, poster session

  24. Schwan P (2003) Lustre: building a file system for 1000-node clusters. In: Proceedings of the 2003 Linux symposium

  25. Seelam S, Fong L, Lewars J, Divirgilio J, Veale BF, Gildea K (2011) Characterization of system services and their performance impact in multi-core nodes. In: Proceedings of the 25th IEEE international parallel and distributed processing symposium, pp 104–117

  26. Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop distributed file system. In: Proceedings of the 26th IEEE symposium on massive storage systems and technologies

  27. Sumimoto S (2013) Performance evaluation of FEFS on K computer and Fujitsu’s roadmap toward Lustre 2.x. Lustre User Group 2013. http://www.opensfs.org/events/lug13/. Accessed 20 Mar 2016

  28. Tatebe O, Hiraga K, Soda N (2010) Gfarm grid file system. New Gener Comput 28(3):257–275

    Article  MATH  Google Scholar 

  29. Tsafrir D, Etsion Y, Feitelson DG, Kirkpatrick S (2005) System noise, OS clock ticks, and fine-grained parallel applications. In: Proceedings of the 19th ACM international conference on supercomputing, pp 303–312

  30. van Riel R (2011) Add extra free kbytes tunable. https://lkml.org/lkml/2011/9/1/188. Accessed 20 Mar 2016

  31. Vicente E Jr, Matias R (2012) Exploratory study on the Linux OS jitter. In: Proceedings of the 2012 Brazilian symposium on computing system engineering, pp 19–24

  32. WRF. http://www.wrf-model.org/. Accessed 20 Mar 2016

  33. Yuan Q, Zhao J, Chen M, Sun N (2010) GenerOS: an asymmetric operating system kernel for multi-core systems. In: Proceedings of the 24th IEEE international parallel and distributed processing symposium

Download references

Acknowledgments

We are grateful for the insightful discussion with Hiroko Midorikawa of Seikei University. We also appreciate many insightful feedbacks from anonymous reviewers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yoshihiro Oyama.

Additional information

This research was supported by CREST, JST.

This paper is an extended version of the paper titled “Reduction of Operating System Jitter Caused by Page Reclaim,” which has appeared in the Proceedings of the 4th International Workshop on Runtime and Operating Systems for Supercomputers (ROSS 2014) [21].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oyama, Y., Ishiguro, S., Murakami, J. et al. Experimental analysis of operating system jitter caused by page reclaim. J Supercomput 72, 1946–1972 (2016). https://doi.org/10.1007/s11227-016-1703-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1703-1

Keywords

Navigation