Advertisement

The Journal of Supercomputing

, Volume 71, Issue 8, pp 3054–3093 | Cite as

WiseThrottling: a new asynchronous task scheduler for mitigating I/O bottleneck in large-scale datacenter servers

  • Fang LvEmail author
  • Lei Liu
  • Hui-min Cui
  • Lei Wang
  • Ying Liu
  • Xiao-bing Feng
  • Pen-Chung Yew
Article

Abstract

Datacenter servers are stepping into an era marked by powerful multi-/many-core processors. Severe problems such as I/O contentions in those large-scale platforms pose an unprecedented challenge. Prior studies primarily considered I/O bandwidth as a major performance bottleneck. However, our work reveals that in many cases the fundamental cause of I/O contentions is the inefficiency of OS schedulers. Particularly, the modern system is not aware of this fact and thus suffers from poor I/O performance, especially for datacenter servers. Based on our findings, we propose a new software-based scheduling approach, WiseThrottling, to reduce I/O contention. WiseThrottling performs asynchronous and self-adjustment scheduling for concurrent tasks. We evaluate our approach across a wide range of C/OpenMP/MapReduce workloads on a 64-core server in Dawning Cluster datacenter. The experimental results exhibit that WiseThrottling is effective for reducing the I/O bottleneck and it can improve the overall system performance by up to 207 %.

Keywords

Multi-/many-core server I/O contention Scheduling  Resource description 

References

  1. 1.
    Alvarez GA, Chambliss DD, Jadav D et al (2009) Utilizing informed throttling to guarantee quality of service to I/O streams. US Patent, Google PatentsGoogle Scholar
  2. 2.
    Armbrust M, Fox A, Griffith R et al (2009) Above the clouds: a berkeley view of cloud computing. Technical Report UCB/EECS-2009-28Google Scholar
  3. 3.
    Barroso L, Holzle U (2007) The case for energy-proportional computing. IEEE Comput 40(12):33–37CrossRefGoogle Scholar
  4. 4.
    Bienia C (2011) Benchmarking Modern Multiprocessors. Princeton University. http://parsec.cs.princeton.edu/publications/bienia11benchmarking.pdf
  5. 5.
    Boneti C, Cazorla FJ, Gioiosa R, Buyuktosunoglu A, Cher C-Y, Valero M (2008) Software-controlled priority characterization of POWER5 processor. In: Proceedings of the 35th international symposium on computer architecture, June 21–25, pp 415–426Google Scholar
  6. 6.
    Bordawekar R, Rosario JM, Choudhary AN (1993) Design and evaluation of primitives for parallel I/O. In: Proceedings of SC’93, pp 452–461Google Scholar
  7. 7.
    Ching A, Choudhary A, Coloma K, Liao WK, Ross R, Gropp W (2003) Noncontiguous access through MPI-IO. In: Proceedings of CCGrid’03. pp 104–111Google Scholar
  8. 8.
    Das R, Ausavarungnirun R, Mutlu O, Kumar A et al (Feb 2013) Application-to-core mapping policies to reduce memory interference in multi-core systems. In: Proceedings of PACT’13Google Scholar
  9. 9.
    Dhodapkar A, Smith J (2003) Comparing program phase detection techniques [C]. In: Proceedings of the 36th annual IEEE/ACM international symposium on microarchitecture. IEEE Computer Society, Los Alamitos, pp 217–217Google Scholar
  10. 10.
    Ding C, Dwarkadas S, Huang MC et al (2006) Program phase detection and exploitation. In: Proceedings of the 20th international conference on parallel and distributed processing. IEEE Computer Society, Los Alamitos, pp 279–279Google Scholar
  11. 11.
    Durand D, Jain R, Tseytlin D et al (2003) Parallel I/O scheduling using randomized, distributed edge coloring algorithms. J Parallel Distrib Comput 63(6):611–618CrossRefzbMATHGoogle Scholar
  12. 12.
    Govindan S, Nath AR, Das A et al (2007) Xen and co.: communication-aware CPU scheduling for consolidated xen-based hosting platforms. In: Proceedings of VEE’07, pp 126–136Google Scholar
  13. 13.
    Hastings A, Choudhary A (Sep 2006) Exploiting shared memory to improve parallel i/o performance. In: EuroPVM/MPI’06, pp 212–221Google Scholar
  14. 14.
    Jain R, Somalwar K, Werth J et al (1992) Scheduling parallel I/O operations in multiple-bus systems. IEEE Trans Parallel Distrib Syst 16(4):352–362zbMATHGoogle Scholar
  15. 15.
    Jain R, Somalwar K, Werth J et al (1997) Heuristics for scheduling I/O operations. IEEE Trans Parallel Distrib Syst 8(3):310–320CrossRefGoogle Scholar
  16. 16.
    Jiang Y, Tian K, Shen X (2010) Combining locality analysis with online proactive job co-scheduling in chip multiprocessors. In: Proceedings of the 5th international conference on high performance embedded architectures and compilers. Springer, Berlin, pp 201–215Google Scholar
  17. 17.
    Kambadur M, Moseley T, Hank R, Kim Martha A (2012) Measuring interference between live datacenter applications. In: IEEE/ACM SC’12, pp 51Google Scholar
  18. 18.
    Lin Z, Zhou S (1993) Parallelizing I/O intensive applications for a workstation cluster: a case study. SIGARCH Comput Arch News 21(5):15–22CrossRefGoogle Scholar
  19. 19.
    Ling X, Jin H, Ibrahim S et al (2012) Efficient Disk I/O scheduling with QoS guarantee for Xen-based hosting platforms. In: Proceedings of CCGRID ’12, pp 81–89Google Scholar
  20. 20.
    Lu Y, Chen Y, Amritkar P, Thakur R et al (2012) A new data sieving approach for high performance I/O. In: Proceedings of the 7th international conference on future information technology (FutureTech’12)Google Scholar
  21. 21.
    Lv F, Cui H-M, Wang L, Liu L, Wu CG, Feng X-B, Yew PC (2014) Dynamic I/O-aware scheduling for batch-mode applications on chip multiprocessor systems of cluster platforms. J Comput Sci Technol 29(1):21–37Google Scholar
  22. 22.
    Ma S, Sun X-H, Ioan R (2012) I/O throttling and coordination for MapReduce. Technical Report, Illinois Institute of TechnologyGoogle Scholar
  23. 23.
    Mars J, Tang L, Hundt R et al (2011) Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: Proceedings of Micro’11, pp 248–259Google Scholar
  24. 24.
    Mishra AK, Hellerstein JL, Cirne W, Das CR (2010) Towards characterizing cloud backend workloads: insights from google compute clusters. SIGMETRICS Perform Eval Rev 37(4):34–41CrossRefGoogle Scholar
  25. 25.
    Moreira JE, Franke H, Chan W et al (1999) A gang-scheduling system for ASCI Blue-Pacific. In: HPCN’99, pp 831–840Google Scholar
  26. 26.
    Ma L, Chamberlain R, Agrawal K (2014) Performance modeling for highly-threaded many-core GPUs. In: Proceedings of IEEE ASAP’14, pp 84–91Google Scholar
  27. 27.
    Ma L, Agrawal K, Chamberlain RD (2014) A memory access model for highly-threaded many-core architectures. Future Gener Comput Syst 30:202–215CrossRefGoogle Scholar
  28. 28.
    Ongaro D, Cox AL, Rixner S (2018) Scheduling I/O in virtual machine monitors. In: Proceedings of VEE’08, pp 1–10Google Scholar
  29. 29.
    Park S, Shen K (2012) FIOS: a fair, efficient flash i/o scheduler. In: FAST’12Google Scholar
  30. 30.
    Ryu KD, Hollingsworth JK, Keleher PJ (2001) Efficient network and I/O throttling for fine-grain cycle stealing. In: Proceedings of SC’01, pp 3–3 (CDROM)Google Scholar
  31. 31.
    Schulz G (2006) Data center I/O performance issues and impacts a look at I/O performance bottlenecks and their impact on time sensitive applications. White paperGoogle Scholar
  32. 32.
    Shakshober DJ (2015) Choosing an I/O Scheduler for Red Hat \(\textregistered \) Enterprise Linux \(\textregistered \) 4 and the 2.6 Kernel. http://www.redhat.com/magazine/008jun05/features/schedulers/
  33. 33.
    Snavely A, Tullsen D (2000) Symbiotic jobscheduling for a simultaneous multithreaded processor. In: Proc of ASPLOS’00, pp 234–244Google Scholar
  34. 34.
    Sun N-H, Meng D (2007) Dawning4000A high performance computer. Front Comput Sci China 1(1):20–25Google Scholar
  35. 35.
    Thakur R, Gropp W, Lusk E (1999) Data sieving and collective I/O in romio. In: Frontiers’99, pp 182–189Google Scholar
  36. 36.
    Thakur R, Ross R, Lusk E, Gropp W, Latham R (2004) Users guide for ROMIO: a high-performance, portable MPI-IO implementation. Technical Memorandum ANL/MCS-TM-234, Mathematics and Computer Science Division. Argonne National Laboratory (revised)Google Scholar
  37. 37.
    Zhang Y, Yang A, Sivasubramaniam A et al (2003) Gang scheduling extensions for I/O intensive workloads. In: JSSPP’03, pp 183–207Google Scholar
  38. 38.
  39. 39.
    http://www.graph500.org. Accessed Apr 2015
  40. 40.

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.State Key Laboratory of Computer Architecture, ICT, CASBeijingChina
  2. 2.Department of Computer Science and EngineeringUniversity of Minnesota at Twin-CitiesMinneapolisUSA

Personalised recommendations