Skip to main content

Advertisement

Log in

Memory-based scheduling of scientific computing clusters

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This study looks at how increased memory utilisation affects throughput and energy consumption in scientific computing, especially in high-energy physics. Our aim is to minimise energy consumed by a set of jobs without increasing the processing time. The earlier tests indicated that, especially in data analysis, throughput can increase over 100% and energy consumption decrease 50% by processing multiple jobs in parallel per CPU core. Since jobs are heterogeneous, it is not possible to find an optimum value for the number of parallel jobs. A better solution is based on memory utilisation, but finding an optimum memory threshold is not straightforward. Therefore, a fuzzy logic-based algorithm was developed that can dynamically adapt the memory threshold based on the overall load. In this way, it is possible to keep memory consumption stable with different workloads while achieving significantly higher throughput and energy-efficiency than using a traditional fixed number of jobs or fixed memory threshold approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Acosta D, Camporesi T (2008) Cosmic success. CMS Times

  2. Agostinelli S et al (2003) GEANT4: A simulation toolkit. Nucl Instrum Methods A 506:250–303

    Article  Google Scholar 

  3. Arabnia HR (1993) A Transputer-based reconfigurable parallel system. In: Proceedings of the sixth conference of North American transputer users group on transputer research and applications 6 (NATUG-6). IOS Press, Canada, pp 153–169

    Google Scholar 

  4. Aziz A, El-Rewini H (2008) On the use of meta-heuristics to increase the efficiency of online grid workflow scheduling algorithms. Clust Comput 11(4):373–390

    Article  Google Scholar 

  5. Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessor—theoretical properties and algorithms. Parallel Comput 21(11):1783–1806

    Article  Google Scholar 

  6. Brown DJ, Reams C (2010) Toward energy-efficient computing. Queue 8(2):30–43

    Google Scholar 

  7. Bunde DP (2006) Power-aware scheduling for makespan and flow. In: SPAA ’06: proceedings of the 18th annual ACM symposium on parallelism in algorithms and architectures. ACM, New York, pp 190–196

    Chapter  Google Scholar 

  8. Cao J, Jarvis S, Saini S, Nudd G (2003) Gridflow: workflow management for grid computing. In: CCGrid 2003: proceedings of the 3rd IEEE/ACM international symposium on cluster computing and the grid, pp 198–205. doi:10.1109/CCGRID.2003.1199369

    Google Scholar 

  9. Chase JS, Anderson DC, Thakar PN, Vahdat AM, Doyle RP (2001) Managing energy and server resources in hosting centers. In: SOSP ’01: proceedings of the eighteenth ACM symposium on operating systems principles. ACM, New York, pp 103–116

    Chapter  Google Scholar 

  10. Chin J, Nourani M (2004) Soc test scheduling with power-time tradeoff and hot spot avoidance. In: DATE ’04: proceedings of the conference on design, automation and test in Europe. IEEE Computer Society, Washington, p 10710

    Google Scholar 

  11. CMS_Experiment: CMSSW application framework. https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookCMSSWFramework

  12. Conner S, Link GM, Tobita S, Irwin MJ, Raghavan P (2006) Energy/performance modeling for collective communication in 3-d torus cluster networks. In: SC ’06: proceedings of the 2006 ACM/IEEE conference on supercomputing. ACM, New York

    Google Scholar 

  13. Coskun AK, Rosing TS, Whisnant KA, Gross KC (2008) Temperature-aware mpsoc scheduling for reducing hot spots and gradients. In: ASP-DAC ’08: proceedings of the 2008 Asia and south pacific design automation conference. IEEE Computer Society, Los Alamitos, pp 49–54

    Google Scholar 

  14. Edmonds J (2000) Scheduling in the dark. Theor Comput Sci 235(1):109–141

    Article  MathSciNet  MATH  Google Scholar 

  15. Essary D, Amer A (2008) Predictive data grouping: defining the bounds of energy and latency reduction through predictive data grouping and replication. Trans Storage 4(1):1–23

    Article  Google Scholar 

  16. Etsion Y, Tsafri D (2005) A short survey of commercial cluster batch schedulers. Tech rep 2005-13, Hebrew Univ of Jerusalem

  17. Fabozzi F, Jones CD, Hegner B, Lista L (2008) Physics analysis tools for the CMS experiment at LHC. IEEE Trans Nucl Sci 55:3539–3543

    Article  Google Scholar 

  18. Fu R, Ji T, Yuan J, Lin Y (2007) Online scheduling in a parallel batch processing system to minimize makespan using restarts. Theor Comput Sci 374(1–3):196–202

    Article  MathSciNet  MATH  Google Scholar 

  19. Ge R, Feng X, Cameron KW (2005) Performance-constrained distributed dvs scheduling for scientific applications on power-aware clusters. In: SC ’05: proceedings of the 2005 ACM/IEEE conference on supercomputing. IEEE Computer Society, Washington, p 34

    Google Scholar 

  20. Góes LF et al (2005) Anthillsched: a scheduling strategy for irregular and iterative i/o-intensive parallel jobs. In: Job scheduling strategies for parallel processing—JSSPP 2005. Springer, Berlin

    Google Scholar 

  21. Goldratt EM, Cox J (1984) The goal. North River Press, Croton-on-Hudson

    Google Scholar 

  22. Grosan C, Abraham A, Helvik B (2007) Multiobjective evolutionary algorithms for scheduling jobs on computational grids. In: Guimares N, Isaias P (eds) ADIS international conference, applied computing 2007, Salamanca, Spain

    Google Scholar 

  23. Hameri AP, Niemi T (2010) Applying operations management principles on optimisation of scientific computing clusters. In: 2nd rapid modelling conference

    Google Scholar 

  24. Hanselman SE, Pegah M (2007) The wild wild waste: e-waste. In: SIGUCCS ’07: proceedings of the 35th annual ACM SIGUCCS conference on user services. ACM, New York, pp 157–162

    Chapter  Google Scholar 

  25. Hopp WJ, Spearman ML (1996) Factory physics. Irwin, Chicago

  26. Kappiah N, Freeh VW, Lowenthal DK (2005) Just in time dynamic voltage scaling: exploiting inter-node slack to save energy in mpi programs. In: SC ’05: proceedings of the 2005 ACM/IEEE conference on supercomputing. IEEE Computer Society, Washington

    Google Scholar 

  27. Karlsson M, Karamanolis C, Zhu X (2005) Triage: performance differentiation for storage systems using adaptive control. Trans Storage 1(4):457–480. doi:10.1145/1111609.1111612

    Article  Google Scholar 

  28. Koole G, Righter R (2008) Resource allocation in grid computing. J Sched 11(3):163–173

    Article  MATH  Google Scholar 

  29. Kumar K, Agarwal A, Krishnan R (2004) Fuzzy based resource management framework for high throughput computing. In: CCGrid 2004: IEEE international symposium on cluster computing and the grid, pp 555–562. doi:10.1109/CCGrid.2004.1336655

    Chapter  Google Scholar 

  30. Kurowski K, Nabrzyski J, Oleksiak A, Weglarz J (2008) A multicriteria approach to two-level hierarchy scheduling in grids. J Sched 11(5):371–379

    Article  MathSciNet  MATH  Google Scholar 

  31. Lee CC (1990) Fuzzy logic in control systems: fuzzy logic controller—part i–ii. IEEE Trans Syst Man Cybern 20(2)

  32. Lefurgy C, Wang X, Ware M (2007) Server-level power control. In: ICAC ’07: proceedings of the fourth international conference on autonomic computing. IEEE Computer Society, Washington

    Google Scholar 

  33. Li X, Li Z, Zhou Y, Adve S (2005) Performance directed energy management for main memory and disks. Trans Storage 1(3):346–380

    Article  Google Scholar 

  34. Liouane N, Yahia H, Borne P (2008) Multi-objective scheduling onto heterogeneous processors system using ant system & fuzzy logic controller. Stud Inform Control 17(1):95–106

    Google Scholar 

  35. Little JDC (1961) A proof of the queuing formula: L=ΛW. Oper Res 9(3):383–387

    Article  MathSciNet  MATH  Google Scholar 

  36. Liu H, Abraham A, Hassanien AE (2010) Scheduling jobs on computational grids using a fuzzy particle swarm optimization algorithm. Future Gener Comput Syst 26(8):1336–1343. doi:10.1016/j.future.2009.05.022

    Article  Google Scholar 

  37. Marwah M et al (2009) Data analysis, visualization and knowledge discovery in sustainable data centers. In: COMPUTE ’09: proceedings of the 2nd Bangalore annual compute conference. ACM, New York, pp 1–8

    Chapter  Google Scholar 

  38. Medernach E (2005) Workload analysis of a cluster in a grid environment. In: Job scheduling strategies for parallel processing 11th international workshop, JSSPP 2005. Springer, Berlin

    Google Scholar 

  39. Moallem A, Ludwig SA (2009) Using artificial life techniques for distributed grid job scheduling. In: SAC ’09: proceedings of the 2009 ACM symposium on applied computing. ACM, New York, pp 1091–1097. doi:10.1145/1529282.1529522

    Chapter  Google Scholar 

  40. Mu’alem AW, Feitelson DG (2001) Utilization, predictability, workloads, and user runtime estimates in scheduling the ibm sp2 with backfilling. IEEE Trans Parallel Distrib Syst 12(6):529–543

    Article  Google Scholar 

  41. Mukherjee T, Banerjee A, Varsamopoulos G, Gupta SKS, Rungta S (2009) Spatio-temporal thermal-aware job scheduling to minimize energy consumption in virtualized heterogeneous data centers. Comput Netw 53(17):2888–2904

    Article  MATH  Google Scholar 

  42. Ni L, Zhang J, Yan C, Jiang C (2005) A heuristic algorithm for task scheduling based on mean load. In: Semantics, knowledge and grid, SKG ’05

    Google Scholar 

  43. Niemi T, Kommeri J, Hameri AP (2009) Energy-efficient scheduling of grid computing clusters. In: Proceedings of the 17th annual international conference on advanced computing and communications (ADCOM 2009), Bengaluru, India

    Google Scholar 

  44. Niemi T, Kommeri J, Happonen K, Klem J, Hameri AP (2009) Improving energy-efficiency of grid computing clusters. In: Advances in grid and pervasive computing, 4th international conference, GPC 2009, Geneva, Switzerland, pp 110–118

    Google Scholar 

  45. Piro RM, Guarise A, Patania G, Werbrouck A (2009) Using historical accounting information to predict the resource usage of grid jobs. Future Gener Comput Syst 25(5):499–510. doi:10.1016/j.future.2008.11.003. http://www.sciencedirect.com/science/article/B6V06-4V0TCX7-1/2/ff3e2e910dcb562d86eb30119c8230bd

    Article  Google Scholar 

  46. Prasanna GNS, Musicus BR (1996) The optimal control approach to generalized multiprocessor scheduling. Algorithmica 15(1):17–49

    Article  MathSciNet  MATH  Google Scholar 

  47. Rajan D, Yu PS (2008) Temperature-aware scheduling: when is system-throttling good enough. In: WAIM ’08: proceedings of the 2008 the ninth international conference on web-age information management. IEEE Computer Society, Washington, pp 397–404

    Chapter  Google Scholar 

  48. Santos-Neto E, Cirne W, Brasileiro F, Lima A, Lima R, Grande C (2004) Exploiting replication and data reuse to efficiently schedule data-intensive applications on grids. In: The 10th workshop on job scheduling strategies for parallel processing, pp 210–232

    Google Scholar 

  49. Shivam P, Babu S, Chase J (2006) Active and accelerated learning of cost models for optimizing scientific applications. In: VLDB ’06: proceedings of the 32nd international conference on very large data bases. VLDB Endowment, pp 535–546

    Google Scholar 

  50. Silva DPD, Cirne W, Brasileiro FV, Grande C (2003) Trading cycles for information: using replication to schedule bag-of-tasks applications on computational grids. In: Applications on computational grids, in proc of euro-par 2003, pp 169–180

    Google Scholar 

  51. Sjostrand T, Mrenna S, Skands PZ (2006) PYTHIA 6.4 physics and manual. J High Energy Phys 5

  52. Sun Microsystems (2008) Beginner’s guide to suntm grid engine 6.2 installation and configuration

  53. Tsafrir D, Etsion Y, Feitelson DG (2005) Modeling user runtime estimates. In: Job scheduling strategies for parallel processing 11th international workshop, JSSPP 2005. Springer, Berlin, p 2005

    Google Scholar 

  54. Tsafrir D, Etsion Y, Feitelson DG (2007) Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans Parallel Distrib Syst 18(6):789–803

    Article  Google Scholar 

  55. Venkatachalam V, Franz M (2005) Power reduction techniques for microprocessor systems. ACM Comput Surv 37(3):195–237

    Article  Google Scholar 

  56. Wang CM, Huang XW, Hsu CC (2009) Bi-objective optimization: an online algorithm for job assignment. In: GPC 2009, Geneva, Switzerland, pp 223–234

    Google Scholar 

  57. Yom-Tov E, Aridor Y (2008) A self-optimized job scheduler for heterogeneous server clusters. In: JSSPP’07: proceedings of the 13th international conference on Job scheduling strategies for parallel processing. Springer, Berlin, pp 169–187

    Chapter  Google Scholar 

  58. Yuan W, Nahrstedt K (2002) Integration of dynamic voltage scaling and soft real-time scheduling for open mobile systems. In: NOSSDAV ’02: proceedings of the 12th international workshop on network and operating systems support for digital audio and video. ACM, New York, pp 105–114

    Chapter  Google Scholar 

  59. Zhang W, Hu JS, Degalahal V, Kandemir M, Vijaykrishnan N, Irwin MJ (2004) Reducing instruction cache energy consumption using a compiler-based strategy. ACM Trans Archit Code Optim 1(1):3–33

    Article  Google Scholar 

  60. Zhu Q, Chen Z, Tan L, Zhou Y, Keeton K, Wilkes J (2005) Hibernator: helping disk arrays sleep through the winter. In: SOSP ’05: 20th ACM symposium on operating systems principles. ACM, New York, pp 177–190

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tapio Niemi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Niemi, T., Hameri, AP. Memory-based scheduling of scientific computing clusters. J Supercomput 61, 520–544 (2012). https://doi.org/10.1007/s11227-011-0612-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-011-0612-6

Keywords

Navigation