Abstract
This study looks at how increased memory utilisation affects throughput and energy consumption in scientific computing, especially in high-energy physics. Our aim is to minimise energy consumed by a set of jobs without increasing the processing time. The earlier tests indicated that, especially in data analysis, throughput can increase over 100% and energy consumption decrease 50% by processing multiple jobs in parallel per CPU core. Since jobs are heterogeneous, it is not possible to find an optimum value for the number of parallel jobs. A better solution is based on memory utilisation, but finding an optimum memory threshold is not straightforward. Therefore, a fuzzy logic-based algorithm was developed that can dynamically adapt the memory threshold based on the overall load. In this way, it is possible to keep memory consumption stable with different workloads while achieving significantly higher throughput and energy-efficiency than using a traditional fixed number of jobs or fixed memory threshold approaches.
Similar content being viewed by others
References
Acosta D, Camporesi T (2008) Cosmic success. CMS Times
Agostinelli S et al (2003) GEANT4: A simulation toolkit. Nucl Instrum Methods A 506:250–303
Arabnia HR (1993) A Transputer-based reconfigurable parallel system. In: Proceedings of the sixth conference of North American transputer users group on transputer research and applications 6 (NATUG-6). IOS Press, Canada, pp 153–169
Aziz A, El-Rewini H (2008) On the use of meta-heuristics to increase the efficiency of online grid workflow scheduling algorithms. Clust Comput 11(4):373–390
Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessor—theoretical properties and algorithms. Parallel Comput 21(11):1783–1806
Brown DJ, Reams C (2010) Toward energy-efficient computing. Queue 8(2):30–43
Bunde DP (2006) Power-aware scheduling for makespan and flow. In: SPAA ’06: proceedings of the 18th annual ACM symposium on parallelism in algorithms and architectures. ACM, New York, pp 190–196
Cao J, Jarvis S, Saini S, Nudd G (2003) Gridflow: workflow management for grid computing. In: CCGrid 2003: proceedings of the 3rd IEEE/ACM international symposium on cluster computing and the grid, pp 198–205. doi:10.1109/CCGRID.2003.1199369
Chase JS, Anderson DC, Thakar PN, Vahdat AM, Doyle RP (2001) Managing energy and server resources in hosting centers. In: SOSP ’01: proceedings of the eighteenth ACM symposium on operating systems principles. ACM, New York, pp 103–116
Chin J, Nourani M (2004) Soc test scheduling with power-time tradeoff and hot spot avoidance. In: DATE ’04: proceedings of the conference on design, automation and test in Europe. IEEE Computer Society, Washington, p 10710
CMS_Experiment: CMSSW application framework. https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookCMSSWFramework
Conner S, Link GM, Tobita S, Irwin MJ, Raghavan P (2006) Energy/performance modeling for collective communication in 3-d torus cluster networks. In: SC ’06: proceedings of the 2006 ACM/IEEE conference on supercomputing. ACM, New York
Coskun AK, Rosing TS, Whisnant KA, Gross KC (2008) Temperature-aware mpsoc scheduling for reducing hot spots and gradients. In: ASP-DAC ’08: proceedings of the 2008 Asia and south pacific design automation conference. IEEE Computer Society, Los Alamitos, pp 49–54
Edmonds J (2000) Scheduling in the dark. Theor Comput Sci 235(1):109–141
Essary D, Amer A (2008) Predictive data grouping: defining the bounds of energy and latency reduction through predictive data grouping and replication. Trans Storage 4(1):1–23
Etsion Y, Tsafri D (2005) A short survey of commercial cluster batch schedulers. Tech rep 2005-13, Hebrew Univ of Jerusalem
Fabozzi F, Jones CD, Hegner B, Lista L (2008) Physics analysis tools for the CMS experiment at LHC. IEEE Trans Nucl Sci 55:3539–3543
Fu R, Ji T, Yuan J, Lin Y (2007) Online scheduling in a parallel batch processing system to minimize makespan using restarts. Theor Comput Sci 374(1–3):196–202
Ge R, Feng X, Cameron KW (2005) Performance-constrained distributed dvs scheduling for scientific applications on power-aware clusters. In: SC ’05: proceedings of the 2005 ACM/IEEE conference on supercomputing. IEEE Computer Society, Washington, p 34
Góes LF et al (2005) Anthillsched: a scheduling strategy for irregular and iterative i/o-intensive parallel jobs. In: Job scheduling strategies for parallel processing—JSSPP 2005. Springer, Berlin
Goldratt EM, Cox J (1984) The goal. North River Press, Croton-on-Hudson
Grosan C, Abraham A, Helvik B (2007) Multiobjective evolutionary algorithms for scheduling jobs on computational grids. In: Guimares N, Isaias P (eds) ADIS international conference, applied computing 2007, Salamanca, Spain
Hameri AP, Niemi T (2010) Applying operations management principles on optimisation of scientific computing clusters. In: 2nd rapid modelling conference
Hanselman SE, Pegah M (2007) The wild wild waste: e-waste. In: SIGUCCS ’07: proceedings of the 35th annual ACM SIGUCCS conference on user services. ACM, New York, pp 157–162
Hopp WJ, Spearman ML (1996) Factory physics. Irwin, Chicago
Kappiah N, Freeh VW, Lowenthal DK (2005) Just in time dynamic voltage scaling: exploiting inter-node slack to save energy in mpi programs. In: SC ’05: proceedings of the 2005 ACM/IEEE conference on supercomputing. IEEE Computer Society, Washington
Karlsson M, Karamanolis C, Zhu X (2005) Triage: performance differentiation for storage systems using adaptive control. Trans Storage 1(4):457–480. doi:10.1145/1111609.1111612
Koole G, Righter R (2008) Resource allocation in grid computing. J Sched 11(3):163–173
Kumar K, Agarwal A, Krishnan R (2004) Fuzzy based resource management framework for high throughput computing. In: CCGrid 2004: IEEE international symposium on cluster computing and the grid, pp 555–562. doi:10.1109/CCGrid.2004.1336655
Kurowski K, Nabrzyski J, Oleksiak A, Weglarz J (2008) A multicriteria approach to two-level hierarchy scheduling in grids. J Sched 11(5):371–379
Lee CC (1990) Fuzzy logic in control systems: fuzzy logic controller—part i–ii. IEEE Trans Syst Man Cybern 20(2)
Lefurgy C, Wang X, Ware M (2007) Server-level power control. In: ICAC ’07: proceedings of the fourth international conference on autonomic computing. IEEE Computer Society, Washington
Li X, Li Z, Zhou Y, Adve S (2005) Performance directed energy management for main memory and disks. Trans Storage 1(3):346–380
Liouane N, Yahia H, Borne P (2008) Multi-objective scheduling onto heterogeneous processors system using ant system & fuzzy logic controller. Stud Inform Control 17(1):95–106
Little JDC (1961) A proof of the queuing formula: L=ΛW. Oper Res 9(3):383–387
Liu H, Abraham A, Hassanien AE (2010) Scheduling jobs on computational grids using a fuzzy particle swarm optimization algorithm. Future Gener Comput Syst 26(8):1336–1343. doi:10.1016/j.future.2009.05.022
Marwah M et al (2009) Data analysis, visualization and knowledge discovery in sustainable data centers. In: COMPUTE ’09: proceedings of the 2nd Bangalore annual compute conference. ACM, New York, pp 1–8
Medernach E (2005) Workload analysis of a cluster in a grid environment. In: Job scheduling strategies for parallel processing 11th international workshop, JSSPP 2005. Springer, Berlin
Moallem A, Ludwig SA (2009) Using artificial life techniques for distributed grid job scheduling. In: SAC ’09: proceedings of the 2009 ACM symposium on applied computing. ACM, New York, pp 1091–1097. doi:10.1145/1529282.1529522
Mu’alem AW, Feitelson DG (2001) Utilization, predictability, workloads, and user runtime estimates in scheduling the ibm sp2 with backfilling. IEEE Trans Parallel Distrib Syst 12(6):529–543
Mukherjee T, Banerjee A, Varsamopoulos G, Gupta SKS, Rungta S (2009) Spatio-temporal thermal-aware job scheduling to minimize energy consumption in virtualized heterogeneous data centers. Comput Netw 53(17):2888–2904
Ni L, Zhang J, Yan C, Jiang C (2005) A heuristic algorithm for task scheduling based on mean load. In: Semantics, knowledge and grid, SKG ’05
Niemi T, Kommeri J, Hameri AP (2009) Energy-efficient scheduling of grid computing clusters. In: Proceedings of the 17th annual international conference on advanced computing and communications (ADCOM 2009), Bengaluru, India
Niemi T, Kommeri J, Happonen K, Klem J, Hameri AP (2009) Improving energy-efficiency of grid computing clusters. In: Advances in grid and pervasive computing, 4th international conference, GPC 2009, Geneva, Switzerland, pp 110–118
Piro RM, Guarise A, Patania G, Werbrouck A (2009) Using historical accounting information to predict the resource usage of grid jobs. Future Gener Comput Syst 25(5):499–510. doi:10.1016/j.future.2008.11.003. http://www.sciencedirect.com/science/article/B6V06-4V0TCX7-1/2/ff3e2e910dcb562d86eb30119c8230bd
Prasanna GNS, Musicus BR (1996) The optimal control approach to generalized multiprocessor scheduling. Algorithmica 15(1):17–49
Rajan D, Yu PS (2008) Temperature-aware scheduling: when is system-throttling good enough. In: WAIM ’08: proceedings of the 2008 the ninth international conference on web-age information management. IEEE Computer Society, Washington, pp 397–404
Santos-Neto E, Cirne W, Brasileiro F, Lima A, Lima R, Grande C (2004) Exploiting replication and data reuse to efficiently schedule data-intensive applications on grids. In: The 10th workshop on job scheduling strategies for parallel processing, pp 210–232
Shivam P, Babu S, Chase J (2006) Active and accelerated learning of cost models for optimizing scientific applications. In: VLDB ’06: proceedings of the 32nd international conference on very large data bases. VLDB Endowment, pp 535–546
Silva DPD, Cirne W, Brasileiro FV, Grande C (2003) Trading cycles for information: using replication to schedule bag-of-tasks applications on computational grids. In: Applications on computational grids, in proc of euro-par 2003, pp 169–180
Sjostrand T, Mrenna S, Skands PZ (2006) PYTHIA 6.4 physics and manual. J High Energy Phys 5
Sun Microsystems (2008) Beginner’s guide to suntm grid engine 6.2 installation and configuration
Tsafrir D, Etsion Y, Feitelson DG (2005) Modeling user runtime estimates. In: Job scheduling strategies for parallel processing 11th international workshop, JSSPP 2005. Springer, Berlin, p 2005
Tsafrir D, Etsion Y, Feitelson DG (2007) Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans Parallel Distrib Syst 18(6):789–803
Venkatachalam V, Franz M (2005) Power reduction techniques for microprocessor systems. ACM Comput Surv 37(3):195–237
Wang CM, Huang XW, Hsu CC (2009) Bi-objective optimization: an online algorithm for job assignment. In: GPC 2009, Geneva, Switzerland, pp 223–234
Yom-Tov E, Aridor Y (2008) A self-optimized job scheduler for heterogeneous server clusters. In: JSSPP’07: proceedings of the 13th international conference on Job scheduling strategies for parallel processing. Springer, Berlin, pp 169–187
Yuan W, Nahrstedt K (2002) Integration of dynamic voltage scaling and soft real-time scheduling for open mobile systems. In: NOSSDAV ’02: proceedings of the 12th international workshop on network and operating systems support for digital audio and video. ACM, New York, pp 105–114
Zhang W, Hu JS, Degalahal V, Kandemir M, Vijaykrishnan N, Irwin MJ (2004) Reducing instruction cache energy consumption using a compiler-based strategy. ACM Trans Archit Code Optim 1(1):3–33
Zhu Q, Chen Z, Tan L, Zhou Y, Keeton K, Wilkes J (2005) Hibernator: helping disk arrays sleep through the winter. In: SOSP ’05: 20th ACM symposium on operating systems principles. ACM, New York, pp 177–190
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Niemi, T., Hameri, AP. Memory-based scheduling of scientific computing clusters. J Supercomput 61, 520–544 (2012). https://doi.org/10.1007/s11227-011-0612-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-011-0612-6