Abstract
In this paper, we conduct an in-depth evaluation of a broad spectrum of scheduling alternatives for clusters. These include the widely used batch scheduling, local scheduling, gang scheduling, most prior communication-driven coscheduling algorithms-Dynamic Coscheduling (DCS), Spin Block (SB), Periodic Boost (PB), and Co-ordinated Coscheduling (CC)-and a newly proposed HYBRID coscheduling algorithm on a 16-node, Myrinet-connected Linux cluster.
Performance and energy measurements using several NAS, LLNL and ANL benchmarks on the Linux cluster provide several conclusions. First, although batch scheduling is currently used in most clusters, the blocking-based coscheduling techniques such as SB, CC and HYBRID and the gang scheduling can provide much better performance even in a dedicated cluster platform. Second, in contrast to some of the prior studies, we observe that blocking-based schemes like SB and HYBRID can provide better performance than spin-based techniques like PB on a Linux platform. Third, the proposed HYBRID scheduling provides the best performance-energy behavior and can be implemented on any cluster with little effort. All these results suggest that blocking-based coscheduling techniques are viable candidates to be used in clusters for significant performance-energy benefits.
Similar content being viewed by others
References
Acharya A, Setia S (1999) Availability and utility of idle memory in workstation clusters. In: Proceedings of ACM SIGMETRICS’99, June 1999, pp 35–46
Agarwal S, Choi G, Das CR, Yoo AB, Nagar S (2003) Co-ordinated coscheduling in time-sharing clusters through a generic framework. In: Proceedings of international conference on cluster computing, Dec 2003
Anderson TE, Culler DE, Patterson DA (1995) A case for NOW (networks of workstations). IEEE Micro 15(1):54–64
Anglano C (2000) A comparative evaluation of implicit coscheduling strategies for networks of workstations. In: Proceedings of 9th international symposium on high performance distributed computing (HPDC’9), Aug 2000, pp 221–228
Arpaci-Dusseau AC, Culler DE, Mainwaring AM (1998) Scheduling with implicit information in distributed systems. In: Proceedings of the 1998 ACM SIGMETRICS joint international conference on measurement and modeling of computer systems, June 1998, pp 233–243
Bailey AM (2002) Accelerated strategic computing initiative (ASCI): Driving the need for the terascale simulation facility (TSF). In: Proceedings of energy2002 workshop and exposition, June 2002
Batat A, Feitelson DG (2000) Gang scheduling with memory considerations. In: Proceedings in 14th international parallel and distributed processing symposium, May 2000, pp 109–114
Bovet DP, Cesati M (2000) Understanding the Linux Kernel, O’Reilly & Associates, Inc, October 2000
Burd TD, Brodersen RW (2000) Design issues for dynamic voltage scaling. In: Proceedings of the 2000 international symposium on low power electronics and design, July 2000, pp 9–14
Compaq, Intel and Microsoft Corporations. Virtual interface architecture specification. Version 1.0, Dec 1997. Available from http://www.vidf.org
Eicken TV, Basu A, Buch V, Vogels W (1995) U-net: A user-level network interface of parallel and distributed computing. In: Proceedings of the 15th ACM symposium on operating systems principles (SOSP), Dec 1995, pp 40–53
Eicken TV, Culler DE, Goldstein SC, Schauser KE (1992) Active messages: a mechanism for integrated communication and computation. In: Proceedings of the 19th annual international symposium on computer architecture, May 1992, pp 256–266
Etsion Y, Feitelson DG (2001) User-level communication in a system with gang scheduling, In: Proceedings of the international parallel and distributed processing symposium, 2001
Feitelson DG (1994) A survey of scheduling in multiprogrammed parallel systems. Technical report research report RC 19790(87657), IBM TJ Watson Research Center, October 1994
Feitelson DG (1996) Packing schemes for gang scheduling. In: Job scheduling strategies for parallel processing—IPPS’96 workshop, LNCS 1162, March 1996, pp 89–110
Feitelson DG, Rudolph L (1990) Distributed hierarchical control for parallel processing. IEEE Comput 23(5):65–77
Gigabit Ethernet Alliance. 10 gigabit ethernet technology overview white paper. Available from http://www.10gea.org/Tech-whitepapers.htm
Hori A, Tezuka H, Ishikawa Y (1998) Highly efficient gang scheduling implementation. In: Proceedings of the 1998 ACM/IEEE conference on supercomputing, pp 1–14
IBM Corporation. IBM LoadLeveler. Available from http://www.mppmu.mpg.de/computing/AIXuser/loadl
InfiniBand Trade Association. InfiniBand architecture specification, vols 1 & 2, Release 1.2, October 2004. Available from http://www.infinibandta.org
Intel and Microsoft. Advanced power management vol 1.2. Available from http://www.microsoft.com/
Intel, Microsoft and Toshiba. The advanced configuration & power interface specification. Available from http://www.acpi.info
Jette MA (1997) Performance characteristics of gang scheduling in multiprogrammed environments. In: Proceedings of the 1997 ACM/IEEE conference on supercomputing, Nov 1997, pp 1–12
Jones MT, Plassmann PE (1992) Solution of large, sparse systems of linear equations in massively parallel applications. In: Proceedings of the 1992 ACM/IEEE conference on supercomputing, Nov 1992, pp 551–560
Kerbyson DJ, Hoisie A, Wasserman HJ (2003) A comparison between the earth simulator and alphaserver systems using predictive application performance models. In: Proceeding of the international parallel and distributed processing symposium 2003, April 2003, pp 64–73
Lawrence Livermore National Laboratory. The sPPM Benchmark Code. Available from http://www.llnl.gov/asci/purple/benchmarks/limited/sppm
Lawrence Berkeley National Laboratory. Data Center Energy Benchmarking Case Study, July 2003. Available from http://datacenters.lbl.gov/docs/Data_Center_Facility4.pdf
Lawrence Livermore National Laboratory. Accelerated Strategic Computing Initiative (ASCI) Program. Available from http://www.llnl.gov/asci
Lawson B, Smirni E, Puiu D (2002) Self-adapting backfilling scheduling for parallel systems. In: Proceedings of the 2002 international conference on parallel processing (ICPP 2002), Aug 2002, pp 583–592
Myrinet, Inc, MPICH-GM software, October 2003. Available from http://www.myrinet.com/
Myrinet, Inc, Myrinet GM Software, October 2003. Available from http://www.myrinet.com/scs/index.html
Nagar S, Banerjee A, Sivasubramaniam A, Das CR (1999) Alternatives to coscheduling a network of workstations. J Parallel Distrib Comput 59(2):302–327
NASA Advanced Supercomputing Division. The NAS Parallel Benchmarks (tech report and source code). Available from http://www.nas.nasa.gov/Software/NPB/
NJB et al (1995) Myrinet: A gigabit-per-second local area network. IEEE Micro 15(1):29–36
OpenPBS. Available from http://www.openpbs.org
Pakin S, Lauria M, Chien A (1995) High performance messaging on workstations: Illinois Fast Messages (FM) for Myrinet. In: Proceedings of the 1995 ACM/IEEE conference on supercomputing, Dec 1995, p 55
PBSPro. Available from http://www.altair.com/software/pbspro.htm
Quadrics Ltd. QsNet HIGH PERFORMANCE INTERCONNECT. Available from http://doc.quadrics.com/quadrics/QuadricsHome.nsf/DisplayPages/Homepage
Rubini A, Corbet J (2001) Linux Device Drivers, 2nd edn. O’Reilly & Associates, Inc, June 2001
Scott Rhine HP MSL. Loadable Scheduler Modules on Linux White Paper
Setia S, Squillante MS, Naik VK (1999) The impact of job memory requirements on gang-scheduling performance. ACM SIGMETRICS Perform Eval Rev 26(4): 30–39
Setia SK, Squillante MS, Tripathi SK (1994) Analysis of processor allocation in multiprogrammed, distributed-memory parallel processing systems. IEEE Trans Parallel Distrib Syst 5(4):401–420
Silberschatz A, Galvin PB, Gagne G (2001) Operating system concepts, 6th edn. Wiley, 2001
Sobalvarro PG, Pakin S, Weihl WE, Chien AA (1998) Dynamic coscheduling on workstation clusters. In: Proceedings of the IPPS workshop on job scheduling strategies for parallel processing, March 1998, pp 231–256
Squillante MS, Wang F, Papaefthymiou M (1996) An analysis of gang scheduling for multiprogrammed parallel computing environments. In: Proceedings of the 8th annual ACM symposium on parallel algorithms and architectures, 1996, pp 89–98
Squillante MS, Zhang Y, Sivasubramaniam A, Gautam N, Franke H, Moreira J (2002) Modeling and analysis of dynamic coscheduling in parallel and distributed environments. In: Proceedings of SIGMETRICS2002, June 2002, pp 43–54
Supercluster Research and Development Group. Maui Scheduler. Available from http://mauischeduler.sourceforge.net/
Takahashi T, Sumimoto S, Hori A, Harada H, Ishikawa Y (2000) PM2: a high performance communication middleware for heterogeneous network environments. In: Proceedings of the 2000 ACM/IEEE conference on supercomputing (CDROM), Nov 2000, p 16
TOP500.org. TOP500 SUPERCOMPUTER SITES. Available from http://www.top500.org
Yokogawa Electric Cooperation. WT210/WT230 Digital Power Meter USER’S MANUAL, May 1998. Available from http://www.yokogawa.com/
Yoo AB, Jette MA (2001) The characteristics of workload on ASCI blue-pacific at Lawrence Livermore National Laboratory. In: Proceedings of CCGrid2001, May 2001, pp 295–302
Zotkin D, Keleher P (1999) Job-length estimation and performance in backfilling schedulers. In: Proceedings of 8th international symposium on high performance distributed computing (HPDC’8), 1999
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported in part by NSF grants CCR-0098149, EIA-0202007, CCR-0208734, CCF-0429631 and CNS-0509251.
Rights and permissions
About this article
Cite this article
Choi, G.S., Kim, JH., Ersoz, D. et al. A comprehensive performance and energy consumption analysis of scheduling alternatives in clusters. J Supercomput 40, 159–184 (2007). https://doi.org/10.1007/s11227-006-0018-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-006-0018-z