Load Balancing in Cluster Using BLCR Checkpoint/Restart
Modern computation is becoming complex in a way that the resource requirement is gradually increasing. High Throughput Computing is one technique to deal with such a complexity. After a significant amount of time, computing clusters gets highly overloaded resulting in degradation of performance. Since there is no central coordinator in Computer Supported Cooperative Working (CSCW) load-balancing is more complex. An overloaded node does not participate in a CSCW network as they are already overloaded. This paper proposes migration of computation intensive jobs from overloaded nodes, which will allow overloaded nodes to be able to participate in CSCW. The proposed solution improves the performance by making more nodes participating in CSCW by migrating compute intensive jobs from overloaded nodes to underloaded nodes. Evaluation of proposed approach shows that the availability and performance of the CSCW clusters is improved by 30%-40% with fault-tolerance based load balancing.
KeywordsCheckpoint/Restart CSCW Fault Tolerance Job Migration Load Balancing
Unable to display preview. Download preview PDF.
- 6.Payli, R.U., et al.: DLB—a dynamic load balancing tool for grid computing. Scientific International Journal for Parallel and Distributed Computing 07(02) (2004)Google Scholar
- 9.Nehra, N., Patel, R.B., Bhatt, V.K.: A framework for distributed dynamic load balancing in heterogeneous cluster. Journal of Computer Science (2007)Google Scholar
- 10.Hargrove, P.H., Duell, J.C.: Berkeley lab checkpoint/restart (BLCR) for Linux clusters, https://ftg.lbl.gov/assets/projects/CheckpointRestart/Pubs/LBNL-60520.pdf