Advertisement

Load Balancing in Cluster Using BLCR Checkpoint/Restart

  • Hemant Hariyale
  • Manu Vardhan
  • Ankit Pandey
  • Ankit Mishra
  • Dharmender Singh Kushwaha
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 176)

Abstract

Modern computation is becoming complex in a way that the resource requirement is gradually increasing. High Throughput Computing is one technique to deal with such a complexity. After a significant amount of time, computing clusters gets highly overloaded resulting in degradation of performance. Since there is no central coordinator in Computer Supported Cooperative Working (CSCW) load-balancing is more complex. An overloaded node does not participate in a CSCW network as they are already overloaded. This paper proposes migration of computation intensive jobs from overloaded nodes, which will allow overloaded nodes to be able to participate in CSCW. The proposed solution improves the performance by making more nodes participating in CSCW by migrating compute intensive jobs from overloaded nodes to underloaded nodes. Evaluation of proposed approach shows that the availability and performance of the CSCW clusters is improved by 30%-40% with fault-tolerance based load balancing.

Keywords

Checkpoint/Restart CSCW Fault Tolerance Job Migration Load Balancing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Selikhov, A., Germain, C.: A Channel Memory based fault tolerance for MPI applications. Future Generation Computer Systems 21(5), 709–715 (2005)CrossRefGoogle Scholar
  2. 2.
    Al-Saqabi, K.H., Saleh, K.A.: An efficient process migration algorithm for homogeneous clusters. Information and Software Technology 38(9), 569–580 (1996)CrossRefGoogle Scholar
  3. 3.
    Hursey, J., Graham, R.L.: Analyzing fault aware collective performance in a process fault tolerant MPI. Parallel Computing 38(1-2), 15–25 (2012)CrossRefGoogle Scholar
  4. 4.
    Chtepen, M., Claeys, F.H.A., Dhoedt, B., De Turck, F., Demeester, P., Vanrolleghem, P.A.: Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids. IEEE Transactions on Parallel and Distributed Systems 20(2), 180–190 (2009)CrossRefGoogle Scholar
  5. 5.
    Lopriore, L.: Object and process migration in a single-address-space distributed system. Microprocessors and Microsystems 23(10), 587–595 (2000)CrossRefGoogle Scholar
  6. 6.
    Payli, R.U., et al.: DLB—a dynamic load balancing tool for grid computing. Scientific International Journal for Parallel and Distributed Computing 07(02) (2004)Google Scholar
  7. 7.
    Cao, J., et al.: Grid load balancing using intelligent agents. Future Generation Computer Systems 21(1), 135–149 (2005)CrossRefGoogle Scholar
  8. 8.
    Yagoubi, Slimani, Y.: Task load balancing for grid computing. Journal of Computer Science 3(3), 186–194 (2007)CrossRefGoogle Scholar
  9. 9.
    Nehra, N., Patel, R.B., Bhatt, V.K.: A framework for distributed dynamic load balancing in heterogeneous cluster. Journal of Computer Science (2007)Google Scholar
  10. 10.
    Hargrove, P.H., Duell, J.C.: Berkeley lab checkpoint/restart (BLCR) for Linux clusters, https://ftg.lbl.gov/assets/projects/CheckpointRestart/Pubs/LBNL-60520.pdf
  11. 11.
    Rodríguez, G., Pardo, X.C., Martín, M.J., González, P.: Performance evaluation of an application-level checkpointing solution on grids. Future Generation Computer Systems 26, 1012–1023 (2010), doi:10.1016/j.future.2010.04.016CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Hemant Hariyale
    • 1
  • Manu Vardhan
    • 1
  • Ankit Pandey
    • 1
  • Ankit Mishra
    • 1
  • Dharmender Singh Kushwaha
    • 1
  1. 1.Department of Computer Science and EngineeringMotilal Nehru National Institute of Technology AllahabadAllahabadIndia

Personalised recommendations