Journal of Grid Computing

, Volume 9, Issue 4, pp 455–478 | Cite as

Adaptive Executions of Multi-Physics Coupled Applications on Batch Grids

  • Sivagama Sundari Murugavel
  • Sathish S Vadhiyar
  • Ravi S Nanjundiah
Article

Abstract

Long running multi-physics coupled parallel applications have gained prominence in recent years. The high computational requirements and long durations of simulations of these applications necessitate the use of multiple systems of a Grid for execution. In this paper, we have built an adaptive middleware framework for execution of long running multi-physics coupled applications across multiple batch systems of a Grid. Our framework, apart from coordinating the executions of the component jobs of an application on different batch systems, also automatically resubmits the jobs multiple times to the batch queues to continue and sustain long running executions. As the set of active batch systems available for execution changes, our framework performs migration and rescheduling of components using a robust rescheduling decision algorithm. We have used our framework for improving the application throughput of a foremost long running multi-component application for climate modeling, the Community Climate System Model (CCSM). Our real multi-site experiments with CCSM indicate that Grid executions can lead to improved application throughput for climate models.

Keywords

Adaptive framework Batch systems Climate models Multi-component applications Rescheduling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Coveney, P., Fabritiis, G.D., Harvey, M., Pickles, S., Porter, A.: On steering coupled models. In: e-Science All Hands Meeting (2005)Google Scholar
  2. 2.
    Larson, J., Jacob, R., Ong, E.: The model coupling toolkit: a new Fortran90 toolkit for building multiphysics parallel coupled models. Int. J. High Perform. Comput. Appl. 19, 277–292 (2005)CrossRefGoogle Scholar
  3. 3.
    Delgado-Buscalioni, R., Coveney, P., Riley, G., Ford, R.: Hybrid molecular-continuum fluid models: implementation within a general coupling framework. Philos. Trans. R. Soc. Lond. A 363, 1833 (2005)MathSciNetCrossRefGoogle Scholar
  4. 4.
    TeraGrid: http://www.teragrid.org. Accessed Sept 2011
  5. 5.
    UK e-Science: http://www.rcuk.ac.uk/escience/default.htm. Accessed Sept 2011
  6. 6.
    Community Climate System Model (CCSM): http://www.ccsm.ucar.edu. Accessed Sept 2011
  7. 7.
    Collins, W., Bitz, C., Blackmon, L., Bonan, G., Bretherton, C., Carton, J., Chang, P., Doney, S., Hack, J., Henderson, T., Kiehl, J., Large, W., McKenna, D., Santer, B., Smith, R.: The community climate system model version 3: CCSM3. J. Climate 19(11), 2122–2143 (2006)CrossRefGoogle Scholar
  8. 8.
  9. 9.
    Gabriel, E., Resch, M., Beisel, T., Keller, R.: Distributed computing in a heterogenous computing environment. In: EuroPVMMPI’98 (1998)Google Scholar
  10. 10.
    Park, K., Park, S., Kwon, O., Park, H.: MPICH-GP: a private-IP-enabled MPI over Grid environments. In: Proc. of Second International Symposium on Parallel and Distributed Processing and Applications, ISPA04, Hong Kong, China, pp. 469–473 (2004)Google Scholar
  11. 11.
    Smith, W., Taylor, V., Foster, I.: Using run-time predictions to estimate queue wait times and improve scheduler performance. In: Job Scheduling Strategies for Parallel Processing (JSSPP), pp. 202–219 (1999)Google Scholar
  12. 12.
    Brevik, J., Nurmi, D., Wolski, R.: Predicting bounds on queuing delay for batch-scheduled parallel machines. In: PPoPP ’06: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 110–118 (2006)Google Scholar
  13. 13.
    The National Center for Atmospheric Research (NCAR): http://www.ncar.ucar.edu. Accessed Sept 2011
  14. 14.
    Lublin, U., Feitelson, D.: The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J. Parallel Distrib. Comput. 63(11), 1105–1122 (2003)MATHCrossRefGoogle Scholar
  15. 15.
    Lee, B., Brooks, D., de Supinski, B., Schulz, M., Singh, K., McKee, S.: Methods of inference and learning for performance modeling of parallel applications. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Jose, CA (2007)Google Scholar
  16. 16.
    Yang, L., Ma, X., Mueller, F.: Cross-platform performance prediction of parallel applications using partial execution. In: SC ’05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, p. 40 (2005)Google Scholar
  17. 17.
    Parallel Climate Model (PCM): http://www.cgd.ucar.edu/pcm. Accessed Sept 2011
  18. 18.
    Skamarock, W., Klemp, J., Dudhia, J., Gill, D., Barker, D., Wang, W., Powers, J.: A description of the advanced research WRF version 2. NCAR, Tech. Rep. Technical Note (2005)Google Scholar
  19. 19.
    Lefantzi, S., Ray, J.: A component-based scientific toolkit for reacting flows. In: Proc. Second MIT Conference on Computational Fluid and Solid Mechanics, pp. 1401–1405 (2003)Google Scholar
  20. 20.
  21. 21.
    Vadhiyar, S., Dongarra, J.: SRS—a framework for developing malleable and migratableparallel applications for distributed systems. Parallel Process. Lett. 13(2), 291–312 (2003)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Fernandes, R., Pingali, K., Stodghill, P.: Mobile MPI programs in computational Grids. In: PPoPP ’06: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 22–31 (2006)Google Scholar
  23. 23.
    WS Resource Framework: http://www.globus.org/wsrf. Accessed Sept 2011
  24. 24.
    Czajkowski, K., Foster, I., Kesselman, C.: Agreement-based resource management. Proc. IEEE 93(3), 631–643 (2005)CrossRefGoogle Scholar
  25. 25.
    Markatchev, N., Kiddle, C., Simmonds, R.: A framework for executing long running jobs in Grid environments. In: HPCS ’08: Proceedings of the 22nd International Symposium on High Performance Computing Systems and Applications, pp. 69–75 (2008)Google Scholar
  26. 26.
    Sarkar, A.D., Roy, S., Ghosh, D., Mukhopadhyay, R., Mukherjee, N.: An adaptive execution scheme for achieving guaranteed performance in computational Grids. J. Grid Computing 8(1), 109–131 (2010)CrossRefGoogle Scholar
  27. 27.
    de O. Lucchese, F., Yero, E., Sambatti, F., Henriques, M.: An adaptive scheduler for Grids. J. Grid Computing 4(1), 1–17 (2006)CrossRefGoogle Scholar
  28. 28.
    Bucur, A., Epema, D.: Scheduling policies for processor coallocation in multicluster systems. IEEE Trans. Parallel Distrib. Syst. 18(7), 958–972 (2007)CrossRefGoogle Scholar
  29. 29.
    Buisson, J., Sonmez, O., Mohamed, H., Lammers, W., Epema, D.: Scheduling malleable applications in multicluster systems. In: CLUSTER ’07: Proceedings of the 2007 IEEE International Conference on Cluster Computing, pp. 372–381 (2007)Google Scholar
  30. 30.
    Casanova, H.: Benefits and drawbacks of redundant batch requests. J. Grid Computing 5(2), 235–250 (2007)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Ko, S.-H., Kim, N., Kim, J., Thota, A., Jha, S.: Efficient runtime environment for coupled multi-physics simulations: dynamic resource allocation and load-balancing. In: CCGRID 2010: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pp. 349–358 (2010)Google Scholar
  32. 32.
    Yu, J., Buyya, R.: A taxonomy of workflow management systems for Grid computing. J. Grid Computing 3(3–4), 171–200 (2005)CrossRefGoogle Scholar
  33. 33.
    Nurmi, D., Mandal, A., Brevik, J., Koelbel, C., Wolski, R., Kennedy, K.: Evaluation of a workflow scheduler using integrated performance modelling and batch queue wait time prediction. In: SC ’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 119 (2006)Google Scholar
  34. 34.
    Kim, H., el-Khamra, Y., Rodero, I., Jha, S., Parashar, M.: Autonomic management of application workflows on hybrid computing infrastructure. Sci. Program. 19(2–3), 75–89 (2011)Google Scholar
  35. 35.
    Zhang, X., Freschl, J., Schopf, J.: A performance study of monitoring and information services for distributed systems. In: HPDC ’03: Proceedings of the 12th IEEE International Symposiumon High Performance Distributed Computing, p. 270 (2003)Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  • Sivagama Sundari Murugavel
    • 1
  • Sathish S Vadhiyar
    • 1
  • Ravi S Nanjundiah
    • 2
  1. 1.Supercomputer Education and Research CentreIndian Institute of ScienceBangaloreIndia
  2. 2.Centre for Atmospheric & Oceanic SciencesIndian Institute of ScienceBangaloreIndia

Personalised recommendations