Skip to main content

An Application-Level Solution for the Dynamic Reconfiguration of MPI Applications

  • Conference paper
  • First Online:
High Performance Computing for Computational Science – VECPAR 2016 (VECPAR 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10150))

Included in the following conference series:

Abstract

Current parallel environments aggregate large numbers of computational resources with a high rate of change in their availability and load conditions. In order to obtain the best performance in this type of infrastructures, parallel applications must be able to adapt to these changing conditions. This paper presents an application-level proposal to automatically and transparently adapt MPI applications to the available resources. The architecture includes: automatic code transformation of the parallel applications, a system to reschedule processes on available nodes, and migration capabilities based on checkpoint-and-restart techniques to move selected processes to target nodes. Experimental results show a good degree of adaptability and a good performance in different availability scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    This time is measured executing the application with a different number of processes depending on the hardware available (16 processes version when only 1 node is available, 32 processes version when 2 nodes are available, etc.).

References

  1. Agbaria, A., Friedman, R.: Starfish: fault-tolerant dynamic MPI programs on clusters of workstations. Cluster Comput. 6(3), 227–236 (2003)

    Article  Google Scholar 

  2. Broquedis, F., Clet-Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: hwloc: a generic framework for managing hardware affinities in HPC applications. In: Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 180–186 (2010)

    Google Scholar 

  3. Buisson, J., Sonmez, O., Mohamed, H., Lammers, W., Epema, D.: Scheduling malleable applications in multicluster systems. In: 2007 International Conference on Cluster Computing (CLUSTER), pp. 372–381 (2007)

    Google Scholar 

  4. Cores, I., Rodríguez, G., Martín, M.J., González, P.: Achieving checkpointing global consistency through a hybrid compile time and runtime protocol. Procedia Comput. Sci. 18, 169–178 (2013). International Conference on Computational Science (ICCS)

    Google Scholar 

  5. George, C., Vadhiyar, S.S.: ADFT: an adaptive framework for fault tolerance on large scale systems using application malleability. Procedia Comput. Sci. 9, 166–175 (2012). International Conference on Computational Science (ICCS)

    Article  Google Scholar 

  6. Guay, W.L., Reinemo, S.A., Johnsen, B.D., Yen, C.H., Skeie, T., Lysne, O., Tørudbakken, O.: Early experiences with live migration of SR-IOV enabled infiniband. J. Parallel Distrib. Comput. 78, 39–52 (2015)

    Article  Google Scholar 

  7. Hacker, T.J., Romero, F., Nielsen, J.J.: Secure live migration of parallel applications using container-based virtual machines. Int. J. Space Based Situated Comput. 2(1), 45–57 (2012)

    Article  Google Scholar 

  8. Huang, C., Lawlor, O., Kalé, L.V.: Adaptive MPI. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, vol. 2958, pp. 306–322. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24644-2_20

    Chapter  Google Scholar 

  9. Hungershofer, J.: On the combined scheduling of malleable and rigid jobs. In: Computer Architecture and High Performance Computing (SBAC-PAD), pp. 206–213 (2004)

    Google Scholar 

  10. Information Technology Center, RIKEN. HIMENO Benchmark. http://accc.riken.jp/2444.htm. Accessed Aug 2016

  11. Jeannot, E., Mercier, G.: Near-optimal placement of MPI processes on hierarchical NUMA architectures. In: D’Ambra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010. LNCS, vol. 6272, pp. 199–210. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15291-7_20

    Chapter  Google Scholar 

  12. Martín, G., Singh, D.E., Marinescu, M.C., Carretero, J.: Enhancing the performance of malleable MPI applications by using performance-aware dynamic reconfiguration. Parallel Comput. 46, 60–77 (2015)

    Article  Google Scholar 

  13. Nagarajan, A.B., Mueller, F., Engelmann, C., Scott, S.L.: Proactive fault tolerance for HPC with Xen virtualization. In: International Conference on Supercomputing (ICS), pp. 23–32 (2007)

    Google Scholar 

  14. National Aeronautics and Space Administration. The NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html. Accessed Aug 2016

  15. Open MPI Team. Open MPI: Open Source High Performance Computing. http://www.open-mpi.org/. Accessed Aug 2016

  16. Raveendran, A., Bicer, T., Agrawal, G.: A framework for elastic execution of existing MPI programs. In: IEEE International Symposium on Parallel and Distributed Processing Workshops (IPDPSW), pp. 940–947 (2011)

    Google Scholar 

  17. Ribeiro, F.S., Nascimento, A.P., Boeres, C., Rebello, V.E.F., Sena, A.C.: Autonomic malleability in iterative MPI applications. In: Computer Architecture and High Performance Computing (SBAC-PAD), pp. 192–199 (2013)

    Google Scholar 

  18. Rodríguez, G., Martín, M.J., González, P., Touriño, J., Doallo, R.: CPPC: a compiler-assisted tool for portable checkpointing of message-passing applications. Concurr. Comput. Pract. Exper. 22(6), 749–766 (2010)

    Article  Google Scholar 

  19. Rodríguez, M., Cores, I., González, P., Martín, M.J.: Improving an MPI application-level migration approach through checkpoint file splitting. In: Computer Architecture and High Performance Computing (SBAC-PAD), pp. 33–40 (2014)

    Google Scholar 

  20. Vadhiyar, S.S., Dongarra, J.J.: SRS - a framework for developing malleable and migratable parallel applications for distributed systems. Parallel Process. Lett. 13(02), 291–312 (2003)

    Article  MathSciNet  Google Scholar 

  21. Wang, C., Mueller, F., Engelmann, C., Scott, S.L.: Proactive process-level live migration and back migration in HPC environments. J. Parallel Distrib. Comput. 72(2), 254–267 (2012)

    Article  Google Scholar 

  22. Weatherly, D.B., Lowenthal, D.K., Nakazawa, M., Lowenthal, F.: Dyn-MPI: supporting MPI on non dedicated clusters. In: ACM/IEEE Conference on High Performance Networking and Computing (SC), p. 5 (2003)

    Google Scholar 

Download references

Acknowledgments

This research was partially supported by the Ministry of Economy and Competitiveness of Spain and FEDER funds of the EU (Project TIN2013-42148-P), by the Galician Government and FEDER funds of the EU (consolidation program of competitive reference groups GRC2013/055) and by the EU under the COST programme Action IC1305, Network for Sustainable Ultrascale Computing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to María J. Martín .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Cores, I., González, P., Jeannot, E., Martín, M.J., Rodríguez, G. (2017). An Application-Level Solution for the Dynamic Reconfiguration of MPI Applications. In: Dutra, I., Camacho, R., Barbosa, J., Marques, O. (eds) High Performance Computing for Computational Science – VECPAR 2016. VECPAR 2016. Lecture Notes in Computer Science(), vol 10150. Springer, Cham. https://doi.org/10.1007/978-3-319-61982-8_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-61982-8_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-61981-1

  • Online ISBN: 978-3-319-61982-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics