Skip to main content

Accelerating Application Migration in HPC

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9945))

Included in the following conference series:

Abstract

It is predicted that the number of cores per node will rapidly increase with the upcoming era of exascale supercomputers. As a result, multiple applications will have to share one node and compete for the (often scarce) resources available on this node. Furthermore, the growing number of hardware components causes a decrease in the mean time between failures. Application migration between nodes has been proposed as a tool to mitigate these two problems: Bottlenecks due to resource sharing can be addressed by load balancing schemes which migrate applications; and hardware errors can often be tolerated by the system if faulty nodes are detected and processes are migrated ahead of time.

Virtual Machine (VM) migration currently seems to be the most promising technique for such approaches as it provides a strong level of isolation. However, the migration time of virtual machines is higher than the respective migration time on the process level. This can be explained by the additional virtualization layer in the memory hierarchy.

In this paper, we propose a technique for the acceleration of VM migration. We take advantage of the fact that freed memory regions within the guest system are not recognized by the hypervisor. Therefore, we fill them with zeros such that zero-page detection and compression can work more efficiently. We demonstrate that the approach reduces migration time by up to 19 % with a negligible overhead for some applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ansel, J., Arya, K., Cooperman, G.: DMTCP: transparent checkpointing for cluster computations and the desktop. In: 23rd IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–12 (2009)

    Google Scholar 

  2. Bellard, F.: Qemu, a fast and portable dynamic translator. In: FREENIX Track: 2005 USENIX Annual Technical Conference, pp. 41–46 (2005)

    Google Scholar 

  3. Breitbart, J., Pickartz, S., Weidendorfer, J., Monti, A.: Viability of Virtual Machines in HPC. In: Euro-Par 2016: Parallel Processing Workshops. LNCS. Springer (Accepted for publication) (2016)

    Google Scholar 

  4. Breitbart, J., Weidendorfer, J., Trinitis, C.: Case study on co-scheduling for HPC applications. In: 44th International Conference on Parallel Processing Workshops (ICPPW), pp. 277–285 (2015)

    Google Scholar 

  5. Bronevetsky, G., Marques, D., Pingali, K., Szwed, P.K., Schulz, M.: Application-level checkpointing for shared memory programs. In: 11th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 235–247 (2004)

    Google Scholar 

  6. Clark, C., Fraser, K., Hand, S., et al.: Live migration of virtual machines. In: 2nd Symposium on Networked Systems Design and Implementation (NSDI), pp. 273–286 (2005)

    Google Scholar 

  7. Darling, A., Carey, L., Feng, W.C.: The design, implementation, and evaluation of mpiBLAST. ClusterWorld Conference & Expo and the 4th International Conference on Linux Cluster: the HPC Revolution 2003, pp. 13–15, June 2003

    Google Scholar 

  8. Duell, J.: The design and implementation of berkeley lab’s linux checkpoint/restart. Technical report, Lawrence Berkeley National Laboratory (2003)

    Google Scholar 

  9. Dusser, J., Seznec, A.: Decoupled zero-compressed memory. In: 6th International Conference on High Performance Embedded Architectures and Compilers (HiPEAC), pp. 77–86 (2011)

    Google Scholar 

  10. FrantzDale, B., Plimpton, S.J., Shephard, M.S.: Software components for parallel multiscale simulation: an example with LAMMPS. Eng. Comput. (Lond.) 26(2), 205–211 (2010)

    Article  Google Scholar 

  11. Hirofuchi, T., Nakada, H., Itoh, S., Sekiguchi, S.: Reactive consolidation of virtual machines enabled by postcopy live migration. In: 5th International Workshop on Virtualization Technologies in Distributed Computing, VTDC@HPDC 2011, pp. 11–18 (2011)

    Google Scholar 

  12. Hu, J., Gu, J., Sun, G., Zhao, T.: A scheduling strategy on load balancing of virtual machine resources in cloud computing environment. In: 3rd International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), pp. 89–96 (2010)

    Google Scholar 

  13. Huang, W., Gao, Q., Liu, J., Panda, D.K.: High performance virtual machine migration with RDMA over modern interconnects. In: IEEE International Conference on Cluster Computing, pp. 11–20 (2007)

    Google Scholar 

  14. Kivity, A., Kamay, Y., Laor, D., Lublin, U., Liguori, A.: kvm: the Linux virtual machine monitor. In: Linux Symposium, pp. 225–230, June 2007

    Google Scholar 

  15. Kozuch, M., Satyanarayanan, M.: Internet suspend/resume. In: 4th IEEE Workshop on Mobile Computing Systems and Applications (WMCSA) (2002)

    Google Scholar 

  16. Lartillot, N., Lepage, T., Blanquart, S.: Phylobayes 3: a bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25(17), 2286–2288 (2009)

    Article  Google Scholar 

  17. Mäsker, M., Nagel, L., Brinkmann, A., Lotfifar, F., Johnson, M.: Smart grid-aware scheduling in data centres. In: 2015 Sustainable Internet and ICT for Sustainability (SustainIT), pp. 1–9 (2015)

    Google Scholar 

  18. Nagarajan, A.B., Mueller, F., Engelmann, C., Scott, S.L.: Proactive fault tolerance for HPC with xen virtualization. In: 21st Annual International Conference on Supercomputing (ICS), pp. 23–32 (2007)

    Google Scholar 

  19. Phillips, J.C., Braun, R., Wang, W., et al.: Scalable molecular dynamics with NAMD. J. Comput. Chem. 26(16), 1781–1802 (2005)

    Article  Google Scholar 

  20. Pickartz, S., Gad, R., Lankes, S., Nagel, L., Süß, T., Brinkmann, A., Krempel, S.: Migration techniques in HPC environments. In: Lopes, L., Žilinskas, J., Costan, A., Cascella, R.G., Kecskemeti, G., Jeannot, E., Cannataro, M., Ricci, L., Benkner, S., Petit, S., Scarano, V., Gracia, J., Hunold, S., Scott, S.L., Lankes, S., Lengauer, C., Carretero, J., Breitbart, J., Alexander, M. (eds.) Euro-Par 2014. LNCS, vol. 8806, pp. 486–497. Springer, Heidelberg (2014). doi:10.1007/978-3-319-14313-2_41

    Google Scholar 

  21. Plank, J.S., Beck, M., Kingsley, G., Li, K.: Libckpt: transparent checkpointing under UNIX. In: USENIX 1995 Technical Conference on UNIX and Advanced Computing Systems, pp. 213–224 (1995)

    Google Scholar 

  22. Pronk, S., Páll, S., Schulz, R., et al.: GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29(7), 845–854 (2013)

    Article  Google Scholar 

  23. Randles, M., Lamb, D.J., Taleb-Bendiab, A.: A comparative study into distributed load balancing algorithms for cloud computing. In: 24th IEEE International Conference on Advanced Information Networking and Applications Workshops (WAINA), pp. 551–556 (2010)

    Google Scholar 

  24. Satyanarayanan, M., Gilbert, B., et al.: Pervasive personal computing in an internet suspend/resume system. IEEE Internet Comput. 11(2), 16–25 (2007)

    Article  Google Scholar 

  25. Schulz, M., Bronevetsky, G., Fernandes, R., Marques, D., Pingali, K., Stodghill, P.: Implementation and evaluation of a scalable application-level checkpoint-recovery scheme for MPI programs. In: ACM/IEEE SC Conference on High Performance Networking and Computing, p. 38 (2004)

    Google Scholar 

  26. Süß, T., Döring, N., Gad, R., Nagel, L., Brinkmann, A., Feld, D., Schröder, E., Soddemann, T.: Impact of the scheduling strategy in heterogeneous systems that provide co-scheduling. In: 1st COSH Workshop on Co-Scheduling of HPC Applications, COSH@HiPEAC 2016, pp. 37–42 (2016)

    Google Scholar 

  27. Svärd, P., Tordsson, J., Hudzia, B., Elmroth, E.: High performance live migration through dynamic page transfer reordering and compression. In: IEEE 3rd International Conference on Cloud Computing Technology and Science (CloudCom), pp. 542–548 (2011)

    Google Scholar 

  28. Wang, C., Mueller, F., Engelmann, C., Scott, S.L.: Proactive process-level live migration and back migration in HPC environments. J. Parall. Distrib. Comput. 72(2), 254–267 (2012)

    Article  Google Scholar 

  29. Youseff, L., Wolski, R., Gorda, B.C., Krintz, C.: Evaluating the performance impact of xen on MPI and process execution for HPC systems. In: Proceedings of the 1st International Workshop on Virtualization Technology in Distributed Computing (VTDC@SC) (2006)

    Google Scholar 

Download references

Acknowledgment and Availability

This work was supported by the German Ministry for Education and Research (BMBF) under project grant 01|H13004 (FAST).

The zeroing preload library is publicly available under https://version.zdv.Uni-Mainz.DE/anonscm/git/memory-zeroing/memory-zeroing.git.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ramy Gad or Simon Pickartz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Gad, R., Pickartz, S., Süß, T., Nagel, L., Lankes, S., Brinkmann, A. (2016). Accelerating Application Migration in HPC. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46079-6_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46078-9

  • Online ISBN: 978-3-319-46079-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics