Accelerating Application Migration in HPC

  • Ramy Gad
  • Simon Pickartz
  • Tim Süß
  • Lars Nagel
  • Stefan Lankes
  • André Brinkmann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9945)


It is predicted that the number of cores per node will rapidly increase with the upcoming era of exascale supercomputers. As a result, multiple applications will have to share one node and compete for the (often scarce) resources available on this node. Furthermore, the growing number of hardware components causes a decrease in the mean time between failures. Application migration between nodes has been proposed as a tool to mitigate these two problems: Bottlenecks due to resource sharing can be addressed by load balancing schemes which migrate applications; and hardware errors can often be tolerated by the system if faulty nodes are detected and processes are migrated ahead of time.

Virtual Machine (VM) migration currently seems to be the most promising technique for such approaches as it provides a strong level of isolation. However, the migration time of virtual machines is higher than the respective migration time on the process level. This can be explained by the additional virtualization layer in the memory hierarchy.

In this paper, we propose a technique for the acceleration of VM migration. We take advantage of the fact that freed memory regions within the guest system are not recognized by the hypervisor. Therefore, we fill them with zeros such that zero-page detection and compression can work more efficiently. We demonstrate that the approach reduces migration time by up to 19 % with a negligible overhead for some applications.


Virtual Machine Migration Time Target Node Virtual Machine Migration Runtime Overhead 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Acknowledgment and Availability

This work was supported by the German Ministry for Education and Research (BMBF) under project grant 01|H13004 (FAST).

The zeroing preload library is publicly available under https://version.zdv.Uni-Mainz.DE/anonscm/git/memory-zeroing/memory-zeroing.git.


  1. 1.
    Ansel, J., Arya, K., Cooperman, G.: DMTCP: transparent checkpointing for cluster computations and the desktop. In: 23rd IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–12 (2009)Google Scholar
  2. 2.
    Bellard, F.: Qemu, a fast and portable dynamic translator. In: FREENIX Track: 2005 USENIX Annual Technical Conference, pp. 41–46 (2005)Google Scholar
  3. 3.
    Breitbart, J., Pickartz, S., Weidendorfer, J., Monti, A.: Viability of Virtual Machines in HPC. In: Euro-Par 2016: Parallel Processing Workshops. LNCS. Springer (Accepted for publication) (2016)Google Scholar
  4. 4.
    Breitbart, J., Weidendorfer, J., Trinitis, C.: Case study on co-scheduling for HPC applications. In: 44th International Conference on Parallel Processing Workshops (ICPPW), pp. 277–285 (2015)Google Scholar
  5. 5.
    Bronevetsky, G., Marques, D., Pingali, K., Szwed, P.K., Schulz, M.: Application-level checkpointing for shared memory programs. In: 11th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 235–247 (2004)Google Scholar
  6. 6.
    Clark, C., Fraser, K., Hand, S., et al.: Live migration of virtual machines. In: 2nd Symposium on Networked Systems Design and Implementation (NSDI), pp. 273–286 (2005)Google Scholar
  7. 7.
    Darling, A., Carey, L., Feng, W.C.: The design, implementation, and evaluation of mpiBLAST. ClusterWorld Conference & Expo and the 4th International Conference on Linux Cluster: the HPC Revolution 2003, pp. 13–15, June 2003Google Scholar
  8. 8.
    Duell, J.: The design and implementation of berkeley lab’s linux checkpoint/restart. Technical report, Lawrence Berkeley National Laboratory (2003)Google Scholar
  9. 9.
    Dusser, J., Seznec, A.: Decoupled zero-compressed memory. In: 6th International Conference on High Performance Embedded Architectures and Compilers (HiPEAC), pp. 77–86 (2011)Google Scholar
  10. 10.
    FrantzDale, B., Plimpton, S.J., Shephard, M.S.: Software components for parallel multiscale simulation: an example with LAMMPS. Eng. Comput. (Lond.) 26(2), 205–211 (2010)CrossRefGoogle Scholar
  11. 11.
    Hirofuchi, T., Nakada, H., Itoh, S., Sekiguchi, S.: Reactive consolidation of virtual machines enabled by postcopy live migration. In: 5th International Workshop on Virtualization Technologies in Distributed Computing, VTDC@HPDC 2011, pp. 11–18 (2011)Google Scholar
  12. 12.
    Hu, J., Gu, J., Sun, G., Zhao, T.: A scheduling strategy on load balancing of virtual machine resources in cloud computing environment. In: 3rd International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), pp. 89–96 (2010)Google Scholar
  13. 13.
    Huang, W., Gao, Q., Liu, J., Panda, D.K.: High performance virtual machine migration with RDMA over modern interconnects. In: IEEE International Conference on Cluster Computing, pp. 11–20 (2007)Google Scholar
  14. 14.
    Kivity, A., Kamay, Y., Laor, D., Lublin, U., Liguori, A.: kvm: the Linux virtual machine monitor. In: Linux Symposium, pp. 225–230, June 2007Google Scholar
  15. 15.
    Kozuch, M., Satyanarayanan, M.: Internet suspend/resume. In: 4th IEEE Workshop on Mobile Computing Systems and Applications (WMCSA) (2002)Google Scholar
  16. 16.
    Lartillot, N., Lepage, T., Blanquart, S.: Phylobayes 3: a bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25(17), 2286–2288 (2009)CrossRefGoogle Scholar
  17. 17.
    Mäsker, M., Nagel, L., Brinkmann, A., Lotfifar, F., Johnson, M.: Smart grid-aware scheduling in data centres. In: 2015 Sustainable Internet and ICT for Sustainability (SustainIT), pp. 1–9 (2015)Google Scholar
  18. 18.
    Nagarajan, A.B., Mueller, F., Engelmann, C., Scott, S.L.: Proactive fault tolerance for HPC with xen virtualization. In: 21st Annual International Conference on Supercomputing (ICS), pp. 23–32 (2007)Google Scholar
  19. 19.
    Phillips, J.C., Braun, R., Wang, W., et al.: Scalable molecular dynamics with NAMD. J. Comput. Chem. 26(16), 1781–1802 (2005)CrossRefGoogle Scholar
  20. 20.
    Pickartz, S., Gad, R., Lankes, S., Nagel, L., Süß, T., Brinkmann, A., Krempel, S.: Migration techniques in HPC environments. In: Lopes, L., Žilinskas, J., Costan, A., Cascella, R.G., Kecskemeti, G., Jeannot, E., Cannataro, M., Ricci, L., Benkner, S., Petit, S., Scarano, V., Gracia, J., Hunold, S., Scott, S.L., Lankes, S., Lengauer, C., Carretero, J., Breitbart, J., Alexander, M. (eds.) Euro-Par 2014. LNCS, vol. 8806, pp. 486–497. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-14313-2_41 Google Scholar
  21. 21.
    Plank, J.S., Beck, M., Kingsley, G., Li, K.: Libckpt: transparent checkpointing under UNIX. In: USENIX 1995 Technical Conference on UNIX and Advanced Computing Systems, pp. 213–224 (1995)Google Scholar
  22. 22.
    Pronk, S., Páll, S., Schulz, R., et al.: GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29(7), 845–854 (2013)CrossRefGoogle Scholar
  23. 23.
    Randles, M., Lamb, D.J., Taleb-Bendiab, A.: A comparative study into distributed load balancing algorithms for cloud computing. In: 24th IEEE International Conference on Advanced Information Networking and Applications Workshops (WAINA), pp. 551–556 (2010)Google Scholar
  24. 24.
    Satyanarayanan, M., Gilbert, B., et al.: Pervasive personal computing in an internet suspend/resume system. IEEE Internet Comput. 11(2), 16–25 (2007)CrossRefGoogle Scholar
  25. 25.
    Schulz, M., Bronevetsky, G., Fernandes, R., Marques, D., Pingali, K., Stodghill, P.: Implementation and evaluation of a scalable application-level checkpoint-recovery scheme for MPI programs. In: ACM/IEEE SC Conference on High Performance Networking and Computing, p. 38 (2004)Google Scholar
  26. 26.
    Süß, T., Döring, N., Gad, R., Nagel, L., Brinkmann, A., Feld, D., Schröder, E., Soddemann, T.: Impact of the scheduling strategy in heterogeneous systems that provide co-scheduling. In: 1st COSH Workshop on Co-Scheduling of HPC Applications, COSH@HiPEAC 2016, pp. 37–42 (2016)Google Scholar
  27. 27.
    Svärd, P., Tordsson, J., Hudzia, B., Elmroth, E.: High performance live migration through dynamic page transfer reordering and compression. In: IEEE 3rd International Conference on Cloud Computing Technology and Science (CloudCom), pp. 542–548 (2011)Google Scholar
  28. 28.
    Wang, C., Mueller, F., Engelmann, C., Scott, S.L.: Proactive process-level live migration and back migration in HPC environments. J. Parall. Distrib. Comput. 72(2), 254–267 (2012)CrossRefGoogle Scholar
  29. 29.
    Youseff, L., Wolski, R., Gorda, B.C., Krintz, C.: Evaluating the performance impact of xen on MPI and process execution for HPC systems. In: Proceedings of the 1st International Workshop on Virtualization Technology in Distributed Computing (VTDC@SC) (2006)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Zentrum für DatenverarbeitungJohannes Gutenberg-Universität MainzMainzGermany
  2. 2.Institute for Automation of Complex Power Systems, E.ON Energy Research CenterRWTH AachenAachenGermany

Personalised recommendations