Abstract
It is predicted that the number of cores per node will rapidly increase with the upcoming era of exascale supercomputers. As a result, multiple applications will have to share one node and compete for the (often scarce) resources available on this node. Furthermore, the growing number of hardware components causes a decrease in the mean time between failures. Application migration between nodes has been proposed as a tool to mitigate these two problems: Bottlenecks due to resource sharing can be addressed by load balancing schemes which migrate applications; and hardware errors can often be tolerated by the system if faulty nodes are detected and processes are migrated ahead of time.
Virtual Machine (VM) migration currently seems to be the most promising technique for such approaches as it provides a strong level of isolation. However, the migration time of virtual machines is higher than the respective migration time on the process level. This can be explained by the additional virtualization layer in the memory hierarchy.
In this paper, we propose a technique for the acceleration of VM migration. We take advantage of the fact that freed memory regions within the guest system are not recognized by the hypervisor. Therefore, we fill them with zeros such that zero-page detection and compression can work more efficiently. We demonstrate that the approach reduces migration time by up to 19 % with a negligible overhead for some applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ansel, J., Arya, K., Cooperman, G.: DMTCP: transparent checkpointing for cluster computations and the desktop. In: 23rd IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–12 (2009)
Bellard, F.: Qemu, a fast and portable dynamic translator. In: FREENIX Track: 2005 USENIX Annual Technical Conference, pp. 41–46 (2005)
Breitbart, J., Pickartz, S., Weidendorfer, J., Monti, A.: Viability of Virtual Machines in HPC. In: Euro-Par 2016: Parallel Processing Workshops. LNCS. Springer (Accepted for publication) (2016)
Breitbart, J., Weidendorfer, J., Trinitis, C.: Case study on co-scheduling for HPC applications. In: 44th International Conference on Parallel Processing Workshops (ICPPW), pp. 277–285 (2015)
Bronevetsky, G., Marques, D., Pingali, K., Szwed, P.K., Schulz, M.: Application-level checkpointing for shared memory programs. In: 11th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 235–247 (2004)
Clark, C., Fraser, K., Hand, S., et al.: Live migration of virtual machines. In: 2nd Symposium on Networked Systems Design and Implementation (NSDI), pp. 273–286 (2005)
Darling, A., Carey, L., Feng, W.C.: The design, implementation, and evaluation of mpiBLAST. ClusterWorld Conference & Expo and the 4th International Conference on Linux Cluster: the HPC Revolution 2003, pp. 13–15, June 2003
Duell, J.: The design and implementation of berkeley lab’s linux checkpoint/restart. Technical report, Lawrence Berkeley National Laboratory (2003)
Dusser, J., Seznec, A.: Decoupled zero-compressed memory. In: 6th International Conference on High Performance Embedded Architectures and Compilers (HiPEAC), pp. 77–86 (2011)
FrantzDale, B., Plimpton, S.J., Shephard, M.S.: Software components for parallel multiscale simulation: an example with LAMMPS. Eng. Comput. (Lond.) 26(2), 205–211 (2010)
Hirofuchi, T., Nakada, H., Itoh, S., Sekiguchi, S.: Reactive consolidation of virtual machines enabled by postcopy live migration. In: 5th International Workshop on Virtualization Technologies in Distributed Computing, VTDC@HPDC 2011, pp. 11–18 (2011)
Hu, J., Gu, J., Sun, G., Zhao, T.: A scheduling strategy on load balancing of virtual machine resources in cloud computing environment. In: 3rd International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), pp. 89–96 (2010)
Huang, W., Gao, Q., Liu, J., Panda, D.K.: High performance virtual machine migration with RDMA over modern interconnects. In: IEEE International Conference on Cluster Computing, pp. 11–20 (2007)
Kivity, A., Kamay, Y., Laor, D., Lublin, U., Liguori, A.: kvm: the Linux virtual machine monitor. In: Linux Symposium, pp. 225–230, June 2007
Kozuch, M., Satyanarayanan, M.: Internet suspend/resume. In: 4th IEEE Workshop on Mobile Computing Systems and Applications (WMCSA) (2002)
Lartillot, N., Lepage, T., Blanquart, S.: Phylobayes 3: a bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25(17), 2286–2288 (2009)
Mäsker, M., Nagel, L., Brinkmann, A., Lotfifar, F., Johnson, M.: Smart grid-aware scheduling in data centres. In: 2015 Sustainable Internet and ICT for Sustainability (SustainIT), pp. 1–9 (2015)
Nagarajan, A.B., Mueller, F., Engelmann, C., Scott, S.L.: Proactive fault tolerance for HPC with xen virtualization. In: 21st Annual International Conference on Supercomputing (ICS), pp. 23–32 (2007)
Phillips, J.C., Braun, R., Wang, W., et al.: Scalable molecular dynamics with NAMD. J. Comput. Chem. 26(16), 1781–1802 (2005)
Pickartz, S., Gad, R., Lankes, S., Nagel, L., Süß, T., Brinkmann, A., Krempel, S.: Migration techniques in HPC environments. In: Lopes, L., Žilinskas, J., Costan, A., Cascella, R.G., Kecskemeti, G., Jeannot, E., Cannataro, M., Ricci, L., Benkner, S., Petit, S., Scarano, V., Gracia, J., Hunold, S., Scott, S.L., Lankes, S., Lengauer, C., Carretero, J., Breitbart, J., Alexander, M. (eds.) Euro-Par 2014. LNCS, vol. 8806, pp. 486–497. Springer, Heidelberg (2014). doi:10.1007/978-3-319-14313-2_41
Plank, J.S., Beck, M., Kingsley, G., Li, K.: Libckpt: transparent checkpointing under UNIX. In: USENIX 1995 Technical Conference on UNIX and Advanced Computing Systems, pp. 213–224 (1995)
Pronk, S., Páll, S., Schulz, R., et al.: GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29(7), 845–854 (2013)
Randles, M., Lamb, D.J., Taleb-Bendiab, A.: A comparative study into distributed load balancing algorithms for cloud computing. In: 24th IEEE International Conference on Advanced Information Networking and Applications Workshops (WAINA), pp. 551–556 (2010)
Satyanarayanan, M., Gilbert, B., et al.: Pervasive personal computing in an internet suspend/resume system. IEEE Internet Comput. 11(2), 16–25 (2007)
Schulz, M., Bronevetsky, G., Fernandes, R., Marques, D., Pingali, K., Stodghill, P.: Implementation and evaluation of a scalable application-level checkpoint-recovery scheme for MPI programs. In: ACM/IEEE SC Conference on High Performance Networking and Computing, p. 38 (2004)
Süß, T., Döring, N., Gad, R., Nagel, L., Brinkmann, A., Feld, D., Schröder, E., Soddemann, T.: Impact of the scheduling strategy in heterogeneous systems that provide co-scheduling. In: 1st COSH Workshop on Co-Scheduling of HPC Applications, COSH@HiPEAC 2016, pp. 37–42 (2016)
Svärd, P., Tordsson, J., Hudzia, B., Elmroth, E.: High performance live migration through dynamic page transfer reordering and compression. In: IEEE 3rd International Conference on Cloud Computing Technology and Science (CloudCom), pp. 542–548 (2011)
Wang, C., Mueller, F., Engelmann, C., Scott, S.L.: Proactive process-level live migration and back migration in HPC environments. J. Parall. Distrib. Comput. 72(2), 254–267 (2012)
Youseff, L., Wolski, R., Gorda, B.C., Krintz, C.: Evaluating the performance impact of xen on MPI and process execution for HPC systems. In: Proceedings of the 1st International Workshop on Virtualization Technology in Distributed Computing (VTDC@SC) (2006)
Acknowledgment and Availability
This work was supported by the German Ministry for Education and Research (BMBF) under project grant 01|H13004 (FAST).
The zeroing preload library is publicly available under https://version.zdv.Uni-Mainz.DE/anonscm/git/memory-zeroing/memory-zeroing.git.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Gad, R., Pickartz, S., Süß, T., Nagel, L., Lankes, S., Brinkmann, A. (2016). Accelerating Application Migration in HPC. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-46079-6_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46078-9
Online ISBN: 978-3-319-46079-6
eBook Packages: Computer ScienceComputer Science (R0)