Skip to main content

Surfing the Grid - Dynamic Task Migration in the Polder Metacomputer Project

  • Conference paper
  • First Online:
Recent Advances in Parallel Virtual Machine and Message Passing Interface (EuroPVM/MPI 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2474))

  • 430 Accesses

Abstract

Traditionally, PVM and MPI programs live on message passing systems, from clusters of non-dedicated workstations to MPP machines. The performance of a parallel program in such an environment is usually determined by the single least performing task in that program. In a homogeneous, stable environment, such as an MPP machine, this can only be repaired by improving the workload balance between the individual tasks. In a cluster of workstations, differences in the performance of individual nodes and network components can be an important cause of imbalance. Moreover, these differences will be time dependent as the load generated by other users plays an important role. Worse yet, nodes may be dynamically removed from the available pool of workstations. In such a dynamically changing environment, redistributing tasks over the available nodes can help to maintain the performance of individual programs and of the pool as a whole. Condor [1] solves this task migration problem for sequential programs. However, the migration of tasks in a parallel program presents a number of additional challenges, for the migrator as well as for the scheduler. For PVM programs, there are a number of solutions, including Dynamite [2]; Hector [3] was designed to migrate MPI tasks and to checkpoint complete MPI programs. The latter capability is very desirable for long-running programs in an unreliable environment.

This brings us to the Grid, where both performance and availability of resources vary dynamically and where reliability is an important issue. Once again, Livny with his Condor-G [4] provides a solution for sequential programs, including provisions for fault-tolerance. In the Polder Metacomputer Project, based on our experience with Dynamite, we are currently investigating the additional challenges in creating a task-migration and checkpointing capability for the Grid environment. This includes the handling of shared resources, such as open files; differences in administrative domains, etc. Eventually, the migration of parallel programs will allow large parallel applications to surf the Grid and ride the waves in this highly dynamic environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. M. Litzkow, T. Tannenbaum, J. Basney, M. Livny, Checkpoint and migration of Unix processes in the Condor distributed processing system, Technical Report 1346, University of Wisconsin, WI, USA, 1997.

    Google Scholar 

  2. K. A. Iskra, F. van der Linden, Z. W. Hendrikse, G. D. van Albada, B. J. Overein-der, P. M. A. Sloot, The implementation of Dynamite-an environment for migrating PVM tasks, Operating Systems Review, vol. 34, nr 3 pp. 40–55. Association for Computing Machinery, Special Interest Group on Operating Systems, July 2000.

    Article  Google Scholar 

  3. J. Robinson, S. H. Russ, B. Flachs, B. Heckel, A task migration implementation of the Message Passing Interface, Proceedings of the 5th IEEE international symposium on high performance distributed computing, 61–68, 1996.

    Google Scholar 

  4. J. Frey, T. Tannenbaum, I. Foster, M. Livny, S. Tuecke, Condor-G: A Computation Management Agent for Multi-Institutional Grids, Proceedings of the Tenth IEEE Symposium on High Performance Distributed Computing (HPDC10) San Francisco, California, August 7–9, 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

van Albada, D., Sloot, P. (2002). Surfing the Grid - Dynamic Task Migration in the Polder Metacomputer Project. In: KranzlmĂĽller, D., Volkert, J., Kacsuk, P., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2002. Lecture Notes in Computer Science, vol 2474. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45825-5_3

Download citation

  • DOI: https://doi.org/10.1007/3-540-45825-5_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44296-7

  • Online ISBN: 978-3-540-45825-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics