Scheduling a Metacomputer with Uncooperative Sub-schedulers

  • Jörn Gehring
  • Thomas Preiss
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1659)

Abstract

The main advantage of a metacomputer is not its peak performance but better utilization of its machines. Therefore, efficient scheduling strategies are vitally important to any metacomputing project. A real metacomputer management system will not gain exclusive access to all its resources, because participating centers will not be willing to give up autonomy. As a consequence, the scheduling algorithm has to deal with a set of local sub-schedulers performing individual machine management. Based on the proposal made by Feitelson and Rudolph in 1998 we developed a scheduling model that takes these circumstances into account. It has been implemented as a generic simulation environment, which we make available to the public. Using this tool, we examined the behavior of several well known scheduling algorithms in a metacomputing scenario. The results demonstrate that interaction with the sub-schedulers, communication of parallel applications, and the huge size of the metacomputer are among the most important aspects for scheduling a metacomputer. Based upon these observations we developed a new technique that makes it possible to use scheduling algorithms developed for less realistic machine models for real world metacomputing projects. Simulation runs demonstrate that this technique leads to far better results than the algorithms currently used in metacomputer management systems.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Academic Computing Services Amsterdam. The SARA Metacomputing Project. WWW Page. http://www.sara.nl/hec/projects/meta/.
  2. 2.
    Carl Albing. Cray NQS: production batch for a distributed computing world. In Proceedings of the 11th Sun User Group Conference and Exhibition, pages 302–309, Brookline, MA, USA, December 1993. Sun User Group, Inc.Google Scholar
  3. 3.
    J. Almond and D. Snelling. UNICORE: Secure and Uniform Access to Distributed Resources via the World Wide Web, 1998. http://www.kfajuelich.de/zam/RD/coop/unicore/.
  4. 4.
    Stergios V. Anastasiadis and Kenneth C. Sevcik. Parallel application scheduling on networks of workstations. Journal of Parallel and Distributed Computing, 43 (2):109–124, June 1997.CrossRefGoogle Scholar
  5. 5.
    T. E. Anderson, D. E. Culler, and D. A. Patterson. A case for NOW (Networks of Workstations). IEEE Micro, 15(1):54–64, February 1995.Google Scholar
  6. 6.
    R. Baraglia, R. Ferrini, D. Laforenza, and A. Lagana. Metacomputing to overcome the power limits of a single machine. Lecture Notes in Computer Science, 1225:982ff, 1997.Google Scholar
  7. 7.
    M. Calzarossa and G. Serazzi. A characterization of the variation in time of workload arrival patterns. IEEE Transactions on Computers, Vol.C-34:2, 156–162, 1985.CrossRefGoogle Scholar
  8. 8.
    Olivier Catoni. Solving scheduling problems by simulated annealing. SIAM Journal on Control and Optimization, 36 (5):1539–1575, September 1998.MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Steve J. Chapin, Dimitrios Katramatos, John Karpovich, and Andrew S. Grimshaw. Resource management in legion. Technical Report CS-98-09, Department of Computer Science, University of Virginia, February 11 1998. Wed, 19 Aug 199817:14:25 GMT.Google Scholar
  10. 10.
    Su-Hui Chiang, Rajesh K. Mansharamani, and Mary K. Vernon. Use of Application Characteristics and Limited Preemption for Run-To-Completion Parallel Processor Scheduling Policies. In Proceedings of the 1994 ACM SIGMETRICS Conference, pages 33–44, February 1994.Google Scholar
  11. 11.
    Cray Research. NQE. commercial product.Google Scholar
  12. 12.
    Thomas A. DeFanti, Ian Foster, Michael E. Papka, Rick Stevens, and Tim Kuhfuss. Overview of the I-WAY: Wide-area visual supercomputing. The International Journal of Supercomputer Applications and High Performance Computing, 10 (2/3):123–131, Summer/Fall 1996.CrossRefGoogle Scholar
  13. 13.
    Jack Dongarra and Hans Meuer and Erich Strohmaier. Top 500 Report. WWW Page, 1998. http://www.netlib.org/benchmark/top500/top500.list.html. 186
  14. 14.
    Allen B. Downey. A parallel workload model and its implications for processor allocation. Technical Report CSD-96-922, University of California, Berkeley, November 6, 1996.Google Scholar
  15. 15.
    Allen B Downey. A model for speedup of parallel programs. Technical Report CSD-97-933, University of California, Berkeley, January 30, 1997.Google Scholar
  16. 16.
    D. G. Feitelson. Packing schemes for gang scheduling. Lecture Notes in Computer Science, 1162:89ff, 1996.CrossRefGoogle Scholar
  17. 17.
    D. G. Feitelson and B. Nitzberg. Job characteristics of a production parallel scientific workload on the NASA ames iPSC/ 860. Lecture Notes in Computer Science, 949:337ff, 1995.Google Scholar
  18. 18.
    D. G. Feitelson and L. Rudolph. Metrics and benchmarking for parallel job scheduling. Lecture Notes in Computer Science, 1459:1ff, 1998.Google Scholar
  19. 19.
    D. G. Feitelson, L. Rudolph, U. Schwiegelshohn, and K. C. Sevcik. Theory and practice in parallel job scheduling. Lecture Notes in Computer Science, 1291:1ff, 1997.Google Scholar
  20. 20.
    I. Foster and C. Kesselman. Globus: A metacomputing infrastructure toolkit. The International Journal of Supercomputer Applications and High Performance Computing, 11 (2):115–128, Summer 1997.CrossRefGoogle Scholar
  21. 21.
    J. Gehring and F. Ramme. Architecture-independent request-scheduling with tight waiting-time estimations. Lecture Notes in Computer Science, 1162:65ff, 1996.Google Scholar
  22. 22.
    J. Gehring, A. Reinefeld, and A. Weber. PHASE and MICA: Application specific metacomputing. In Proceedings of Europar 97, Passau, Germany, 1997.Google Scholar
  23. 23.
    Genias Software GmbH, Erzgebirgstr. 2B, D-93073 Neutraubling. CODINE User’s Guide, 1993. http://www.genias.de/genias/english/codine/.
  24. 24.
    Hoare. Quicksort. In C. A. A. Hoare and C. B. Jones (Eds.), Essays in Computing Science, Prentice Hall. 1989.Google Scholar
  25. 25.
    Chao-Ju Hou and Kang G. Shin. Implementation of decentralized load sharing in networked workstations using the Condor package. Journal of Parallel and Distributed Computing, 40 (2):173–184, February 1997.CrossRefGoogle Scholar
  26. 26.
    IBM Corporation. Using and Administering LoadLeveler (Release 3.0), 4 edition, August 1996. Document Number SC23-3989-00.Google Scholar
  27. 27.
    K. Koski. A step towards large scale parallelism: building a parallel computing environment from heterogenous resources. Future Generation Computer Systems, 11 (4-5):491–498, August 1995.CrossRefGoogle Scholar
  28. 28.
    Robert R. Lipman and Judith E. Devaney. Websubmit–running supercomputer applications via the web. In Supercomputing’ 96, Pittsburgh, PA, November 1996.Google Scholar
  29. 29.
    Walter T. Ludwig. Algorithms for scheduling malleable and nonmalleable parallel tasks. Technical Report CS-TR-95-1279, University ofWisconsin, Madison, August 1995.Google Scholar
  30. 30.
    The NRW Metacomputing Initiative. WWW Page. http://www.unipaderborn.de/pc2/nrwmc/.
  31. 31.
    B. J. Overeinder and P. M. A. Sloot. Breaking the curse of dynamics by task migration: Pilot experiments in the polder metacomputer.Google Scholar
  32. 32.
    E. W. Parsons and K. C. Sevcik. Implementing multiprocessor scheduling disciplines. Lecture Notes in Computer Science, 1291:166ff, 1997.Google Scholar
  33. 33.
    Platform Computing Corporation. LSF Product Information. WWW Page, October 1996. http://www.platform.com/.
  34. 34.
    F. Ramme and K. Kremer. Scheduling a metacomputer by an implicit voting system. In Int. IEEE Symposium on High-Perform94 Google Scholar
  35. 35.
    A. Reinefeld, R. Baraglia, T. Decker, J. Gehring, D. Laforenza, F. Ramme, T. Rémke;, and J. Simon. The MOL project: An open, extensible metacomputer. In Debra Hensgen, editor, Proceedings of the 6th Heterogeneous Computing Workshop, pages 17–31, Washington, April 1 1997. IEEE Computer Society Press.Google Scholar
  36. 36.
    V. Sander, D. Erwin, and V. Huber. High-performance computer management based on Java. Lecture Notes in Computer Science, 1401:526ff, 1998.Google Scholar
  37. 37.
    M. Schwehm and T. Walter. Mapping and scheduling by genetic algorithms. Lecture Notes in Computer Science, 854:832ff, 1994.Google Scholar
  38. 38.
    Uwe Schwiegelshohn. Preemptive weighted completion time scheduling of parallel jobs. In Josep Díaz and Maria Serna, editors, Algorithms ESA’ 96, Fourth Annual European Symposium, volume 1136 of Lecture Notes in Computer Science, pages 39–51, Barcelona, Spain, 25-27 September 1996. Springer.Google Scholar
  39. 39.
    Uwe Schwiegelshohn and Ramin Yahyapour. Analysis of first-come-first-serve parallel job scheduling. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 629–638, San Francisco, California, 25-27 January 1998. i. A comparative analysis of static processor partitioning policies for parallel computers. In Internat. Workshop on Modeling and Simulation of Computer and Telecommunication Systems (MASCOTS), pages 283–286, January 1993.Google Scholar
  40. 41.
    Jon Siegel. CORBA: Fundamentals and Programming. John Wiley & Sons Inc., New York, 1 edition, 1996.Google Scholar
  41. 42.
    Larry Smarr and Charles E. Catlett. Metacomputing. Communications of the ACM, 35 (6):44–52, June 1992.CrossRefGoogle Scholar
  42. 43.
    W. Smith, I. Foster, and V. Taylor. Predicting application run times using historical information. Lecture Notes in Computer Science, 1459:122ff, 1998.Google Scholar
  43. 44.
    A. W. van Halderen, Benno J. Overeinder, Peter M. A. Sloot, R. van Dantzig, Dick H. J. Epema, and Miron Livny. Hierarchical resource management in the polder metacomputing initiative. submitted to Parallel Computing, 1997.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Jörn Gehring
    • 1
  • Thomas Preiss
    • 2
  1. 1.Paderborn Center for Parallel ComputingPaderbornGermany
  2. 2.Paderborn Center for Parallel ComputingPaderbornGermany

Personalised recommendations