Scheduling a Metacomputer with Uncooperative Sub-schedulers
The main advantage of a metacomputer is not its peak performance but better utilization of its machines. Therefore, efficient scheduling strategies are vitally important to any metacomputing project. A real metacomputer management system will not gain exclusive access to all its resources, because participating centers will not be willing to give up autonomy. As a consequence, the scheduling algorithm has to deal with a set of local sub-schedulers performing individual machine management. Based on the proposal made by Feitelson and Rudolph in 1998 we developed a scheduling model that takes these circumstances into account. It has been implemented as a generic simulation environment, which we make available to the public. Using this tool, we examined the behavior of several well known scheduling algorithms in a metacomputing scenario. The results demonstrate that interaction with the sub-schedulers, communication of parallel applications, and the huge size of the metacomputer are among the most important aspects for scheduling a metacomputer. Based upon these observations we developed a new technique that makes it possible to use scheduling algorithms developed for less realistic machine models for real world metacomputing projects. Simulation runs demonstrate that this technique leads to far better results than the algorithms currently used in metacomputer management systems.
KeywordsSchedule Algorithm Parallel Machine Partition Size Workload Model Local Queue
Unable to display preview. Download preview PDF.
- 1.Academic Computing Services Amsterdam. The SARA Metacomputing Project. WWW Page. http://www.sara.nl/hec/projects/meta/.
- 2.Carl Albing. Cray NQS: production batch for a distributed computing world. In Proceedings of the 11th Sun User Group Conference and Exhibition, pages 302–309, Brookline, MA, USA, December 1993. Sun User Group, Inc.Google Scholar
- 3.J. Almond and D. Snelling. UNICORE: Secure and Uniform Access to Distributed Resources via the World Wide Web, 1998. http://www.kfajuelich.de/zam/RD/coop/unicore/.
- 5.T. E. Anderson, D. E. Culler, and D. A. Patterson. A case for NOW (Networks of Workstations). IEEE Micro, 15(1):54–64, February 1995.Google Scholar
- 6.R. Baraglia, R. Ferrini, D. Laforenza, and A. Lagana. Metacomputing to overcome the power limits of a single machine. Lecture Notes in Computer Science, 1225:982ff, 1997.Google Scholar
- 9.Steve J. Chapin, Dimitrios Katramatos, John Karpovich, and Andrew S. Grimshaw. Resource management in legion. Technical Report CS-98-09, Department of Computer Science, University of Virginia, February 11 1998. Wed, 19 Aug 199817:14:25 GMT.Google Scholar
- 10.Su-Hui Chiang, Rajesh K. Mansharamani, and Mary K. Vernon. Use of Application Characteristics and Limited Preemption for Run-To-Completion Parallel Processor Scheduling Policies. In Proceedings of the 1994 ACM SIGMETRICS Conference, pages 33–44, February 1994.Google Scholar
- 11.Cray Research. NQE. commercial product.Google Scholar
- 13.Jack Dongarra and Hans Meuer and Erich Strohmaier. Top 500 Report. WWW Page, 1998. http://www.netlib.org/benchmark/top500/top500.list.html. 186
- 14.Allen B. Downey. A parallel workload model and its implications for processor allocation. Technical Report CSD-96-922, University of California, Berkeley, November 6, 1996.Google Scholar
- 15.Allen B Downey. A model for speedup of parallel programs. Technical Report CSD-97-933, University of California, Berkeley, January 30, 1997.Google Scholar
- 17.D. G. Feitelson and B. Nitzberg. Job characteristics of a production parallel scientific workload on the NASA ames iPSC/ 860. Lecture Notes in Computer Science, 949:337ff, 1995.Google Scholar
- 18.D. G. Feitelson and L. Rudolph. Metrics and benchmarking for parallel job scheduling. Lecture Notes in Computer Science, 1459:1ff, 1998.Google Scholar
- 19.D. G. Feitelson, L. Rudolph, U. Schwiegelshohn, and K. C. Sevcik. Theory and practice in parallel job scheduling. Lecture Notes in Computer Science, 1291:1ff, 1997.Google Scholar
- 21.J. Gehring and F. Ramme. Architecture-independent request-scheduling with tight waiting-time estimations. Lecture Notes in Computer Science, 1162:65ff, 1996.Google Scholar
- 22.J. Gehring, A. Reinefeld, and A. Weber. PHASE and MICA: Application specific metacomputing. In Proceedings of Europar 97, Passau, Germany, 1997.Google Scholar
- 23.Genias Software GmbH, Erzgebirgstr. 2B, D-93073 Neutraubling. CODINE User’s Guide, 1993. http://www.genias.de/genias/english/codine/.
- 24.Hoare. Quicksort. In C. A. A. Hoare and C. B. Jones (Eds.), Essays in Computing Science, Prentice Hall. 1989.Google Scholar
- 26.IBM Corporation. Using and Administering LoadLeveler (Release 3.0), 4 edition, August 1996. Document Number SC23-3989-00.Google Scholar
- 28.Robert R. Lipman and Judith E. Devaney. Websubmit–running supercomputer applications via the web. In Supercomputing’ 96, Pittsburgh, PA, November 1996.Google Scholar
- 29.Walter T. Ludwig. Algorithms for scheduling malleable and nonmalleable parallel tasks. Technical Report CS-TR-95-1279, University ofWisconsin, Madison, August 1995.Google Scholar
- 30.The NRW Metacomputing Initiative. WWW Page. http://www.unipaderborn.de/pc2/nrwmc/.
- 31.B. J. Overeinder and P. M. A. Sloot. Breaking the curse of dynamics by task migration: Pilot experiments in the polder metacomputer.Google Scholar
- 32.E. W. Parsons and K. C. Sevcik. Implementing multiprocessor scheduling disciplines. Lecture Notes in Computer Science, 1291:166ff, 1997.Google Scholar
- 33.Platform Computing Corporation. LSF Product Information. WWW Page, October 1996. http://www.platform.com/.
- 34.F. Ramme and K. Kremer. Scheduling a metacomputer by an implicit voting system. In Int. IEEE Symposium on High-Perform94 Google Scholar
- 35.A. Reinefeld, R. Baraglia, T. Decker, J. Gehring, D. Laforenza, F. Ramme, T. Rémke;, and J. Simon. The MOL project: An open, extensible metacomputer. In Debra Hensgen, editor, Proceedings of the 6th Heterogeneous Computing Workshop, pages 17–31, Washington, April 1 1997. IEEE Computer Society Press.Google Scholar
- 36.V. Sander, D. Erwin, and V. Huber. High-performance computer management based on Java. Lecture Notes in Computer Science, 1401:526ff, 1998.Google Scholar
- 37.M. Schwehm and T. Walter. Mapping and scheduling by genetic algorithms. Lecture Notes in Computer Science, 854:832ff, 1994.Google Scholar
- 38.Uwe Schwiegelshohn. Preemptive weighted completion time scheduling of parallel jobs. In Josep Díaz and Maria Serna, editors, Algorithms ESA’ 96, Fourth Annual European Symposium, volume 1136 of Lecture Notes in Computer Science, pages 39–51, Barcelona, Spain, 25-27 September 1996. Springer.Google Scholar
- 39.Uwe Schwiegelshohn and Ramin Yahyapour. Analysis of first-come-first-serve parallel job scheduling. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 629–638, San Francisco, California, 25-27 January 1998. i. A comparative analysis of static processor partitioning policies for parallel computers. In Internat. Workshop on Modeling and Simulation of Computer and Telecommunication Systems (MASCOTS), pages 283–286, January 1993.Google Scholar
- 41.Jon Siegel. CORBA: Fundamentals and Programming. John Wiley & Sons Inc., New York, 1 edition, 1996.Google Scholar
- 43.W. Smith, I. Foster, and V. Taylor. Predicting application run times using historical information. Lecture Notes in Computer Science, 1459:122ff, 1998.Google Scholar
- 44.A. W. van Halderen, Benno J. Overeinder, Peter M. A. Sloot, R. van Dantzig, Dick H. J. Epema, and Miron Livny. Hierarchical resource management in the polder metacomputing initiative. submitted to Parallel Computing, 1997.Google Scholar