Parallel application characterization for multiprocessor scheduling policy design
Much of the recent work on multiprocessor scheduling disciplines has used abstract workload models to explore the fundamental, high-level properties of the various alternatives. As continuing work on these policies increases their level of sophistication, however, it is clear that the choice of appropriate policies must be guided at least in part by the typical behavior of actual parallel applications. Our goal in this paper is to examine a variety of such applications, providing measurements of properties relevant to scheduling policy design. We give measurements for both hand-coded parallel programs (from the SPLASH benchmark suites) and compiler-parallelized programs (from the PERFECT Club suite) running on a KSR-2 shared-memory multiprocessor.
In the spectrum between aggressively dynamic and static allocation policies, what is an appropriate choice for the rate at which reallocations should take place?
Is it possible to take measurements of application speedup and efficiency at runtime that are sufficiently accurate to guide allocation decisions?
First, we examine application speedup, and the sources of speedup loss. Our results confirm that there is considerable variation in job speedup, and that the bulk of the speedup loss is due to communication and idleness.
Next, we examine runtime measurement of speedup information. We begin by looking at how such information might be acquired accurately and at acceptable cost. We then investigate the extent to which recent measurements of speedup accurately predict the future, and so the extent to which such measurements might reasonably be expected to guide allocation decisions.
Finally, we examine the durations of individual processor idle periods, and relate these to the cost of reallocating a processor at those times. These results shed light on the potential for aggressively dynamic policies to improve performance.
KeywordsProcessor Time Delay Interval Idle Period Benchmark Suite Attraction Memory
Unable to display preview. Download preview PDF.
- 1.A. Agarwal and A. Gupta. Memory-Reference Characteristics of Multiprocessor Applications under MACH. In Proceedings of the ACM SIGMETRICS Conference, pages 215–225, May 1988.Google Scholar
- 2.I. Ashok and J. Zahorjan. Scheduling a Mixed Interactive and Batch Workload on a Parallel, Shared Memory Supercomputer. In Supercomputing '92, pages 616–625, Nov. 1992.Google Scholar
- 3.M. Berry, D. Chen, P. Koss, D. Kuck, S. Lo, Y. Pang, L. Pointer, R. Roloff, A. Sameh, E. Clementi, S. Chin, D. Schneider, G. Fox, P. Messina, D. Walker, C. Hsiung, J. Scharzmeier, K. Lue, S. Orszag, F. Seidl, O. Johnson, R. Goodrum, and J. Martin. The PERFECT Club Benchmarks: Effective Performance Evaluation of Supercomputers. The International Journal of Supercomputer Applications, 3(3):5–40, 1989.Google Scholar
- 4.J. Chen, Y. Endo, K. Chan, D. Mazieres, A. Dias, M. Seltzer, and M. Smith. The Measured Performance of Personal Computer Operating Systems. In Proceedings of the 15th ACM Symposium on Operating system Principles, pages 299–313, Dec. 1995.Google Scholar
- 5.S.-H. Chiang, R. K. Mansharamani, and M. K. Vernon. Use of Application Characteristics and Limited Preemption for Run-To-Completion Parallel Processor Scheduling Policies. In Proceedings of the ACM SIGMETRICS Conference, pages 33–44, May 1994.Google Scholar
- 6.E. C. Cooper and R. P. Draves. C Threads. Technical Report CMU-CS-88-154, Department of Computer Science, Carnegie-Mellon University, June 1988.Google Scholar
- 7.G. Cybenko, L. Kipp, L. Pointer, and D. Kuck. Supercomputer Performance Evaluation and the Perfect Benchmarks. In Proceedings of the 1990 International Conference on Supercomputing, ACM SIGARCH Computer Architecture News, pages 254–266, Sept. 1990.Google Scholar
- 8.R. Cypher, A. Ho, S. Konstantinidou, and P. Messina. Architectural Requirements of Parallel Scientific Applications with Explicit Communication. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 2–13, May 1993.Google Scholar
- 9.F. Darema-Rogers, G. Pfister, and K. So. Memory Access Patterns of Parallel Scientific Programs. In Proceedings of the ACM SIGMETRICS Conference, pages 46–58, May 1987.Google Scholar
- 10.J. J. Dongarra and T. Dunigan. Message-Passing Performance of Various Computers. Technical Report CS-95-299, University of Tennessee, July 1995.Google Scholar
- 11.R. Eigenmann,J. Hoeflinger,Z. Li, and D. Padua. Experience in the Parallelization of Four Perfect-Benchmark Programs. Technical Report 1193, Center for Supercomputing Research and Development, Aug. 1991.Google Scholar
- 12.D. G. Feitelson and B. Nitzberg. Job Characteristics of a Production Parallel Scientific Workload on the NASA Ames iPSC/860. In Proceedings of the IPPS'95 Workshop on Job Scheduling Strategies for Parallel Processing, pages 337–360, Apr. 1995.Google Scholar
- 13.K. Guha. Using Parallel Program Characteristics in Dynamic Processor Allocation Policies. Technical Report CS-95-03, Department of Computer Science, York University, May 1995.Google Scholar
- 14.A. Gupta, A. Tucker, and S. Urushibara. The Impact of Operating System Scheduling Policies and Synchronization Methods on the Performance of Parallel Applications. In Proceedings of the ACM SIGMETRICS Conference, pages 120–133, May 1991.Google Scholar
- 15.A. Karlin, K. Li, M. S. Manasse, and S. Owicki. Empirical Studies of Competitive Spinning for a Shared-Memory Multiprocessor. In Proceedings of the 13th ACM Symposium on Operating Systems Principles, pages 41–55, Oct. 1991.Google Scholar
- 16.Kendall Square Research Inc., 170 Tracer Lane, Waltham, MA 02154. KSR Fortran Programming, 1993.Google Scholar
- 17.S. T. Leutenegger and M. K. Vernon. The Performance of Multiprogrammed Multiprocessor Scheduling Policies. In Proceedings of the ACM SIGMETRICS Conference, pages 226–236, May 1990.Google Scholar
- 18.S.-P. Lo and V. Gligor. A Comparative Analysis of Multiprocessor Scheduling Algorithms. In Proceedings of the 7th International Conference on Distributed Computing Systems, pages 356–63, Sept. 1987.Google Scholar
- 19.S. Majumdar, D. L. Eager, and R. B. Bunt. Scheduling in Multiprogrammed Parallel Systems. In Proceedings of the ACM SIGMETRICS Conference, pages 104–113, May 1988.Google Scholar
- 21.A. J. Musciano and T. L. Sterling. Efficient Dynamic Scheduling of Medium-Grained Tasks for General Purpose Parallel Processing. In Proceedings of the International Conference on Parallel Processing, pages 166–175, Aug. 1988.Google Scholar
- 22.T. D. Nguyen, R. Vaswani, and J. Zahorjan. Maximizing Speedup Through Self-Tuning of Processor Allocation. In Proceedings of the 10th International Parallel Processing Symposium, pages 463–468, Apr. 1996.Google Scholar
- 23.T. D. Nguyen, R. Vaswani, and J. Zahorjan. Using Runtime Measured Workload Characteristics in Parallel Processor Scheduling. In Proceedings of the IPPS'96 Workshop on Job Scheduling Strategies for Parallel Processing, Apr. 1996.Google Scholar
- 24.T. D. Nguyen, R. Vaswani, and J. Zahorjan. Parallel Application Characterization for Multiprocessor Scheduling Policy Design. Technical report, Department of Computer Science and Engineering, University of Washington, In preparation.Google Scholar
- 25.J. K. Ousterhout. Scheduling Techniques for Concurrent Systems. In Proceedings of 3rd International Conference on Distributed Computing Systems, pages 22–30, Oct. 1982.Google Scholar
- 26.P. Petersen and D. Padua. Machine-Independent Evaluation of Parallelizing Compilers. Technical Report 1173, Center for Supercomputing Research and Development, 1992.Google Scholar
- 27.E. Rothberg, J. P. Singh, and A. Gupta. Working Sets, Cache Sizes, and Node Granularity Issues for Large-Scale Multiprocessors. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 14–25, May 1993.Google Scholar
- 28.K. C. Sevcik. Characterizations of Parallelism in Applications and their Use in Scheduling. In Proceedings of the ACM SIGMETRICS Conference, pages 171–180, May 1989.Google Scholar
- 31.R. L. Sites, editor. Alpha Architecture Reference Manual. Digital Press, 1992.Google Scholar
- 34.A. Tucker and A. Gupta. Process Control and Scheduling Issues for Multiprogrammed Shared-Memory Multiprocessors. In Proceedings of the 12th ACM Symposium on Operating Systems Principles, pages 159–166, Dec. 1989.Google Scholar
- 35.R. Vaswani and J. Zahorjan. The Implications of Cache Affinity on Processor Scheduling for Multiprogrammed, Shared Memory Multiprocessors. In Proceedings of the 13th ACM Symposium on Operating Systems Principles, pages 26–40, Dec. 1991.Google Scholar
- 36.R. P. Wilson, R. S. French, C. S. Wilson, S. P. Amarasinghe, J. M. Anderson, S. W. K. Tjiang, S.-W. Liao, C.-W. Tseng, M. W. Hall, M. S. Lam, and J. L. Hennessy. SUIF: An Infrastructure for Research on Parallelizing and Optimizing Comilers. Technical report, Computer Systems Laboratory, Stanford Univeristy.Google Scholar
- 37.S. C.Woo,M. Ohara,E. Torrie,J. P.Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings 22nd Annual International Symposium on Computer Architecture, pages 24–36, June 1995.Google Scholar