Parallel application characterization for multiprocessor scheduling policy design

  • Thu D. Nguyen
  • Raj Vaswani
  • John Zahorjan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1162)

Abstract

Much of the recent work on multiprocessor scheduling disciplines has used abstract workload models to explore the fundamental, high-level properties of the various alternatives. As continuing work on these policies increases their level of sophistication, however, it is clear that the choice of appropriate policies must be guided at least in part by the typical behavior of actual parallel applications. Our goal in this paper is to examine a variety of such applications, providing measurements of properties relevant to scheduling policy design. We give measurements for both hand-coded parallel programs (from the SPLASH benchmark suites) and compiler-parallelized programs (from the PERFECT Club suite) running on a KSR-2 shared-memory multiprocessor.

The measurements we present are intended primarily to address two aspects of multiprocessor scheduling policy design:
  • In the spectrum between aggressively dynamic and static allocation policies, what is an appropriate choice for the rate at which reallocations should take place?

  • Is it possible to take measurements of application speedup and efficiency at runtime that are sufficiently accurate to guide allocation decisions?

We address these questions through three sets of measurements:
  • First, we examine application speedup, and the sources of speedup loss. Our results confirm that there is considerable variation in job speedup, and that the bulk of the speedup loss is due to communication and idleness.

  • Next, we examine runtime measurement of speedup information. We begin by looking at how such information might be acquired accurately and at acceptable cost. We then investigate the extent to which recent measurements of speedup accurately predict the future, and so the extent to which such measurements might reasonably be expected to guide allocation decisions.

  • Finally, we examine the durations of individual processor idle periods, and relate these to the cost of reallocating a processor at those times. These results shed light on the potential for aggressively dynamic policies to improve performance.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. Agarwal and A. Gupta. Memory-Reference Characteristics of Multiprocessor Applications under MACH. In Proceedings of the ACM SIGMETRICS Conference, pages 215–225, May 1988.Google Scholar
  2. 2.
    I. Ashok and J. Zahorjan. Scheduling a Mixed Interactive and Batch Workload on a Parallel, Shared Memory Supercomputer. In Supercomputing '92, pages 616–625, Nov. 1992.Google Scholar
  3. 3.
    M. Berry, D. Chen, P. Koss, D. Kuck, S. Lo, Y. Pang, L. Pointer, R. Roloff, A. Sameh, E. Clementi, S. Chin, D. Schneider, G. Fox, P. Messina, D. Walker, C. Hsiung, J. Scharzmeier, K. Lue, S. Orszag, F. Seidl, O. Johnson, R. Goodrum, and J. Martin. The PERFECT Club Benchmarks: Effective Performance Evaluation of Supercomputers. The International Journal of Supercomputer Applications, 3(3):5–40, 1989.Google Scholar
  4. 4.
    J. Chen, Y. Endo, K. Chan, D. Mazieres, A. Dias, M. Seltzer, and M. Smith. The Measured Performance of Personal Computer Operating Systems. In Proceedings of the 15th ACM Symposium on Operating system Principles, pages 299–313, Dec. 1995.Google Scholar
  5. 5.
    S.-H. Chiang, R. K. Mansharamani, and M. K. Vernon. Use of Application Characteristics and Limited Preemption for Run-To-Completion Parallel Processor Scheduling Policies. In Proceedings of the ACM SIGMETRICS Conference, pages 33–44, May 1994.Google Scholar
  6. 6.
    E. C. Cooper and R. P. Draves. C Threads. Technical Report CMU-CS-88-154, Department of Computer Science, Carnegie-Mellon University, June 1988.Google Scholar
  7. 7.
    G. Cybenko, L. Kipp, L. Pointer, and D. Kuck. Supercomputer Performance Evaluation and the Perfect Benchmarks. In Proceedings of the 1990 International Conference on Supercomputing, ACM SIGARCH Computer Architecture News, pages 254–266, Sept. 1990.Google Scholar
  8. 8.
    R. Cypher, A. Ho, S. Konstantinidou, and P. Messina. Architectural Requirements of Parallel Scientific Applications with Explicit Communication. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 2–13, May 1993.Google Scholar
  9. 9.
    F. Darema-Rogers, G. Pfister, and K. So. Memory Access Patterns of Parallel Scientific Programs. In Proceedings of the ACM SIGMETRICS Conference, pages 46–58, May 1987.Google Scholar
  10. 10.
    J. J. Dongarra and T. Dunigan. Message-Passing Performance of Various Computers. Technical Report CS-95-299, University of Tennessee, July 1995.Google Scholar
  11. 11.
    R. Eigenmann,J. Hoeflinger,Z. Li, and D. Padua. Experience in the Parallelization of Four Perfect-Benchmark Programs. Technical Report 1193, Center for Supercomputing Research and Development, Aug. 1991.Google Scholar
  12. 12.
    D. G. Feitelson and B. Nitzberg. Job Characteristics of a Production Parallel Scientific Workload on the NASA Ames iPSC/860. In Proceedings of the IPPS'95 Workshop on Job Scheduling Strategies for Parallel Processing, pages 337–360, Apr. 1995.Google Scholar
  13. 13.
    K. Guha. Using Parallel Program Characteristics in Dynamic Processor Allocation Policies. Technical Report CS-95-03, Department of Computer Science, York University, May 1995.Google Scholar
  14. 14.
    A. Gupta, A. Tucker, and S. Urushibara. The Impact of Operating System Scheduling Policies and Synchronization Methods on the Performance of Parallel Applications. In Proceedings of the ACM SIGMETRICS Conference, pages 120–133, May 1991.Google Scholar
  15. 15.
    A. Karlin, K. Li, M. S. Manasse, and S. Owicki. Empirical Studies of Competitive Spinning for a Shared-Memory Multiprocessor. In Proceedings of the 13th ACM Symposium on Operating Systems Principles, pages 41–55, Oct. 1991.Google Scholar
  16. 16.
    Kendall Square Research Inc., 170 Tracer Lane, Waltham, MA 02154. KSR Fortran Programming, 1993.Google Scholar
  17. 17.
    S. T. Leutenegger and M. K. Vernon. The Performance of Multiprogrammed Multiprocessor Scheduling Policies. In Proceedings of the ACM SIGMETRICS Conference, pages 226–236, May 1990.Google Scholar
  18. 18.
    S.-P. Lo and V. Gligor. A Comparative Analysis of Multiprocessor Scheduling Algorithms. In Proceedings of the 7th International Conference on Distributed Computing Systems, pages 356–63, Sept. 1987.Google Scholar
  19. 19.
    S. Majumdar, D. L. Eager, and R. B. Bunt. Scheduling in Multiprogrammed Parallel Systems. In Proceedings of the ACM SIGMETRICS Conference, pages 104–113, May 1988.Google Scholar
  20. 20.
    C. McCann, R. Vaswani, and J. Zahorjan. A Dynamic Processor Allocation Policy for Multiprogrammed Shared-Memory Multiprocessors. ACM Transactions on Computer Systems, 11(2):146–178, May 1993.CrossRefGoogle Scholar
  21. 21.
    A. J. Musciano and T. L. Sterling. Efficient Dynamic Scheduling of Medium-Grained Tasks for General Purpose Parallel Processing. In Proceedings of the International Conference on Parallel Processing, pages 166–175, Aug. 1988.Google Scholar
  22. 22.
    T. D. Nguyen, R. Vaswani, and J. Zahorjan. Maximizing Speedup Through Self-Tuning of Processor Allocation. In Proceedings of the 10th International Parallel Processing Symposium, pages 463–468, Apr. 1996.Google Scholar
  23. 23.
    T. D. Nguyen, R. Vaswani, and J. Zahorjan. Using Runtime Measured Workload Characteristics in Parallel Processor Scheduling. In Proceedings of the IPPS'96 Workshop on Job Scheduling Strategies for Parallel Processing, Apr. 1996.Google Scholar
  24. 24.
    T. D. Nguyen, R. Vaswani, and J. Zahorjan. Parallel Application Characterization for Multiprocessor Scheduling Policy Design. Technical report, Department of Computer Science and Engineering, University of Washington, In preparation.Google Scholar
  25. 25.
    J. K. Ousterhout. Scheduling Techniques for Concurrent Systems. In Proceedings of 3rd International Conference on Distributed Computing Systems, pages 22–30, Oct. 1982.Google Scholar
  26. 26.
    P. Petersen and D. Padua. Machine-Independent Evaluation of Parallelizing Compilers. Technical Report 1173, Center for Supercomputing Research and Development, 1992.Google Scholar
  27. 27.
    E. Rothberg, J. P. Singh, and A. Gupta. Working Sets, Cache Sizes, and Node Granularity Issues for Large-Scale Multiprocessors. In Proceedings of the 20th Annual International Symposium on Computer Architecture, pages 14–25, May 1993.Google Scholar
  28. 28.
    K. C. Sevcik. Characterizations of Parallelism in Applications and their Use in Scheduling. In Proceedings of the ACM SIGMETRICS Conference, pages 171–180, May 1989.Google Scholar
  29. 29.
    K. C. Sevcik. Application Scheduling and Processor Allocation in Multiprogrammed Parallel Processing Systems. Performance Evaluation, 19(2/3): 107–140, Mar. 1994.CrossRefGoogle Scholar
  30. 30.
    J. P. Singh, W.-D. Weber, and A. Gupta. SPLASH: Stanford Parallel Applications for Shared-Memory. Computer Architecture News, 20(1):5–44, 1992.CrossRefGoogle Scholar
  31. 31.
    R. L. Sites, editor. Alpha Architecture Reference Manual. Digital Press, 1992.Google Scholar
  32. 32.
    M. Squillante and E. Lazowska. Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling. IEEE Transactions on Parallel and Distributed Systems, 4(2):131–143, February 1993.CrossRefGoogle Scholar
  33. 33.
    D. Thiebaut and H. S. Stone. Footprints in the Cache. ACM Transactions on Computer Systems, 5(4):305–329, Nov. 1987.CrossRefGoogle Scholar
  34. 34.
    A. Tucker and A. Gupta. Process Control and Scheduling Issues for Multiprogrammed Shared-Memory Multiprocessors. In Proceedings of the 12th ACM Symposium on Operating Systems Principles, pages 159–166, Dec. 1989.Google Scholar
  35. 35.
    R. Vaswani and J. Zahorjan. The Implications of Cache Affinity on Processor Scheduling for Multiprogrammed, Shared Memory Multiprocessors. In Proceedings of the 13th ACM Symposium on Operating Systems Principles, pages 26–40, Dec. 1991.Google Scholar
  36. 36.
    R. P. Wilson, R. S. French, C. S. Wilson, S. P. Amarasinghe, J. M. Anderson, S. W. K. Tjiang, S.-W. Liao, C.-W. Tseng, M. W. Hall, M. S. Lam, and J. L. Hennessy. SUIF: An Infrastructure for Research on Parallelizing and Optimizing Comilers. Technical report, Computer Systems Laboratory, Stanford Univeristy.Google Scholar
  37. 37.
    S. C.Woo,M. Ohara,E. Torrie,J. P.Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings 22nd Annual International Symposium on Computer Architecture, pages 24–36, June 1995.Google Scholar

Copyright information

© Springer-Verlag 1996

Authors and Affiliations

  • Thu D. Nguyen
    • 1
  • Raj Vaswani
    • 1
  • John Zahorjan
    • 1
  1. 1.Department of Computer Science and EngineeringUniversity of WashingtonSeattleUSA

Personalised recommendations