Advertisement

Symbiotic Space-Sharing on SDSC’s DataStar System

  • Jonathan Weinberg
  • Allan Snavely
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4376)

Abstract

Using a large HPC platform, we investigate the effectiveness of “symbiotic space-sharing”, a technique that improves system throughput by executing parallel applications in combinations and configurations that alleviate pressure on shared resources. We demonstrate that relevant benchmarks commonly suffer a 10-60% penalty in runtime efficiency due to memory resource bottlenecks and up to several orders of magnitude for I/O. We show that this penalty can be often mitigated, and sometimes virtually eliminated, by symbiotic space-sharing techniques and deploy a prototype scheduler that leverages these findings to improve system throughput by 20%.

Keywords

Parallel Code Memory Operation Gang Schedule Hardware Counter Improve System Throughput 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
    Antonopoulos, C.D., Nikolopoulos, D.S., Papatheodorou, T.S.: Scheduling Algorithms with Bus Bandwidth Considerations for SMPs. In: ICPP, p. 547 (2003)Google Scholar
  6. 6.
    Bailey, D.H., et al.: The NAS Parallel Benchmarks. The International Journal of Supercomputer Applications 5(3), 63–73 (1991)CrossRefGoogle Scholar
  7. 7.
    Batat, A., Feitelson, D.G.: Gang Scheduling with Memory Considerations. In: 14th Intl. Parallel Distributed Processing Symp., pp. 109–114 (2000)Google Scholar
  8. 8.
    Dongarra, J., Luszczek, P.: Introduction to the HPCChallenge Benchmark Suite. Technical Report ICL-UT-05-01, ICL (2005)Google Scholar
  9. 9.
    Downey, A.B., Feitelson, D.G.: The elusive goal of workload characterization. SIGMETRICS Perform. Eval. Rev. 26(4), 14–29 (1999)CrossRefGoogle Scholar
  10. 10.
    Feitelson, D.G., Nitzberg, B.: Job Characteristics of a Production parallel scientific workload on the NASA Ames iPSC/860. In: Feitelson, D.G., Rudolph, L. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 949, pp. 337–360. Springer, Heidelberg (1995)Google Scholar
  11. 11.
    Gibbons, R.: A Historical Application Profiler for Use by Parallel Schedulers. In: Feitelson, D.G., Rudolph, L. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 1291, pp. 58–77. Springer, Heidelberg (1997)Google Scholar
  12. 12.
    Kannan, S., et al.: Workload Management with LoadLeveler. IBM (November 2001)Google Scholar
  13. 13.
    Koukis, E., Koziris, N.: Memory Bandwidth Aware Scheduling for SMP Cluster Nodes. In: PDP ’05: Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP’05), Washington, DC, USA, pp. 187–196. IEEE Computer Society Press, Los Alamitos (2005)CrossRefGoogle Scholar
  14. 14.
    Leinberger, W., Karypis, G., Kumar, V.: Job scheduling in the presence of multiple resource requirements (CDROM). In: Supercomputing ’99: Proceedings of the 1999 ACM/IEEE conference on Supercomputing, USA, ACM Press, New York (1999)Google Scholar
  15. 15.
    Liedtke, J., Volp, M., Elphinstone, K.: Preliminary thoughts on memory-bus scheduling. In: EW 9: Proceedings of the 9th workshop on ACM SIGOPS European workshop, pp. 207–210. ACM Press, New York (2000)Google Scholar
  16. 16.
    Lifka, D.A.: The ANL/IBM SP Scheduling System. In: Feitelson, D.G., Rudolph, L. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 949, pp. 295–303. Springer, Heidelberg (1995)Google Scholar
  17. 17.
    Mache, J., Lo, V., Garg, S.: Job Scheduling that Minimizes Network Contention due to both Communication and I/O. In: 14th International Parallel and Distributed Processing Symposium, Washington, DC, USA, p. 457. IEEE Computer Society Press, Los Alamitos (2000)Google Scholar
  18. 18.
    Mache, J., et al.: The impact of spatial layout of jobs on parallel I/O performance. In: IOPADS ’99: Proceedings of the sixth workshop on I/O in parallel and distributed systems, pp. 45–56. ACM Press, New York (1999)CrossRefGoogle Scholar
  19. 19.
    McGregor, R.L., Antonopoulos, C., Nikolopoulos, D.: Scheduling Algorithms for Effective Thread Pairing on Hybrid Multiprocessors. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, Denver, CO, April 2005, IEEE Computer Society Press, Denver (2005)Google Scholar
  20. 20.
    Parsons, E.W., Sevcik, K.C.: Coordinated allocation of memory and processors in multiprocessors. In: SIGMETRICS ’96: Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pp. 57–67. ACM Press, New York (1996)CrossRefGoogle Scholar
  21. 21.
    Peris, V.G.J., Squillante, M.S., Naik, V.K.: Analysis of the impact of memory in distributed parallel processing systems. In: SIGMETRICS ’94: Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, pp. 5–18. ACM Press, New York (1994)CrossRefGoogle Scholar
  22. 22.
    Smith, W., Foster, I.T., Taylor, V.E.: Predicting Application Run Times Using Historical Information. In: Feitelson, D.G., Rudolph, L. (eds.) Job Scheduling Strategies for Parallel Processing. LNCS, vol. 1459, pp. 122–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  23. 23.
    Snavely, A., Tullsen, D.: Symbiotic Job Scheduling for a Simultaneous Multithreading Processor. In: Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, November 2000, pp. 234–244 (2000)Google Scholar
  24. 24.
    Snavely, A., Tullsen, D., Voelker, G.: Symbiotic Jobscheduling for a Simultaneous Multithreading Processor. In: Proceedings of the ACM 2002 Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS 2002), Marina Del Rey, June 2002, pp. 66–76. ACM Press, New York (2002)CrossRefGoogle Scholar
  25. 25.
    Squillante, M., Lazowska, E.: Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling. IEEE Transactions on Parallel and Distributed Systemse 4(2), 131–143 (1993)CrossRefGoogle Scholar
  26. 26.
    Suh, G.E., Rudolph, L., Devadas, S.: Effects of Memory Performance on Parallel Job Scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 116–132. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  27. 27.
    Sundaramoorthy, K., Purser, Z., Rotenberg, E.: Slipstream Processors: Improving both Performance and Fault Tolerance. In: Architectural Support for Programming Languages and Operating Systems, pp. 257–268 (2000)Google Scholar
  28. 28.
    Torrellas, J., Tucker, A., Gupta, A.: Evaluating the Performance of Cache-Affinity Scheduling in Shared-Memory Multiprocessors. Journal of Parallel and Distributed Computing 24(2), 139 (1995)CrossRefGoogle Scholar
  29. 29.
    Vaswani, R., Zahorjan, J.: The Implications of Cache Affinity on Processor Scheduling for Multiprogrammed Shared Memory Multiprocessors. In: Proceedings of the 13th ACM Symposium on Operating System Principles, Pacific Grove, CA, October 1991, pp. 26–40. ACM Press, New York (1991)Google Scholar
  30. 30.
    Wiseman, Y., Feitelson, D.: Paired Gang Scheduling. IEEE Transactions on Parallel and Distributed Systems 14, 581–592 (2003)CrossRefGoogle Scholar
  31. 31.
    Wong, P., der Wijngaart, R.V.: NAS Parallel Benchmarks I/O Version 2.4. Technical report, NASA Ames Research Center, Moffett Field, CA, NAS Technical Report NAS-03-002 (January 2003)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Jonathan Weinberg
    • 1
  • Allan Snavely
    • 1
  1. 1.San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093-0505USA

Personalised recommendations