Enhancing the performance of autoscheduling in Distributed Shared Memory multiprocessors
Autoscheduling is a parallel program compilation and execution model that combines uniquely three features: Automatic extraction of loop and functional parallelism at any level of granularity, dynamic scheduling of parallel tasks, and dynamic program adaptability on multiprogrammed shared memory multiprocessors. This paper presents a technique that enhances the performance of autoscheduling in Distributed Shared Memory (DSM) multiprocessors, targetting mainly at medium and large scale systems, where poor data locality and excessive communication impose performance bottlenecks. Our technique partitions the application Hierarchical Task Graph and maps the derived partitions to clusters of processors in the DSM architecture. Autoscheduling is then applied separately for each partition to enhance data locality and reduce communication costs. Our experimental results show that partitioning achieves remarkable performance improvements compared to a standard autoscheduling environment and a commercial parallelizing compiler.
KeywordsTask Graph Runtime System Distribute Shared Memory Dynamic Program Adaptability Synthetic Benchmark
Unable to display preview. Download preview PDF.
- 1.E. Ayguadé, X. Martorell, J. Labarta, M. Gonzàlez, and N. Navarro, Exploiting Parallelism through Directives on the Nano-Threads Programming Model, Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing, Minneapolis, Minnesota, August 1997.Google Scholar
- 2.S. Dandamundi, Reducing Run Queue Contention in Shared Memory Multiprocessors, IEEE Computer, Vol. 30(3), pp. 82–89, March 1997.Google Scholar
- 4.D. Feitelson and L. Rudolph, Distributed Hierarchical Control for Parallel Processing, IEEE Computer, Vol. 23(5), pp. 65–79, May 1990.Google Scholar
- 6.J. Laudon and D. Lenoski, The SGI Origin: A ccNUMA Highly Scalable Server, Proceedings of the 24th Annual International Symposium on Computer Architecture, pp. 241–251, Denver, Colorado, June 1997.Google Scholar
- 8.X. Martorell, J. Labarta, N. Navarro and E. Ayguadé, A Library Implementation of the Nano-Threads Programming Model, Proceedings of Euro-Par’96, pp. 644–649, Lyon, France, August 1996.Google Scholar
- 9.J. Moreira, On the Implementation and Effectiveness of Autoscheduling for Shared-Memory Multiprocessors, PhD Thesis, University of Illinois at Urbana-Champaign, Department of Electrical and Computer Engineering, 1995.Google Scholar
- 10.NANOS Consortium, Nano-Threads Programming Model Specification, ESPRIT Project No. 21907 (NANOS), Deliverable MIDI, July 1997, also available at http://www.ac.upc.es/NANOSGoogle Scholar
- 11.D. Nikolopoulos, E. Polychronopoulos and T. Papatheodorou, Efficient Runtime Thread Management for the Nano-Threads Programming Model, Proceedings of the IPPS/SPDP’98 Workshop on Runtime Systems for Parallel Programming, LNCS vol. 1388, pp. 183–194, Orlando, Florida, March/April 1998.Google Scholar
- 12.C. Polychronopoulos, M. Girkar, M. Haghighat, C. Lee, B. Leung and D. Schouten, Parafrase-2: An Environment for Parallelizing, Partitioning, Synchronizing and Scheduling Programs, International Journal of High Speed Computing, Vol. 1 (1), 1989.Google Scholar
- 13.C. Polychronopoulos, N. Bitar and S. Kleiman, Nano-Threads: A User-Level Threads Architecture, CSRD Technical Report 1297, University of Illinois at Urbana-Champaign, 1993.Google Scholar
- 14.E. Polychronopoulos, X. Martorell, D. Nikolopoulos, J. Labarta, T. Papatheodorou and N. Navarro, Kernel-Level Scheduling for the Nano-Threads Programming Model, Proceedings of the 12th ACM International Conference on Supercomputing, Melbourne, Australia, July 1998.Google Scholar
- 15.E. Polychronopoulos and T. Papatheodorou, Dynamic Bisectioning Scheduling for Scalable Shared-Memory Multiprocessors, Technical Report LHPCA-010697, University of Patras, June 1997.Google Scholar