Enhancing the performance of autoscheduling in Distributed Shared Memory multiprocessors

  • Dimitrios S. Nikolopoulos
  • Eleftherios D. Polychronopoulos
  • Theodore S. Papatheodorou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1470)


Autoscheduling is a parallel program compilation and execution model that combines uniquely three features: Automatic extraction of loop and functional parallelism at any level of granularity, dynamic scheduling of parallel tasks, and dynamic program adaptability on multiprogrammed shared memory multiprocessors. This paper presents a technique that enhances the performance of autoscheduling in Distributed Shared Memory (DSM) multiprocessors, targetting mainly at medium and large scale systems, where poor data locality and excessive communication impose performance bottlenecks. Our technique partitions the application Hierarchical Task Graph and maps the derived partitions to clusters of processors in the DSM architecture. Autoscheduling is then applied separately for each partition to enhance data locality and reduce communication costs. Our experimental results show that partitioning achieves remarkable performance improvements compared to a standard autoscheduling environment and a commercial parallelizing compiler.


Task Graph Runtime System Distribute Shared Memory Dynamic Program Adaptability Synthetic Benchmark 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    E. Ayguadé, X. Martorell, J. Labarta, M. Gonzàlez, and N. Navarro, Exploiting Parallelism through Directives on the Nano-Threads Programming Model, Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing, Minneapolis, Minnesota, August 1997.Google Scholar
  2. 2.
    S. Dandamundi, Reducing Run Queue Contention in Shared Memory Multiprocessors, IEEE Computer, Vol. 30(3), pp. 82–89, March 1997.Google Scholar
  3. 3.
    R. Eigenmann, J. Hoeflinger and D. Padua, On the Automatic Parallelization of Perfect Benchmarks, IEEE Transactions on Parallel and Distributed Systems, Vol. 9(1), pp. 5–23, January 1998.CrossRefGoogle Scholar
  4. 4.
    D. Feitelson and L. Rudolph, Distributed Hierarchical Control for Parallel Processing, IEEE Computer, Vol. 23(5), pp. 65–79, May 1990.Google Scholar
  5. 5.
    M. Girkar and C. Polychronopoulos, Automatic Extraction of Functional Parallelism from Ordinary Programs, IEEE Transactions on Parallel and Distributed Systems, vol. 3(2), pp. 166–178, March 1992.CrossRefGoogle Scholar
  6. 6.
    J. Laudon and D. Lenoski, The SGI Origin: A ccNUMA Highly Scalable Server, Proceedings of the 24th Annual International Symposium on Computer Architecture, pp. 241–251, Denver, Colorado, June 1997.Google Scholar
  7. 7.
    S. Lyons, T. Hanratty and J. MacLaughlin, Large-scale Computer Simulation of Fully Developed Channel Flow with Heat Transfer, International Journal of Numerical Methods for Fluids, vol. 13, pp. 999–1028, 1991.MATHCrossRefGoogle Scholar
  8. 8.
    X. Martorell, J. Labarta, N. Navarro and E. Ayguadé, A Library Implementation of the Nano-Threads Programming Model, Proceedings of Euro-Par’96, pp. 644–649, Lyon, France, August 1996.Google Scholar
  9. 9.
    J. Moreira, On the Implementation and Effectiveness of Autoscheduling for Shared-Memory Multiprocessors, PhD Thesis, University of Illinois at Urbana-Champaign, Department of Electrical and Computer Engineering, 1995.Google Scholar
  10. 10.
    NANOS Consortium, Nano-Threads Programming Model Specification, ESPRIT Project No. 21907 (NANOS), Deliverable MIDI, July 1997, also available at Scholar
  11. 11.
    D. Nikolopoulos, E. Polychronopoulos and T. Papatheodorou, Efficient Runtime Thread Management for the Nano-Threads Programming Model, Proceedings of the IPPS/SPDP’98 Workshop on Runtime Systems for Parallel Programming, LNCS vol. 1388, pp. 183–194, Orlando, Florida, March/April 1998.Google Scholar
  12. 12.
    C. Polychronopoulos, M. Girkar, M. Haghighat, C. Lee, B. Leung and D. Schouten, Parafrase-2: An Environment for Parallelizing, Partitioning, Synchronizing and Scheduling Programs, International Journal of High Speed Computing, Vol. 1 (1), 1989.Google Scholar
  13. 13.
    C. Polychronopoulos, N. Bitar and S. Kleiman, Nano-Threads: A User-Level Threads Architecture, CSRD Technical Report 1297, University of Illinois at Urbana-Champaign, 1993.Google Scholar
  14. 14.
    E. Polychronopoulos, X. Martorell, D. Nikolopoulos, J. Labarta, T. Papatheodorou and N. Navarro, Kernel-Level Scheduling for the Nano-Threads Programming Model, Proceedings of the 12th ACM International Conference on Supercomputing, Melbourne, Australia, July 1998.Google Scholar
  15. 15.
    E. Polychronopoulos and T. Papatheodorou, Dynamic Bisectioning Scheduling for Scalable Shared-Memory Multiprocessors, Technical Report LHPCA-010697, University of Patras, June 1997.Google Scholar
  16. 16.
    B. Quentrec and C. Brot, New Method for Searching for Neighbors in Molecular Dynamics Computations, Journal of Computational Physics, Vol. 13, pp. 430–432, 1975.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Dimitrios S. Nikolopoulos
    • 1
  • Eleftherios D. Polychronopoulos
    • 1
  • Theodore S. Papatheodorou
    • 1
  1. 1.High Performance Computing Architectures Laboratory Department of Computer Engineering and InformaticsUniversity of PatrasPatrasGreece

Personalised recommendations