Efficient runtime thread management for the nano-threads programming model

  • Dimitrios S. Nikolopoulos
  • Eleftherios D. Polychronopoulos
  • Theodore S. Papatheodorou
Worshop on Run- Time Systems for Parallel Programming Matthew Haines, University or Wyoming, USA Koen Langendoen, Vrije Universiteit, The Netherlands Greg Benson, University of California at Davis, USA
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1388)


The nano-threads programming model was proposed to effectively integrate multiprogramming on shared-memory multiprocessors, with the exploitation of fine-grain parallelism from standard applications. A prerequisite for the applicability of the nano-threads programming model is the ability of the runtime environment to manage parallelism at any level of granularity with minimal overheads. In this paper, we introduce runtime techniques for efficient memory management and user-level scheduling in an experimental runtime system designed to support the nano-threads programming model. We evaluate the exploitation of processor affinity for the management of nano-thread contexts, and the use of hierarchical queues to implement user-level scheduling strategies for applications with inherent multilevel parallelism. The proposed mechanisms attempt to obtain maximum benefits from data locality on cache-coherent NUMA multiprocessors. Through the use of synthetic benchmarks, we find that our mechanism for memory management in the runtime system reduces overheads by 52% on average, compared to other known mechanisms. The use of hierarchical queues gives significant performance improvements between 17% and 40%, compared to scheduling strategies that use local queues.


Task Graph Runtime System Parallel Loop Schedule Loop Local Pool 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Ande89]
    T. Anderson, E. Lazowska and H. Levy, The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors, IEEE Transactions on Computers, vol. 38(12), pp. 1632–1644, December 1989.CrossRefGoogle Scholar
  2. [130196]
    F. Bellosa and M. Steckermeier, The Performance Implications of Locality Information Usage in Shared-Memory Multiprocessors, Journal of Parallel and Distributed Computing, vol. 37(l), pp. 113–121, August 1996.CrossRefGoogle Scholar
  3. [Dand95]
    S. Dandamundi and P. Cheng, A Hierarchical Task Queue Organization for SharedMemory Multiprocessor Systems, IEEE Transactions on Parallel and Distributed Systems, vol. 6(1), pp. l–16, January 1995.Google Scholar
  4. [Free96]
    V Freeh, D. Lowenthal, and G. Andrews, Efficient Support for Fine-Grain Parallelism on Shared-Memory Machines, Technical Report TR96-l, University of Arizona, January 1996.Google Scholar
  5. [Girk92]
    M. Girkar and C. Polychronopoulos, Automatic Extraction of Functional Parallelism from Ordinary Programs, IEEE Transactions on Parallel and Distributed Systems, vol. 3(2), pp. 166–178, March 1992.CrossRefGoogle Scholar
  6. [Kepp93]
    D. Keppel, Tools and Techniques for Building Fast Portable Threads Packages, Technical Report UWCSE 93-05-06, University of Washington at Seattle, May 1993.Google Scholar
  7. [Laud97]
    J. Laudon and D. Lenoski, The SGI Origin: A ccNUMA Highly Scalable Server, Proceedings of the 24th International Symposium on Computer Architecture, pp. 241–251, Denver, Colorado, June 1997.Google Scholar
  8. [Mart96]
    X. Martorell, J. Labarta, N. Navarro and E. Ayguadé, A Library Implementation of the Nano-Threads Programming Model, Proceedings of the 2nd International EuroPar Conference, pp. 644–649, Lyon, France, August 1996.Google Scholar
  9. [Mart97]
    X. Martorell, J. Labarta, N. Navarro and E. Ayguadé, Analysis of Several Scheduling Algorithms under the Nano-threads Programming Model, Proceedings of the 11th International Parallel Processing Symposium, pp. 281–287, Geneva, Switzerland, April 1997.Google Scholar
  10. [More95]
    J. Moreira, On the Implementation and Effectiveness of Autoscheduling for SharedMemory Multiprocessors, PhD Thesis, University of Illinois at Urbana-Champaign, Department of Electrical and Computer Engineering, 1995.Google Scholar
  11. [Poly93]
    C. Polychronopoulos, N. Bitar and S. Kleiman, Nano-Threads: A User-Level Threads Architecture, Technical Report 1297, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, 1993.Google Scholar
  12. [Poly97]
    E. Polychronopoulos and T. Papatheodorou, Dynamic Bisectioning Scheduling for Scalable Shared-Memory Multiprocessors based on the Nano-Threads Programming Model, Technical Report HPCAL-TR-010697, University of Patras, Department of Computer Engineering and Informatics, June 1997.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Dimitrios S. Nikolopoulos
    • 1
  • Eleftherios D. Polychronopoulos
    • 1
  • Theodore S. Papatheodorou
    • 1
  1. 1.High Performance Computing Architectures Laboratory Department of Computer Engineering and InformaticsPatrasGreece

Personalised recommendations