Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective

  • François Broquedis
  • Nathalie Furmento
  • Brice Goglin
  • Raymond Namyst
  • Pierre-André Wacrenier
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5568)


Exploiting the full computational power of current hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture so as to avoid memory access penalties. Directive-based programming languages such as OpenMPprovide programmers with an easy way to structure the parallelism of their application and to transmit this information to the runtime system.

Our runtime, which is based on a multi-level thread scheduler combined with a NUMA-aware memory manager, converts this information into “scheduling hints” to solve thread/memory affinity issues. It enables dynamic load distribution guided by application structure and hardware topology, thus helping to achieve performance portability. First experiments show that mixed solutions (migrating threads and data) outperform next-touch-based data distribution policies and open possibilities for new optimizations.


OpenMP Memory NUMA Hierarchical Thread Scheduling Multi-Core 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Benkner, S., Brandes, T.: Efficient parallel programming on scalable shared memory systems with High Performance Fortran. In: Concurrency: Practice and Experience, vol. 14, pp. 789–803. John Wiley & Sons, Chichester (2002)Google Scholar
  2. 2.
    Broquedis, F., Diakhaté, F., Thibault, S., Aumage, O., Namyst, R., Wacrenier, P.-A.: Scheduling Dynamic OpenMP Applications over Multicore Architectures. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 170–180. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  3. 3.
    Carlson, W., Draper, J., Culler, D., Yelick, K., Brooks, E., Warren, K.: Introduction to UPC and Language Specification. Technical Report CCS-TR-99-157, George Mason University (May 1999)Google Scholar
  4. 4.
    Chapman, B.M., Bregier, F., Patil, A., Prabhakar, A.: Achieving performance under OpenMP on ccNUMA and software distributed shared memory systems. In: Concurrency: Practice and Experience, vol. 14, pp. 713–739. John Wiley & Sons, Chichester (2002)Google Scholar
  5. 5.
    Dolbeau, R., Bihan, S., Bodin, F.: HMPPTM: A Hybrid Multi-core Parallel Programming Environment. Technical report, CAPS entreprise (2007)Google Scholar
  6. 6.
    Duran, A., Perez, J.M., Ayguade, E., Badia, R., Labarta, J.: Extending the OpenMP Tasking Model to Allow Dependant Tasks. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 111–122. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  7. 7.
    Frigo, M., Leiserson, C.E., Randall, K.H.: The Implementation of the Cilk-5 Multithreaded Language. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Montreal, Canada (June 1998)Google Scholar
  8. 8.
    Goglin, B., Furmento, N.: Enabling High-Performance Memory-Migration in Linux for Multithreaded Applications. In: MTAAP 2009: Workshop on Multithreaded Architectures and Applications, held in conjunction with IPDPS 2009, Rome, Italy, May 2009. IEEE Computer Society Press, Los Alamitos (2009)Google Scholar
  9. 9.
    Intel. Thread Building Blocks,
  10. 10.
    Koelbel, C., Loveman, D., Schreiber, R., Steele, G., Zosel, M.: The High Performance Fortran Handbook (1994)Google Scholar
  11. 11.
    Löf, H., Holmgren, S.: Affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system. In: 19th ACM International Conference on Supercomputing, Cambridge, MA, USA, June 2005, pp. 387–392 (2005)Google Scholar
  12. 12.
    McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995, pp. 19–25 (1995)Google Scholar
  13. 13.
    Nikolopoulos, D.S., Papatheodorou, T.S., Polychronopoulos, C.D., Labarta, J., Ayguadé, E.: User-Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors. In: International Conference on Parallel Processing, September 2000, pp. 95–103. IEEE Computer Society Press, Los Alamitos (2000)Google Scholar
  14. 14.
    Nikolopoulos, D.S., Polychronopoulos, C.D., Papatheodorou, T.S., Labarta, J., Ayguadé, E.: Scheduler-Activated Dynamic Page Migration for Multiprogrammed DSM Multiprocessors. Parallel and Distributed Computing 62, 1069–1103 (2002)CrossRefzbMATHGoogle Scholar
  15. 15.
    Nordén, M., Löf, H., Rantakokko, J., Holmgren, S.: Geographical Locality and Dynamic Data Migration for OpenMP Implementations of Adaptive PDE Solvers. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005 and IWOMP 2006. LNCS, vol. 4315, pp. 382–393. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  16. 16.
    Terboven, C., an Mey, D., Schmidl, D., Jin, H., Reichstein, T.: Data and Thread Affinity in OpenMP Programs. In: MAW 2008: Proceedings of the 2008 workshop on Memory access on future processors, pp. 377–384. ACM, New York (2008)Google Scholar
  17. 17.
    Thibault, S., Broquedis, F., Goglin, B., Namyst, R., Wacrenier, P.-A.: An efficient openMP runtime system for hierarchical architectures. In: Chapman, B., Zheng, W., Gao, G.R., Sato, M., Ayguadé, E., Wang, D. (eds.) IWOMP 2007. LNCS, vol. 4935, pp. 161–172. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  18. 18.
    Thibault, S., Namyst, R., Wacrenier, P.-A.: Building Portable Thread Schedulers for Hierarchical Multiprocessors: the BubbleSched Framework. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 42–51. Springer, Heidelberg (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • François Broquedis
    • 1
  • Nathalie Furmento
    • 3
  • Brice Goglin
    • 2
  • Raymond Namyst
    • 1
  • Pierre-André Wacrenier
    • 1
  1. 1.University of BordeauxFrance
  2. 2.INRIAFrance
  3. 3.CNRS LaBRITalence

Personalised recommendations