Exploiting Thread-Data Affinity in OpenMP with Data Access Patterns

  • Andrea Di Biagio
  • Ettore Speziale
  • Giovanni Agosta
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6852)

Abstract

In modern NUMA architectures, preserving data access locality is a key issue to guarantee performance. We define, for the OpenMP programming model, a type of architecture-agnostic programmer hint to describe the behaviour of parallel loops. These hints are only related to features of the program, in particular to the data accessed by each loop iteration. The runtime will then combine this information with architectural information gathered during its initialization, to guide task scheduling, in case of dynamic loop iteration scheduling. We prove the effectiveness of the proposed technique on the NAS parallel benchmark suite, achieving an average speedup of 1.21x.

Keywords

Iteration Space Remote Access Parallel Loop Memory Page Data Access Pattern 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Allen, E., Chase, D., Hallet, J., Luchangco, V., Maessen, J., Ryu, S., Steele Jr., G.L., Tobin-Hochstadt, S.: The Fortress Language Specification. In: Sun Microsystems (2008)Google Scholar
  2. 2.
  3. 3.
    ARB: OpenMP Application Program Interface, version 3.0 (2008), http://www.openmp.org
  4. 4.
    Bircsak, J., Craig, P., Crowell, R., Cvetanovic, Z., Harris, J., Nelson, C.A., Offner, C.D.: Extending OpenMP for NUMA Machines. In: SC (2000)Google Scholar
  5. 5.
    Broquedis, F., Diakhaté, F., Thibault, S., Aumage, O., Namyst, R., Wacrenier, P.-A.: Scheduling Dynamic OpenMP Applications over Multicore Architectures. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 170–180. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Broquedis, F., Furmento, N., Goglin, B., Namyst, R., Wacrenier, P.-A.: Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 79–92. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  7. 7.
    GNU: GNU libgomp (2010), http://gcc.gnu.org/onlinedocs/libgomp/
  8. 8.
    Goglin, B., Furmento, N.: Enabling High-performance Memory Migration for Multithreaded Applications on LINUX. In: IPDPS, pp. 1–9. IEEE, Los Alamitos (2009)Google Scholar
  9. 9.
    Intel: Intel QuickPath Architecture (2010), www.intel.com/technology/quickpath/whitepaper.pdf
  10. 10.
    Jenks, S., Gaudiot, J.-L.: Exploiting Locality and Tolerating Remote Memory Access Latency Using Thread Migration. Int. J. Parallel Program. 25(4), 281–304 (1997)CrossRefGoogle Scholar
  11. 11.
    Jin, H., Frumkin, M.: The OpenMP Implementation of NAS Parallel Benchmarks and its Performance. Tech. rep., NASA (1999)Google Scholar
  12. 12.
    Kleen, A.: An NUMA API for Linux (2004), http://www.halobates.de/numaapi3.pdf
  13. 13.
    Lankes, S., Bierbaum, B., Bemmerl, T.: Affinity-On-Next-Touch: An Extension to the Linux Kernel for NUMA Architectures. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2009. LNCS, vol. 6067, pp. 576–585. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  14. 14.
    Marathe, J., Mueller, F.: Hardware Profile-guided Automatic Page Placement for ccNUMA Systems. In: PPOPP, pp. 90–99. ACM, New York (2006)Google Scholar
  15. 15.
    Nikolopoulos, D.S., Artiaga, E., Ayguadé, E., Labarta, J.: Scaling Non-regular Shared-memory Codes by Reusing Custom Loop Schedules. Scientific Programming 11(2), 143–158 (2003)CrossRefGoogle Scholar
  16. 16.
    Nikolopoulos, D.S., Papatheodorou, T.S., Polychronopoulos, C.D., Labarta, J., Ayguadé, E.: A Transparent Runtime Data Distribution Engine for OpenMP. Scientific Programming 8(3), 143–162 (2000)CrossRefGoogle Scholar
  17. 17.
    Polychronopoulos, C.D., Kuck, D.J.: Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers. IEEE Trans. Computers 36(12), 1425–1439 (1987)CrossRefGoogle Scholar
  18. 18.
    University, R.: High Performance Fortran Language Specification. SIGPLAN Fortran Forum 12(4), 1–86 (1993)CrossRefGoogle Scholar
  19. 19.
    Robertson, N., Rendell, A.P.: OpenMP and NUMA Architectures I: Investigating Memory Placement on the SGI Origin 3000. In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J., Zomaya, A.Y. (eds.) ICCS 2003. LNCS, vol. 2660, pp. 648–656. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  20. 20.
    Terboven, C., Mey, D.a., Schmidl, D., Jin, H., Reichstein, T.: Data and Thread Affinity in OpenMP Programs. In: MAW 2008: Proceedings of the 2008 workshop on Memory access on future processors, pp. 377–384. ACM, New York (2008)Google Scholar
  21. 21.
    Tikir, M.M., Hollingsworth, J.K.: Using Hardware Counters to Automatically Improve Memory Performance. In: SC, p. 46. IEEE Computer Society, Los Alamitos (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Andrea Di Biagio
    • 1
  • Ettore Speziale
    • 1
  • Giovanni Agosta
    • 1
  1. 1.Dipartimento di Elettronica ed InformazionePolitecnico di MilanoItaly

Personalised recommendations