Dynamic Thread Pinning for Phase-Based OpenMP Programs

  • Abdelhafid Mazouz
  • Sid-Ahmed-Ali Touati
  • Denis Barthou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8097)

Abstract

Thread affinity has appeared as an important technique to improve the overall program performance and for better performance stability. However, if we consider a program with multiple phases, it is unlikely that a single thread affinity produces the best program performance for all these phases. If we consider the case of OpenMP, applications may have multiple parallel regions, each with a distinct inter-thread data sharing pattern. In this paper, we propose an approach that allows to change thread affinity dynamically (thread migrations) between parallel regions at runtime to account for these distinct inter-thread data sharing patterns. We demonstrate that as far as cache sharing is concerned for SPEC OMP01, not all the tested OpenMP applications exhibit a distinct phase behavior. However, we show that while fixing thread affinity for the whole execution may improve performance by up to 30%, allowing dynamic thread pinning may improve performance by up to 40%. Furthermore, we provide an analysis about the required conditions to improve the effectiveness of the approach.

Keywords

OpenMP thread level parallelism thread affinity multicores 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bienia, C., Kumar, S., Singh, J.P., Li, K.: The parsec benchmark suite: Characterization and architectural implications. In: Proc. of the International Conference on Parallel Architectures and Compilation Techniques, PACT 2008 (October 2008)Google Scholar
  2. 2.
    Edmonds, J.: Maximum matching and a polyhedron with 0-1 vertices. Journal Res. Nat. 69-B(1-22), 125–130 (1965)MathSciNetGoogle Scholar
  3. 3.
    Jin, H., Frumkin, M., Yan, J.: The OpenMP implementation of NAS parallel benchmarks and its performance. Tech. rep., NASA Ames Research Center (October 1999), http://www.nas.nasa.gov/Resources/Software/npb.html
  4. 4.
    Kandemir, M., Yemliha, T., Muralidhara, S., Srikantaiah, S., Irwin, M.J., Zhnag, Y.: Cache topology aware computation mapping for multicores. SIGPLAN Not. 45(6), 74–85 (2010)CrossRefGoogle Scholar
  5. 5.
    Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing 48, 96–129 (1998), http://dx.doi.org/10.1006/jpdc.1997.1404 CrossRefGoogle Scholar
  6. 6.
    Klug, T., Ott, M., Weidendorfer, J., Trinitis, C.: autopin — automated optimization of thread-to-core pinning on multicore systems. In: Stenström, P. (ed.) Transactions on HiPEAC III. LNCS, vol. 6590, pp. 219–235. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  7. 7.
    Lee, J., Wu, H., Ravichandran, M., Clark, N.: Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications. In: Proc. of the Annual International Symposium on Computer Architecture, ISCA 2010, pp. 270–279. ACM, New York (2010)Google Scholar
  8. 8.
    Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proc. of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2005, pp. 190–200. ACM, New York (2005), http://doi.acm.org/10.1145/1065010.1065034 CrossRefGoogle Scholar
  9. 9.
    Mazouz, A., Touati, S.A.A., Barthou, D.: Performance evaluation and analysis of thread pinning strategies on multi-core platforms: Case study of spec omp applications on intel architectures. In: Proc. of IEEE International Conference on High Performance Computing & Simulation, HPCS 2011, July 4-8, pp. 273–279. IEEE, Istanbul (2011)CrossRefGoogle Scholar
  10. 10.
    Mohr, B., Malony, A.D., Shende, S., Wolf, F.: Design and prototype of a performance tool interface for openmp. The Journal of Supercomputing 23, 105–128 (2002), http://portal.acm.org/citation.cfm?id=603339.603347 MATHCrossRefGoogle Scholar
  11. 11.
    Jain, R.: The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modelling. John Wiley and Sons (1991)Google Scholar
  12. 12.
    Song, F., Moore, S., Dongarra, J.: Feedback-directed thread scheduling with memory considerations. In: Proc. of the International Symposium on High Performance Distributed Computing, HPDC 2007, pp. 97–106. ACM, New York (2007), http://doi.acm.org/10.1145/1272366.1272380 Google Scholar
  13. 13.
    Song, F., Moore, S., Dongarra, J.: Analytical modeling and optimization for affinity based thread scheduling on multicore systems. In: Proc. of the IEEE International Conference on Cluster Computing, New Orleans, Louisiana, USA, August 31 - September 4. IEEE (2009)Google Scholar
  14. 14.
    Standard Performance Evaluation Corporation: SPEC CPU (2006), http://www.spec.org/
  15. 15.
    Tam, D., Azimi, R., Stumm, M.: Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In: Proc. of theACM SIGOPS/EuroSys European Conference on Computer Systems 2007, EuroSys 2007, pp. 47–58. ACM, New York (2007)CrossRefGoogle Scholar
  16. 16.
    Terboven, C., An Mey, D., Schmidl, D., Jin, H., Reichstein, T.: Data and thread affinity in OpenMP programs. In: Proc. of the Workshop on Memory Access on Future Processors, MAW 2008, pp. 377–384. ACM, New York (2008)CrossRefGoogle Scholar
  17. 17.
    Touati, S.A.A., Worms, J., Briais, S.: The Speedup-Test: A Statistical Methodology for Program Speedup Analysis and Computation. To Appear in the Journal of Concurrency and Computation: Practice and Experience (2012), http://hal.inria.fr/hal-00764454
  18. 18.
    Zhang, E.Z., Jiang, Y., Shen, X.: Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? In: Proc. of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2010, pp. 203–212. ACM, New York (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Abdelhafid Mazouz
    • 1
  • Sid-Ahmed-Ali Touati
    • 2
  • Denis Barthou
    • 3
  1. 1.University of Versailles Saint-Quentin-en-YvelinesFrance
  2. 2.University of Nice Sophia AntipolisFrance
  3. 3.University of BordeauxFrance

Personalised recommendations