Dynamic Thread Pinning for Phase-Based OpenMP Programs
Thread affinity has appeared as an important technique to improve the overall program performance and for better performance stability. However, if we consider a program with multiple phases, it is unlikely that a single thread affinity produces the best program performance for all these phases. If we consider the case of OpenMP, applications may have multiple parallel regions, each with a distinct inter-thread data sharing pattern. In this paper, we propose an approach that allows to change thread affinity dynamically (thread migrations) between parallel regions at runtime to account for these distinct inter-thread data sharing patterns. We demonstrate that as far as cache sharing is concerned for SPEC OMP01, not all the tested OpenMP applications exhibit a distinct phase behavior. However, we show that while fixing thread affinity for the whole execution may improve performance by up to 30%, allowing dynamic thread pinning may improve performance by up to 40%. Furthermore, we provide an analysis about the required conditions to improve the effectiveness of the approach.
KeywordsOpenMP thread level parallelism thread affinity multicores
Unable to display preview. Download preview PDF.
- 1.Bienia, C., Kumar, S., Singh, J.P., Li, K.: The parsec benchmark suite: Characterization and architectural implications. In: Proc. of the International Conference on Parallel Architectures and Compilation Techniques, PACT 2008 (October 2008)Google Scholar
- 3.Jin, H., Frumkin, M., Yan, J.: The OpenMP implementation of NAS parallel benchmarks and its performance. Tech. rep., NASA Ames Research Center (October 1999), http://www.nas.nasa.gov/Resources/Software/npb.html
- 7.Lee, J., Wu, H., Ravichandran, M., Clark, N.: Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications. In: Proc. of the Annual International Symposium on Computer Architecture, ISCA 2010, pp. 270–279. ACM, New York (2010)Google Scholar
- 8.Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proc. of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2005, pp. 190–200. ACM, New York (2005), http://doi.acm.org/10.1145/1065010.1065034 CrossRefGoogle Scholar
- 9.Mazouz, A., Touati, S.A.A., Barthou, D.: Performance evaluation and analysis of thread pinning strategies on multi-core platforms: Case study of spec omp applications on intel architectures. In: Proc. of IEEE International Conference on High Performance Computing & Simulation, HPCS 2011, July 4-8, pp. 273–279. IEEE, Istanbul (2011)CrossRefGoogle Scholar
- 10.Mohr, B., Malony, A.D., Shende, S., Wolf, F.: Design and prototype of a performance tool interface for openmp. The Journal of Supercomputing 23, 105–128 (2002), http://portal.acm.org/citation.cfm?id=603339.603347 MATHCrossRefGoogle Scholar
- 11.Jain, R.: The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modelling. John Wiley and Sons (1991)Google Scholar
- 13.Song, F., Moore, S., Dongarra, J.: Analytical modeling and optimization for affinity based thread scheduling on multicore systems. In: Proc. of the IEEE International Conference on Cluster Computing, New Orleans, Louisiana, USA, August 31 - September 4. IEEE (2009)Google Scholar
- 14.Standard Performance Evaluation Corporation: SPEC CPU (2006), http://www.spec.org/
- 17.Touati, S.A.A., Worms, J., Briais, S.: The Speedup-Test: A Statistical Methodology for Program Speedup Analysis and Computation. To Appear in the Journal of Concurrency and Computation: Practice and Experience (2012), http://hal.inria.fr/hal-00764454
- 18.Zhang, E.Z., Jiang, Y., Shen, X.: Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? In: Proc. of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2010, pp. 203–212. ACM, New York (2010)Google Scholar