Assessing OpenMP Tasking Implementations on NUMA Architectures
The introduction of task-level parallelization promises to raise the level of abstraction compared to thread-centric expression of parallelism. However, tasks might exhibit poor performance on NUMA systems if locality cannot be maintained. In contrast to traditional OpenMP worksharing constructs for which threads can be bound, the behavior of tasks is much less predetermined by the OpenMP specification and implementations have a high degree of freedom implementing task scheduling.
Employing different approaches to express task-parallelism, namely the single-producer and parallel-producer patterns with different data initialization strategies, we compare the behavior and quality of OpenMP implementations with task-parallel codes on NUMA architectures. For the programmer, we propose recipies to express parallelism with tasks allowing to preserve data locality while optimizing the degree of parallelism. Our proposals are evaluated on reasonably large NUMA systems with both important application kernels as well as a real-world simulation code.
KeywordsStatic Schedule Work Package Chunk Size System Topology Task Creation
Unable to display preview. Download preview PDF.
- 4.Bull, J.M.: Measuring Synchronisation and Scheduling Overheads in OpenMP. In: Proceedings of First European Workshop on OpenMP, pp. 99–105 (1999)Google Scholar
- 5.Davis, T.A.: University of Florida Sparse Matrix Collection. NA Digest, 92 (1994)Google Scholar
- 7.Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP. In: Parallel Processing, (ICPP 2009), pp. 124–131 (September 2009)Google Scholar
- 8.Gerndt, A., Sarholz, S., Wolter, M., Mey, D.A., Bischof, C., Kuhlen, T.: Nested OpenMP for Efficient Computation of 3D Critical Points in Multi-Block CFD Datasets. In: Proceedings of the ACM/IEEE, SC 2006 Conference, p. 46 (November 2006)Google Scholar
- 11.McCalpin, J.: STREAM: Sustainable Memory Bandwidth in High Performance Computers (1999), http://www.cs.virginia.edu/stream (accessed March 29, 2012)
- 13.OpenMP ARB. OpenMP Application Program Interface, v. 3.1, http://www.openmp.org
- 14.Teruel, X., Martorell, X., Duran, A., Ferrer, R., Ayguadé, E.: Support for OpenMP tasks in Nanos v4. In: Lyons, K.A., Couturier, C. (eds.) Proceedings of the 2007 Conference of the Centre for Advanced Studies on Collaborative Research, pp. 256–259. IBM (October 2007)Google Scholar