Affinity-Aware Synchronization in Work Stealing Run-Times for NUMA Multi-core Processors

Vikranth, B.; Wankar, Rajeev; Raghavendra Rao, C.

doi:10.1007/978-981-13-0680-8_12

Affinity-Aware Synchronization in Work Stealing Run-Times for NUMA Multi-core Processors

B. Vikranth¹⁷,
Rajeev Wankar¹⁸ &
C. Raghavendra Rao¹⁸

Conference paper
First Online: 06 July 2018

376 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 702))

Abstract

Modern high-performance server systems are typically built as several multi-core chips put together in a single system. Each chip is connected to its local memory via an integrated memory controller (IMC) behaving as a node and hence the single machine behaving as non-uniform memory architecture (NUMA). Various user-level run-time systems adapt work stealing load balancing technique in multi-core processors. The work stealing run-times have to be aware of the topology of the processor on which they are running. Work stealing run-times on multi-core processors typically rely on lock-based synchronization to guarantee the coherency of shared mutable state. Synchronization constructs such as mutex locks, condition variables, and barriers are extensively used in implementation of these user-level work stealing run-times. The locality of these lock variables in multi-socket NUMA processors has considerable impact on the performance of these run-time systems. This paper studies the effect of locality of these synchronization constructs and proposes NUMA awareness to them. The proposed methodology is implemented using a source to source translator of OpenMP run-time, evaluated using OpenMP microbenchmark programs.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Ziakas, D.: Intel quick path interconnect architectural features supporting scalable system architectures. In IEEE 18th Annual Symposium on High Performance Interconnects (HOTI), pp. 1–6 (2010)
Google Scholar
Hughes, B., Conway, P.: The AMD Opteron northbridge architecture. In IEEE Micro 27(2) (2007)
Google Scholar
Majo, Z., Gross, T.: Memory system performance in a NUMA multicore multiprocessor. In Proceedings of the 4th Annual International Conference on Systems and Storage, p. 12. ACM, 30 May 2011
Google Scholar
Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y., Blumofe, R.D.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 25(37), 55–69 (1996)
Google Scholar
Pheatt, C.: Intel® threading building blocks. J. Comput. Sci. Coll. 23(4), 298–299 (2008)
Google Scholar
Terboven, C., Wong, M., an Mey, D., Eichenberger, A.E.: The design of OpenMP thread affinity. In OpenMP in a Heterogeneous World, pp. 15–28 (2012)
Google Scholar
Hadjidoukas, P.E., Agathos, S.N., Dimakopoulos, V.V.: Design and implementation of openmp tasks in the ompi compiler. In 15th Panhellenic Conference on Informatics (PCI), pp. 265–269. IEEE, 30 Sept 2011
Google Scholar
http://paragroup.cse.uoi.gr/wpsite/software/ompi/
Al Bahra, S.: Nonblocking algorithms and scalable multicore programming. Queue 11(5), 40 (2013)
Google Scholar
Lev, Y., Chase, D.: Dynamic circular work-stealing deque. In Proceedings of the Seventeenth Annual ACM Symposium on Parallelism in Algorithms and Architectures ACM, pp. 21–28, 18 July 2005
Google Scholar
Leiserson, C.E., Blumofe, R.D.: Scheduling multithreaded computations by work stealing. J. ACM (JACM) 46(5), 720–748 (1999)
Article MathSciNet Google Scholar
Kleen, A.: A NUMA API for Linux. Novel Inc. (2005)
Google Scholar
Hager, G., Wellein, G., Meier, M., Treibig, J.: LIKWID: lightweight performance tools. In Proceedings of the 2011 Companion on High Performance Computing Networking, Storage and Analysis Companion, pp. 29–30. ACM (2011)
Google Scholar
OpenMP AR. OpenMP 4.0 specification. June 2013
Google Scholar
Marathe, V.J., Shavit, N., Dice, D.: Lock cohorting: a general technique for designing NUMA locks. ACM Trans. Parallel Comput. 1(2), 13 (2015)
Google Scholar
Marathe, V.J., Shavit, N., Dice, D.: Lock cohorting: a general technique for designing NUMA locks. In ACM SIGPLAN Notices, vol. 47, no. 8, pp. 247–256. ACM (2012)
Google Scholar
O’Neill, D., Bull, J.M.: A microbenchmark suite for OpenMP 2.0. In ICPP, ACM SIGARCH Computer Architecture News, vol. 29, no. 5, pp. 41–8, 1 Dec 2001
Google Scholar
Reid, F. McDonnell, N., Bull J.M.: A microbenchmark suite for openmp tasks. In International Workshop on OpenMP, pp. 271–274. Springer, Berlin, Heidelberg, 11 June 2012
Google Scholar
https://software.intel.com/en-us/forums/software-tuningperformance-optimization-platform-monitoring/topic/600141
Wanker, R., Raghavendra Rao, C., Vikranth, B.: Topology aware task stealing for on-chip NUMA multi-core processors. In Procedia Computer Science (ICCS’13), pp. 379–388 (2013)
Google Scholar
Wanker, R., Raghavendra Rao, C., Vikranth, B.: Effective task binding in work stealing runtimes for NUMA multi-core processors. IJCSE, 8(4), pp. 189–196 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

CVR College of Engineering, Hyderabad, 501510, India
B. Vikranth
SCIS, University of Hyderabad, Hyderabad, 500046, India
Rajeev Wankar & C. Raghavendra Rao

Authors

B. Vikranth
View author publications
You can also search for this author in PubMed Google Scholar
Rajeev Wankar
View author publications
You can also search for this author in PubMed Google Scholar
C. Raghavendra Rao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to B. Vikranth .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Faculty of Engineering, Technology and Management, University of Kalyani, Kalyani, West Bengal, India
Jyotsna Kumar Mandal
Computational Science Division, Saha Institute of Nuclear Physics, Kolkata, West Bengal, India
Dhananjay Bhattacharyya
Department of Computer Science and Engineering, Indian Institute of Technology Ropar, Rupnagar, Punjab, India
Nitin Auluck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vikranth, B., Wankar, R., Raghavendra Rao, C. (2019). Affinity-Aware Synchronization in Work Stealing Run-Times for NUMA Multi-core Processors. In: Mandal, J., Bhattacharyya, D., Auluck, N. (eds) Advanced Computing and Communication Technologies. Advances in Intelligent Systems and Computing, vol 702. Springer, Singapore. https://doi.org/10.1007/978-981-13-0680-8_12

Download citation

DOI: https://doi.org/10.1007/978-981-13-0680-8_12
Published: 06 July 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0679-2
Online ISBN: 978-981-13-0680-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics