Skip to main content

Affinity-Aware Synchronization in Work Stealing Run-Times for NUMA Multi-core Processors

  • Conference paper
  • First Online:
  • 376 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 702))

Abstract

Modern high-performance server systems are typically built as several multi-core chips put together in a single system. Each chip is connected to its local memory via an integrated memory controller (IMC) behaving as a node and hence the single machine behaving as non-uniform memory architecture (NUMA). Various user-level run-time systems adapt work stealing load balancing technique in multi-core processors. The work stealing run-times have to be aware of the topology of the processor on which they are running. Work stealing run-times on multi-core processors typically rely on lock-based synchronization to guarantee the coherency of shared mutable state. Synchronization constructs such as mutex locks, condition variables, and barriers are extensively used in implementation of these user-level work stealing run-times. The locality of these lock variables in multi-socket NUMA processors has considerable impact on the performance of these run-time systems. This paper studies the effect of locality of these synchronization constructs and proposes NUMA awareness to them. The proposed methodology is implemented using a source to source translator of OpenMP run-time, evaluated using OpenMP microbenchmark programs.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ziakas, D.: Intel quick path interconnect architectural features supporting scalable system architectures. In IEEE 18th Annual Symposium on High Performance Interconnects (HOTI), pp. 1–6 (2010)

    Google Scholar 

  2. Hughes, B., Conway, P.: The AMD Opteron northbridge architecture. In IEEE Micro 27(2) (2007)

    Google Scholar 

  3. Majo, Z., Gross, T.: Memory system performance in a NUMA multicore multiprocessor. In Proceedings of the 4th Annual International Conference on Systems and Storage, p. 12. ACM, 30 May 2011

    Google Scholar 

  4. Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y., Blumofe, R.D.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 25(37), 55–69 (1996)

    Google Scholar 

  5. Pheatt, C.: Intel® threading building blocks. J. Comput. Sci. Coll. 23(4), 298–299 (2008)

    Google Scholar 

  6. Terboven, C., Wong, M., an Mey, D., Eichenberger, A.E.: The design of OpenMP thread affinity. In OpenMP in a Heterogeneous World, pp. 15–28 (2012)

    Google Scholar 

  7. Hadjidoukas, P.E., Agathos, S.N., Dimakopoulos, V.V.: Design and implementation of openmp tasks in the ompi compiler. In 15th Panhellenic Conference on Informatics (PCI), pp. 265–269. IEEE, 30 Sept 2011

    Google Scholar 

  8. http://paragroup.cse.uoi.gr/wpsite/software/ompi/

  9. Al Bahra, S.: Nonblocking algorithms and scalable multicore programming. Queue 11(5), 40 (2013)

    Google Scholar 

  10. Lev, Y., Chase, D.: Dynamic circular work-stealing deque. In Proceedings of the Seventeenth Annual ACM Symposium on Parallelism in Algorithms and Architectures ACM, pp. 21–28, 18 July 2005

    Google Scholar 

  11. Leiserson, C.E., Blumofe, R.D.: Scheduling multithreaded computations by work stealing. J. ACM (JACM) 46(5), 720–748 (1999)

    Article  MathSciNet  Google Scholar 

  12. Kleen, A.: A NUMA API for Linux. Novel Inc. (2005)

    Google Scholar 

  13. Hager, G., Wellein, G., Meier, M., Treibig, J.: LIKWID: lightweight performance tools. In Proceedings of the 2011 Companion on High Performance Computing Networking, Storage and Analysis Companion, pp. 29–30. ACM (2011)

    Google Scholar 

  14. OpenMP AR. OpenMP 4.0 specification. June 2013

    Google Scholar 

  15. Marathe, V.J., Shavit, N., Dice, D.: Lock cohorting: a general technique for designing NUMA locks. ACM Trans. Parallel Comput. 1(2), 13 (2015)

    Google Scholar 

  16. Marathe, V.J., Shavit, N., Dice, D.: Lock cohorting: a general technique for designing NUMA locks. In ACM SIGPLAN Notices, vol. 47, no. 8, pp. 247–256. ACM (2012)

    Google Scholar 

  17. O’Neill, D., Bull, J.M.: A microbenchmark suite for OpenMP 2.0. In ICPP, ACM SIGARCH Computer Architecture News, vol. 29, no. 5, pp. 41–8, 1 Dec 2001

    Google Scholar 

  18. Reid, F. McDonnell, N., Bull J.M.: A microbenchmark suite for openmp tasks. In International Workshop on OpenMP, pp. 271–274. Springer, Berlin, Heidelberg, 11 June 2012

    Google Scholar 

  19. https://software.intel.com/en-us/forums/software-tuningperformance-optimization-platform-monitoring/topic/600141

  20. Wanker, R., Raghavendra Rao, C., Vikranth, B.: Topology aware task stealing for on-chip NUMA multi-core processors. In Procedia Computer Science (ICCS’13), pp. 379–388 (2013)

    Google Scholar 

  21. Wanker, R., Raghavendra Rao, C., Vikranth, B.: Effective task binding in work stealing runtimes for NUMA multi-core processors. IJCSE, 8(4), pp. 189–196 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B. Vikranth .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vikranth, B., Wankar, R., Raghavendra Rao, C. (2019). Affinity-Aware Synchronization in Work Stealing Run-Times for NUMA Multi-core Processors. In: Mandal, J., Bhattacharyya, D., Auluck, N. (eds) Advanced Computing and Communication Technologies. Advances in Intelligent Systems and Computing, vol 702. Springer, Singapore. https://doi.org/10.1007/978-981-13-0680-8_12

Download citation

Publish with us

Policies and ethics