A Study of the Usefulness of Producer/Consumer Synchronization

Lin, Hao; Bae, Hansang; Midkiff, Samuel P.; Eigenmann, Rudolf; Kim, Soohong P.

doi:10.1007/978-3-642-36036-7_10

Hao Lin¹⁷,
Hansang Bae¹⁷,
Samuel P. Midkiff¹⁷,
Rudolf Eigenmann¹⁷ &
…
Soohong P. Kim¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7146))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

932 Accesses
1 Citations

Abstract

In the early 1980s, shared memory mini-super-computers had buses and memory whose speeds were relatively fast compared to processor speeds. This led to the widespread use of various producer/consumer (post/wait) synchronization schemes for enforcing data dependences within parallel doacross loops. The rise of the “killer micro”, instruction sets optimized for serial programs, and rapidly increasing processor clock rates driven by Moore’s law, led to special purpose synchronization instructions being replaced by software barriers combined with loop fission (to allow the barriers to enforce dependences.) One cost of this approach is poorer cache behavior because variables on which a dependence exists are now accessed in separate loops. With the advent of the multicore era, producer/consumer synchronization again appears plausible. In this paper we compare the performance of hardware and software synchronization schemes to barrier synchronization, and show that either hardware or software based producer/consumer synchronization can provide applications with superior performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Livermore Loops in C Version, http://www.netlib.org/benchmark/livermorec
Multifacet GEMS Wiki, Protocol, http://www.cs.wisc.edu/gems/doc/gems-wiki/moin.cgi/Protocols
Bull, J.M., O’Neill, D.: A Microbenchmark Suite for OpenMP 2.0. SIGARCH Comput. Archit. News 29, 41–48 (2001)
Article Google Scholar
Carr, S., Ding, C., Sweany, P.: Improving Software Pipelining With Unroll-and-Jam. In: Proceedings of the 29th Hawaii International Conference on System Sciences, HICSS 1996. Software Technology and Architecture, vol. 1, pp. 183–192. IEEE Computer Society, Washington, DC (1996)
Chapter Google Scholar
Chen, D.K., Su, H.M., Yew, P.C.: The Impact of Synchronization and Granularity on Parallel Systems. In: Proceedings of the 17th Annual International Symposium on Computer Architecture, ISCA 1990, pp. 239–248. ACM (1990)
Google Scholar
Culler, D., Singh, J., Gupta, A.: Parallel Computer Architecture: A Hardware/Software Approach, 1st edn. Morgan Kaufmann (1998)
Google Scholar
Cytron, R.: Doacross: Beyond Vectorization for Multiprocessors. In: ICPP, pp. 836–844 (1986)
Google Scholar
Dave, C., Bae, H., Min, S.J., Lee, S., Eigenmann, R., Midkiff, S.: Cetus: A Source-to-Source Compiler Infrastructure for Multicores. Computer 42, 36–42 (2009)
Article Google Scholar
Kejariwal, A., Saito, H., Tian, X., Girkar, M., Li, W., Banerjee, U., Nicolau, A., Polychronopoulos, C.D.: Lightweight Lock-free Synchronization Methods for Multithreading. In: Proceedings of the 20th Annual International Conference on Supercomputing, ICS 2006, pp. 361–371. ACM, New York (2006)
Chapter Google Scholar
Kennedy, K., McKinley, K.S.: Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1993. LNCS, vol. 768, pp. 301–320. Springer, Heidelberg (1994)
Chapter Google Scholar
Kim, S.P., Midkiff, S.P., Dietz, H.G.: Hardware Support for OpenMP Collective Operations. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds.) LCPC 2009. LNCS, vol. 5898, pp. 31–49. Springer, Heidelberg (2010)
Chapter Google Scholar
Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., Werner, B.: Simics: A Full System Simulation Platform. Computer 35(2), 50–58 (2002)
Article Google Scholar
Manjikian, N., Abdelrahman, T.S.: Fusion of Loops for Parallelism and Locality. IEEE Trans. Parallel Distrib. Syst. 8, 193–209 (1997)
Article Google Scholar
Martin, M.M.K., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s General Execution-driven Multiprocessor Simulator (GEMS) Toolset. SIGARCH Comput. Archit. News 33, 92–99 (2005)
Article Google Scholar
Midkiff, S.P., Padua, D.A.: Compiler Algorithms for Synchronization. IEEE Transactions on Computers C-36(12), 1485–1495 (1987)
Article Google Scholar
Padua, D.: Multiprocessors: Discussion of Some Theoretical and Practical Problems. Ph.D. thesis, University of Illinois, Urbana, Illinois, USA (1979)
Google Scholar
Qasem, A., Kennedy, K.: A Cache-Conscious Profitability Model for Empirical Tuning of Loop Fusion. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds.) LCPC 2005. LNCS, vol. 4339, pp. 106–120. Springer, Heidelberg (2006)
Chapter Google Scholar
Sampson, J., Gonzalez, R., Collard, J.F., Jouppi, N.P., Schlansker, M., Calder, B.: Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 39, pp. 235–246. IEEE Computer Society, Washington, DC (2006)
Google Scholar
Su, H.M., Yew, P.C.: On Data Synchronization for Multiprocessors. In: International Symposium on Computer Architecture, vol. 17, pp. 416–423 (1989)
Google Scholar
Wolf, M.E., Lam, M.S.: A Data Locality Optimizing Algorithm. In: Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, PLDI 1991, pp. 30–44. ACM, New York (1991)
Chapter Google Scholar
Zhu, W.: Synchronization State Buffer: Supporting Efficient Fine-grain Synchronization on Many-core Architectures. Ph.D. thesis, University of Delaware, Newark, DE, USA (2006)
Google Scholar
Zhu, W., Sreedhar, V.C., Hu, Z., Gao, G.R.: Synchronization State Buffer: Supporting Efficient Fine-grain Synchronization on Many-core Architectures. In: Proceedings of the 34th Annual International Symposium on Computer Architecture, ISCA 2007, pp. 35–45. ACM, New York (2007)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA
Hao Lin, Hansang Bae, Samuel P. Midkiff, Rudolf Eigenmann & Soohong P. Kim

Authors

Hao Lin
View author publications
You can also search for this author in PubMed Google Scholar
Hansang Bae
View author publications
You can also search for this author in PubMed Google Scholar
Samuel P. Midkiff
View author publications
You can also search for this author in PubMed Google Scholar
Rudolf Eigenmann
View author publications
You can also search for this author in PubMed Google Scholar
Soohong P. Kim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, Colorado State University, 80523-1873, Fort Collins, CO, USA
Sanjay Rajopadhye & Michelle Mills Strout &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, H., Bae, H., Midkiff, S.P., Eigenmann, R., Kim, S.P. (2013). A Study of the Usefulness of Producer/Consumer Synchronization. In: Rajopadhye, S., Mills Strout, M. (eds) Languages and Compilers for Parallel Computing. LCPC 2011. Lecture Notes in Computer Science, vol 7146. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36036-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-36036-7_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36035-0
Online ISBN: 978-3-642-36036-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics