Skip to main content

A Study of the Usefulness of Producer/Consumer Synchronization

  • Conference paper
Languages and Compilers for Parallel Computing (LCPC 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7146))

Abstract

In the early 1980s, shared memory mini-super-computers had buses and memory whose speeds were relatively fast compared to processor speeds. This led to the widespread use of various producer/consumer (post/wait) synchronization schemes for enforcing data dependences within parallel doacross loops. The rise of the “killer micro”, instruction sets optimized for serial programs, and rapidly increasing processor clock rates driven by Moore’s law, led to special purpose synchronization instructions being replaced by software barriers combined with loop fission (to allow the barriers to enforce dependences.) One cost of this approach is poorer cache behavior because variables on which a dependence exists are now accessed in separate loops. With the advent of the multicore era, producer/consumer synchronization again appears plausible. In this paper we compare the performance of hardware and software synchronization schemes to barrier synchronization, and show that either hardware or software based producer/consumer synchronization can provide applications with superior performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Livermore Loops in C Version, http://www.netlib.org/benchmark/livermorec

  2. Multifacet GEMS Wiki, Protocol, http://www.cs.wisc.edu/gems/doc/gems-wiki/moin.cgi/Protocols

  3. Bull, J.M., O’Neill, D.: A Microbenchmark Suite for OpenMP 2.0. SIGARCH Comput. Archit. News 29, 41–48 (2001)

    Article  Google Scholar 

  4. Carr, S., Ding, C., Sweany, P.: Improving Software Pipelining With Unroll-and-Jam. In: Proceedings of the 29th Hawaii International Conference on System Sciences, HICSS 1996. Software Technology and Architecture, vol. 1, pp. 183–192. IEEE Computer Society, Washington, DC (1996)

    Chapter  Google Scholar 

  5. Chen, D.K., Su, H.M., Yew, P.C.: The Impact of Synchronization and Granularity on Parallel Systems. In: Proceedings of the 17th Annual International Symposium on Computer Architecture, ISCA 1990, pp. 239–248. ACM (1990)

    Google Scholar 

  6. Culler, D., Singh, J., Gupta, A.: Parallel Computer Architecture: A Hardware/Software Approach, 1st edn. Morgan Kaufmann (1998)

    Google Scholar 

  7. Cytron, R.: Doacross: Beyond Vectorization for Multiprocessors. In: ICPP, pp. 836–844 (1986)

    Google Scholar 

  8. Dave, C., Bae, H., Min, S.J., Lee, S., Eigenmann, R., Midkiff, S.: Cetus: A Source-to-Source Compiler Infrastructure for Multicores. Computer 42, 36–42 (2009)

    Article  Google Scholar 

  9. Kejariwal, A., Saito, H., Tian, X., Girkar, M., Li, W., Banerjee, U., Nicolau, A., Polychronopoulos, C.D.: Lightweight Lock-free Synchronization Methods for Multithreading. In: Proceedings of the 20th Annual International Conference on Supercomputing, ICS 2006, pp. 361–371. ACM, New York (2006)

    Chapter  Google Scholar 

  10. Kennedy, K., McKinley, K.S.: Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1993. LNCS, vol. 768, pp. 301–320. Springer, Heidelberg (1994)

    Chapter  Google Scholar 

  11. Kim, S.P., Midkiff, S.P., Dietz, H.G.: Hardware Support for OpenMP Collective Operations. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds.) LCPC 2009. LNCS, vol. 5898, pp. 31–49. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  12. Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., Werner, B.: Simics: A Full System Simulation Platform. Computer 35(2), 50–58 (2002)

    Article  Google Scholar 

  13. Manjikian, N., Abdelrahman, T.S.: Fusion of Loops for Parallelism and Locality. IEEE Trans. Parallel Distrib. Syst. 8, 193–209 (1997)

    Article  Google Scholar 

  14. Martin, M.M.K., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s General Execution-driven Multiprocessor Simulator (GEMS) Toolset. SIGARCH Comput. Archit. News 33, 92–99 (2005)

    Article  Google Scholar 

  15. Midkiff, S.P., Padua, D.A.: Compiler Algorithms for Synchronization. IEEE Transactions on Computers C-36(12), 1485–1495 (1987)

    Article  Google Scholar 

  16. Padua, D.: Multiprocessors: Discussion of Some Theoretical and Practical Problems. Ph.D. thesis, University of Illinois, Urbana, Illinois, USA (1979)

    Google Scholar 

  17. Qasem, A., Kennedy, K.: A Cache-Conscious Profitability Model for Empirical Tuning of Loop Fusion. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds.) LCPC 2005. LNCS, vol. 4339, pp. 106–120. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  18. Sampson, J., Gonzalez, R., Collard, J.F., Jouppi, N.P., Schlansker, M., Calder, B.: Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 39, pp. 235–246. IEEE Computer Society, Washington, DC (2006)

    Google Scholar 

  19. Su, H.M., Yew, P.C.: On Data Synchronization for Multiprocessors. In: International Symposium on Computer Architecture, vol. 17, pp. 416–423 (1989)

    Google Scholar 

  20. Wolf, M.E., Lam, M.S.: A Data Locality Optimizing Algorithm. In: Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, PLDI 1991, pp. 30–44. ACM, New York (1991)

    Chapter  Google Scholar 

  21. Zhu, W.: Synchronization State Buffer: Supporting Efficient Fine-grain Synchronization on Many-core Architectures. Ph.D. thesis, University of Delaware, Newark, DE, USA (2006)

    Google Scholar 

  22. Zhu, W., Sreedhar, V.C., Hu, Z., Gao, G.R.: Synchronization State Buffer: Supporting Efficient Fine-grain Synchronization on Many-core Architectures. In: Proceedings of the 34th Annual International Symposium on Computer Architecture, ISCA 2007, pp. 35–45. ACM, New York (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lin, H., Bae, H., Midkiff, S.P., Eigenmann, R., Kim, S.P. (2013). A Study of the Usefulness of Producer/Consumer Synchronization. In: Rajopadhye, S., Mills Strout, M. (eds) Languages and Compilers for Parallel Computing. LCPC 2011. Lecture Notes in Computer Science, vol 7146. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36036-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36036-7_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36035-0

  • Online ISBN: 978-3-642-36036-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics