Support for Fine-Grained Synchronization in Shared-Memory Multiprocessors

  • Vladimir Vlassov
  • Oscar Sierra Merino
  • Csaba Andras Moritz
  • Konstantin Popov
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4671)


It has been already verified that hardware-supported finegrain synchronization provides a significant performance improvement over coarse-grained synchronization mechanisms, such as barriers. Support for fine-grain synchronization on individual data items becomes notably important in order to efficiently exploit thread-level parallelism available on multi-threading and multi-core processors. Fine-grained synchronization can be achieved using the full/empty tagged shared memory. We define the complete set of synchronizing memory instructions as well as the architecture of the full/empty tagged shared memory that provides support for these operations. We develop a snoopy cache coherency protocol for an SMP with the centralized full/empty tagged memory.


Shared Memory Memory Location Computer Architecture Cache Line Transactional Memory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agarwal, et al.: The MIT Alewife machine: architecture and performance. In: ISCA 1995. Proceedings of the 22nd Annual International Symposium on Computer Architecture, Margherita Ligure, Italy, pp. 2–13. ACM Press, New York (1995)CrossRefGoogle Scholar
  2. 2.
    Alverson,, et al.: The Tera computer system. In: ICS 1990. Proceedings of the 4th International Conference on Supercomputing, Amsterdam, The Netherlands, pp. 1–6. ACM Press, New York (1990)CrossRefGoogle Scholar
  3. 3.
    Ang, B., Arvind, Chiou, D.: StarT the Next Generation: Integrating global caches and dataflow architecture. In: Advanced Topics in Dataflow Computing and Multithreading, IEEE Press, New York (1995)Google Scholar
  4. 4.
    Arvind, R.N., Pingali, K.: I-structures: data structures for parallel computing. ACM Transactions on Programming Languages and Systems (TOPLAS) 11(4), 598–632 (1989)CrossRefGoogle Scholar
  5. 5.
    Barth, P., Nikhil, R., Arvind.: M-structures: extending a parallel, non-strict, functional language with state. In: Proceedings of the 5th ACM Conference on Functional Programming Languages and Computer Architecture, Cambridge, MA, U.S, pp. 538–568. Springer, Heidelberg (1991)Google Scholar
  6. 6.
    Chen, D.-K., Su, H.-M., Yew, P.-C.: The impact of synchronization and granularity on parallel systems. In: ISCA 1990. Proceedings of the 17th Annual International Symposium on Computer Architecture, Seattle, Washington, pp. 239–248. ACM Press, New York (1990)CrossRefGoogle Scholar
  7. 7.
    Culler, D.E., Singh, J.P., Gupta, A.: Parallel Computer Architecture. Morgan Kaufmann, Seattle (1997)Google Scholar
  8. 8.
    Feo, J., Harper, D., Kahan, S., Konecny, P.: ELDORADO. In: CF 2005. Proceedings of the 2nd Conference on Computing Frontiers, Ischia, Italy, pp. 28–34. ACM Press, New York (2005)CrossRefGoogle Scholar
  9. 9.
    Goodman, J., Vernon, M., Woest, P.: Efficient synchronization primitives for large-scale cache-coherent multiprocessors. In: ASPLOS-III: Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems, Boston, Massachusetts, pp. 64–75. ACM Press, New York (1989)CrossRefGoogle Scholar
  10. 10.
    Hammond, et al.: Transactional memory coherence and consistency. In: Proceedings of the 31st Annual International Symposium on Computer Architecture, p. 102. IEEE Computer Society, Los Alamitos (2004)CrossRefGoogle Scholar
  11. 11.
    Herlihy, M., Moss, J.: Transactional memory: architectural support for lock-free data structures. In: Proceedings of the 20th Annual International Symposium on Computer Architecture, San Diego, California, pp. 289–300. ACM Press, New York (1993)CrossRefGoogle Scholar
  12. 12.
    Kägi, A., Burger, D., Goodman, J.: Efficient synchronization: Let them eat QOLB. In: Proceedings of the 24th Annual International Symposium on Computer Architecture, Denver, Colorado, pp. 170–180. ACM Press, New York (1997)Google Scholar
  13. 13.
    Kim, N., Austin, T., Blaauw, D., Mudge, T., Flautner, K., Hu, J., Irwin, M., Kandemir, M., Narayanan, V.: Leakage current: Moore’s Law meets static power. IEEE Computer 36(12), 68–75 (2003)Google Scholar
  14. 14.
    Kranz, D., Lim, B.H., Agarwal, A., Yeung, D.: Low-cost support for fine-grain synchronization in multiprocessors. In: Multithreaded Computer Architecture: A Summary of the State of the Art, pp. 139–166. Kluwer Academic Publishers, Boston (1994)Google Scholar
  15. 15.
    Kroft, D.: Lockup-free instruction fetch/prefetch cache organization. In: ISCA 1998. 25 years of the International Symposia on Computer Architecture (selected papers), Barcelona, Spain, pp. 195–201. ACM Press, New York (1998)CrossRefGoogle Scholar
  16. 16.
    Lim, B.-H., Agarwal, A.: Reactive synchronization algorithms for multiprocessors. In: ASPLOS-VI. Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, U.S, pp. 25–35. ACM Press, New York (1994)CrossRefGoogle Scholar
  17. 17.
    McDonald, A., Chung, J., Carlstrom, B., Minh, C., Chafi, H., Kozyrakis, C., Olukotun, K.: Architectural semantics for practical transactional memory. ACM SIGARCH Computer Architecture News 34(2), 53–65 (2006)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Merino, O.S., Vlassov, V., Moritz, C.A.: Performance implication of fine-grained synchronization in multiprocessors. Technical Report TRITAIMITLECS R 02:02, Department of Microelectronics and Information Technology (IMIT) Royal Institute of Technology (KTH), Stockholm, Sweden (2002)Google Scholar
  19. 19.
    Moore, K., Bobba, J., Moravan, M., Hill, M., Wood, D.: LogTM: Log-based transactional memory. In: Proceedings of the 12th International Symposium on High-Performance Computer Architecture, pp. 254–265 (February 2006)Google Scholar
  20. 20.
    Olukotun, K., Nayfeh, B., Hammond, L., Wilson, K., Chang, K.: The case for a single-chip multiprocessor. In: ASPLOS-VII. Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, Cambridge, Massachusetts, pp. 2–11. ACM Press, New York (1996)CrossRefGoogle Scholar
  21. 21.
    Ronen, R., Mendelson, A., Lai, K., Lu, S.-L., Pollack, F., Shen, J.P.: Coming challenges in microarchitecture and architecture. Proceedings of the IEEE 89(3), 325–340 (2001)CrossRefGoogle Scholar
  22. 22.
    Sutter, H.: The free lunch is over: A fundamental turn toward concurrency in software. Dr. Dobb’s Journal 30(3) (March 2005)Google Scholar
  23. 23.
    Tullsen, D., Eggers, S., Levy, H.: Simultaneous multithreading: Maximizing on-chip parallelism. In: The 22th Annual International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, pp. 392–403. ACM Press, New York (1995)CrossRefGoogle Scholar
  24. 24.
    Tullsen, D., Lo, J., Eggers, S., Levy, H.: Supporting fine-grained synchronization on a simultaneous multithreading processor. In: HPCA 1999. Proceedings of the 5th International Symposium on High Performance Computer Architecture, pp. 54–58. IEEE Computer Society, Los Alamitos (1999)Google Scholar
  25. 25.
    Vachharajani, N., Iyer, M., Ashok, C., Vachharajani, M., August, D., Connors, D.: Chip multi-processor scalability for single-threaded applications. SIGARCH Computer Architecture News 33(4), 44–53 (2005)CrossRefGoogle Scholar
  26. 26.
    Vlassov, V., Moritz, C.A.: Efficient fine grained synchronization support using full/empty tagged shared memory and cache coherency. Technical Report TRITA-IT-R 00:04, Deptartment of Teleinformatics, Royal Institute of Technology (KTH) (December 2000)Google Scholar
  27. 27.
    Xiaowei, S.: Implementing global cache coherence in *T-NG. Master’s thesis, Department of Electrical Engineering and Computer Science, MIT (May 1995)Google Scholar
  28. 28.
    Yeung, D., Agarwal, A.: Experience with fine-grain synchronization in MIMD machines for preconditioned conjugate gradient. In: PPOPP 1993. Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, U.S, pp. 187–192. ACM Press, New York (1993)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Vladimir Vlassov
    • 1
  • Oscar Sierra Merino
    • 1
  • Csaba Andras Moritz
    • 2
  • Konstantin Popov
    • 3
  1. 1.Royal Institute of Technology (KTH), StockholmSweden
  2. 2.University of Massachusetts (UMASS), Amherst, MAU.S.A.
  3. 3.Swedish Institute of Computer Science (SICS), StockholmSweden

Personalised recommendations