Distributed Computing

, Volume 31, Issue 5, pp 367–388 | Cite as

\(\hbox {TM}^{2}\)C: a software transactional memory for many-cores

  • Vincent Gramoli
  • Rachid Guerraoui
  • Vasileios TrigonakisEmail author


Transactional memory is an appealing paradigm for concurrent systems. Many software implementations of the paradigm were proposed in the past two decades for both shared memory multi-core systems and clusters of distributed machines. Chip manufacturers have however started producing many-core architectures, with low network-on-chip communication latencies and limited support for cache coherence, rendering existing transactional-memory implementations inapplicable. This paper presents \(\hbox {TM}^{2}\hbox {C}\), the first software transactional memory protocol for many-core systems, hence featuring transactions that are both distributed and leverage shared memory. \(\hbox {TM}^{2}\hbox {C}\) exploits fast messages over network-on-chip to make accesses to shared data coherent. In particular, it allows visible read accesses to detect conflicts eagerly and incorporates the first distributed contention manager that guarantees the commit of all transactions. We evaluate \(\hbox {TM}^{2}\hbox {C}\) on Intel, AMD and Tilera architectures, ranging from common multi-cores to experimental many-cores. We build upon new message-passing protocols, based on both software and hardware, which are interesting in their own right. Our results on various benchmarks, including realistic banking and MapReduce applications, show that \(\hbox {TM}^{2}\hbox {C}\) scales well regardless of the underlying platform.


Transactional memory Many-cores Concurrent programming Contention management 


  1. 1.
    Abts, D., Enright Jerger, N.D., Kim, J., Gibson, D., Lipasti, M.H.: Achieving predictable performance through better memory controller placement in many-core cmps. In: ISCA, pp. 451–461 (2009)Google Scholar
  2. 2.
    Aguilera, M., Merchant, A., Veitch, A., Karamanolis, C.: Sinfonia : a new paradigm for building scalable distributed systems. In: SOSP (2007)Google Scholar
  3. 3.
    Attiya, H., Gramoli, V., Milani, A.: Brief announcement: combine—an improved directory-based consistency protocol. In: SPAA, pp. 72–73 (2010)Google Scholar
  4. 4.
    Attiya, H., Gramoli, V., Milani, A.: A provably starvation-free distributed directory protocol. In: SSS, pp. 405–419 (2010)Google Scholar
  5. 5.
    Balaji, P., Narravula, S., Vaidyanathan, K., Krishnamoorthy, S., Wu, J., Panda, D.K.: Sockets direct protocol over infiniband in clusters: is it beneficial? In: ISPASS, pp. 28–35 (2004)Google Scholar
  6. 6.
    Baumann, A., Barham, P., Dagand, P.-E., Harris, T., Isaacs, R., Peter, S., Roscoe, T., Schupbach, A., Singhania, A.: The multikernel: a new OS architecture for scalable multicore systems. In: SOSP, pp. 29–44 (2009)Google Scholar
  7. 7.
    Bayer, R., Schkolnick, M.: Concurrency of operations on b-trees. Acta Inf. 9(1), 1–21 (1977)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Berezecki, M., Frachtenberg, E., Paleczny, M., Steele, K.: Many-core key-value store. In: IGCC, pp. 1–8 (2011)Google Scholar
  9. 9.
    Bieniusa, A., Fuhrmann, T.: Consistency in hindsight: a fully decentralized stm algorithm. In: IPDPS, pp. 1–12 (2010)Google Scholar
  10. 10.
    Bocchino, R., Adve, V., Chamberlain, B.: Software transactional memory for large scale clusters. In: PPoPP, pp. 247–258 (2008)Google Scholar
  11. 11.
    Borkar, S.: Thousand core chips: a technology perspective. In: DAC, pp. 746–749 (2007)Google Scholar
  12. 12.
    Borkar, S., Chien, A.A.: The future of microprocessors. Commun. ACM 54(5), 67–77 (2011)CrossRefGoogle Scholar
  13. 13.
    Boyd-Wickizer, S., Clements, A.T., Mao, Y., Pesterev, A., Kaashoek, M.F., Morris, R., Zeldovich, Ni.: An analysis of linux scalability to many cores. In: OSDI (2010)Google Scholar
  14. 14.
    Boyd-Wickizer, S., Kaashoek, M.F., Morris, R., Zeldovich, N.: Non-scalable locks are dangerous. In: Proceedings of the Linux Symposium (2012)Google Scholar
  15. 15.
    Carvalho, N., Romano, P., Rodrigues, L.: Asynchronous lease-based replication of software transactional memory. In: Middleware, pp. 376–396 (2010)Google Scholar
  16. 16.
    Carvalho, N., Romano, P., Rodrigues, L.: SCert: Speculative certification in replicated software transactional memories. In: SYSTOR, pp. 10:1–10:13 (2011)Google Scholar
  17. 17.
    Choi, B., Komuravelli, R., Sung, H., Smolinski, R., Honarmand, N., Adve, S.V., Adve, V.S., Carter, N.P., Chou, C.-T.: Denovo: rethinking the memory hierarchy for disciplined parallelism. In: PACT, pp. 155–166 (2011)Google Scholar
  18. 18.
    Conway, P., Kalyanasundharam, N., Donley, G., Lepak, K., Hughes, B.: Cache hierarchy and memory subsystem of the amd opteron processor. Micro IEEE 30(2), 16–29 (2010)CrossRefGoogle Scholar
  19. 19.
    Couceiro, M., Romano, P., Carvalho, N., Rodrigues, L.: D2STM: dependable distributed software transactional memory. In: PRDC, pp. 307–313 (2009)Google Scholar
  20. 20.
    Dalessandro, L., Spear, M.F., Scott, M.L.: Norec: streamlining STM by abolishing ownership records. In PPoPP (2010)Google Scholar
  21. 21.
    David, T., Guerraoui, R., Trigonakis, V.: Everything you always wanted to know about synchronization but were afraid to ask. In: SOSP, pp. 33–48 (2013)Google Scholar
  22. 22.
    Défago, X., Schiper, A., Urbán, P.: Total order broadcast and multicast algorithms: taxonomy and survey. ACM Computing Surveys, pp. 372–421 (2004)Google Scholar
  23. 23.
    Dice, D., Shalev, O., Shavit, N.: Transactional locking II. In: DISC, pp. 194–208 (2006)Google Scholar
  24. 24.
    Dice, D., Shavit, N.: TLRW: return of the read-write lock. In: SPAA (2010)Google Scholar
  25. 25.
    Dragojevic, A., Felber, P., Gramoli, V., Guerraoui, R.: Why STM can be more than a research toy. Commun. ACM 54(4), 70–77 (2011)CrossRefGoogle Scholar
  26. 26.
    Fan, B., Andersen, D.G., Kaminsky, M.: Memc3: compact and concurrent memcache with dumber caching and smarter hashing. In NSDI (2013)Google Scholar
  27. 27.
    Felber, P., Fetzer, C., Riegel, T.: Dynamic performance tuning of word-based software transactional memory. In: PPoPP, pp. 237–246 (2008)Google Scholar
  28. 28.
    Felber, P., Gramoli, V., Guerraoui, R.: Elastic transactions. In: DISC, pp. 93–107 (2009)Google Scholar
  29. 29.
    Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A.D., Ailamaki, A., Falsafi, B.: Quantifying the mismatch between emerging scale-out applications and modern processors. ACM Trans. Comput. Syst. 30(4), 15:1–15:24 (2012)CrossRefGoogle Scholar
  30. 30.
    Gramoli, V.: More than you ever wanted to know about synchronization: Synchrobench, measuring the impact of the synchronization on concurrent algorithms. In: PPoPP, pp. 1–10 (2015)Google Scholar
  31. 31.
    Gramoli, V., Guerraoui, R., Trigonakis, V.: TM2C: a software transactional memory for many-cores. In: EuroSys, pp. 351–364 (2012)Google Scholar
  32. 32.
    Gray, J.: Notes on data base operating systems. In: Operating Systems, An Advanced Course, volume 60 of LNCS, pp. 393–481 (1978)Google Scholar
  33. 33.
    Guerraoui, R., Herlihy, M., Pochon, B.: Toward a theory of transactional contention managers. In: PODC, pp. 258–264 (2005)Google Scholar
  34. 34.
    Guerraoui, R., Kapalka, M.: The semantics of progress in lock-based transactional memory. In POPL, pp. 404–415 (2009)Google Scholar
  35. 35.
    Guerraoui, R., Kapalka, M.: Principles of Transactional Memory. Synthesis Lectures on Distributed Computing Theory. Morgan & Claypool Publishers, San Rafael (2010)zbMATHGoogle Scholar
  36. 36.
    Harmanci, D., Gramoli, V., Felber, P., Fetzer, C.: Extensible transactional memory testbed. J. Parallel Distrib. Comput. 70(10), 1053–1067 (2010)CrossRefzbMATHGoogle Scholar
  37. 37.
    Harris, T., Larus, J.R., Rajwar, R.: Transactional Memory. Synthesis Lectures on Computer Architecture, 2nd edn. Morgan & Claypool Publishers, San Rafael (2010)Google Scholar
  38. 38.
    Herlihy, M., Luchangco, V., Moir, M.: A flexible framework for implementing software transactional memory. In: OOPSLA, pp. 253–262 (2006)Google Scholar
  39. 39.
    Herlihy, M., Luchangco, V., Moir, M., Scherer, W.: Software transactional memory for dynamic-sized data structures. In: PODC, pp. 92–101 (2003)Google Scholar
  40. 40.
    Herlihy, M., Moss, J.E.B.: Transactional memory: architectural support for lock-free data structures. In: ISCA, pp. 289–300 (1993)Google Scholar
  41. 41.
    Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Elsevier (2012). (Revised Reprint) Google Scholar
  42. 42.
    Herlihy, M., Sun, Y.: Distributed transactional memory for metric-space networks. In: DISC, pp. 58–208 (2005)Google Scholar
  43. 43.
    Howard, J., Dighe, S., Hoskote, Y., Vangal, S., Finan, D., Ruhl, G., Jenkins, D., Wilson, H., Borkar, N., Schrom, G., Pailet, F., Jain, S., Jacob, T., Yada, S., Marella, S., Salihundam, P., Erraguntla, V., Konow, M., Riepen, M., Droege, G., Lindemann, J., Gries, M., Apel, T., Henriss, K., Lund-Larsen, T., Steibl, S., Borkar, S., De, V., Van Der Wijngaart, R., Mattson, T.: A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS. In: ISSCC, pp. 108–109 (2010)Google Scholar
  44. 44.
  45. 45.
    Jacobi, C., Slegel, T., Greiner, D.: Transactional memory architecture and implementation for ibm system z. In: MICRO, pp. 25–36 (2012)Google Scholar
  46. 46.
    Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., Ailamaki, A.: Scalability of write-ahead logging on multicore and multisocket hardware. VLDB J. 21(2), 239–263 (2012)CrossRefGoogle Scholar
  47. 47.
    Jose, J., Subramoni, H., Luo, M., Zhang, M., Huang, J., Wasi-ur Rahman, M., Islam, N.S., Ouyang, X., Wang, H., Sur, S., Panda, D.K. : Memcached design on high performance rdma capable interconnects. In: ICPP, pp. 743–752 (2011)Google Scholar
  48. 48.
    Kelm, J.H., Johnson, D.R., Tuohy, W., Lumetta, S.S., Patel, S.J.: Cohesion: a hybrid memory model for accelerators. In: ISCA, pp. 429–440 (2010)Google Scholar
  49. 49.
    Kontothanassis, L., Scott, M.: Software cache coherence for large scale multiprocessors. In: HPCA, pp. 286–295 (1995)Google Scholar
  50. 50.
    Kotselidis, C., Ansari, M., Jarvis, K., Luján, M., Kirkham, C., Watson, I.: DiSTM: a software transactional memory framework for clusters. In: ICPP, pp. 51–58 (2008)Google Scholar
  51. 51.
    Lenoski, D., Laudon, J., Gharachorloo, K., Gupta, A., Hennessy, J.: The directory-based cache coherence protocol for the DASH multiprocessor. In: ISCA, pp. 148–159 (1990)Google Scholar
  52. 52.
    Lim, H., Fan, B., Andersen, D.G., Kaminsky, M.: Silt: a memory-efficient, high-performance key-value store. In: SOSP, pp. 1–13 (2011)Google Scholar
  53. 53.
    Liskov, B.: The argus language and system. In: Distributed Systems: Methods and Tools for Specification, An Advanced Course, volume 190 of LNCS, pp. 343–430 (1985)Google Scholar
  54. 54.
    Manassiev, K., Mihailescu, M., Amza, C.: Exploiting distributed version concurrency in a transactional memory cluster. In PPoPP, pp. 198–208 (2006)Google Scholar
  55. 55.
    Martin, M., Blundell, C., Lewis, E.: Subtleties of transactional memory atomicity semantics. IEEE Comput. Archit. Lett. 5 (2006)Google Scholar
  56. 56.
    Martin, M.M.K., Hill, M.D., Sorin, D.J.: Why on-chip cache coherence is here to stay. Commun. ACM 55(7), 78–89 (2012)CrossRefGoogle Scholar
  57. 57.
    Mattson, T.G., Riepen, M., Lehnig, T., Brett, P., Haas, W., Kennedy, P., Howard, J., Vangal, S., Borkar, N., Ruhl, G., Dighe, S.: The 48-core SCC processor: the programmer’s view. In: SC, pp. 1–11 (2010)Google Scholar
  58. 58.
    Mellor-Crummey, J., Scott, M.L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM TOCS 9(1), 21–65 (1991)CrossRefGoogle Scholar
  59. 59.
    Michael, M.M.: Hazard pointers: safe memory reclamation for lock-free objects. IEEE Trans. Parallel Distrib. Syst. 15(6), 491–504 (2004)CrossRefGoogle Scholar
  60. 60.
    Olszewski, M., Cutler, J., Steffan, J.G.: Judostm: a dynamic binary-rewriting approach to software transactional memory. In: PACT, pp. 365–375 (2007)Google Scholar
  61. 61.
    Papamarcos, M.S., Patel, J.H.: A low-overhead coherence solution for multiprocessors with private cache memories. In: ISCA, pp. 348–354 (1984)Google Scholar
  62. 62.
    Pritchett, D.: Base: an acid alternative. Queue 6(3), 48–55 (2008)CrossRefGoogle Scholar
  63. 63.
    Rajwar, R., Goodman, J.R.: Speculative lock elision: enabling highly concurrent multithreaded execution. In: MICRO, pp. 294–305 (2001)Google Scholar
  64. 64.
    Romano, P., Carvalho, N., Rodrigues, L.: Towards distributed software transactional memory systems. In: LADIS, pp. 1–4 (2008)Google Scholar
  65. 65.
    Romano, P., Rodrigues, L., Carvalho, N., Cachopo, J.: Cloud-tm: harnessing the cloud with distributed transactional memories. SIGOPS Oper. Syst. Rev. 44(2), 1–6 (2010)CrossRefGoogle Scholar
  66. 66.
    Saad, M., Ravindran, B.: Snake: control flow distributed software transactional memory. In: SSS, pp. 238–252 (2011)Google Scholar
  67. 67.
    Saad, M., Ravindran, B.: Transactional Forwarding Algorithm. Technical Report, Virigina Tech (2011)Google Scholar
  68. 68.
    Scherer W., Scott, M.: Contention management in dynamic software transactional memory. In: PODC Workshop on Concurrency and Synchronization in Java Programs (2004)Google Scholar
  69. 69.
    Scherer W., Scott, M.: Advanced contention management for dynamic software transactional memory. In: PODC, pp. 240–248 (2005)Google Scholar
  70. 70.
    Sewall, J., Chhugani, J., Kim, C., Satish, N., Dubey, P.: Palm: parallel architecture-friendly latch-free modifications to b+ trees on many-core processors. PVLDB 4(11), 795–806 (2011)Google Scholar
  71. 71.
    Shavit, N., Touitou, D.: Software transactional memory. In: PODC, pp. 204–213 (1995)Google Scholar
  72. 72.
    Spear, M.F., Marathe, V.J., Dalessandro, L., Scott, M.L.: Privatization techniques for software transactional memory. In: PODC (2007)Google Scholar
  73. 73.
  74. 74.
    Wang, A., Gaudet, M., Wu, P., Amaral, J.N., Ohmacht, M., Barton, C., Silvera, R., Michael, M.: Evaluation of blue gene/q hardware support for transactional memories. In: PACT, pp. 127–136 (2012)Google Scholar
  75. 75.
    Welc, A., Saha, B., Adl-Tabatabai, A.-R.: Irrevocable transactions and their applications. In: SPAA (2008)Google Scholar
  76. 76.
    Zhang, B.: On the Design of Contention Managers and Cache-Coherence Protocols for Distributed Transactional Memory. Ph.D. Thesis, Virginia Tech (2009)Google Scholar
  77. 77.
    Zhang, B., Ravindran, B.: Relay : a cache-coherence protocol for distributed transactional memory. In: OPODIS, pp. 48–53 (2009)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  • Vincent Gramoli
    • 2
  • Rachid Guerraoui
    • 3
  • Vasileios Trigonakis
    • 1
    • 3
    Email author
  1. 1.Oracle LabsZürichSwitzerland
  2. 2.NICTA and University of Sydney, Concurrent Systems Research GroupSydneyAustralia
  3. 3.EPFL, ICT, LPDLausanneSwitzerland

Personalised recommendations