Hardware Transactional Memory Exploration in Coherence-Free Many-Core Architectures

  • Dimitra Papagiannopoulou
  • Andrea Marongiu
  • Tali Moreshet
  • Luca Benini
  • Maurice Herlihy
  • R. Iris Bahar


High-end embedded systems, like their general-purpose counterparts, are turning to many-core cluster-based shared-memory architectures that provide a shared memory abstraction subject to non-uniform memory access costs. In order to keep the cores and memory hierarchy simple, many-core embedded systems tend to employ simple, scratchpad-like memories, rather than hardware managed caches that require some form of cache coherence management. These “coherence-free” systems still require some means to synchronize memory accesses and guarantee memory consistency. Conventional lock-based approaches may be employed to accomplish the synchronization, but may lead to both usability and performance issues. Instead, speculative synchronization, such as hardware transactional memory, may be a more attractive approach. However, hardware speculative techniques traditionally rely on the underlying cache-coherence protocol to synchronize memory accesses among the cores. The lack of a cache-coherence protocol adds new challenges in the design of hardware speculative support. In this article, we present a new scheme for hardware transactional memory (HTM) support within a cluster-based, many-core embedded system that lacks an underlying cache-coherence protocol. We propose two alternative data versioning implementations for the HTM support, Full-Mirroring and Distributed Logging and we conduct a performance comparison between them. To the best of our knowledge, these are the first designs for speculative synchronization for this type of architecture. Through a set of benchmark experiments using our simulation platform, we show that our designs can achieve significant performance improvements over traditional lock-based schemes.


Transactional memory Embedded systems Parallel processing Coherence-free memory architectures 


  1. 1.
    Adapteva: Epiphany-IV 64-core 28nm microprocessor (E64G401). http://www.adapteva.com/epiphanyiv/ (2013)
  2. 2.
    Bit-tech.net: IBM releases “world’s most powerful” 5.5GHz processor. http://www.bit-tech.net/news/hardware/2012/08/29/ibm-zec12/1, 8 Sept 2012
  3. 3.
    Bortolotti, D., Pinto, C., Marongiu, A., Ruggiero, M., Benini, L.: Virtualsoc: A full-system simulation environment for massively parallel heterogeneous system-on-chip. In: 2013 IEEE International Symposium on Parallel and Distributed Processing, pp. 2182–2187 (2013).  https://doi.org/10.1109/IPDPSW.2013.177
  4. 4.
    Ferri, C., Marongiu, A., Lipton, B., Moreshet, T., Bahar, R.I., Herlihy, M., Benini, L.: SoC-TM: integrated HW/SW support for transactional memory programming on embedded mpsocs. In: CODES, pp. 39–48. Taipei, Taiwan (2011)Google Scholar
  5. 5.
    Ferri, C., Wood, S., Moreshet, T., Bahar, R.I., Herlihy, M.: Embedded-TM: energy and complexity-effective hardware transactional memory for embedded multicore systems. J. Parallel Distrib. Comput. 70(10), 1042–1052 (2010)CrossRefMATHGoogle Scholar
  6. 6.
    Helmstetter, C., Joloboff, V.: SimSoC: a systemC TLM integrated ISS for full system simulation. In: IEEE Asia Pacific Conference, pp. 1759–1762 (2008)Google Scholar
  7. 7.
    Herlihy, M., Moss, J.E.B.: Transactional memory: architectural support for lock-free data structures. In: ISCA, pp. 289–300 (1993).  https://doi.org/10.1145/165123.165164
  8. 8.
    Hong, S., Oguntebi, T., Casper, J., Bronson, N., Kozyrakis, C., Olukotun, K.: Eigenbench: A simple exploration tool for orthogonal tm characteristics. In: Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’10), IISWC ’10, pp. 1–11. IEEE Computer Society, Washington (2010).  https://doi.org/10.1109/IISWC.2010.5648812
  9. 9.
    Hong, S., Oguntebi, T., Casper, J., Bronson, N., Kozyrakis, C., Olukotun, K.: Eigenbench: a simple exploration tool for orthogonal TM characteristics. In: IEEE International Symposium on Workload Characterization (IISWC), 2010, pp. 1–11 (2010).  https://doi.org/10.1109/IISWC.2010.5648812
  10. 10.
    Intel Corporation: Transactional Synchronization in Haswell. http://software.intel.com/en-us/blogs/2012/02/07/transactional-synchronization-in-haswell/, 8 Sept 2012
  11. 11.
    Kalray: MPPA 256—Programmable Manycore Processor. www.kalray.eu/products/mppa-manycore/mppa-256/
  12. 12.
    Kunz, L., Girão, G., Wagner, F.: Evaluation of a hardware transactional memory model in an NoC-based embedded MPSoC. In: SBCCI, pp. 85–90. São Paulo, Brazil (2010)Google Scholar
  13. 13.
    Melpignano, D., Benini, L., Flamand, E., Jego, B., Lepley, T., Haugou, G., Clermidy, F., Dutoit, D.: Platform 2012, a many-core computing accelerator for embedded SoCs: performance evaluation of visual analytics applications. In: DAC, pp. 1137–1142. ACM (2012)Google Scholar
  14. 14.
    Meunier, Q., Petrot, F.: Lightweight transactional memory systems for NoCs based architectures: design, implementation and comparison of two policies. J. Parallel Distrib. Comput. 70(10), 1024–1041 (2010)CrossRefMATHGoogle Scholar
  15. 15.
    Minh, C.C., Chung, J., Kozyrakis, C., Olukotun, K.: STAMP: Stanford transactional applications for multi-processing. In: International Symposium on Workload Characterization (2008)Google Scholar
  16. 16.
    Moore, K.E., Bobba, J., Moravan, M.J., Hill, M.D., Wood, D.A.: LogTM: log-based transactional memory. In: HPCA, pp. 254–265 (2006)Google Scholar
  17. 17.
    NVIDIA: NVIDIA’s next generation CUDA compute architecture: Fermi. White paper, NVIDIA (2009)Google Scholar
  18. 18.
    Papagiannopoulou, D., Capodanno, G., Moreshet, T., Herlihy, M., Bahar, R.: Energy-efficient and high-performance lock speculation hardware for embedded multicore systems. ACM Trans. Embed. Comput. Syst. (2015).  https://doi.org/10.1145/2700097
  19. 19.
    Papagiannopoulou, D., Marongiu, A., Moreshet, T., Benini, L., Herlihy, M., Bahar, R.: Playing with fire: transactional memory revisited for error-resilient and energy-efficient MPSoC execution. In: GLSVLSI (2015).  https://doi.org/10.1145/2742060.2742090
  20. 20.
    Papagiannopoulou, D., Moreshet, T., Marongiu, A., Benini, L., Herlihy, M., Bahar, R.: Speculative synchronization for coherence-free embedded NUMA architectures. In: SAMOS, pp. 99–106 (2014).  https://doi.org/10.1109/SAMOS.2014.6893200
  21. 21.
    Rajwar, R., Goodman, J.R.: Speculative lock elision: enabling highly concurrent multithreaded execution. In: MICRO, pp. 294–305 (2001). http://dl.acm.org/citation.cfm?id=563998.564036
  22. 22.
    Rajwar, R., Goodman, J.R.: Transactional lock-free execution of lock-based programs. In: ASPLOS, pp. 5–17 (2002).  https://doi.org/10.1145/605397.605399

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Dimitra Papagiannopoulou
    • 1
  • Andrea Marongiu
    • 2
    • 3
  • Tali Moreshet
    • 4
  • Luca Benini
    • 2
    • 3
  • Maurice Herlihy
    • 1
  • R. Iris Bahar
    • 1
  1. 1.Brown UniversityProvidenceUSA
  2. 2.ETH ZurichZurichSwitzerland
  3. 3.DEI — University of BolognaBolognaItaly
  4. 4.Boston UniversityBostonUSA

Personalised recommendations