The Journal of Supercomputing

, Volume 68, Issue 2, pp 914–934 | Cite as

Selective dynamic serialization for reducing energy consumption in hardware transactional memory systems

  • Epifanio Gaona
  • J. Rubén Titos-Gil
  • Juan Fernández
  • Manuel E. Acacio
Article

Abstract

In the search for new paradigms to simplify multithreaded programming, Transactional Memory (TM) is currently being advocated as a promising alternative to deadlock-prone lock-based synchronization. In this way, future many-core CMP architectures may need to provide hardware support for TM. On the other hand, power dissipation constitutes a first class consideration in multicore processor designs. In this work, we propose Selective Dynamic Serialization (SDS) as a new technique to improve energy consumption without degrading performance in applications with conflicting transactions by avoiding wasted work due to aborted transactions. Our proposal, which is implemented on top of a hardware transactional memory (HTM) system with an eager conflict management policy, detects and serializes conflicting transactions dynamically (at run-time). In its simplest form, in case of conflict, one transaction is allowed to continue whilst the rest are completely stalled. Once the executing transaction has finished, it wakes up several of the stalling transactions. More elaborated implementations of SDS try to delay this behavior until serialization of transactions is profitable, achieving the best trade-off between performance, energy savings and network traffic. SDS implementations differ from each other in the condition that triggers the serialization mode. We have evaluated several SDS schemes using GEMS, a full-system simulator implementing the LogTM-SE Eager–Eager HTM system, and several benchmarks from the STAMP suite. Results for a 16-core CMP show that SDS obtains reductions of 6 % on average in energy consumption (more than 20 % in high contention scenarios) in a wide range of benchmarks without affecting, on average, execution time. At the same time, network traffic level is also reduced by 22 %.

Keywords

Many-core CMPs Hardware transactional memory Transactions Run-time serialization Energy consumption  Execution time 

Notes

Acknowledgments

This work was supported by the Spanish MINECO, as well as European Commission FEDER funds, under grant “TIN2012-38341-C04-03”. Epifanio Gaona Ramírez is supported by fellowship 09503/FPI/08 from Fundación Séneca, Agencia Regional de Ciencia y Tecnología de la Región de Murcia (II PCTRM).

References

  1. 1.
    Borkar S (2007) Thousand core chips: a technology perspective. In: DAC-44Google Scholar
  2. 2.
    Diestelhorst S, Pohlack M, Hohmuth M, Christie D, Chung J-W, Yen L (2010) Implementing AMD’s advanced synchronization facility in an out-of-order x86 core. In: Transact-05Google Scholar
  3. 3.
    Dice D, Lev Y, Moir M, Nussbaum D (2009) Early experience with a commercial hardware transactional memory implementation. In: ASPLOS-14Google Scholar
  4. 4.
    The IBM Blue Gene Team (2011) The Blue Gene/Q compute chip. In: Hot Chips 23Google Scholar
  5. 5.
    Kanter D (2012) Analysis of Haswell’s transactional memory. In: Real World Technologies (02–15-2012)Google Scholar
  6. 6.
    Herlihy M, Eliot J, Moss B (1993) Transactional memory: architectural support for lock-free data structures. In: ISCA-20Google Scholar
  7. 7.
    Harris T, Cristal A, Unsal OS, Ayguad E, Gagliardi F, Smith B, Valero M (2007) Transactional memory: an overview. IEEE Micro 27(3):8–29CrossRefGoogle Scholar
  8. 8.
    Ferri C, Wood S, Moreshet T, Bahar RI, Herlihy M (2010) Embedded-TM: energy and complexity-effective hardware transactional memory for embedded multicore systems. J Parallel Distrib Comput (JPDC) 70(10):1042–1052Google Scholar
  9. 9.
    Ferri C, Wood S, Moreshet T, Bahar RI, Herlihy M (2010) Energy and throughput efficient transactional memory for embedded multicore systems. In: HiPEAC, pp 50–65Google Scholar
  10. 10.
    Barroso LA, Hölzle U (2007) The case for energy-proportional computing. Computer 40(12):33–37CrossRefGoogle Scholar
  11. 11.
    Ceze L, Tuck J, Torrellas J, Cascaval C (2006) Bulk disambiguation of speculative threads in multiprocessors. In: ISCA-33Google Scholar
  12. 12.
    Shriraman A, Dwarkadas S, Scott ML (2008) Flexible decoupled transactional memory support. In: ISCA-35Google Scholar
  13. 13.
    Gaona-Ramírez E, Titos-Gil JR, Fernández J, Acacio ME (2013) On the design of energy-efficient hardware transactional memory systems. Concurr Comput Pract Exp 25(6):862–880Google Scholar
  14. 14.
    Yen L, Bobba J, Marty MR, Moore KE, Volos H, Hill MD, Swift MM, Wood DA (2007) LogTM-SE: decoupling hardware transactional memory from caches. In: HPCA-13Google Scholar
  15. 15.
    Minh CC, Chung J, Kozyrakis C, Olukotun K (2008) STAMP: stanford transactional applications for multi-processing. In: IISWC-4Google Scholar
  16. 16.
    Gaona-Ramírez E, Titos-Gil JR, Acacio ME, Fernández J (2012) Dynamic serialization: Improving energy consumption in eager–eager hardware transactional memory systems. In: PDP-20, pp 221–228Google Scholar
  17. 17.
    Moreshet T, Bahar RI, Herlihy M (2006) Energy-aware microprocessor synchronization: transactional memory vs. locks. In: Workshop on memory performance, IssuesGoogle Scholar
  18. 18.
    Martin MMK, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA (2005) Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH CAN 33(4):92–99Google Scholar
  19. 19.
    Kahng AB, Li B, Peh L-S, Samadi K (2009) ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration. In: DATE-13Google Scholar
  20. 20.
    Thoziyoor S, Muralimanohar N, Ahn JH, Jouppi NP (2008) Cacti 5.1. Technical Report HPL-2008–20. HP Laboratories, Palo Alto, CAGoogle Scholar
  21. 21.
    Dragojevic A, Guerraoui R (2010) Predicting the scalability of an STM. In: Transact-05Google Scholar
  22. 22.
    Harris T, Larus J, Rajwar R (2010) Transactional memory, 2nd edn. Morgan & Claypool, San RafaelGoogle Scholar
  23. 23.
    Dice D, Shalev O, Shavit N (2006) Transactional locking II. In: DISC-20Google Scholar
  24. 24.
    Fraser K, Harris TL (2007) Concurrent programming without locks. ACM TOCS 25(2):1–61Google Scholar
  25. 25.
    Marathe VJ, Scherer-III WN, Scott ML (2005) Adaptive software transactional memory. In: DISC-19Google Scholar
  26. 26.
    Herlihy M, Luchangco V, Moir M, Scherer-III WN (2003) Software transactional memory for dynamic-sized data structures. In: PODC-22Google Scholar
  27. 27.
    Saha B, Adl-tabatabai A, Hudson RL, Minh CC, Hertzberg B (2006) McRT-STM: a high performance software transactional memory system for a multi-core runtime. In: PPoPP-11Google Scholar
  28. 28.
    Tomic S, Perfumo C, Kulkarni CE, Armejach A, Cristal A, Unsal OS, Harris T, Valero M (2009) EazyHTM: eager-lazy hardware transactional memory. In: MICRO-42Google Scholar
  29. 29.
    Rajwar R, Herlihy M, Lai KK (2005) Virtualizing transactional memory. In: ISCA-32Google Scholar
  30. 30.
    Damron P, Fedorova A, Lev Y, Luchangco V, Moir M, Nussbaum D (2006) Hybrid transactional memory. In: ASPLOS-XII, pp 336–346Google Scholar
  31. 31.
    Flores A, Aragón JL, Acacio ME (2008) An energy consumption characterization of on-chip interconnection networks for tiled cmp architectures. J Supercomput 45(3):341–364CrossRefGoogle Scholar
  32. 32.
    Lupon M, Magklis G, González A (2010) A dynamically adaptable hardware transactional memory. In: MICRO-43, pp 27–38Google Scholar
  33. 33.
    Negi A, Titos-Gil JR, Acacio ME, García JM, Stenström P (2011) ZEBRA: a data-centric, hybrid-policy hardware transactional memory design. In: ICS-25Google Scholar
  34. 34.
    Negi A, Titos-Gil JR, Acacio ME, García JM, Stenström P (2012) PI-TM: pessimistic invalidation for scalable lazy hardware transactional memory. In: HPCA-18, pp 141–152Google Scholar
  35. 35.
    Titos-Gil JR, Negi A, Acacio ME, García JM, Stenström P (2013) Eager beats lazy: improving store management in eager hardware transactional memory. IEEE Trans Parallel Distrib Syst 24(11):2192–2201CrossRefGoogle Scholar
  36. 36.
    Shriraman A, Dwarkadas S, Scott ML (2010) Implementation tradeoffs in the design of flexible transactional memory support. J Parallel Distrib Comput 70(10):1068–1084CrossRefMATHGoogle Scholar
  37. 37.
    Klein F, Baldassin A, Araujo G, Centoducatte P, Azevedo R (2009) On the energy-efficiency of software transactional memory. In: SBCCI-22Google Scholar
  38. 38.
    Sanyal S, Roy S, Cristal A, Unsal O, Valero M (2009) Clock gate on abort: towards energy-efficient hardware transactional memory. In: HPPAC-2009Google Scholar
  39. 39.
    Chafi H, Casper J, Carlstrom BD, McDonald A, Minh CC, Baek W, Kozyrakis C, Olukotun K (2007) A scalable, non-blocking approach to transactional memory. In: HPCA-13Google Scholar
  40. 40.
    Pugsley SH, Awasthi M, Madan N, Muralimanohar N, Balasubramonian R (2008) Scalable and reliable communication for hardware transactional memory. In: PACT-17Google Scholar
  41. 41.
    Cristal A, Unsal O, Yalcin G, Fetzer C, Wamhoff J-T, Felber P, Harmanci D (2013) A. Sobe, Leveraging transactional memory for energy-efficient computing below safe operation margin. In: TRANSACT-2013Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Epifanio Gaona
    • 1
  • J. Rubén Titos-Gil
    • 2
  • Juan Fernández
    • 3
  • Manuel E. Acacio
    • 1
  1. 1.Universidad de MurciaMurciaSpain
  2. 2.Chalmers University of TechnologyGöteborgSweden
  3. 3.Intel Barcelona Research CenterBarcelonaSpain

Personalised recommendations