Multicore vs Manycore: The Energy Cost of Concurrency

  • Martin Groen
  • Vincent Gramoli
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9833)


In this paper, we study the relation between performance and energy in concurrent programs. As energy efficiency became a key challenge of the computing industry, it is crucial to seek solutions that achieve high performance at a reasonable carbon footprint. We show, however, that energy is dramatically impacted by concurrency and it remains difficult to predict the energy consumed even when the application and the thermal design power are given, due to the number of threads running or their level of contention.

To this end, we evaluated concurrent algorithms on a 2.1 GHz multicore and a 1.2 GHz manycore platforms. Our results show that even though the throughput on manycore is lower than the throughput on multicore, we could not find a single concurrent algorithm where the multicore offers consistently a higher performance per watt than the manycore. More importantly, we identified some benchmarks on which the manycore offers up to \(4.3{\times }\) more operations per second per watt than the multicore.


Power consumption Energy Manycore Concurrency 



This research was supported under Australian Research Council’s Discovery Projects funding scheme (project number 160104801) entitled “Data Structures for Multi-Core”. Vincent Gramoli is the recipient of the Australian Research Council Discovery International Award.


  1. 1.
    Esmaeilzadeh, H., Blem, E., St. Amant, R., Sankaralingam, K., Burger, D.: Power challenges may end the multicore era. Commun. ACM 56(2), 93–102 (2013)CrossRefGoogle Scholar
  2. 2.
    Dennard, R.H., Rideout, V., Bassous, E., Leblanc, A.: Design of ion-implanted mosfet’s with very small physical dimensions. IEEE J. Solid-State Circ. 9(5), 256–268 (1974)CrossRefGoogle Scholar
  3. 3.
    Esmaeilzadeh, H., Blem, E. St. Amant, R., Sankaralingam, K., Burger, D.: Dark silicon and the end of multicore scaling. In: ISCA, pp. 365–376, June 2011Google Scholar
  4. 4.
    Borkar, S., Chien, A.A.: The future of microprocessors. Commun. ACM 54(5), 67–77 (2011)CrossRefGoogle Scholar
  5. 5.
    Borkar, S.: Thousand core chips: a technology perspective. In: DAC, pp. 746–749 (2007)Google Scholar
  6. 6.
    Andersen, D.G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., Vasudevan, V.: Fawn: A fast array of wimpy nodes. In: SOSP, pp. 1–14 (2009)Google Scholar
  7. 7.
    Berezecki, M., Frachtenberg, E., Paleczny, M., Steele, K.: Many-core key-value store. In: IGCC, pp. 1–8, July 2011Google Scholar
  8. 8.
    Gramoli, V.: More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms. In: PPoPP, pp. 1–10 (2015)Google Scholar
  9. 9.
    Intel: Measuring power processor: TDP vs. ACP Intel White paper (2011)Google Scholar
  10. 10.
    Barroso, L.A., Holzle, U.: The case for energy-proportional computing. Computer 40(12), 33–37 (2007)CrossRefGoogle Scholar
  11. 11.
    Rotem, E., Naveh, A., Rajwan, D., Ananthakrishnan, A., Weissmann, E.: Power management architecture of the 2nd generation intel core microarchitecture, formerly codenamed sandy bridge. In: HotChips (2011)Google Scholar
  12. 12.
    Tilera: UG410 - TILExtreme-Gx Platform User’s Guide Release 1.1 Doc. N. UG410, May 2013Google Scholar
  13. 13.
    Herlihy, M.P., Lev, Y., Luchangco, V., Shavit, N.N.: A simple optimistic skiplist algorithm. In: Prencipe, G., Zaks, S. (eds.) SIROCCO 2007. LNCS, vol. 4474, pp. 124–138. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  14. 14.
    Dick, I., Fekete, A., Gramoli, V.: A skip list for multicore. Concurrency and Computation, Practice and Experience (2016)Google Scholar
  15. 15.
    Felber, P., Gramoli, V., Guerraoui, R.: Elastic transactions. In: Keidar, I. (ed.) DISC 2009. LNCS, vol. 5805, pp. 93–107. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  16. 16.
    Fraser, K.: Practical lock-freedom. Ph.D. Thesis, University of Cambridge (2004)Google Scholar
  17. 17.
    Crain, T., Gramoli, V., Raynal, M.: No hot spot non-blocking skip list. In: ICDCS, pp. 196–205 (2013)Google Scholar
  18. 18.
    Hellor, S., Herlihy, M., Luchangco, V., Moir, M., Scherer, W.N., Shavit, N.: A lazy concurrent list-based set algorithm. Parallel Process. Lett. 17(4), 411–424 (2007)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Harris, T.L.: A pragmatic implementation of non-blocking linked-lists. In: Welch, J.L. (ed.) DISC 2001. LNCS, vol. 2180, pp. 300–314. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  20. 20.
    Gramoli, V., Guerraoui, R.: Reusable concurrent data types. In: Jones, R. (ed.) ECOOP 2014. LNCS, vol. 8586, pp. 182–206. Springer, Heidelberg (2014)Google Scholar
  21. 21.
    Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco (2008)Google Scholar
  22. 22.
    Michael, M.M.: High performance dynamic lock-free hash tables and list-based sets. In: SPAA, pp. 73–82. ACM, New York (2002)Google Scholar
  23. 23.
    Crain, T., Gramoli, V., Raynal, M.: A contention-friendly binary search tree. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 229–240. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  24. 24.
    Minh, C.C., Chung, J., Kozyrakis, C., Olukotun, K.: Stamp: Stanford transactional applications for multi-processing. In: IISWC, pp. 35–46. IEEE (2008)Google Scholar
  25. 25.
    Felber, P., Gramoli, V., Guerraoui, R.: Elastic transactions. In: Keidar, I. (ed.) DISC 2009. LNCS, vol. 5805, pp. 93–107. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  26. 26.
    Intel: Intel 64 and IA-32 architectures software developers manual - vol. 2A: Instruction set reference, A–M (2007)Google Scholar
  27. 27.
    Crain, T., Gramoli, V., Raynal, M.: A speculation-friendly binary search tree. In: PPoPP, pp. 161–170 (2012)Google Scholar
  28. 28.
    Esmaeilzadeh, H., Cao, T., Yang, X., Blackburn, S.M., McKinley, K.S.: Looking back and looking forward: Power, performance, and upheaval. Commun. ACM 55(7), 105–114 (2012)CrossRefGoogle Scholar
  29. 29.
    Choi, J.W., Vuduc, R.W.: How much (execution) time and energy does my algorithm cost? XRDS 19(3), 49–51 (2013)CrossRefGoogle Scholar
  30. 30.
    Demmel, J., Gearhart, A., Lipshitz, B., Schwartz, O.: Perfect strong scaling using no additional energy. In: IPDPS, pp. 649–660 (2013)Google Scholar
  31. 31.
    Haider, S.K., Hasenplaugh, W., Alistarh, D.: Lease/release: architectural support for scaling contended data structures. In: PPoPP (2016)Google Scholar
  32. 32.
    Ramapantulu, L., Loghin, D., Teo, Y.M.: An approach for energy efficient execution of hybrid parallel programs. In: IPDPS, pp. 1000–1009, May 2015Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.University of SydneySydneyAustralia

Personalised recommendations