Journal of Signal Processing Systems

, Volume 80, Issue 3, pp 295–307 | Cite as

Performance and Energy Evaluation of Different Multi-Threading Interfaces in Embedded and General Purpose Systems

  • Arthur Francisco LorenzonEmail author
  • Márcia Cristina Cera
  • Antonio Carlos Schneider Beck


In current systems, while it is necessary to exploit the availability of multiple cores, it is also mandatory to consume less energy. To speed up the development process and make it as transparent as possible to the programmer, parallelism is exploited through the use of Application Programming Interfaces (API). However, each one of these API implements different ways to exchange data using shared memory regions, and by consequence, they have different levels of energy consumption. In this paper, considering general purpose and embedded systems, we show how each API influences the performance, energy consumption and Energy-Delay Product. For example, Pthreads consumes 12 % less energy on average than OpenMP and MPI considering all benchmarks. We also demonstrate that the difference in Energy-Delay Product (EDP) among the APIs can be of up to 81 %, while the level of efficiency (e.g.: performance or energy consumption per core) changes as the number of threads increases, depending on whether the system is embedded or general purpose.


Embedded systems Parallel programming Performance Energy efficiency evaluation 


  1. 1.
    Cheney, W., & Kincaid, D. (2009). Linear Algebra: Theory and Applications. Sudbury (Pp. 544–558).Google Scholar
  2. 2.
    Korthikanti, V.A.,& Agha, G. (2010). “Towards optimizing energy costs of algorithms for shared memory architectures”. Proceedings of the 22nd ACM SPAA (pp. 157–165).Google Scholar
  3. 3.
    Ji, J., Wang, C., Zhou, X. (2008). “System-Level early power estimation for memory subsystem in embedded systems.” Fifth IEEE International Symposium on Embedded Computing (pp. 370–375).Google Scholar
  4. 4.
    Suleman, M.A., Qureshi, M.K., & Patt, Y.N. (2008). “Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs”. ASPLOS XIII (pp. 277–286).Google Scholar
  5. 5.
    Chen, J., Dong, Y., Yang, X., Wang, P. (2008). “Energy-Constrained OpenMP static loop scheduling”. In High Perform. Comput. and Communications (pp. 139–146).Google Scholar
  6. 6.
    Balladini,J., Suppi, R., Rexachs, D., Luque, E. (2011). “Impact of parallel programming models and CPUs clock frequency on energy consumption of HPC systems.” AICCSA ’11, IEEE (pp. 16–21).Google Scholar
  7. 7.
    Berlin,K., Huan, J., Jacob, M., Rochhar, G., Prins, J., Pugh, B., Sadayappan, P., Spacco, J., Tseng, C. (2003). “Evaluating of programming language features on the performance of parallel applications on cluster architectures. “In proc. LCPC 2003 (pp. 194–208).Google Scholar
  8. 8.
    Adve, V.S., Vernon, M.K. (1998). “A deterministic model for parallel program evaluate performance evaluation”. Techreport in Rice University and University of Wisconsin-Madison.Google Scholar
  9. 9.
    Lee, K.M., Song, T.H., Yoon, S-H., Kwon, K-H., Jeon, J-W. (2011). “OpenMP parallel programming using dual-core embedded system,” In 11th ICCAS.Google Scholar
  10. 10.
    Hanawa, T., Sato, M., Lee, J., Imada, T., Kimura, H., & Boku, T. (2009). Evaluation of multicore processors for embedded systems by parallel benchmark program using openmp”. Lecture Notes in Computer Science, 5568, 15–27. Springer.CrossRefGoogle Scholar
  11. 11.
    Chapman, B., Jost, G., Van Der Pas, R. (2008). “Using OpenMP: portable shared memory parallel programming”, The MIT Press.Google Scholar
  12. 12.
    Rauber,T., Runger, G. (2010). “Parallel Programming - for Multicore and Cluster Systems”. [S.l.]: Springer.Google Scholar
  13. 13.
    Butenhof, D. R. (1997). Programming with POSIX threads. Boston: Addison-Wesley Longman Publishing Co., Inc.Google Scholar
  14. 14.
    Tanenbaum, A.S., & Woodhul, A.S. (2009). “Operating Systems: design and implementation”, Prentice-Hall.Google Scholar
  15. 15.
    Gropp,W. et. Al. (1998). “MPI- The complete reference”. Cambridge. MA, MIT Press.Google Scholar
  16. 16.
    Gao,C., Gutierrez, A., Dreslinski, R.G., Mudge, T., Flautner, K., Blake, G. (2014). “A study of thread level parallelism om mobile devices”. In IEEE ISPASS. (pp. 126–127).Google Scholar
  17. 17.
    Gardner, M. (1970). “Mathematical games – the fantastic combinations of john conway’s new solitaire game, life”, Scientific American, (pp 120–123).Google Scholar
  18. 18.
    Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P. (2007). “Numerical recipes 3rd edition: The art of scientific computing”. Cambridge University Press.Google Scholar
  19. 19.
    Oliveira, A.B.. & Scharcanski, J. (2010). “Vehicle Couting and Trajectory Detection Based on Particle Filtering”. In XXIII SIBGRAPI.Google Scholar
  20. 20.
    Aherne, F., Thacker, N., & Rockett, P. (1998). The bhattacharya metric as an absolute similarity measure for frequency coded data. Kybernetica, 34(4), 363–368.zbMATHMathSciNetGoogle Scholar
  21. 21.
    Blake, G., Dreslinski, R.G., Mudge, T., Flautner, K. (2010). “Evolution of thread-level parallelism in desktop applications”. In Proceedings of the 37th annual international symposium on computer architecture.Google Scholar
  22. 22.
    Dixon, S. L., Steele, K. L., & Burton, R. P. (1996). Generation and graphical analysis of Mandelbrot and Julia Sets in more than four dimensions”. Computers and Graphics, 20, 451–456.CrossRefGoogle Scholar
  23. 23.
    Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1, 269–271.CrossRefzbMATHMathSciNetGoogle Scholar
  24. 24.
    Browne, S., Dongarra, J., Graner, N., Ho, G., & Mucci, P. (2000). A portable programming interface for performance evaluation on modern processors. International Journal High Performance Computer Applications, 14, 189–204.CrossRefGoogle Scholar
  25. 25.
    CACTI. Retrieved September 2013 from:
  26. 26.
    Blem, E., Menon, J., Sankaralingam, K. (2013). “A detailed Analysis of the Contemporary ARM and x86 Architectures”, UW-Madison Technical Report.Google Scholar
  27. 27.
    Andrews, G.E., Askey, R., Roy, R. (1999). “Special Functions”, Cambridge University Press.Google Scholar
  28. 28.
  29. 29.
  30. 30.
    Foster, I.T. (1995). “Designing and Building Parallel Programs – Concepts and Tools for Parallel Software Engineering” Addison-Wesley Press.Google Scholar
  31. 31.
    Tristam, W., Bradshaw, K. (2010). “Investigating the Performance and Code Characteristics of Three Parallel Programming Models for C++”. In SATNAC.Google Scholar
  32. 32.
    Kuhn, B., Petersen, P., & O’toole, E. (2000). Open-MP versus Threading in C/C++. Concurrency: Practice Experimental. doi: 10.1002/1096-9128(200010)12:12.Google Scholar
  33. 33.
    Ajkunic, E., Fatkic, H., Omerovic, E., Talic, K., Nosovic. N. (2012). “A comparison of Five Parallel Programming Models for C++”. In Proc. Of the 35th International Convention MIPRO. (pp. 1780–1784).Google Scholar
  34. 34.
    Patel, I., Gilbert, J.R. (2008). “An Empirical Study of the Performance and Productivity of Two Parallel Programming Models”. In Proc. of the IPDPS.Google Scholar
  35. 35.
    Wilson, G.V., Bal, H.E. (1996). “Using the Cowichan Problems to Assess the Usability of Orca”. In IEEE PDTSA.Google Scholar
  36. 36.
    Gropp, W., Lusk, E., & Thakur, R. (1999). Using MPI-2: Advanced Features of the Message Passing Interface. Cambridge: MIT Press.37. Beck, A.C.S., Lisboa, C.A. and Carro, L. (2012). Adaptable Embedded Systems. Springer-Verlag.Google Scholar
  37. 37.
    Beck, A.C.S., Lisboa, C.A. and Carro, L. (2012). Adaptable Embedded Systems. Springer-Verlag.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Arthur Francisco Lorenzon
    • 1
    Email author
  • Márcia Cristina Cera
    • 2
  • Antonio Carlos Schneider Beck
    • 1
  1. 1.Informatics InstituteFederal University of Rio Grande do SulPorto AlegreBrazil
  2. 2.Federal University of PampaAlegreteBrazil

Personalised recommendations