Case Studies of Multi-core Energy Efficiency in Task Based Programs

  • Hallgeir Lien
  • Lasse Natvig
  • Abdullah Al Hasib
  • Jan Christian Meyer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7453)

Abstract

In this paper, we present three performance and energy case studies of benchmark applications in the OmpSs environment for task based programming. Different parallel and vectorized implementations are evaluated on an Intel® CoreTMi7-2600 quad-core processor. Using FLOPS/W derived from chip MSR registers, we find AVX code to be clearly most energy efficient in general. The peak on-chip GFLOPS/W rates are: Black-Scholes (BS) 0.89, FFTW 1.38 and Matrix Multiply (MM) 1.97. Experiments cover variable degrees of thread parallelism and different OmpSs task pool scheduling policies. We find that maximum energy efficiency for small and medium sized problems is obtained by limiting the number of parallel threads. Comparison of AVX variants with non-vectorized code shows ≈ 6 − 7 × (BS) and ≈ 3 − 5 × (FFTW) improvements in on-chip energy efficiency, depending on the problem size and degree of multithreading.

Keywords

performance evaluation energy efficiency task based programming 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Mont Blanc project website, http://www.montblanc-project.eu/
  2. 2.
    The Green 500 - Ranking the World’s Most Energy Efficient Supercomputers, http://www.green500.org
  3. 3.
    Perez, J., Badia, R., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: 2008 IEEE Int’l Conf. on Cluster Computing, pp. 142–151 (October 2008)Google Scholar
  4. 4.
    Duran, A., Ayguadé, E., Badia, R.M., Labarta, J., Martinell, L., Martorell, X., Planas, J.: OmpSs: A Proposal for Programming Heterogeneous Multi-core Architetcures. Parallel Processing Letters 21, 173–193 (2011)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Ramirez, A.: European scalable and power efficient HPC platform based on low-power embedded technology. Presentation at the EESI Conference (October 2011), http://www.eesi-project.eu/
  6. 6.
    Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27(12), 3–35 (2001)MATHCrossRefGoogle Scholar
  7. 7.
    Intel, Intel®64 and IA-32 Architectures Optimization Reference Manual (June 2011)Google Scholar
  8. 8.
    Intel, Avoiding AVX-SSE Transition Penalties (November 2011)Google Scholar
  9. 9.
    Rivoire, S., Shah, M., Ranganatban, P., Kozyrakis, C., Meza, J.: Models and metrics to enable energy-efficiency optimizations. Computer 40, 39–48 (2007)CrossRefGoogle Scholar
  10. 10.
    Lien, H.: Case Studies in Multi-core Energy Efficiency of Task Based Programs (preliminary title). Master’s thesis, Norwegian University of Science and Technologoy, Trondheim, Norway (Work in progress, to be submitted July 2012)Google Scholar
  11. 11.
    Intel, Intel®64 and IA-32 Architecture Software Development Manual (December 2011)Google Scholar
  12. 12.
    Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In: Proc. of the 17th Int’l Conf. on Parallel Architectures and Compilation Techniques, PACT 2008, pp. 72–81 (2008)Google Scholar
  13. 13.
    Frigo, M., Johnson, S.: The Design and Implementation of FFTW3. Proceedings of the IEEE 93, 216–231 (2005)CrossRefGoogle Scholar
  14. 14.
    Moshier, S.L.: Cephes Math Library, http://www.netlib.org/cephes
  15. 15.
    Ge, R., Feng, X., Song, S., Chang, H.-C., Li, D., Cameron, K.: Powerpack: Energy profiling and analysis of high-performance systems and applications. IEEE Transactions on Parallel and Distributed Systems 21, 658–671 (2010)CrossRefGoogle Scholar
  16. 16.
    Li, J., Martínez, J.F.: Power-performance considerations of parallel computing on chip multiprocessors. ACM Transactions on Architecture and Code Optimization 2, 397–422 (2005)CrossRefGoogle Scholar
  17. 17.
    Molka, D., Hackenberg, D., Schöne, R., Minartz, T., Nagel, W.: Flexible workload generation for HPC cluster efficiency benchmarking. Computer Science - Research and Development, 1–9Google Scholar
  18. 18.
    Anzt, H., Castillo, M., Fernández, J., Heuveline, V., Igual, F., Mayo, R., Quintana-Ortí, E.: Optimization of power consumption in the iterative solution of sparse linear systems on graphics processors. Computer Science - Research and Development, 1–9Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Hallgeir Lien
    • 1
  • Lasse Natvig
    • 1
  • Abdullah Al Hasib
    • 1
  • Jan Christian Meyer
    • 2
  1. 1.Dept. of Computer and Information Science (IDI)NTNUTrondheimNorway
  2. 2.High Performance Computing Section, IT Dept.NTNUNorway

Personalised recommendations