Case Studies of Multi-core Energy Efficiency in Task Based Programs
In this paper, we present three performance and energy case studies of benchmark applications in the OmpSs environment for task based programming. Different parallel and vectorized implementations are evaluated on an Intel® CoreTMi7-2600 quad-core processor. Using FLOPS/W derived from chip MSR registers, we find AVX code to be clearly most energy efficient in general. The peak on-chip GFLOPS/W rates are: Black-Scholes (BS) 0.89, FFTW 1.38 and Matrix Multiply (MM) 1.97. Experiments cover variable degrees of thread parallelism and different OmpSs task pool scheduling policies. We find that maximum energy efficiency for small and medium sized problems is obtained by limiting the number of parallel threads. Comparison of AVX variants with non-vectorized code shows ≈ 6 − 7 × (BS) and ≈ 3 − 5 × (FFTW) improvements in on-chip energy efficiency, depending on the problem size and degree of multithreading.
Keywordsperformance evaluation energy efficiency task based programming
Unable to display preview. Download preview PDF.
- 1.Mont Blanc project website, http://www.montblanc-project.eu/
- 2.The Green 500 - Ranking the World’s Most Energy Efficient Supercomputers, http://www.green500.org
- 3.Perez, J., Badia, R., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: 2008 IEEE Int’l Conf. on Cluster Computing, pp. 142–151 (October 2008)Google Scholar
- 5.Ramirez, A.: European scalable and power efficient HPC platform based on low-power embedded technology. Presentation at the EESI Conference (October 2011), http://www.eesi-project.eu/
- 7.Intel, Intel®64 and IA-32 Architectures Optimization Reference Manual (June 2011)Google Scholar
- 8.Intel, Avoiding AVX-SSE Transition Penalties (November 2011)Google Scholar
- 10.Lien, H.: Case Studies in Multi-core Energy Efficiency of Task Based Programs (preliminary title). Master’s thesis, Norwegian University of Science and Technologoy, Trondheim, Norway (Work in progress, to be submitted July 2012)Google Scholar
- 11.Intel, Intel®64 and IA-32 Architecture Software Development Manual (December 2011)Google Scholar
- 12.Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In: Proc. of the 17th Int’l Conf. on Parallel Architectures and Compilation Techniques, PACT 2008, pp. 72–81 (2008)Google Scholar
- 14.Moshier, S.L.: Cephes Math Library, http://www.netlib.org/cephes
- 17.Molka, D., Hackenberg, D., Schöne, R., Minartz, T., Nagel, W.: Flexible workload generation for HPC cluster efficiency benchmarking. Computer Science - Research and Development, 1–9Google Scholar
- 18.Anzt, H., Castillo, M., Fernández, J., Heuveline, V., Igual, F., Mayo, R., Quintana-Ortí, E.: Optimization of power consumption in the iterative solution of sparse linear systems on graphics processors. Computer Science - Research and Development, 1–9Google Scholar