Performance and Energy Efficiency Analysis of Data Reuse Transformation Methodology on Multicore Processor

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7640)


Memory latency and energy efficiency are two key constraints to high performance computing systems. Data reuse transformations aim at reducing memory latency by exploiting temporal locality in data accesses. Simultaneously, modern multicore processors provide the opportunity of improving performance with reduced energy dissipation through parallelization. In this paper, we investigate to what extent data reuse transformations in combination with a parallel programming model in a multicore processor can meet the challenges of memory latency and energy efficiency constraints. As a test case, a “full-search motion estimation” kernel is run on the Intel® CoreTM i7-2600 processor. Energy Delay Product (EDP) is used as a metric to compare energy efficiencies. Achieved results show that performance and energy efficiency can be improved by a factor of more than 6 and 15, respectively, by exploiting a data reuse transformation methodology and parallel programming model in a multicore system.


Performance energy efficiency data reuse transformation methodology parallel programming 


  1. 1.
    Albers, S.: Energy-Efficient Algorithms. Communications of the ACM 53(5), 86–96 (2011)CrossRefGoogle Scholar
  2. 2.
    Wuytack, S., Diguet, J.P., Catthoor, F., et al.: Formalized Methodology for Data Reuse Exploration for Low-Power Hierarchical Memory Mappings. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 6(4), 529–537 (1998)CrossRefGoogle Scholar
  3. 3.
    Catthoor, F., Danckaert, K., Kulkarni, K., et al.: Data Access and Storage Management for Embedded Programmable Processors. Kluwer Academic Publishers, Dordrecht (2002)CrossRefGoogle Scholar
  4. 4.
    Catthoor, F., Wuytack, S., de Greef, G., et al.: Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design. Kluwer Academic Publishers, Norwell (1998)zbMATHCrossRefGoogle Scholar
  5. 5.
    Zervas, N.D., Masselos, K., Goutis, C.E.: Data-Reuse Exploration for Low-Power Realization of Multimedia Applications on Embedded Cores. In: Proc. 9th International Workshop on Power and Timing Modeling, Optimization and Simulation, PATMOS 1999, pp. 71–80 (1999)Google Scholar
  6. 6.
    Chatzigeorgiou, A., Chatzigeorgiou, E., Kougia, S., et al.: Evaluating the Effect of Data-Reuse Transformations on Processor Power Consumption (2001)Google Scholar
  7. 7.
    Vassiliadis, N., Chormoviti, A., Kavvadias, N., et al.: The Effect of Data-Reuse Transformations on Multimedia Applications for Application Specific Processors. In: Proc. Intelligent Data Acquisition and Advanced Computing Systems Technology and Applications, IDAACS 2005, pp. 179–182 (September 2005)Google Scholar
  8. 8.
    Kalva, H., Colic, A., Garcia, A., et al.: Parallel Programming for Multimedia Applications. Multimedia Tools and Applications 51(2), 801–818 (2011)CrossRefGoogle Scholar
  9. 9.
    Chen, L., Hu, Z., Lin, J., et al.: Optimizing the Fast Fourier Transform on a Multi-core Architectures. In: Proc. Parallel and Distributed Processing Symposium, IPDPS 2007, pp. 1–8 (March 2007)Google Scholar
  10. 10.
    Zhang, Y., Kandemir, M., Yemliha, T.: Studying Inter-core Data Reuse in Multicores. In: Proc. ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2011, pp. 25–36 (2011)Google Scholar
  11. 11.
    Marchal, P., Catthoor, F., Bruni, D., et al.: Integrated Task Scheduling and Data Assignment for SDRAMs in Dynamic Applications. IEEE Design & Test of Computers 21(5), 378–387 (2004)CrossRefGoogle Scholar
  12. 12.
    Podobas, A., Brorsson, M., Faxén, K.F.: A Comparison of some recent Task-based Parallel Programming Models. In: Proc. 3rd Workshop on Programmability Issues for Multi-Core Computers, Pisa, Italy (January 2010)Google Scholar
  13. 13.
    OpenMP Architecture Review Board: OpenMP Application Program Interface (July 2011),
  14. 14.
    Komarek, T., Pirsch, P.: Array Architectures for Block Matching Algorithms. IEEE Transactions on Circuits and Systems 36(10), 1301–1308 (1989)CrossRefGoogle Scholar
  15. 15.
    Intel: Intel 64 and IA-32 Architectures Software Developer’s Manual (2011)Google Scholar
  16. 16.
    Rivoire, S., Shah, M.A., Ranganathan, P., et al.: Models and Metrics to Enable Energy-Efficiency Optimizations. Computer 40(12), 39–48 (2007)CrossRefGoogle Scholar
  17. 17.
    Flautner, K., Kim, N.S., Martin, S., et al.: Drowsy Caches: Simple Techniques for Reducing Leakage Power. In: Proc. 29th Annual International Symposium on Computer Architecture, ISCA 2002, Washington, DC, USA, pp. 148–157 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Department of Computer and Information ScienceNorwegian University of Science and TechnologyTrondheimNorway
  2. 2.Department of Electronics and TelecommunicationsNorwegian University of Science and TechnologyTrondheimNorway

Personalised recommendations