Journal of Real-Time Image Processing

, Volume 14, Issue 3, pp 713–728 | Cite as

An extended analysis of memory hierarchies for efficient implementations of image processing applications

  • Christian HartmannEmail author
  • Dietmar Fey
Special Issue Paper


Through continued miniaturization of electronic devices embedded smart cameras are steadily becoming more and more important. The reduction of the camera size increases the spectrum of applications. In industrial applications the range of smart cameras spans from quality monitoring and position tracking to the calibration of production machines. In non-professional applications a distinct boom in action cameras combined with fused sensor information can be observed. However, all of these applications have a common bottleneck: the memory architecture. Most image processing applications are memory-bound tasks. Thus, the amount of time for transferring data with image processing applications decisively affects the application’s entire processing time. Different memory access patterns require different memory configurations and hierarchies. An insufficient match between the image processing application and the memory architecture leads to a poor performance in the image processing system. This can lead to longer processing times, and larger energy consumption rates. This work introduces new methods of classifying image processing applications by using their memory access pattern for mapping on memory architectures. Our work combines a simulation framework the heterogenous memory simulator with a analytical framework the memory analyzer to find bottlenecks inside the image processing application and aids in finding a suitable, application-specific memory configuration in terms of processing time and energy consumption.


Image processing Memory Cache Energy analysis Performance analysis Data locality 



This work is supported by the Bavarian Research Foundation (BFS) as part of their research project “FORMUS3IC”.


  1. 1.
  2. 2.
    Bailey, D.: Design for Embedded Image Processing on FPGAs. Wiley, New York (2011)CrossRefGoogle Scholar
  3. 3.
    Binkert, N., Beckmann, B., Black, G., Reinhardt, S., Saidi, A., Basu, A., Hestness, J., Hower, D., Krishna, T., Sardashti, S., Sen, R., Sewel, K., Shoaib, M., Vaish, N., Hill, M., Wood, D.: The gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1–7 (2011)CrossRefGoogle Scholar
  4. 4.
    Burger, W., Burge, M.: Principles of Digital Image Processing. Springer, London (2009)zbMATHGoogle Scholar
  5. 5.
    Das, S., Aamodt, T.M., Dally, W.J.: Reuse distance-based probabilistic cache replacement. Trans. Archit. Code Optim. 12(4), 33:1–33:22 (2015)Google Scholar
  6. 6.
    Eeckhout, L.: Computer Architecture Performance Evaluation Methods. Morgan and Claypool, Wisconsin (2010)Google Scholar
  7. 7.
    Gonzalez, R., Woods, R.: Digital Image Processing. Person Education Ltd., London (2008)Google Scholar
  8. 8.
    GPGPU-Sim. (2017)
  9. 9.
    Hartmann, C., Reichenbach, M., Fey, D.: Ipol—a domain specific language for image processing applications. In: Proceedings of the International Symposium on International Conference on Systems, pp. 40–43. Barcelona, Spain, IARIA (2015)Google Scholar
  10. 10.
    Hartmann, C., Häublein, K., Reichenbach, M., Fey, D.: Ipas: a design framework for analysis, synthesis and optimization of image processing applications for heterogenous computing architectures. J. Real Time Image Process. 11, 1–16 (2016). doi: 10.1007/s11554-016-0587-x CrossRefGoogle Scholar
  11. 11.
    Herglotz, C., Seiler, J., Kaup, A., Hendricks, A., Reichenbach, M., Fey, D.: Estimation of non-functional properties for embedded hardware with application to image processing. In: Proceedings of the International Parallel and Distributed Processing Symposium Workshop, pp. 190–195. Hyderabad, Malay, IEEE (2015)Google Scholar
  12. 12.
  13. 13.
    Imperas. (2016)
  14. 14.
    Intel. (2016)
  15. 15.
  16. 16.
    Naji, O., Hansson, A., Weis, C., Jung, M., Wehn, N.: A high-level dram timing, power and area exploration tool. In: International Conference on Embedded Computer Systems Architectures Modeling and Simulation, pp. 149–156. IEEE (2015)Google Scholar
  17. 17.
    Nugteren, C., van den Braak, G.-J., Corporaal, H., Bal, H.: A detailed gpu cache model based on reuse distance theory. In: Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), pp. 37–48. IEEE (2014)Google Scholar
  18. 18.
    Pan, X., Jonsson, B.: A modeling framework for reuse distance-based estimation of cache performance. In: Performance Analysis of Systems and Software (ISPASS), pp. 62–71. Philadelphia, USA, IEEE (2015)Google Scholar
  19. 19.
    Pelcat, M., Desnos, K., Heulot, J., Guy, C., Nezan, J-F., Aridhi, S.: Preesm: a dataflow-based rapid prototyping framework for simplifying multicore dsp programming. In: European Embedded Design in Education and Research Conference, pp. 30–40. Milano, Italy, IEEE (2014)Google Scholar
  20. 20.
    Schmidt, M., Reichenbach, M., Fey, D.: Traffic sign recognition with color-based method, shape-arc estimation and svm. In: International Conference on Electrical Engineering and Informatics (ICEEI), pp. 1–6. IEEE (2011)Google Scholar
  21. 21.
    Schmidt, M., Reichenbach, M., Fey, D.: A generic vhdl template for 2d stencil code applications on fpgas. In: International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops (ISORCW), pp. 180–187. IEEE (2012)Google Scholar
  22. 22.
    Xu, C., Chen, X., Dick, R., Mao, Z.: Cache contention and application performance prediction for multi-core systems. In: Performance Analysis of Systems and Software (ISPASS), pp. 76–86. White Plains, USA, IEEE (2010)Google Scholar
  23. 23.

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  1. 1.Chair of Computer ArchitectureUniversity of Erlangen-NurembergErlangenGermany

Personalised recommendations