Journal of Real-Time Image Processing

, Volume 14, Issue 3, pp 565–583 | Cite as

A novel global methodology to analyze the embeddability of real-time image processing algorithms

  • Romain Saussard
  • Boubker Bouzid
  • Marius Vasiliu
  • Roger Reynaud
Special Issue Paper


Advanced driver assistance systems applications increasingly use cameras and image processing algorithms. To embed and achieve real-time execution of these algorithms, semiconductor companies propose heterogeneous systems-on-chip (SoCs). Embedding algorithms on this type of hardware is not trivial: One needs to determine how to partition the computational load on the different processing units. In addition, it is not easy to predict whether a given algorithm can be executed on a given heterogeneous SoC while meeting real-time constraints. We propose a novel global methodology to assist with embedding image processing algorithms on heterogeneous SoC while meeting real-time constraints (using a soft real-time analysis). Our approach proposes several heuristics predicting delays and execution times and is based on a set of multi-level test vectors which extract key features of heterogeneous architectures.


Embedded heterogeneous architectures Kernel mapping Performance and real-time constraints analysis ADAS GPU 


  1. 1.
    Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the Spring Joint Computer Conference, pp. 483–485. ACM, (1967)Google Scholar
  2. 2.
    Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, SW, et al.: The Landscape of Parallel Computing Research: a View from Berkeley. Technical report, UCB/EECS-2006-183, EECS Department, University of California, Berkeley (2006)Google Scholar
  3. 3.
    Benoit, N., Louise, S.: A performance prediction for automatic placement of heterogeneous workloads on many-cores. In: 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), pp. 159–166. (2015)Google Scholar
  4. 4.
    Castaño-Díez, D., Moser, D., Schoenegger, A., Pruggnaller, S., Frangakis, A.S.: Performance evaluation of image processing algorithms on the GPU. J. Struct. Biol. 164(1), 153–160 (2008)CrossRefGoogle Scholar
  5. 5.
    Castrillon, J., Leupers, R., Ascheid, G.: Maps: mapping concurrent dataflow applications to heterogeneous MPSoCs. IEEE Trans. Ind. Inform. 9(1), 527–545 (2013)CrossRefGoogle Scholar
  6. 6.
    Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.H., Skadron, K.: Rodinia: a benchmark suite for heterogeneous computing. In: 2009 IEEE International Symposium on Workload Characterization (IISWC), pp. 44–54. IEEE, (2009)Google Scholar
  7. 7.
    Chitnis, K., Staszewski, R., Agarwal, G.: TI Vision SDK, Optimized Vision Libraries for ADAS Systems. Technical report, Texas Instrument (2014)Google Scholar
  8. 8.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Conference on Computer Vision and Pattern Recognition, vol.1, pp. 886–893. IEEE, (2005)Google Scholar
  9. 9.
    Danalis, A., Marin, G., McCurdy, C., Meredith, J.S., Roth, P.C., Spafford, K., Tipparaju, V., Vetter, J.S.: The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pp. 63–74. ACM, (2010)Google Scholar
  10. 10.
    Dawood, H.: Theories of Interval Arithmetic: Mathematical Foundations and Applications. LAP Lambert Academic Publishing, Saarbrücken (2011)Google Scholar
  11. 11.
    Everitt, B.: Cambridge dictionary of statistics. Cambridge University Press, Cambridge (1998)zbMATHGoogle Scholar
  12. 12.
    Gal-On, S., Levy, M.: Exploring coremark a benchmark maximizing simplicity and efficacy. Web ressource. (2009)
  13. 13.
    García, J.D., Sotomayor, R., Fernández, J., Sánchez, L.M.: Static partitioning and mapping of kernel-based applications over modern heterogeneous architectures. Simul. Model. Pract. Theory 58, 79–94 (2015)CrossRefGoogle Scholar
  14. 14.
    Geronimo, D., Lopez, A.M., Sappa, A.D., Graf, T.: Survey of pedestrian detection for advanced driver assistance systems. IEEE Trans. Pattern Anal. Mach. Intell. 32(7), 1239–1258 (2010)CrossRefGoogle Scholar
  15. 15.
    Henning, J.L.: Spec cpu2006 benchmark descriptions. ACM SIGARCH Comput. Archit. News 34(4), 1–17 (2006)CrossRefGoogle Scholar
  16. 16.
    Hillel, A.B., Lerner, R., Levi, D., Raz, G.: Recent progress in road and lane detection: a survey. Mach. Vis. Appl. 25(3), 727–745 (2014)CrossRefGoogle Scholar
  17. 17.
    Hong, S., Kim, H.: An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. ACM SIGARCH Comput. Archit. News 37(3), 152–163 (2009)CrossRefGoogle Scholar
  18. 18.
    Hoste, K., Phansalkar, A., Eeckhout, L., Georges, A., John, L.K., De Bosschere, K.: Performance prediction based on inherent program similarity. In: Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, pp. 114–122. ACM, (2006)Google Scholar
  19. 19.
    Kerr, A., Diamos, G., Yalamanchili, S.: Modeling GPU-CPU workloads and systems. In: Proceedings of the 3rd Workshop on General-Purpose Computation on GPU, pp. 31–42. ACM, (2010)Google Scholar
  20. 20.
    Lopez-Novoa, U., Mendiburu, A., Miguel-Alonso, J.: A survey of performance modeling and simulation techniques for accelerator-based computing. IEEE Trans. Parallel Distrib. Syst. 26(1), 272–281 (2015)CrossRefGoogle Scholar
  21. 21.
    Manolache, S., Eles, P., Peng, Z.: Task mapping and priority assignment for soft real-time applications under deadline miss ratio constraints. ACM Trans. Embed. Comput. Syst. (TECS) 7(2), 19 (2008)Google Scholar
  22. 22.
    McCalpin, J.D.: STREAM: Sustainable memory bandwidth in high performance computers. (1995). Accessed 7 Apr 2017
  23. 23.
    Mucci, P.: Llcbench-low level architectural characterization benchmark suite. Web ressource. (2009)
  24. 24.
    Nugteren, C., Corporaal, H.: A modular and parameterisable classification of algorithms. Eindhoven University of Technology, Technical report ESR-2011-02 (2011)Google Scholar
  25. 25.
    Nugteren, C., Corporaal, H.: The boat hull model: adapting the roofline model to enable performance prediction for parallel computing. ACM Sigplan Not. 47(8), 291–292 (2012)CrossRefGoogle Scholar
  26. 26.
    Nvidia, CUDA C programming guide. (2015). Accessed 7 Apr 2017
  27. 27.
    Rainey, E., Villarreal, J., Dedeoglu, G., Pulli, K., Lepley, T., Brill, F.: Addressing system-level optimization with OpenVX graphs. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, (2014)Google Scholar
  28. 28.
    Sankaran, J., Zoran, N.: TDA2X, a SoC optimized for advanced driver assistance systems. In: 2014 IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2204–2208. IEEE, (2014)Google Scholar
  29. 29.
    Saussard, R., Bouzid, B., Vasiliu, M., Reynaud, R.: The embeddability of lane detection algorithms on heterogeneous architectures. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 4694–4697. IEEE, (2015)Google Scholar
  30. 30.
    Saussard, R., Bouzid, B., Vasiliu, M., Reynaud, R.: Optimal performance prediction of ADAS algorithms on embedded parallel architectures. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), pp. 213–218. IEEE, (2015)Google Scholar
  31. 31.
    Saussard, R., Bouzid, B., Vasiliu, M., Reynaud, R.: Towards an automatic prediction of image processing algorithms performances on embedded heterogeneous architectures. In: 2015 44th International Conference on Parallel Processing Workshops (ICPPW), pp. 27–36. IEEE, (2015)Google Scholar
  32. 32.
    Saussard, R., Bouzid, B., Vasiliu, M., Reynaud, R.: A robust methodology for performance analysis on hybrid embedded multicore architectures. In: IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16). IEEE, (2016)Google Scholar
  33. 33.
    Shen, J., Varbanescu, A.L., Sips, H.: Look before you leap: using the right hardware resources to accelerate applications. In: 2014 IEEE International Conference on High Performance Computing and Communications (HPCC), pp. 383–391. IEEE, (2014)Google Scholar
  34. 34.
    Singh, A.K., Shafique, M., Kumar, A., Henkel, J.: Mapping on multi/many-core systems: survey of current and emerging trends. In: Proceedings of the 50th Annual Design Automation Conference, pp. 1–10. ACM, (2013)Google Scholar
  35. 35.
    Sivaraman, S., Trivedi, M.M.: Integrated lane and vehicle detection, localization, and tracking: a synergistic approach. IEEE Trans. Intell. Transp. Syst. 14(2), 906–917 (2013)CrossRefGoogle Scholar
  36. 36.
    Stratton, J.A., Rodrigues, C., Sung, I.J., Obeid, N., Chang, L.W., Anssari, N., Liu, G.D., Hwu, W.M.W.: Parboil: Revised benchmark suite for scientific and commercial throughput computing. Tech. Rep. IMPACT-12-01 (2012)Google Scholar
  37. 37.
    Ubal, R., Jang, B., Mistry, P., Schaa, D., Kaeli, D.: Multi2sim: a simulation framework for CPU-GPU computing. In: Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, pp. 335–344. ACM, (2012)Google Scholar
  38. 38.
    Weicker, R.P.: Dhrystone: a synthetic systems programming benchmark. Commun. ACM 27(10), 1013–1030 (1984)CrossRefGoogle Scholar
  39. 39.
    Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRefGoogle Scholar
  40. 40.
    Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of the 22nd annual international symposium on computer architecture, pp. 24–36. IEEE (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  • Romain Saussard
    • 1
    • 2
  • Boubker Bouzid
    • 1
  • Marius Vasiliu
    • 2
  • Roger Reynaud
    • 2
  1. 1.Renault S.A.S.GuyancourtFrance
  2. 2.SATIE, Université Paris SudUniversité Paris SaclayOrsayFrance

Personalised recommendations