Journal of Real-Time Image Processing

, Volume 16, Issue 1, pp 143–160 | Cite as

A templated programmable architecture for highly constrained embedded HD video processing

  • Mathieu TheveninEmail author
  • Michel Paindavoine
  • Renaud Schmit
  • Barthelemy Heyrman
  • Laurent Letellier
Special Issue Paper


The implementation of a video reconstruction pipeline is required to improve the quality of images delivered by highly constrained devices. These algorithms require high computing capacities—several dozens of GOPs for real-time HD 1080p video streams. Today’s embedded design constraints impose limitations both in terms of silicon budget and power consumption—usually 2 mm\(^2\) for half a Watt. This paper presents the eISP architecture that is able to reach 188 MOPs/mW with 94 GOPs/mm\(^2\) and 378 GOPs/mW using TSMC 65-nm integration technology. This fully programmable and modular architecture, is based on an analysis of video-processing algorithms. Synthesizable VHDL is generated taking into account different parameters, which simplify the architecture sizing and characterization.


SIMD VLIW Programmable Low silicon footprint Low-power 



Authors are grateful to Nicola Martin, Dominique Debize, John Rander and Jacques Bouchard for their valuable assistance in proofreading and improving accuracy in written skills in English.


  1. 1.
    Chalamalasetti, S.R., Purohit, S., Margala, M., Vanderbauwhede, W.: MORA—an architecture and programming model for a resource efficient coarse grained reconfigurable processor. In: 2009 NASA/ESA conference on adaptive hardware and systems, IEEE, pp 389–396 (2009).
  2. 2.
    Chao, W.M., Chen, L.G.: Pyramid architecture for 3840 x 2160 quad full high definition 30 frames/s video acquisition. Circ Syst Video Technol IEEE Trans 20(11), 1499–1508 (2010). CrossRefGoogle Scholar
  3. 3.
    Chen, J.C., Chien, S.Y.: CRISP: coarse-grained reconfigurable image stream processor for digital still cameras and camcorders. IEEE Trans Circ Syst Video Technol 18(9), 1223–1236 (2008). CrossRefGoogle Scholar
  4. 4.
    Chen, P.Y., Lien, C.Y., Lin, Y.M.: A real-time image denoising chip. In: Circuits and systems, 2008. ISCAS 2008. IEEE international symposium on, pp. 3390–3393 (2008).
  5. 5.
    Chen, T.H., Chen, J.C., Cheng, T.Y., Chien, S.Y.: CRISP-DS: dual-stream coarse-grained reconfigurable image stream processor for HD digital camcorders and digital still cameras. In: Solid-state circuits conference, 2009. A-SSCC 2009. IEEE Asian, IEEE, pp. 193–196 (2009).
  6. 6.
    Conti, F., Schilling, R., Schiavone, P.D., Pullini, A., Rossi, D., Gurkaynak, F.K., Muehlberghuber, M., Gautschi, M., Loi, I., Haugou, G., Mangard, S., Benini, L.: An iot endpoint system-on-chip for secure and energy-efficient near-sensor analytics. IEEE Trans Circ Syst I Regular Papers 64(9), 2481–2494 (2017). CrossRefGoogle Scholar
  7. 7.
    David, R., Chillet, D., Pillement, S., Sentieys, O.: DART: a dynamically reconfigurable architecture dealing with future mobile telecommunications constr. In: Proceedings 16th international parallel and distributed processing symposium, IEEE Comput. Soc, pp. 156+ (2002).
  8. 8.
    Desoli, G., Chawla, N., Boesch, T., Singh, S.P., Guidetti, E., Ambroggi, F.D., Majo, T., Zambotti, P., Ayodhyawasi, M., Singh, H., Aggarwal, N.: 14.1 a 2.9tops/w deep convolutional neural network soc in fd-soi 28nm for intelligent embedded systems. In: 2017 IEEE international solid-state circuits conference (ISSCC), pp. 238–239 (2017).
  9. 9.
    Di Carlo, S., Prinetto, P., Rolfo, D., Trotta, P.: AIdi: an adaptive image denoising FPGA-based IP-core for real-time applications. In: Adaptive hardware and systems (AHS), 2013 NASA/ESA conference on, pp. 99–106 (2013).
  10. 10.
    Du, Y., Du, L., Li, Y., Su, J., Chang, M.F.: A streaming accelerator for deep convolutional neural networks with image and feature decomposition for resource-limited system applications. CoRR abs/1709.05116:1–5 (2017). (1709.05116)
  11. 11.
    Evain, S., Diguet, J.P.: Houzet D (2006) NoC design flow for TDMA and QoS management in a GALS context. EURASIP J Embedded Syst 1, 4–4 (2006)Google Scholar
  12. 12.
    Franzen, R.: Kodak lossless true color image suite (1999).
  13. 13.
    Garcia-Lamont, J., Aleman-Arce, M., Waissman-Vilanova, J.: A digital real time image demosaicking implementation for high definition video cameras. In: Electronics, robotics and automotive mechanics conference, 2008. CERMA ’08, pp. 565–569 (2008).
  14. 14.
    Gentile, A., Wills, D.S.: Portable video supercomputing. IEEE Trans Comput 53(8), 960–973 (2004). CrossRefGoogle Scholar
  15. 15.
    Global Sources: Mobile phone camera modules—mobile phones spur output growth, r&d activities in camera modules segment. Glob Sour Part 1–4: NA (2009)Google Scholar
  16. 16.
    Gonzalez, R.: Xtensa: a configurable and extensible processor. Micro IEEE 20(2), 60–70 (2000). CrossRefGoogle Scholar
  17. 17.
    Goossens, K., Hansson, A.: The aethereal network on chip after ten years: goals, evolution, lessons, and future. In: Proceedings of the 47th design automation conference, ACM, New York, NY, USA, DAC ’10, pp. 306–311 (2010).
  18. 18.
    Goossens, K., Dielissen, J., Radulescu, A.: Aethereal network on chip: concepts, architectures, and implementations. Design Test Comput IEEE 22(5), 414–421 (2005). CrossRefGoogle Scholar
  19. 19.
    Hartmann, M., Pantazis, V., Vander Aa, T., Berekovic, M., Hochberger, C.: Still image processing on coarse-grained reconfigurable array architectures. J Signal Process Syst 60(2), 225–237 (2010). CrossRefGoogle Scholar
  20. 20.
    Jin, W., He, G., He, W., Mao, Z.: A 12-bit \(4928 \times 3264\) pixel cmos image signal processor for digital still cameras. Integr VLSI J 59, 206–217 (2017). CrossRefGoogle Scholar
  21. 21.
    Juan, E.S.S.: Optimizing VLIW architecture for multimedia application. PhD thesis, Universitat Politècnica de Catalunya (2007)Google Scholar
  22. 22.
    Kapasi, U., Rixner, S., Dally, W., Khailany, B., Ahn, J., Mattson, P., Owens, J.: Programmable stream processors. Computer 36(8), 54–62 (2003). CrossRefGoogle Scholar
  23. 23.
    Khailany, B.K., Williams, J., Long, E.P., Rygh, M., Tovey, D.W., Dally, W.J.: A programmable 512 GOPS stream processor for signal, image, and video processing. Solid State Circ IEEE J 43(1), 202–213 (2008). CrossRefGoogle Scholar
  24. 24.
    Khawam, S., Nousias, I., Milward, M., Yi, Y., Muir, M., Arslan, T.: The reconfigurable instruction cell array. IEEE Trans Very Large Scale Integr (VLSI) Syst 16(1), 75–85 (2008). CrossRefGoogle Scholar
  25. 25.
    Lopez, D., Llosa, J., Valero, M., Ayguade, E.: Widening resources: a cost-effective technique for aggressive ILP architectures. In: Microarchitecture, 1998. MICRO-31. Proceedings. 31st annual ACM/IEEE international symposium on, pp. 237–246 (1998).
  26. 26.
    Millberg, M., Nilsson, E., Thid, R., Kumar, S., Jantsch, A.: The nostrum backbone-a communication protocol stack for networks on chip. In: VLSI design, 2004. Proceedings. 17th international conference on, pp. 693–696 (2004).
  27. 27.
    Paindavoine, M., Boisard, O., Carbon, A., Philippe, J.M., Brousse, O.: Neurodsp accelerator for face detection application. In: Proceedings of the 25th edition on great lakes symposium on VLSI, ACM, New York, NY, USA, GLSVLSI ’15, pp. 211–215 (2015).
  28. 28.
    Philippe, J.M., Carbon, A., Schmit, R.: Neurodsp: a multi-purpose energy-optimized accelerator for neural networks. In: Design, automation and test in Europe (DATE) 2016 conference, p. UB06.9 (2016).
  29. 29.
    Rixner, S., Dally, W.J., Khailany, B., Mattson, P., Kapasi, U.J., Owens, J.D.: Register organization for media processing. In: Sixth international symposium on high-performance computer architecture, 2000. HPCA-6, pp. 375–386 (2000)Google Scholar
  30. 30.
    Rossi, D., Pullini, A., Loi, I., Gautschi, M., Gürkaynak, F.K., Bartolini, A., Flatresse, P., Benini, L.: A 60 GOPS/W, \(-1.8\)–0.9 V body bias ULP cluster in 28 nm UTBB FD-SOI technology. Solid State Electron 117, 170–184 (2016). CrossRefGoogle Scholar
  31. 31.
    Saidani, T., Lacassagne, L., Falcou, J., Tadonki, C., Bouaziz, S.: Parallelization schemes for memory optimization on the cell processor: a case study on the harris corner detector. Transaction HiPEAC 3, 177–200 (2011)Google Scholar
  32. 32.
    Seo, S., Dreslinski, R.G., Woh, M., Chakrabarti, C., Mahlke, S., Mudge, T.: Diet soda: a power-efficient processor for digital cameras. In: 2010 ACM/IEEE international symposium on low-power electronics and design (ISLPED), pp. 79–84 (2010).
  33. 33.
    Singh, H., Lee, M.H., Lu, G., Kurdahi, F.J., Bagherzadeh, N., Chaves Filho, E.M.: MorphoSys: an integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Transactions on Computers 49(5), 465–481 (2000). CrossRefGoogle Scholar
  34. 34.
    Sparsoe, J.: Design of networks-on-chip for real-time multi-processor systems-on-chip. In: Application of concurrency to system design (ACSD), 2012 12th international conference on, pp. 1–5 (2012).
  35. 35.
    Texier, M., Piriou, E., Thevenin, M., David, R.: Designing processors using mass, a modular and lightweight instruction-level exploration tool. In: Design and architectures for signal and image processing (DASIP), 2011 conference on, pp. 1–6 (2011).
  36. 36.
    Thevenin, M., Letellier, L.: Device for the parallel processing of a data stream. International Patent WO/2010/037570 PCT/EP2009/057033:1 (2008)Google Scholar
  37. 37.
    Thevenin, M., Paindavoine, M., Letellier, L., Heyrman, B.: Embedded processor extensions for image processing. In: Proc. SPIE 7001, photonics in multimedia II, vol 7001, pp. 70,010B–11 (2008).

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Mathieu Thevenin
    • 1
    Email author
  • Michel Paindavoine
    • 2
  • Renaud Schmit
    • 1
  • Barthelemy Heyrman
    • 2
  • Laurent Letellier
    • 1
  1. 1.CEA, LIST—CEA SaclaySaclayFrance
  2. 2.University of BurgundyBurgundyFrance

Personalised recommendations