A templated programmable architecture for highly constrained embedded HD video processing
- 29 Downloads
Abstract
The implementation of a video reconstruction pipeline is required to improve the quality of images delivered by highly constrained devices. These algorithms require high computing capacities—several dozens of GOPs for real-time HD 1080p video streams. Today’s embedded design constraints impose limitations both in terms of silicon budget and power consumption—usually 2 mm\(^2\) for half a Watt. This paper presents the eISP architecture that is able to reach 188 MOPs/mW with 94 GOPs/mm\(^2\) and 378 GOPs/mW using TSMC 65-nm integration technology. This fully programmable and modular architecture, is based on an analysis of video-processing algorithms. Synthesizable VHDL is generated taking into account different parameters, which simplify the architecture sizing and characterization.
Keywords
SIMD VLIW Programmable Low silicon footprint Low-powerNotes
Acknowledgements
Authors are grateful to Nicola Martin, Dominique Debize, John Rander and Jacques Bouchard for their valuable assistance in proofreading and improving accuracy in written skills in English.
References
- 1.Chalamalasetti, S.R., Purohit, S., Margala, M., Vanderbauwhede, W.: MORA—an architecture and programming model for a resource efficient coarse grained reconfigurable processor. In: 2009 NASA/ESA conference on adaptive hardware and systems, IEEE, pp 389–396 (2009). https://doi.org/10.1109/AHS.2009.37
- 2.Chao, W.M., Chen, L.G.: Pyramid architecture for 3840 x 2160 quad full high definition 30 frames/s video acquisition. Circ Syst Video Technol IEEE Trans 20(11), 1499–1508 (2010). https://doi.org/10.1109/TCSVT.2010.2077770 CrossRefGoogle Scholar
- 3.Chen, J.C., Chien, S.Y.: CRISP: coarse-grained reconfigurable image stream processor for digital still cameras and camcorders. IEEE Trans Circ Syst Video Technol 18(9), 1223–1236 (2008). https://doi.org/10.1109/TCSVT.2008.928529 CrossRefGoogle Scholar
- 4.Chen, P.Y., Lien, C.Y., Lin, Y.M.: A real-time image denoising chip. In: Circuits and systems, 2008. ISCAS 2008. IEEE international symposium on, pp. 3390–3393 (2008). https://doi.org/10.1109/ISCAS.2008.4542186
- 5.Chen, T.H., Chen, J.C., Cheng, T.Y., Chien, S.Y.: CRISP-DS: dual-stream coarse-grained reconfigurable image stream processor for HD digital camcorders and digital still cameras. In: Solid-state circuits conference, 2009. A-SSCC 2009. IEEE Asian, IEEE, pp. 193–196 (2009). https://doi.org/10.1109/asscc.2009.5357150
- 6.Conti, F., Schilling, R., Schiavone, P.D., Pullini, A., Rossi, D., Gurkaynak, F.K., Muehlberghuber, M., Gautschi, M., Loi, I., Haugou, G., Mangard, S., Benini, L.: An iot endpoint system-on-chip for secure and energy-efficient near-sensor analytics. IEEE Trans Circ Syst I Regular Papers 64(9), 2481–2494 (2017). https://doi.org/10.1109/TCSI.2017.2698019 CrossRefGoogle Scholar
- 7.David, R., Chillet, D., Pillement, S., Sentieys, O.: DART: a dynamically reconfigurable architecture dealing with future mobile telecommunications constr. In: Proceedings 16th international parallel and distributed processing symposium, IEEE Comput. Soc, pp. 156+ (2002). https://doi.org/10.1109/IPDPS.2002.1016554
- 8.Desoli, G., Chawla, N., Boesch, T., Singh, S.P., Guidetti, E., Ambroggi, F.D., Majo, T., Zambotti, P., Ayodhyawasi, M., Singh, H., Aggarwal, N.: 14.1 a 2.9tops/w deep convolutional neural network soc in fd-soi 28nm for intelligent embedded systems. In: 2017 IEEE international solid-state circuits conference (ISSCC), pp. 238–239 (2017). https://doi.org/10.1109/ISSCC.2017.7870349
- 9.Di Carlo, S., Prinetto, P., Rolfo, D., Trotta, P.: AIdi: an adaptive image denoising FPGA-based IP-core for real-time applications. In: Adaptive hardware and systems (AHS), 2013 NASA/ESA conference on, pp. 99–106 (2013). https://doi.org/10.1109/AHS.2013.6604232
- 10.Du, Y., Du, L., Li, Y., Su, J., Chang, M.F.: A streaming accelerator for deep convolutional neural networks with image and feature decomposition for resource-limited system applications. CoRR abs/1709.05116:1–5 (2017). http://arxiv.org/abs/1709.05116 (1709.05116)
- 11.Evain, S., Diguet, J.P.: Houzet D (2006) NoC design flow for TDMA and QoS management in a GALS context. EURASIP J Embedded Syst 1, 4–4 (2006)Google Scholar
- 12.Franzen, R.: Kodak lossless true color image suite (1999). http://r0k.us/graphics/kodak/
- 13.Garcia-Lamont, J., Aleman-Arce, M., Waissman-Vilanova, J.: A digital real time image demosaicking implementation for high definition video cameras. In: Electronics, robotics and automotive mechanics conference, 2008. CERMA ’08, pp. 565–569 (2008). https://doi.org/10.1109/CERMA.2008.78
- 14.Gentile, A., Wills, D.S.: Portable video supercomputing. IEEE Trans Comput 53(8), 960–973 (2004). https://doi.org/10.1109/TC.2004.48 CrossRefGoogle Scholar
- 15.Global Sources: Mobile phone camera modules—mobile phones spur output growth, r&d activities in camera modules segment. Glob Sour Part 1–4: NA (2009)Google Scholar
- 16.Gonzalez, R.: Xtensa: a configurable and extensible processor. Micro IEEE 20(2), 60–70 (2000). https://doi.org/10.1109/40.848473 CrossRefGoogle Scholar
- 17.Goossens, K., Hansson, A.: The aethereal network on chip after ten years: goals, evolution, lessons, and future. In: Proceedings of the 47th design automation conference, ACM, New York, NY, USA, DAC ’10, pp. 306–311 (2010). https://doi.org/10.1145/1837274.1837353
- 18.Goossens, K., Dielissen, J., Radulescu, A.: Aethereal network on chip: concepts, architectures, and implementations. Design Test Comput IEEE 22(5), 414–421 (2005). https://doi.org/10.1109/MDT.2005.99 CrossRefGoogle Scholar
- 19.Hartmann, M., Pantazis, V., Vander Aa, T., Berekovic, M., Hochberger, C.: Still image processing on coarse-grained reconfigurable array architectures. J Signal Process Syst 60(2), 225–237 (2010). https://doi.org/10.1007/s11265-008-0309-0 CrossRefGoogle Scholar
- 20.Jin, W., He, G., He, W., Mao, Z.: A 12-bit \(4928 \times 3264\) pixel cmos image signal processor for digital still cameras. Integr VLSI J 59, 206–217 (2017). https://doi.org/10.1016/j.vlsi.2017.06.005 CrossRefGoogle Scholar
- 21.Juan, E.S.S.: Optimizing VLIW architecture for multimedia application. PhD thesis, Universitat Politècnica de Catalunya (2007)Google Scholar
- 22.Kapasi, U., Rixner, S., Dally, W., Khailany, B., Ahn, J., Mattson, P., Owens, J.: Programmable stream processors. Computer 36(8), 54–62 (2003). https://doi.org/10.1109/MC.2003.1220582 CrossRefGoogle Scholar
- 23.Khailany, B.K., Williams, J., Long, E.P., Rygh, M., Tovey, D.W., Dally, W.J.: A programmable 512 GOPS stream processor for signal, image, and video processing. Solid State Circ IEEE J 43(1), 202–213 (2008). https://doi.org/10.1109/JSSC.2007.909331 CrossRefGoogle Scholar
- 24.Khawam, S., Nousias, I., Milward, M., Yi, Y., Muir, M., Arslan, T.: The reconfigurable instruction cell array. IEEE Trans Very Large Scale Integr (VLSI) Syst 16(1), 75–85 (2008). https://doi.org/10.1109/TVLSI.2007.912133 CrossRefGoogle Scholar
- 25.Lopez, D., Llosa, J., Valero, M., Ayguade, E.: Widening resources: a cost-effective technique for aggressive ILP architectures. In: Microarchitecture, 1998. MICRO-31. Proceedings. 31st annual ACM/IEEE international symposium on, pp. 237–246 (1998). https://doi.org/10.1109/MICRO.1998.742785
- 26.Millberg, M., Nilsson, E., Thid, R., Kumar, S., Jantsch, A.: The nostrum backbone-a communication protocol stack for networks on chip. In: VLSI design, 2004. Proceedings. 17th international conference on, pp. 693–696 (2004). https://doi.org/10.1109/ICVD.2004.1261005
- 27.Paindavoine, M., Boisard, O., Carbon, A., Philippe, J.M., Brousse, O.: Neurodsp accelerator for face detection application. In: Proceedings of the 25th edition on great lakes symposium on VLSI, ACM, New York, NY, USA, GLSVLSI ’15, pp. 211–215 (2015). https://doi.org/10.1145/2742060.2743769. http://doi.acm.org/10.1145/2742060.2743769
- 28.Philippe, J.M., Carbon, A., Schmit, R.: Neurodsp: a multi-purpose energy-optimized accelerator for neural networks. In: Design, automation and test in Europe (DATE) 2016 conference, p. UB06.9 (2016). https://www.date-conference.com/date16/conference/session/UB06
- 29.Rixner, S., Dally, W.J., Khailany, B., Mattson, P., Kapasi, U.J., Owens, J.D.: Register organization for media processing. In: Sixth international symposium on high-performance computer architecture, 2000. HPCA-6, pp. 375–386 (2000)Google Scholar
- 30.Rossi, D., Pullini, A., Loi, I., Gautschi, M., Gürkaynak, F.K., Bartolini, A., Flatresse, P., Benini, L.: A 60 GOPS/W, \(-1.8\)–0.9 V body bias ULP cluster in 28 nm UTBB FD-SOI technology. Solid State Electron 117, 170–184 (2016). https://doi.org/10.1016/j.sse.2015.11.015 CrossRefGoogle Scholar
- 31.Saidani, T., Lacassagne, L., Falcou, J., Tadonki, C., Bouaziz, S.: Parallelization schemes for memory optimization on the cell processor: a case study on the harris corner detector. Transaction HiPEAC 3, 177–200 (2011)Google Scholar
- 32.Seo, S., Dreslinski, R.G., Woh, M., Chakrabarti, C., Mahlke, S., Mudge, T.: Diet soda: a power-efficient processor for digital cameras. In: 2010 ACM/IEEE international symposium on low-power electronics and design (ISLPED), pp. 79–84 (2010). https://doi.org/10.1145/1840845.1840862
- 33.Singh, H., Lee, M.H., Lu, G., Kurdahi, F.J., Bagherzadeh, N., Chaves Filho, E.M.: MorphoSys: an integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Transactions on Computers 49(5), 465–481 (2000). https://doi.org/10.1109/12.859540 CrossRefGoogle Scholar
- 34.Sparsoe, J.: Design of networks-on-chip for real-time multi-processor systems-on-chip. In: Application of concurrency to system design (ACSD), 2012 12th international conference on, pp. 1–5 (2012). https://doi.org/10.1109/ACSD.2012.27
- 35.Texier, M., Piriou, E., Thevenin, M., David, R.: Designing processors using mass, a modular and lightweight instruction-level exploration tool. In: Design and architectures for signal and image processing (DASIP), 2011 conference on, pp. 1–6 (2011). https://doi.org/10.1109/DASIP.2011.6136870
- 36.Thevenin, M., Letellier, L.: Device for the parallel processing of a data stream. International Patent WO/2010/037570 PCT/EP2009/057033:1 (2008)Google Scholar
- 37.Thevenin, M., Paindavoine, M., Letellier, L., Heyrman, B.: Embedded processor extensions for image processing. In: Proc. SPIE 7001, photonics in multimedia II, vol 7001, pp. 70,010B–11 (2008). https://doi.org/10.1117/12.780852