Applied Intelligence

, Volume 48, Issue 5, pp 1288–1301 | Cite as

Convolutional neural network acceleration with hardware/software co-design

  • Andrew Tzer-Yeu ChenEmail author
  • Morteza Biglari-Abhari
  • Kevin I-Kai Wang
  • Abdesselam Bouzerdoum
  • Fok Hing Chi Tivive


Convolutional Neural Networks (CNNs) have a broad range of applications, such as image processing and natural language processing. Inspired by the mammalian visual cortex, CNNs have been shown to achieve impressive results on a number of computer vision challenges, but often with large amounts of processing power and no timing restrictions. This paper presents a design methodology for accelerating CNNs using Hardware/Software Co-design techniques, in order to balance performance and flexibility, particularly for resource-constrained systems. The methodology is applied to a gender recognition case study, using an ARM processor and FPGA fabric to create an embedded system that can process facial images in real-time.


Computer vision Embedded system Neural network Co-design Hardware acceleration FPGA Real-time Gender recognition 


  1. 1.
    Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X (2016) TensorFlow: a system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI)Google Scholar
  2. 2.
    Tivive FHC, Bouzerdoum A (2006) A gender recognition system using shunting inhibitory convolutional neural networks. In: International joint conference on neural networks (IJCNN), pp 5336–5341Google Scholar
  3. 3.
    Chen ATY, Biglari-Abhari M, Wang KIK, Bouzerdoum A, Tivive FHC (2016) Hardware/software co-design for a gender recognition embedded system. In: Trends in applied knowledge-based systems and data science, vol 9799, pp 541–552Google Scholar
  4. 4.
    de Michell G, Gupta RK (1997) Hardware/software co-design. Proc IEEE 85(3):349–365CrossRefGoogle Scholar
  5. 5.
    Teich J (2012) Hardware/software codesign: the past, the present, and predicting the future. Proc IEEE 100:1411–1430CrossRefGoogle Scholar
  6. 6.
    Alt N, Clause C, Stechele W (2008) Hardware/software architecture of an algorithm for vision-based real-time vehicle detection in dark environments. In: Design, automation, and test in europe (DATE), pp 176–181Google Scholar
  7. 7.
    van der Wal G, Zhang D, Kandaswamy I, Marakowitz J, Kaighn K, Zhang J, Chai S (2015) FPGA acceleration for feature based processing applications. In: Conference on computer vision and pattern recognition (CVPR), pp 42–47Google Scholar
  8. 8.
    Tasson D, Montagnini A, Marzotto R, Farenzena M (2015) FPGA-based pedestrian detection under strong distortions. In: Conference on computer vision and pattern recognition (CVPR), pp 65–70Google Scholar
  9. 9.
    Farabet C, Poulet C, Han JY, LeCun Y (2009) CNP: An FPGA-based processor for convolutional networks. In: International conference on field programmable logic (FPL), pp 32–37Google Scholar
  10. 10.
    Sankaradas M, Jakkula V, Cadambi S, Chakradhar S, Durdanovic I, Cosatto E, Graf HP (2009) A massively parallel coprocessor for convolutional neural networks. In: 20th international conference on application-specific systems, architectures, and processors (ASAP), pp 53–60Google Scholar
  11. 11.
    Farabet C, Martini B, Corda B, Akselrod P, Culurciello E, LeCun Y (2011) NeuFlow: a runtime reconfigurable dataflow processor for vision. In: Conference on computer vision and pattern recognition workshops (CVPR), pp 109–116Google Scholar
  12. 12.
    Cavigelli L, Gschwend D, Mayer C, Willi S, Muheim B, Benini L (2015) Origami: a convolutional network accelerator. In: 25th great lakes symposium on VLSI (GLSVLSI), pp 199–204Google Scholar
  13. 13.
    Pham PH, Jelaca D, Farabet C, Martini B, LeCun Y, Culurciello E (2012) NeuFlow: dataflow vision processing system-on-a-chip. In: 55th midwest symposium on circuits and systems (MWSCAS), pp 1044–1047Google Scholar
  14. 14.
    Li X, Areibi S (2004) A hardware/software co-design approach for face recognition. In: 16th international conference on microelectronics (ICM), pp 55–58Google Scholar
  15. 15.
    Che M, Chang Y (2010) A hardware/software co-design of a face detection algorithm based on FPGA. In: International conference on measuring technology and mechatronics automation (ICMTMA), pp 109–112Google Scholar
  16. 16.
    Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S, Wang Y, Yang H (2016) Going deeper with embedded FPGA platform for convolutional neural network. In: International symposium on field-programmable gate arrays (FPGA), pp 26–35Google Scholar
  17. 17.
    Maclean WJ (2005) An evaluation of the suitability of FPGAs for embedded vision systems. In: Conference on computer vision and pattern recognition workshops (CVPR), pp 131–138Google Scholar
  18. 18.
    Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: International symposium on field-programmable gate arrays (FPGA), pp 161–170Google Scholar
  19. 19.
    Gupta S, Agrawal A, Gopalakrishnan K (2015) Deep learning with limited numerical precision. In: 32nd international conference on machine learning (ICML), pp 1737–1746Google Scholar
  20. 20.
    Ng CB, Tay YH, Goi BM (2012) Recognizing human gender in computer vision: a survey. In: Pacific rim international conference on artificial intelligence: trends in artificial intelligence (PRICAI), pp 335–346Google Scholar
  21. 21.
    Zheng J, Lu B (2011) A support vector machine classifier with automatic confidence. Neurocomputing 74(11):1926–1935CrossRefGoogle Scholar
  22. 22.
    Shan C (2012) Learning local binary patterns for gender classification on real-world face images. Pattern Recogn Lett 4(33):431–437CrossRefGoogle Scholar
  23. 23.
    Azarmehr R, Laganiere R, Lee WS, Xu C, Laroche D (2015) Real-time embedded age and gender classification in unconstrained video. In: Conference on computer vision and pattern recognition workshops (CVPR), pp 56–64Google Scholar
  24. 24.
    Irick KM, DeBole M, Narayanan V, Gayasen A (2008) A hardware efficient support vector machine architecture for FPGA. In: 16th international symposium on field-programmable custom computing machines (FCCM), pp 304–305Google Scholar
  25. 25.
    Irick K, DeBole M, Narayanan V, Sharma R, Moon H, Mummareddy S (2007) A unified streaming architecture for real time face detection and gender classification. In: international conference on field programmable logic and applications (FPL), pp 267–272Google Scholar
  26. 26.
    Ratnakar A, More G (2015) Real time gender recognition on FPGA. Int J Sci Eng Res 6(2):19–22Google Scholar
  27. 27.
    Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Conference on computer vision and pattern recognition (CVPR), pp 779–788Google Scholar
  28. 28.
    Tivive FHC, Bouzerdoum A, Phung SL, Iftekharuddin KM (2010) Adaptive hierarchical architecture for visual recognition. Appl Opt 49(10):B1–B8CrossRefGoogle Scholar
  29. 29.
    Fogel I, Sagi D (1989) Gabor filters as texture discriminator. Biol Cybern 61(2):103–113CrossRefGoogle Scholar
  30. 30.
    Wu J, An G, Ruan Q (2009) Independent Gabor analysis of discriminant features fusion for face recognition. IEEE Signal Processing Lett 16(2):97–100CrossRefGoogle Scholar
  31. 31.
    Li W, Du Q (2014) Gabor-filtering-based nearest regularized subspace for hyperspectral image classification. IEEE J Select Topics Appl Earth Observ Rem Sens 7(4):1012–1022CrossRefGoogle Scholar
  32. 32.
    Jones JP, Palmer L (1987) An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. J Neurophys 58(6):1233–1258CrossRefGoogle Scholar
  33. 33.
    Daugman JG (1985) Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J Optic Soc Amer A: Optic Image Sci Vis 2(7):1160–1169CrossRefGoogle Scholar
  34. 34.
    Naka KI, Rushton WAH (1966) S-potentials from colour units in the retina of fish (Cyprinidae). J Phys 185:536–555Google Scholar
  35. 35.
    Hagan MT, Menhaj M (1994) Training feedforward networks with the marquardt algorithm. IEEE Trans Neural Networks 5(6):989–993CrossRefGoogle Scholar
  36. 36.
    Cesur E, Yildiz N, Tavsanoglu V (2012) On an improved FPGA implementation of CNN-based Gabor-type filters. IEEE Trans Circuits Systems 59(11):815–819Google Scholar
  37. 37.
    Pauwels K, Tomasi M, Alonso JD, Ros E, van Hulle MM (2012) A comparison of FPGA and GPU for real-time phase-based optical flow, stereo, and local image features. IEEE Trans Comput 61(7):999–1012MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Han S, Mao H, Dally WJ (2016) Deep compression: Compressing deep neural networks with pruning trained quantization and huffman coding. In: International conference on learning representations (ICLR)Google Scholar
  39. 39.
    Chen Y, Xu W, Zhao R, Chen X (2014) Design and evaluation of a hardware/software FPGA-based system for fast image processing. Photonic Sensors 4(3):274–280CrossRefGoogle Scholar
  40. 40.
    Gudis E, Lu P, Berends D, Kaighn K, van der Wal G, Buchanan G, Chai S, Piacentino M (2013) An embedded vision services framework for heterogeneous accelerators. In: conference on computer vision and pattern recognition workshops (CVPR), pp 598–603Google Scholar
  41. 41.
    Albericio J, Judd P, Hetherington T, Aamodt T, Jerger NE, Moshovos A (2016) Cnvlutin: ineffectual-neuron-free deep neural network computing. In: 43rd international symposium on comparative archives (ISCA), pp 1–13Google Scholar
  42. 42.
    Jesorsky O, Kirchberg KJ, Frischholz RW (2001) Robust face detection using the Hausdorff distance. In: 3rd international conference on audio- and video-based biometric person authentication (AVBPA), pp 90–95Google Scholar
  43. 43.
    Pantic M, Valstar M, Rademaker R (2005) Web-based database for facial expression analysis. In: International conference on multimedia and expo (ICME), pp. 317–321Google Scholar
  44. 44.
    Phillips PJ, Moon H, Rauss PJ, Rizvi S (2000) The FERET evaluation methodology for face recognition algorithms. IEEE Trans Pattern Anal Machine Intelligence 22(10):1090–1104CrossRefGoogle Scholar
  45. 45.
    Thomaz CE, Giraldi GA (2010) A new ranking method for principal components analysis and its application to face image analysis. Image Vis Comput 28(6):902–913CrossRefGoogle Scholar
  46. 46.
    Lee PH, Hung JY, Hung YP (2010) Automatic gender recognition using fusion of facial strips. In: 20th international conference on pattern recognition, pp 1140–1143Google Scholar
  47. 47.
    Leng XM, Wang YD (2008) Improving generalization for gender classification. In: 15th international conference on image processing, pp 1656–1659Google Scholar
  48. 48.
    Moghaddam B, Yang MH (2002) Learning gender with support faces. IEEE Trans Pattern Anal Machine Intelligence 24(5):707–711CrossRefGoogle Scholar
  49. 49.
    Lu L, Shi P (2009) A novel fusion-based method for expression-invariant gender classification. In: International conference on acoustics, speech, and signal processing, pp 1065–1068Google Scholar
  50. 50.
    Baluja S, Rowley HA (2007) Boosting sex identification performance. Int J Comp Vision 71(1):111–119CrossRefGoogle Scholar
  51. 51.
    Buchala S, Loomes MJ, Davey N, Frank RJ (2005) The role of global and feature based information in gender classification of faces: a comparison of human performance and computational models. Int J Neural Syst 15:121–128CrossRefGoogle Scholar
  52. 52.
    Sahin I, Saritekin NK (2016) A data path design tool for automatically mapping artificial neural networks on to FPGA-based systems. J Elec Eng Tech 11(5):1921–1929CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringThe University of AucklandAucklandNew Zealand
  2. 2.School of Electrical, Computer, and Telecommunications EngineeringUniversity of WollongongWollongongAustralia
  3. 3.College of Science and EngineeringHamad Bin Khalifa UniversityDohaQatar

Personalised recommendations