In recent years, with the rapid growth of big data and computation, high-performance computing and heterogeneous computing have been widely concerned. In object detection algorithms, people tend to pay less attention to training time, but more attention to algorithm running time, energy efficiency ratio and processing delay. FPGA can achieve data parallel operation, low power, low latency and reprogramming, providing powerful computing power and enough flexibility. In this paper, SDAccel tool of Xilinx is used to implement a heterogeneous computing platform for face detection based on CPU+FPGA, in which FPGA is used as a coprocessor to accelerate face detection algorithm. A high-level synthesis (HLS) approach allows developers to focus more on the architecture of the design and lowers the development threshold for software developers. The implementation of Viola Jones face detection algorithm on FPGA is taken as an example to demonstrate the development process of SDAccel, and explore the potential parallelism of the algorithm, as well as how to optimize the hardware circuit with high-level language. Our final design is 70 times faster than a single-threaded CPU.


FPGA Heterogeneous Face detection Architecture High-level synthesis SDAccel 


  1. 1.
    Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(1–3), 66–73 (2010)CrossRefGoogle Scholar
  2. 2.
    Guidi, G., et al.: On how to improve FPGA-based systems design productivity via SDAccel. In: Proceedings of 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016, pp. 247–252 (2016)Google Scholar
  3. 3.
    Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)CrossRefGoogle Scholar
  4. 4.
    Lai, H.C., Savvides, M., Chen, T.: Proposed FPGA hardware architecture for high frame rate (>00 Fps) face detection using feature cascade classifiers. In: IEEE Conference on Biometrics: Theory, Applications and Systems, BTAS 2007 (2007)Google Scholar
  5. 5.
    Hiromoto, M., Sugano, H., Miyamoto, R.: Partially parallel architecture for AdaBoost-based detection with haar-like features. IEEE Trans. Circuits Syst. Video Technol. 19(1), 41–52 (2009)CrossRefGoogle Scholar
  6. 6.
    Cho, J., Benson, B., Mirzaei, S., Kastner, R.: Parallelized architecture of multiple classifiers for face detection. In: 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, pp. 75–82. IEEE (2009)Google Scholar
  7. 7.
    Kyrkou, C., Theocharides, T.: A flexible parallel hardware architecture for AdaBoost-based real-time object detection. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 19(6), 1034–1047 (2011)Google Scholar
  8. 8.
    Casseau, E., Gal, B.L.: High-level synthesis for the design of FPGA-based signal processing systems. In: International Symposium on Systems, Architectures, Modeling, and Simulation (SAMOS) (2009)Google Scholar
  9. 9.
    Skalicky, S., Wood, C., Łukowiak, M., Ryan, M.: High-level synthesis: where are we? A case study on matrix multiplication. In: International Conference on Reconfigurable Computing and FPGAs (ReConFig) (2013)Google Scholar
  10. 10.
    Winterstein, F., Bayliss, S., Constantinides, G.A.: High-level synthesis of dynamic data structures: a case study using Vivado HLS. In: International Conference on Field-Programmable Technology (FPT) (2013)Google Scholar
  11. 11.
    Neuendorffer, S., Li, T., Wang, D.: Accelerating OpenCV applications with Zynq-7000 all programmable SoC using Vivado HLS video libraries. Xilinx Inc., August 2013Google Scholar
  12. 12.
    Edwards, S., et al.: The challenges of synthesizing hardware from c- like languages. IEEE Des. Test Comput. 23(5), 375–386 (2006)CrossRefGoogle Scholar
  13. 13.
    Srivastava, N.K., Dai, S., Manohar, R., Zhang, Z.: Accelerating face detection on programmable SoC using C-based synthesis. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA 2017, pp. 195–200 (2017)Google Scholar
  14. 14.
    Zemcik, P., Juranek, R., Musil, P., Musil, M., Hradis, M.: High performance architecture for object detection in streamed videos. In: Proceedings of 2013 23rd International Conference on Field Programmable Logic and Applications, FPL 2013, pp. 4–7 (2013)Google Scholar
  15. 15.
    Musil, P., Juranek, R., Musil, M., et al.: Cascaded stripe memory engines for multi-scale object detection in FPGA. IEEE Trans. Circuits Syst. Video Technol., 1–1 (2018)Google Scholar
  16. 16.
    Kyrkou, C., Bouganis, C., Theocharides, T., Polycarpou, M.M.: Embedded hardware-efficient real-time classification with cascade support vector machines. IEEE Trans. Neural Netw. Learn. Syst. 27(1), 99–112 (2016)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Papageorgiou, C.P., Oren, M., Poggio, T.: General framework for object detection. In: Proceedings of the IEEE International Conference on Computer Vision, February 1998, pp. 555–562 (1998)Google Scholar
  18. 18.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting BT. In: Proceedings of Computational Learning Theory: Second European Conference, EuroCOLT 1995, Barcelona, Spain, 13–15 March 1995, pp. 23–37 (1995). Journal of Computer & System SciencesGoogle Scholar

Copyright information

© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2020

Authors and Affiliations

  1. 1.Dalian University of TechnologyDalianChina

Personalised recommendations