Exploring FPGA-GPU Heterogeneous Architecture for ADAS: Towards Performance and Energy

  • Xiebing WangEmail author
  • Linlin Liu
  • Kai Huang
  • Alois Knoll
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10393)


This paper investigates the feasibility of using heterogeneous computing for future advanced driver assistance systems (ADAS) applications. In particular, we take lane detection algorithm (LDA) as a test case. The algorithm is customized into FPGA-GPU heterogeneous implementations which can be executed in either workload constant or balanced scheme. Then the heterogeneous executions are evaluated in view of performance and energy consumption, and further compared with the single-accelerator run. Experiments show that the heterogeneous execution alleviates both the performance and energy bottlenecks caused when only using a single accelerator. Moreover, compared with the single FPGA execution, the workload balance scheme increases the performance by 236.9% and 42.9% on our two tested platforms respectively, while ensuring the low energy cost.


Advanced Driver Assistance Systems (ADAS) OpenCL FPGA GPU 



This work is supported in part by the scholarship from China Scholarship Council (CSC) under the Grant No. 201506270152.


  1. 1.
    Alawieh, M., Kasparek, M., Franke, N., Hupfer, J.: A high performance FPGA-GPU-CPU platform for a real-time locating system. In: 23rd European Signal Processing Conference (EUSIPCO), pp. 1576–1580. IEEE (2015)Google Scholar
  2. 2.
    Aly, M.: Caltech lanes. Accessed 10 Mar 2017
  3. 3.
    Asano, S., Maruyama, T., Yamaguchi, Y.: Performance comparison of FPGa, GPU and CPU in image processing. In: 19th International Conference on Field Programmable Logic and Applications (FPL), pp. 126–131. IEEE (2009)Google Scholar
  4. 4.
    Blair, C., Robertson, N.M., Hume, D.: Characterizing a heterogeneous system for person detection in video using histograms of oriented gradients: power versus speed versus accuracy. IEEE J. Emerg. Sel. Top. Circ. Syst. 3(2), 236–247 (2013)CrossRefGoogle Scholar
  5. 5.
    Che, S., Li, J., Sheaffer, J.W., Skadron, K., Lach, J.: Accelerating compute-intensive applications with GPUs and FPGAs. In: Proceedings of the 6th IEEE Symposium on Application Specific Processors (SASP), pp. 101–107. IEEE (2008)Google Scholar
  6. 6.
    Chen, D., Singh, D.: Fractal video compression in OpenCL: an evaluation of CPUs, GPUs, and FPGAs as acceleration platforms. In: 18th Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 297–304. IEEE (2013)Google Scholar
  7. 7.
    Cope, B., Cheung, P.Y., Luk, W., Howes, L.: Performance comparison of graphics processors to reconfigurable logic: a case study. IEEE Trans. Comput. 59(4), 433–448 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Da Silva, B., Braeken, A., D’Hollander, E.H., Touhafi, A., Cornelis, J.G., Lemeire, J.: Comparing and combining GPU and FPGA accelerators in an image processing context. In: 23rd International Conference on Field Programmable Logic and Applications (FPL), pp. 1–4. IEEE (2013)Google Scholar
  9. 9.
    Fowers, J., Brown, G., Cooke, P., Stitt, G.: A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications. In: Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA), pp. 47–56. ACM (2012)Google Scholar
  10. 10.
    Huang, K., Hu, B., Botsch, J., Madduri, N., Knoll, A.: A scalable lane detection algorithm on COTSs with OpenCL. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 229–232. IEEE (2016)Google Scholar
  11. 11.
    Intel: Powerplay early power estimators and power analyzer. Accessed 10 Mar 2017
  12. 12.
    Meng, P., Jacobsen, M., Kastner, R.: FPGA-GPU-CPU heterogenous architecture for real-time cardiac physiological optical mapping. In: International Conference on Field-Programmable Technology (ICFPT), pp. 37–42. IEEE (2012)Google Scholar
  13. 13.
    Nurvitadhi, E., Sheffield, D., Sim, J., Mishra, A., Venkatesh, G., Marr, D.: Accelerating binarized neural networks: comparison of FPGA, CPU, GPU, and ASIC. In: International Conference on Field-Programmable Technology (ICFPT), pp. 37–42. IEEE (2016)Google Scholar
  14. 14.
    Nvidia: Nvidia® jetson™: the embedded platform for autonomous everything. Accessed 10 Mar 2017
  15. 15.
    Struyf, L., De Beugher, S., Van Uytsel, D.H., Kanters, F., Goedemé, T.: The battle of the giants: a case study of GPU vs FPGA optimisation for real-time image processing. In: Proceedings of the 4th International Conference on Pervasive and Embedded Computing and Communication Systems (PECCS), vol. 1, pp. 112–119. VISIGRAPP (2014)Google Scholar
  16. 16.
    Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Xiebing Wang
    • 1
    Email author
  • Linlin Liu
    • 2
  • Kai Huang
    • 2
  • Alois Knoll
    • 1
  1. 1.Technische Universität MünchenGarching bei MünchenGermany
  2. 2.Sun Yat-sen UniversityGuangzhouPeople’s Republic of China

Personalised recommendations