A Close Look at Multi-tenant Parallel CNN Inference for Autonomous Driving

Huang, Yitong; Zhang, Yu; Feng, Boyuan; Guo, Xing; Zhang, Yanyong; Ding, Yufei

doi:10.1007/978-3-030-79478-1_8

Yitong Huang¹¹,
Yu Zhang¹¹,
Boyuan Feng¹²,
Xing Guo¹¹,
Yanyong Zhang¹¹ &
…
Yufei Ding¹²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12639))

Included in the following conference series:

IFIP International Conference on Network and Parallel Computing

1295 Accesses

Abstract

Convolutional neural networks (CNNs) are widely used in vision-based autonomous driving, i.e., detecting and localizing objects captured in live video streams. Although CNNs demonstrate the state-of-the-art detection accuracy, processing multiple video streams using such models in real-time imposes a serious challenge to the on-car computing systems. The lack of optimized system support, for example, could lead to a significant frame loss due to the high processing latency, which is unacceptable for safety-critical applications. To alleviate this problem, several optimization strategies such as batching, GPU parallelism, and data transfer modes between CPU/GPU have been proposed, in addition to a variety of deep learning frameworks and GPUs. It is, however, unclear how these techniques interact with each other, which particular combination performs better, and under what settings. In this paper, we set out to answer these questions. We design and develop a Multi-Tenant Parallel CNN Inference Framework, MPInfer, to carefully evaluate the performance of various parallel execution modes with different data transfer modes between CPU/GPU and GPU platforms. We find that on more powerful GPUs such as GTX 1660, it achieves the best performance when we adopt parallelism across CUDA contexts enhanced by NVIDIA Multi-Process Service (MPS), with 147.06 FPS throughput and 14.50 ms latency. Meanwhile, on embedded GPUs such as Jetson AGX Xavier, pipelining is a better choice, with 46.63 FPS throughput and 35.09 ms latency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bateni, S., Wang, Z., Zhu, Y., Hu, Y., Liu, C.: Co-optimizing performance and memory footprint via integrated CPU/GPU memory management, an implementation on autonomous driving platform. In: RTAS’2020 20 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pp. 310–323 (2020)
Google Scholar
Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-NMS - improving object detection with one line of code. In: ICCV 2017 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5562–5570 (2017)
Google Scholar
Goodwin, D.: NVIDIA TensorRT Inference Server boosts deep learning inference (2018). https://devblogs.nvidia.com/nvidia-serves-deep-learning-inference/
Hawkins, A.J.: Watch mobileye’s self-driving car drive through Jerusalem using only cameras (2020). https://www.theverge.com/2020/1/7/21055450/mobileye-self-driving-car-watch-camera-only-intel-jerusalem
Heng, L., et al.: Project autovision: localization and 3D scene perception for an autonomous vehicle with a multi-camera system. In: ICRA IEEE International Conference on Robotics and Automation, pp. 4695–4702 (2019)
Google Scholar
Jain, P., et al.: Dynamic space-time scheduling for GPU inference. CoRR abs/1901.00041 (2019). http://arxiv.org/abs/1901.00041
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: [22nd] MMACM International Conference on Multimedia, pp. 675–678. ACM, New York (2014). https://doi.org/10.1145/2647868.2654889. http://doi.acm.org/10.1145/2647868.2654889
Migacz, S.: 8-bit inference with TensorRT. In: GPU Technology Conference, vol. 2, p. 7 (2017)
Google Scholar
NVIDIA: Multi-Process Service (vR440) (2019). https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf
NVIDIA: TensorRT developer’s guide (v7.0) (2019). https://docs.nvidia.com/deeplearning/sdk/pdf/TensorRT-Developer-Guide.pdf
Redmon, J.: Darknet: open source neural networks in C (2013–2016). http://pjreddie.com/darknet/
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: [28th] NeurIPS Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
da Silva Carvalho, M.D., Koark, F., Rheinländer, C., Wehn, N.: Real-time image recognition system based on an embedded heterogeneous computer and deep convolutional neural networks for deployment in constrained environments. In: WCX SAE World Congress Experience. SAE International (2019). https://doi.org/10.4271/2019-01-1045
Yang, M., et al.: Re-thinking CNN frameworks for time-sensitive autonomous-driving applications: addressing an industrial challenge. In: RTAS 2019 IEEE Real-Time and Embedded Technology and Applications Symposium, pp. 305–317 (2019). https://doi.org/10.1109/RTAS.2019.00033
Yang, M., Otterness, N., Amert, T., Bakita, J., Anderson, J.H., Smith, F.D.: Avoiding pitfalls when using NVIDIA GPUs for real-time tasks in autonomous systems. In: Altmeyer, S. (ed.) [30th]ECRTSEuromicro Conference on Real-Time Systems. Leibniz International Proceedings in Informatics (LIPIcs), vol. 106, pp. 20:1–20:21. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuh (2018). https://doi.org/10.4230/LIPIcs.ECRTS.2018.20

Download references

Acknowledgment

This work was partially funded by the National Major Program for Technological Innovation 2030–New Generation Artificial Intelligence (No. 2018AAA0100500) and the National Natural Science Foundation of China (No. 61772487).

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, China
Yitong Huang, Yu Zhang, Xing Guo & Yanyong Zhang
University of California, Santa Barbara, USA
Boyuan Feng & Yufei Ding

Authors

Yitong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Boyuan Feng
View author publications
You can also search for this author in PubMed Google Scholar
Xing Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yanyong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yufei Ding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Zhang .

Editor information

Editors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Xin He
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
En Shao
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Guangming Tan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, Y., Zhang, Y., Feng, B., Guo, X., Zhang, Y., Ding, Y. (2021). A Close Look at Multi-tenant Parallel CNN Inference for Autonomous Driving. In: He, X., Shao, E., Tan, G. (eds) Network and Parallel Computing. NPC 2020. Lecture Notes in Computer Science(), vol 12639. Springer, Cham. https://doi.org/10.1007/978-3-030-79478-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-79478-1_8
Published: 23 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79477-4
Online ISBN: 978-3-030-79478-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)