Skip to main content

A Close Look at Multi-tenant Parallel CNN Inference for Autonomous Driving

  • Conference paper
  • First Online:
Network and Parallel Computing (NPC 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12639))

Included in the following conference series:

  • 1295 Accesses

Abstract

Convolutional neural networks (CNNs) are widely used in vision-based autonomous driving, i.e., detecting and localizing objects captured in live video streams. Although CNNs demonstrate the state-of-the-art detection accuracy, processing multiple video streams using such models in real-time imposes a serious challenge to the on-car computing systems. The lack of optimized system support, for example, could lead to a significant frame loss due to the high processing latency, which is unacceptable for safety-critical applications. To alleviate this problem, several optimization strategies such as batching, GPU parallelism, and data transfer modes between CPU/GPU have been proposed, in addition to a variety of deep learning frameworks and GPUs. It is, however, unclear how these techniques interact with each other, which particular combination performs better, and under what settings. In this paper, we set out to answer these questions. We design and develop a Multi-Tenant Parallel CNN Inference Framework, MPInfer, to carefully evaluate the performance of various parallel execution modes with different data transfer modes between CPU/GPU and GPU platforms. We find that on more powerful GPUs such as GTX 1660, it achieves the best performance when we adopt parallelism across CUDA contexts enhanced by NVIDIA Multi-Process Service (MPS), with 147.06 FPS throughput and 14.50 ms latency. Meanwhile, on embedded GPUs such as Jetson AGX Xavier, pipelining is a better choice, with 46.63 FPS throughput and 35.09 ms latency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bateni, S., Wang, Z., Zhu, Y., Hu, Y., Liu, C.: Co-optimizing performance and memory footprint via integrated CPU/GPU memory management, an implementation on autonomous driving platform. In: RTAS’2020 20 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pp. 310–323 (2020)

    Google Scholar 

  2. Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-NMS - improving object detection with one line of code. In: ICCV 2017 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5562–5570 (2017)

    Google Scholar 

  3. Goodwin, D.: NVIDIA TensorRT Inference Server boosts deep learning inference (2018). https://devblogs.nvidia.com/nvidia-serves-deep-learning-inference/

  4. Hawkins, A.J.: Watch mobileye’s self-driving car drive through Jerusalem using only cameras (2020). https://www.theverge.com/2020/1/7/21055450/mobileye-self-driving-car-watch-camera-only-intel-jerusalem

  5. Heng, L., et al.: Project autovision: localization and 3D scene perception for an autonomous vehicle with a multi-camera system. In: ICRA IEEE International Conference on Robotics and Automation, pp. 4695–4702 (2019)

    Google Scholar 

  6. Jain, P., et al.: Dynamic space-time scheduling for GPU inference. CoRR abs/1901.00041 (2019). http://arxiv.org/abs/1901.00041

  7. Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: [22nd] MMACM International Conference on Multimedia, pp. 675–678. ACM, New York (2014). https://doi.org/10.1145/2647868.2654889. http://doi.acm.org/10.1145/2647868.2654889

  8. Migacz, S.: 8-bit inference with TensorRT. In: GPU Technology Conference, vol. 2, p. 7 (2017)

    Google Scholar 

  9. NVIDIA: Multi-Process Service (vR440) (2019). https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf

  10. NVIDIA: TensorRT developer’s guide (v7.0) (2019). https://docs.nvidia.com/deeplearning/sdk/pdf/TensorRT-Developer-Guide.pdf

  11. Redmon, J.: Darknet: open source neural networks in C (2013–2016). http://pjreddie.com/darknet/

  12. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement arXiv:1804.02767 (2018)

  13. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: [28th] NeurIPS Advances in Neural Information Processing Systems, pp. 91–99 (2015)

    Google Scholar 

  14. da Silva Carvalho, M.D., Koark, F., Rheinländer, C., Wehn, N.: Real-time image recognition system based on an embedded heterogeneous computer and deep convolutional neural networks for deployment in constrained environments. In: WCX SAE World Congress Experience. SAE International (2019). https://doi.org/10.4271/2019-01-1045

  15. Yang, M., et al.: Re-thinking CNN frameworks for time-sensitive autonomous-driving applications: addressing an industrial challenge. In: RTAS 2019 IEEE Real-Time and Embedded Technology and Applications Symposium, pp. 305–317 (2019). https://doi.org/10.1109/RTAS.2019.00033

  16. Yang, M., Otterness, N., Amert, T., Bakita, J., Anderson, J.H., Smith, F.D.: Avoiding pitfalls when using NVIDIA GPUs for real-time tasks in autonomous systems. In: Altmeyer, S. (ed.) [30th]ECRTSEuromicro Conference on Real-Time Systems. Leibniz International Proceedings in Informatics (LIPIcs), vol. 106, pp. 20:1–20:21. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuh (2018). https://doi.org/10.4230/LIPIcs.ECRTS.2018.20

Download references

Acknowledgment

This work was partially funded by the National Major Program for Technological Innovation 2030–New Generation Artificial Intelligence (No. 2018AAA0100500) and the National Natural Science Foundation of China (No. 61772487).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, Y., Zhang, Y., Feng, B., Guo, X., Zhang, Y., Ding, Y. (2021). A Close Look at Multi-tenant Parallel CNN Inference for Autonomous Driving. In: He, X., Shao, E., Tan, G. (eds) Network and Parallel Computing. NPC 2020. Lecture Notes in Computer Science(), vol 12639. Springer, Cham. https://doi.org/10.1007/978-3-030-79478-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-79478-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-79477-4

  • Online ISBN: 978-3-030-79478-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics