Pipelined CNN Inference on Heterogeneous Multi-processor System-on-Chip

Aghapour, Ehsan; Zhang, Yujie; Pathania, Anuj; Mitra, Tulika

doi:10.1007/978-3-031-39932-9_16

Ehsan Aghapour³,
Yujie Zhang⁴,
Anuj Pathania³ &
…
Tulika Mitra⁴

337 Accesses

Abstract

Convolutional neural networks (CNNs)-based inference is a quintessential component in mobile machine learning applications. Privacy and real-time response requirements require applications to perform inference on the mobile (edge) devices themselves. Heterogeneous multi-processor system-on-chips (HMPSoCs) within the edge devices enable high-throughput, low-latency edge inference. An HMPSoC contains several processing cores, each capable of independently performing CNN inference. However, to meet stringent performance requirements, an application must simultaneously involve all core types in inferencing. A software-based CNN inference pipeline design allows for synergistic engagement of all the cores in an HMPSoC for a high-throughput and low-latency CNN inference. In this chapter, we present two different CNN inference pipeline designs. The first design creates a pipeline between two different types of CPU cores. The second design extends the pipeline from CPU to GPU. We also provide a future perspective and research directions on the subject.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Mitra, T.: Heterogeneous multi-core architectures. Inf. Media Technol. 10(3), 383–394 (2015)
Google Scholar
Prakash, A., Wang, S., Mitra, T.: Mobile application processors: Techniques for software power-performance optimization. IEEE Consumer Electron. Magaz. 9(4), 67–76 (2020)
Article Google Scholar
Wang, S., Ananthanarayanan, G., Zeng, Y., Goel, N., Pathania, A., Mitra, T.: High-throughput CNN inference on embedded ARM Big. LITTLE multicore processors. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 39(10), 2254–2267 (2019)
Article Google Scholar
Khadas VIM 3, https://www.khadas.com/vim3, 23 12 2011
Somu Muthukaruppan, T., Pathania, A., Mitra, T.: Price theory based power management for heterogeneous multi-cores. ACM SIGPLAN Notices 49(4), 161–176 (2014)
Article Google Scholar
Mitra, T., Muthukaruppan, T.S., Pathania, A., Pricopi, M., Venkataramani, V., Vishin, S.: Power management of asymmetric multi-cores in the dark silicon Era. In: The Dark Side of Silicon, pp. 159–189. Springer, Cham (2017)
Google Scholar
Rapp, M., Pathania, A., Mitra, T., Henkel, J.: Neural network-based performance prediction for task migration on S-NUCA many-cores. IEEE Trans. Comput. 70(10), 1691–1704 (2020)
MATH Google Scholar
Pricopi, M., Mitra, T.: Bahurupi: a polymorphic heterogeneous multi-core architecture. ACM Trans. Archit. Code Optimiz. 8(4), 1–21 (2012)
Article Google Scholar
Mitra, T., Pricopi, M.: U.S. Patent No. 9,690,620. Washington, DC: U.S. Patent and Trademark Office (2017)
Google Scholar
Pricopi, M., Mitra, T.: Task scheduling on adaptive multi-core. IEEE Trans. Comput. 63(10), 2590–2603 (2013)
Article MathSciNet MATH Google Scholar
Pathania, A., Jiao, Q., Prakash, A., Mitra, T.: Integrated CPU-GPU power management for 3D mobile games. In 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6. IEEE (2014)
Google Scholar
Pathania, A., Irimiea, A.E., Prakash, A., Mitra, T.: Power-performance modelling of mobile gaming workloads on heterogeneous MPSoCs. In Proceedings of the 52nd Annual Design Automation Conference, pp. 1–6 (2015)
Google Scholar
Prakash, A., Wang, S., Irimiea, A. E., Mitra, T.: Energy-efficient execution of data-parallel applications on heterogeneous mobile platforms. In 2015 33rd IEEE International Conference on Computer Design (ICCD), pp. 208–215 (2015)
Google Scholar
Karunaratne, M., Mohite, A.K., Mitra, T., Peh, L.S.: HyCUBE: A CGRA with reconfigurable single-cycle multi-hop interconnect. In Proceedings of the 54th Annual Design Automation Conference 2017, pp. 1–6 (2017)
Google Scholar
Li, Z., Wijerathne, D., Chen, X., Pathania, A., Mitra, T.: ChordMap: Automated mapping of streaming applications onto CGRA. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 41, 306–319 (2021)
Article Google Scholar
Wijerathne, D., Li, Z., Pathania, A., Mitra, T., Thiele, L.: HiMap: fast and scalable high-quality mapping on CGRA via hierarchical abstraction. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 41(10), 3290–3303 (2021)
Article Google Scholar
Wijerathne, D., Li, Z., Karunarathne, M., Pathania, A., Mitra, T.: Cascade: High throughput data streaming via decoupled access-execute CGRA. ACM Trans. Embed. Comput. Syst. 18(5s), 1–26 (2019)
Article Google Scholar
Li, Z., Wu, D., Wijerathne, D., Mitra, T.: LISA: Graph neural network based portable mapping on spatial accelerators. In: 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 444–459. IEEE (2022)
Google Scholar
Bandara, T.K., Wijerathne, D., Mitra, T., Peh, L.S.: REVAMP: A systematic framework for heterogeneous CGRA realization. In: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 918–932 (2022)
Google Scholar
Wijerathne, D., Li, Z., Bandara, T.K., Mitra, T.: PANORAMA: Divide-and-conquer approach for mapping complex loop kernels on CGRA. In: Proceedings of the 59th Annual Design Automation Conference 2022 (2022)
Google Scholar
Venkataramani, V., Pathania, A., Mitra, T.: Unified thread-and data-mapping for multi-threaded multi-phase applications on SPM many-cores. In: 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1496–1501. IEEE (2020)
Google Scholar
Wang, S., Pathania, A., Mitra, T.: Neural network inference on mobile SoCs. IEEE Design Test 37(5), 50–57 (2020)
Article Google Scholar
Wang, S., Prakash, A., Mitra, T.: Software support for heterogeneous computing. In: 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 756–762. IEEE (2018)
Google Scholar
Prakash, A., Wang, S., Mitra, T.: Mobile application processors: techniques for software power-performance optimization. IEEE Consumer Electron. Mag. 9(4), 67–76 (2020)
Article Google Scholar
ARM. Arm Compute Library. Available online: https://developer.arm.com/ip-products/processors/machine-learning/compute-library. Accessed 17 March 2022
OAID. Tengine. Available online: https://github.com/OAID/Tengine. Accessed 17 March 2022
Tencent. NCNN. Available online: https://github.com/Tencent/ncnn. Accessed 17 March 2022
Wu, H.I., Guo, D.Y., Chin, H.H., Tsay, R.S.: A pipeline-based scheduler for optimizing latency of convolution neural network inference over heterogeneous multicore systems. In 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), pp. 46–49. IEEE (2020)
Google Scholar
Kim, B., Lee, S., Trivedi, A.R., Song, W.J.: Energy-efficient acceleration of deep neural networks on realtime-constrained embedded edge devices. IEEE Access 8, 216259–216270 (2020)
Article Google Scholar
Minakova, S., Tang, E., Stefanov, T.: Combining task- and data-level parallelism for high-throughput CNN inference on embedded CPUs-GPUs MPSoCs. In: International Conference on Embedded Computer Systems, pp. 18–35. Springer, Cham (2020)
Google Scholar
Tang, E., Minakova, S., Stefanov, T.: Energy-efficient and High-throughput CNN inference on embedded CPUs-GPUs MPSoCs. In: International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS) (2022)
Google Scholar
Jeong, E., Kim, J., Tan, S., Lee, J., Ha, S.: Deep learning inference parallelization on heterogeneous processors with TensorRT. IEEE Embed. Syst. Lett. 14, 15–18 (2021)
Article Google Scholar
Zhong, G., Dubey, A., Tan, C., Mitra, T.: Synergy: an HW/SW framework for high throughput CNNs on embedded heterogeneous SoC. ACM Trans. Embed. Comput. Syst. 18(2), 1–23 (2019)
Article Google Scholar
Soomro, P.N., Abduljabbar, M., Castrillon, J., Pericàs, M.: An online guided tuning approach to run CNN pipelines on edge devices. In: Proceedings of the 18th ACM International Conference on Computing Frontiers, pp. 45–53 (2021)
Google Scholar
Zhong, G., Prakash, A., Liang, Y., Mitra, T., Niar, S.: Lin-analyzer: A high-level performance analysis tool for FPGA-based accelerators. In 2016 53rd ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6. IEEE (2016)
Google Scholar
Zhong, G., Venkataramani, V., Liang, Y., Mitra, T., Niar, S.: Design space exploration of multiple loops on FPGAs using high level synthesis. In 2014 IEEE 32nd International Conference on Computer Design (ICCD), pp. 456–463. IEEE (2014)
Google Scholar
XiTAO. https://github.com/CHART-Team/xitao. Accessed 17 March 2022
Aghapour, E., Pathania, A., Ananthanarayanan, G. Integrated ARM big. Little-Mali Pipeline for High-Throughput CNN Inference. TechRxiv preprint (2021)
Google Scholar
Aghapour, E., Sapra, D., Pimentel, A., Pathania, A.: CPU-GPU layer-switched low latency CNN inference. In: 2022 25th Euromicro Conference on Digital System Design (DSD) (2022)
Google Scholar

Download references

Acknowledgements

This research is partially supported by the National Research Foundation Singapore under its Competitive Research Program Award NRF-CRP23-2019-0003 and Singapore Ministry of Education Academic Research Fund T1 251RES1905.

Author information

Authors and Affiliations

University of Amsterdam, Amsterdam, Netherlands
Ehsan Aghapour & Anuj Pathania
National University of Singapore, Singapore, Singapore
Yujie Zhang & Tulika Mitra

Authors

Ehsan Aghapour
View author publications
You can also search for this author in PubMed Google Scholar
Yujie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Anuj Pathania
View author publications
You can also search for this author in PubMed Google Scholar
Tulika Mitra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tulika Mitra .

Editor information

Editors and Affiliations

Colorado State University, Fort Collins, CO, USA
Sudeep Pasricha
New York University Abu Dhabi, Abu Dhabi, Abu Dhabi, United Arab Emirates
Muhammad Shafique

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aghapour, E., Zhang, Y., Pathania, A., Mitra, T. (2024). Pipelined CNN Inference on Heterogeneous Multi-processor System-on-Chip. In: Pasricha, S., Shafique, M. (eds) Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing. Springer, Cham. https://doi.org/10.1007/978-3-031-39932-9_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-39932-9_16
Published: 10 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-39931-2
Online ISBN: 978-3-031-39932-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics