Enabling Inference and Training of Deep Learning Models for AI Applications on IoT Edge Devices

Sharma, Divyasheel; Sarkar, Santonu

doi:10.1007/978-3-030-87059-1_10

Divyasheel Sharma⁶ &
Santonu Sarkar⁶

Part of the book series: Internet of Things ((ITTCC))

1658 Accesses

Abstract

IoT edge devices sense and process data to support real-time decision-making in latency-sensitive and mission-critical applications such as autonomous driving, industry automation, safety compliance, and security-threat monitoring. Running AI at edge brings the ability to make intelligent real-time decisions on the device. Moreover, on-device AI is vital to preserving data privacy. Hence, edge AI is an active topic for research and engineering at the major technology corporations, numerous start-ups, and academia.

Deep learning neural network models have made tremendous improvements in prediction accuracies tending to surpass human intelligence for several tasks. Typically, these models are large-sized and hence, not suitable for resource-constrained edge devices and real-time inference. It is also challenging to train deep learning models on the edge device because they require large amounts of data and compute resources to train the model.

We present the active ongoing research in optimizing deep learning models for inference at the edge using connection pruning, model quantization, and knowledge distillation. Then, we describe techniques to train/retrain the deep learning models at the resource-constrained edge device using new learning paradigms such as federated learning, weight imprinting, and training smaller models on fewer data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

General Data Protection Regulation (GDPR) (https://gdpr-info.eu/)
Edge AI Software Market by Component (Solutions and Services), Data Source, Application (Autonomous Vehicles, Access Management, Video Surveillance, Remote Monitoring & Predictive Maintenance, Telemetry), Vertical, and Region – Global Forecast to 2023. Markets and Markets (https://www.researchandmarkets.com/reports/4752886/edge-ai-software-market-by-component-solutions)
Deep learning models with pre-trained weights. (https://keras.io/api/applications/)
Cheng, Y., Wang, D., Zhou, P., & Zhang, T. (2018). Model compression and acceleration for deep neural networks: The principles, progress, and challenges. IEEE Signal Processing Magazine, 35(1), 126–136.
Article Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778).
Google Scholar
Vanhoucke, V., Senior, A., & Mao, M. Z. (2011). Improving the speed of neural networks on CPUs. Deep Learning and Unsupervised Feature Learning Workshop, Neural Information Processing Systems.
Google Scholar
Gupta, S., Agrawal, A., Gopalakrishnan, K., & Narayanan, P. (2015). Deep Learning with limited numerical precision. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (pp. 1737–1746).
Google Scholar
Han, S., Pool, J., Tran, J., and Dally, W. J, (2015). Learning both weights and connections for efficient neural networks. Advances in Neural Information Processing Systems.
Google Scholar
Zhu, M., & Gupta, S. (2018). To prune, or not to prune: exploring the efficacy of pruning for model compression. International Conference on Learning Representations (ICLR) Workshop.
Google Scholar
Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., & Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. Advances in Neural Information Processing Systems.
Google Scholar
Jaderberg, M., Vedaldi, A., & Zisserman, A. (2014). Speeding up convolutional neural networks with low-rank expansions. In Proceedings of the British Machine Vision Conference. BMVA Press.
Google Scholar
Lebedev, V., Ganin, Y., Rakhuba, M., Oseledets, I. V., & Lempitsky, V. S. (2015). Speeding-up convolutional neural networks using fine-tuned CP-decomposition. International Conference on Learning Representations (ICLR Poster).
Google Scholar
Tai, C., Xiao, T., Wang, X., & Weinan, E. (2015). Convolutional neural networks with low-rank regularization. International Conference on Learning Representations (ICLR Poster).
Google Scholar
Denil, M., Shakibi, B., Dinh, L., Ranzato, M., & Freitas, N. D. (2013). Predicting parameters in Deep Learning. Advances in Neural Information Processing Systems.
Google Scholar
Sainath, T. N., Kingsbury, B., Sindhwani, V., Arisoy, E., & Ramanhadran, B. (2013). Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing.
Google Scholar
Gou, J., Yu, B., Maybank, S. J., & Tao, D. (2020). Knowledge distillation: A survey. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR).
Google Scholar
Kang, Y., Hauswald, J., Gao, C., et al. (2017). Neurosurgeon: collaborative intelligence between the Cloud and mobile Edge. In Proceeding of 22nd International Conference Architecture Support Programming Language Operator System (ASPLOS) (pp. 615–629).
Google Scholar
Teerapittayanon, S., McDaniel, B., & Kung, H. (2016). Branchynet: Fast inference via early exiting from deep neural networks. International Conference on Pattern Recognition.
Google Scholar
TFLite: ML for mobile and Edge devices. (https://www.tensorflow.org/lite)
NVIDIA TensorRT: Programmable inference accelerator. (https://developer.nvidia.com/tensorrt)
NVIDIA Jetson: The AI platform for autonomous everything (https://www.nvidia.com/en-in/autonomous-machines/embedded-systems/)
Google coral (https://coral.ai/)
Intel Open Vino (https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html)
Intel Movidius Vision Process Units (https://www.intel.com/content/www/us/en/products/processors/movidius-vpu/movidius-myriad-x.html)
Qualcomm’s Snapdragon (https://www.qualcomm.com/news/releases/2019/04/10/qualcomm-expands-ecosystem-enable-next-gen-Edge-ai-and-machine-learning)
Konecný, J., McMahan, H. B., Ramage, D., & Richtárik, P. (2016). Federated Optimization: Distributed Machine Learning for On-Device Intelligence. CoRR abs/1610.02527. arXiv:1610.02527. (http://arxiv.org/abs/1610.02527)
Retrain a classification model on-device with backpropagation. (https://coral.ai/docs/Edgetpu/retrain-classification-ondevice-backprop/)
Qi, H., Brown, M., & Lowe, D.G. (2018). Low-shot learning with imprinted weights. Conference on Computer Vision and Pattern recognition (CVPR).
Google Scholar
Denker, J., Schwartz, D., Wittner, B., Solla, S. A., Howard, R., Jackel, L., & Hopfield, J. (1987). Large automatic learning, rule extraction and generalization. Complex Systems, 1, 877–922.
MathSciNet MATH Google Scholar
Baum, E. B., & Haussler, D. (1989). What size net gives valid generalization? Neural Computation, 1, 151–160.
Article Google Scholar
Solla, S. A., Schwartz, D. B., Tishby, N., & Levin, E. (1990). Supervised learning: A theoretical framework. Neural Information Processing Systems.
Google Scholar
Le Cun, Y. (1989). Generalization and network design strategies. In R. Pfeifer, Z. Schreter, F. Fogelman, & L. Steels (Eds.), Connectionism in Perspective. Elsevier.
Google Scholar
Blalock, D., Gonzalez Ortiz, J. J., Frankle, J., & Guttag, J. (2020). What is the state of neural network pruning? In Machine Learning and Systems (MLSys).
Google Scholar
Konecný, J., McMahan, H. B., Yu, F. X., Richtárik, P., Suresh, A. T., & Bacon, D. (2016). Federated Learning: Strategies for Improving Communication Efficiency. NIPS 2016 Workshop on Private Multi-Party Machine Learning.
Google Scholar
McMahan, H. B, Moore, E., Ramage, D., and Agüera-Arcas, B., (2016). Federated Learning of Deep Networks using Model Averaging. CoRR abs/1602.05629. arXiv:1602.05629. (http://arxiv.org/abs/1602.05629).
Yang, Q., Liu, Y., Chen, T., & Tong, Y. (2019). Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology. Article no. 12.
Google Scholar
Frankle, J., & Carbin, M. (2019). The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations (ICLR).
Google Scholar
Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., & Kalenichenko, D. (2018). Quantization and training of neural networks for efficient integer-arithmetic-only inference. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2704–2713.
Google Scholar
Guo, Y. (2018). A survey on methods and theories on quantized neural networks. CoRR, abs/1808.04752, arXiv:1808.04752.
Google Scholar
Courbariaux, M., Bengio, Y., & David, J.-P. (2015). Binaryconnect: Training deep neural networks with binary weights during propagations. Advances in Neural Information Processing Systems, 3123–3131.
Google Scholar
Gong, Y., Liu, L., Yang, M., & Bourdev, L. (2014). Compressed deep convolutional networks using vector quantization. CoRR, abs/1412.6115, arXiv:1808.04752.
Google Scholar
Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016). Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision (ECCV) (pp. 525–542). Springer.
Google Scholar
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., & Liu, C. (2018). A survey on deep transfer learning. In International Conference on Artificial Neural Networks (ICANN) (pp. 270–279). Springer.
Google Scholar
Brutzkus, A., & Globerson, A. (2019). Why do larger models generalize better? A theoretical perspective via the XOR problem. International Conference on Machine Learning (ICML).
Google Scholar
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. Neural Information Processing Systems Deep Learning and Representation Learning Workshop.
Google Scholar
Bucilua, C., Caruana, R., & Niculescu-Mizil, A. (2006). Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (pp. 535–541). NY, USA.
Chapter Google Scholar
Gou, J., Yu, B., Maybank, S. J., & Tao, D. (2020). Knowledge Distillation: a survey. CoRR, abs/2006.05525 arXiv:2006.05525.
Google Scholar
McMahan, H. B., Moore, E., Ramage, D., Hampson, S., & Aguera-Arcas, B. (2017). Communication-efficient learning of deep networks from decentralized data. AISTATS.
Google Scholar
Hard, A., Rao, K., Mathews, R., Ramaswamy, S., Beaufays, F., Augenstein, S., Eichner, H., Kiddon, C., & Ramage, D. (2018). Federated learning for mobile keyboard prediction. CoRR, abs/ 1811.03604 (2018) arXiv: 1811.03604.
Google Scholar
McMahan, H. B., Ramage, D., Talwar, K., & Zhang, L. (2018). Learning differentially private recurrent language models. International Conference on Learning Representations (ICLR).
Google Scholar
Qi, H., Brown, M., & Lowe, D. G. (2018). Low-shot learning with imprinted weights. Conference on Computer Vision and Pattern Recognition (CVPR).
Google Scholar

Download references

Author information

Authors and Affiliations

ABB Corporate Research, Bangalore, India
Divyasheel Sharma & Santonu Sarkar

Authors

Divyasheel Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Santonu Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Santonu Sarkar .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Sister Nivedita University, Kolkata, West Bengal, India
Souvik Pal
Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, West Bengal, Kolkata, West Bengal, India
Debashis De
School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
Rajkumar Buyya

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sharma, D., Sarkar, S. (2022). Enabling Inference and Training of Deep Learning Models for AI Applications on IoT Edge Devices. In: Pal, S., De, D., Buyya, R. (eds) Artificial Intelligence-based Internet of Things Systems. Internet of Things. Springer, Cham. https://doi.org/10.1007/978-3-030-87059-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-87059-1_10
Published: 11 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87058-4
Online ISBN: 978-3-030-87059-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics