Abstract
IoT edge devices sense and process data to support real-time decision-making in latency-sensitive and mission-critical applications such as autonomous driving, industry automation, safety compliance, and security-threat monitoring. Running AI at edge brings the ability to make intelligent real-time decisions on the device. Moreover, on-device AI is vital to preserving data privacy. Hence, edge AI is an active topic for research and engineering at the major technology corporations, numerous start-ups, and academia.
Deep learning neural network models have made tremendous improvements in prediction accuracies tending to surpass human intelligence for several tasks. Typically, these models are large-sized and hence, not suitable for resource-constrained edge devices and real-time inference. It is also challenging to train deep learning models on the edge device because they require large amounts of data and compute resources to train the model.
We present the active ongoing research in optimizing deep learning models for inference at the edge using connection pruning, model quantization, and knowledge distillation. Then, we describe techniques to train/retrain the deep learning models at the resource-constrained edge device using new learning paradigms such as federated learning, weight imprinting, and training smaller models on fewer data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
General Data Protection Regulation (GDPR) (https://gdpr-info.eu/)
Edge AI Software Market by Component (Solutions and Services), Data Source, Application (Autonomous Vehicles, Access Management, Video Surveillance, Remote Monitoring & Predictive Maintenance, Telemetry), Vertical, and Region – Global Forecast to 2023. Markets and Markets (https://www.researchandmarkets.com/reports/4752886/edge-ai-software-market-by-component-solutions)
Deep learning models with pre-trained weights. (https://keras.io/api/applications/)
Cheng, Y., Wang, D., Zhou, P., & Zhang, T. (2018). Model compression and acceleration for deep neural networks: The principles, progress, and challenges. IEEE Signal Processing Magazine, 35(1), 126–136.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778).
Vanhoucke, V., Senior, A., & Mao, M. Z. (2011). Improving the speed of neural networks on CPUs. Deep Learning and Unsupervised Feature Learning Workshop, Neural Information Processing Systems.
Gupta, S., Agrawal, A., Gopalakrishnan, K., & Narayanan, P. (2015). Deep Learning with limited numerical precision. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (pp. 1737–1746).
Han, S., Pool, J., Tran, J., and Dally, W. J, (2015). Learning both weights and connections for efficient neural networks. Advances in Neural Information Processing Systems.
Zhu, M., & Gupta, S. (2018). To prune, or not to prune: exploring the efficacy of pruning for model compression. International Conference on Learning Representations (ICLR) Workshop.
Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., & Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. Advances in Neural Information Processing Systems.
Jaderberg, M., Vedaldi, A., & Zisserman, A. (2014). Speeding up convolutional neural networks with low-rank expansions. In Proceedings of the British Machine Vision Conference. BMVA Press.
Lebedev, V., Ganin, Y., Rakhuba, M., Oseledets, I. V., & Lempitsky, V. S. (2015). Speeding-up convolutional neural networks using fine-tuned CP-decomposition. International Conference on Learning Representations (ICLR Poster).
Tai, C., Xiao, T., Wang, X., & Weinan, E. (2015). Convolutional neural networks with low-rank regularization. International Conference on Learning Representations (ICLR Poster).
Denil, M., Shakibi, B., Dinh, L., Ranzato, M., & Freitas, N. D. (2013). Predicting parameters in Deep Learning. Advances in Neural Information Processing Systems.
Sainath, T. N., Kingsbury, B., Sindhwani, V., Arisoy, E., & Ramanhadran, B. (2013). Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing.
Gou, J., Yu, B., Maybank, S. J., & Tao, D. (2020). Knowledge distillation: A survey. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR).
Kang, Y., Hauswald, J., Gao, C., et al. (2017). Neurosurgeon: collaborative intelligence between the Cloud and mobile Edge. In Proceeding of 22nd International Conference Architecture Support Programming Language Operator System (ASPLOS) (pp. 615–629).
Teerapittayanon, S., McDaniel, B., & Kung, H. (2016). Branchynet: Fast inference via early exiting from deep neural networks. International Conference on Pattern Recognition.
TFLite: ML for mobile and Edge devices. (https://www.tensorflow.org/lite)
NVIDIA TensorRT: Programmable inference accelerator. (https://developer.nvidia.com/tensorrt)
NVIDIA Jetson: The AI platform for autonomous everything (https://www.nvidia.com/en-in/autonomous-machines/embedded-systems/)
Google coral (https://coral.ai/)
Intel Open Vino (https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html)
Intel Movidius Vision Process Units (https://www.intel.com/content/www/us/en/products/processors/movidius-vpu/movidius-myriad-x.html)
Qualcomm’s Snapdragon (https://www.qualcomm.com/news/releases/2019/04/10/qualcomm-expands-ecosystem-enable-next-gen-Edge-ai-and-machine-learning)
Konecný, J., McMahan, H. B., Ramage, D., & Richtárik, P. (2016). Federated Optimization: Distributed Machine Learning for On-Device Intelligence. CoRR abs/1610.02527. arXiv:1610.02527. (http://arxiv.org/abs/1610.02527)
Retrain a classification model on-device with backpropagation. (https://coral.ai/docs/Edgetpu/retrain-classification-ondevice-backprop/)
Qi, H., Brown, M., & Lowe, D.G. (2018). Low-shot learning with imprinted weights. Conference on Computer Vision and Pattern recognition (CVPR).
Denker, J., Schwartz, D., Wittner, B., Solla, S. A., Howard, R., Jackel, L., & Hopfield, J. (1987). Large automatic learning, rule extraction and generalization. Complex Systems, 1, 877–922.
Baum, E. B., & Haussler, D. (1989). What size net gives valid generalization? Neural Computation, 1, 151–160.
Solla, S. A., Schwartz, D. B., Tishby, N., & Levin, E. (1990). Supervised learning: A theoretical framework. Neural Information Processing Systems.
Le Cun, Y. (1989). Generalization and network design strategies. In R. Pfeifer, Z. Schreter, F. Fogelman, & L. Steels (Eds.), Connectionism in Perspective. Elsevier.
Blalock, D., Gonzalez Ortiz, J. J., Frankle, J., & Guttag, J. (2020). What is the state of neural network pruning? In Machine Learning and Systems (MLSys).
Konecný, J., McMahan, H. B., Yu, F. X., Richtárik, P., Suresh, A. T., & Bacon, D. (2016). Federated Learning: Strategies for Improving Communication Efficiency. NIPS 2016 Workshop on Private Multi-Party Machine Learning.
McMahan, H. B, Moore, E., Ramage, D., and Agüera-Arcas, B., (2016). Federated Learning of Deep Networks using Model Averaging. CoRR abs/1602.05629. arXiv:1602.05629. (http://arxiv.org/abs/1602.05629).
Yang, Q., Liu, Y., Chen, T., & Tong, Y. (2019). Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology. Article no. 12.
Frankle, J., & Carbin, M. (2019). The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations (ICLR).
Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., & Kalenichenko, D. (2018). Quantization and training of neural networks for efficient integer-arithmetic-only inference. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2704–2713.
Guo, Y. (2018). A survey on methods and theories on quantized neural networks. CoRR, abs/1808.04752, arXiv:1808.04752.
Courbariaux, M., Bengio, Y., & David, J.-P. (2015). Binaryconnect: Training deep neural networks with binary weights during propagations. Advances in Neural Information Processing Systems, 3123–3131.
Gong, Y., Liu, L., Yang, M., & Bourdev, L. (2014). Compressed deep convolutional networks using vector quantization. CoRR, abs/1412.6115, arXiv:1808.04752.
Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016). Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision (ECCV) (pp. 525–542). Springer.
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., & Liu, C. (2018). A survey on deep transfer learning. In International Conference on Artificial Neural Networks (ICANN) (pp. 270–279). Springer.
Brutzkus, A., & Globerson, A. (2019). Why do larger models generalize better? A theoretical perspective via the XOR problem. International Conference on Machine Learning (ICML).
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. Neural Information Processing Systems Deep Learning and Representation Learning Workshop.
Bucilua, C., Caruana, R., & Niculescu-Mizil, A. (2006). Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (pp. 535–541). NY, USA.
Gou, J., Yu, B., Maybank, S. J., & Tao, D. (2020). Knowledge Distillation: a survey. CoRR, abs/2006.05525 arXiv:2006.05525.
McMahan, H. B., Moore, E., Ramage, D., Hampson, S., & Aguera-Arcas, B. (2017). Communication-efficient learning of deep networks from decentralized data. AISTATS.
Hard, A., Rao, K., Mathews, R., Ramaswamy, S., Beaufays, F., Augenstein, S., Eichner, H., Kiddon, C., & Ramage, D. (2018). Federated learning for mobile keyboard prediction. CoRR, abs/ 1811.03604 (2018) arXiv: 1811.03604.
McMahan, H. B., Ramage, D., Talwar, K., & Zhang, L. (2018). Learning differentially private recurrent language models. International Conference on Learning Representations (ICLR).
Qi, H., Brown, M., & Lowe, D. G. (2018). Low-shot learning with imprinted weights. Conference on Computer Vision and Pattern Recognition (CVPR).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Sharma, D., Sarkar, S. (2022). Enabling Inference and Training of Deep Learning Models for AI Applications on IoT Edge Devices. In: Pal, S., De, D., Buyya, R. (eds) Artificial Intelligence-based Internet of Things Systems. Internet of Things. Springer, Cham. https://doi.org/10.1007/978-3-030-87059-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-87059-1_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87058-4
Online ISBN: 978-3-030-87059-1
eBook Packages: Computer ScienceComputer Science (R0)