Abstract
There has been an explosive demand for bringing machine learning (ML) powered intelligence into numerous Internet-of-Things (IoT) devices. However, the effectiveness of such intelligent functionality requires in-situ continuous model adaptation for adapting to new data and environments, while the on-device computing and energy resources are usually extremely constrained. Neither traditional hand-crafted (e.g., SGD, Adagrad, and Adam) nor existing meta optimizers are specifically designed to meet those challenges, as the former requires tedious hyper-parameter tuning while the latter are often costly due to the meta algorithms’ own overhead. To this end, we propose hardware-aware learning to optimize (HALO), a practical meta optimizer dedicated to resource-efficient on-device adaptation. Our HALO optimizer features the following highlights: (1) faster adaptation speed (i.e., taking fewer data or iterations to reach a specified accuracy) by introducing a new regularizer to promote empirical generalization; and (2) lower per-iteration complexity, thanks to a stochastic structural sparsity regularizer being enforced. Furthermore, the optimizer itself is designed as a very light-weight RNN and thus incurs negligible overhead. Ablation studies and experiments on five datasets, six optimizees, and two state-of-the-art (SOTA) edge AI devices validate that, while always achieving a better accuracy (\(\uparrow \)0.46% - \(\uparrow \)20.28%), HALO can greatly trim down the energy cost (up to \(\downarrow \)60%) in adaptation, quantified using an IoT device or SOTA simulator. Codes and pre-trained models are at https://github.com/RICE-EIC/HALO.
The first two authors Chaojian Li and Tianlong Chen contributed equally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Achille, A., Rovere, M., Soatto, S.: Critical learning periods in deep networks. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=BkeStsCcKQ
Andrychowicz, M., et al.: Learning to learn by gradient descent by gradient descent. In: Advances in Neural Information Processing Systems, pp. 3981–3989 (2016)
Anguita, D., Ghio, A., Oneto, L., Parra, X., Reyes-Ortiz, J.L.: A public domain dataset for human activity recognition using smartphones. In: Esann (2013)
Ashqar, B.A., Abu-Naser, S.S.: Identifying images of invasive hydrangea using pre-trained deep convolutional neural networks. Int. J. Acad. Eng. Res. (IJAER) 3(3), 28–36 (2019)
Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning, pp. 17–36 (2012)
Bippus, R., Fischer, A., Stahl, V.: Domain adaptation for robust automatic speech recognition in car environments. In: Sixth European Conference on Speech Communication and Technology (1999)
Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 120–128 (2006)
Cao, Y., Chen, T., Wang, Z., Shen, Y.: Learning to optimize in swarms. In: Advances in Neural Information Processing Systems, vol. 32, pp. 15018–15028. Curran Associates, Inc. (2019). http://papers.nips.cc/paper/9641-learning-to-optimize-in-swarms.pdf
Chen, H., Mahfuz, S., Zulkernine, F.: Smart phone based human activity recognition. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2525–2532. IEEE (2019)
Chen, Q., Liu, Y., Wang, Z., Wassell, I., Chetty, K.: Re-weighted adversarial adaptation network for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7976–7985 (2018)
Chen, Y., et al.: Learning to learn without gradient descent by gradient descent. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 748–756. JMLR org (2017)
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)
Elibol, M., Lei, L., Jordan, M.I.: Variance reduction with sparse gradients (2020)
Fang, B., Zeng, X., Zhang, M.: Nestdnn: resource-aware multi-tenant on-device deep learning for continuous mobile vision. In: Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, pp. 115–127 (2018)
Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of the 28th International Conference on Machine Learning (2011)
Greff, K., Srivastava, R.K., Schmidhuber, J.: Highway and residual networks learn unrolled iterative estimation (2016). arXiv preprint arXiv:1612.07771
Grothmann, T., Patt, A.: Adaptive capacity and human cognition: the process of individual adaptation to climate change. Glob. Environ. Change 15(3), 199–213 (2005)
Habibzadeh, M., Jannesari, M., Rezaei, Z., Baharvand, H., Totonchi, M.: Automatic white blood cell classification using pre-trained deep learning models: Resnet and inception. In: Tenth International Conference on Machine Vision (ICMV 2017), vol. 10696, p. 1069612. International Society for Optics and Photonics (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Hochreiter, S., Schmidhuber, J.: Flat minima. Neural Comput. 9(1), 1–42 (1997)
Hoffman, J., Roberts, D.A., Yaida, S.: Robust learning with Jacobian regularization (2019). arXiv preprint arXiv:1908.02729
Hoffman, J., Rodner, E., Donahue, J., Darrell, T., Saenko, K.: Efficient learning of domain-invariant image representations (2013). arXiv preprint arXiv:1301.3224
Hou, L., Zhu, J., Kwok, J., Gao, F., Qin, T., Liu, T.Y.: Normalization helps training of quantized LSTM. In: Advances in Neural Information Processing Systems, pp. 7344–7354 (2019)
Huang, G., Sun, Y., Liu, Z., Sedra, D., Weinberger, K.Q.: Deep networks with stochastic depth. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 646–661. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_39
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018. https://doi.org/10.1109/cvpr.2018.00286, http://dx.doi.org/10.1109/CVPR.2018.00286
Keshamoni, K., Hemanth, S.: Smart gas level monitoring, booking and gas leakage detector over IoT. In: 2017 IEEE 7th International Advance Computing Conference (IACC), pp. 330–332. IEEE (2017)
Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: Generalization gap and sharp minima (2016). arXiv preprint arXiv:1609.04836
Kikui, K., Itoh, Y., Yamada, M., Sugiura, Y., Sugimoto, M.: Intra-/inter-user adaptation framework for wearable gesture sensing device. In: Proceedings of the 2018 ACM International Symposium on Wearable Computers, pp. 21–24 (2018)
Kikui, K., Itoh, Y., Yamada, M., Sugiura, Y., Sugimoto, M.: Intra-/inter-user adaptation framework for wearable gesture sensing device. In: Proceedings of the 2018 ACM International Symposium on Wearable Computers, New York, NY, USA. ISWC’2018, pp. 21–24, Association for Computing Machinery (2018). https://doi.org/10.1145/3267242.3267256
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014)
Krizhevsky, A., et al.: Learning multiple layers of features from tiny images (2009)
Lane, N.D., Bhattacharya, S., Mathur, A., Georgiev, P., Forlivesi, C., Kawsar, F.: Squeezing deep learning into mobile and embedded devices. IEEE Pervasive Compu. 16(3), 82–88 (2017)
LeCun, Y.: The mnist database of handwritten digits (1999). http://www.yann.lecun.com/exdb/mnist/
Lee, J., et al.: Mems-based no 2 gas sensor using zno nano-rods for low-power IoT application. J. Korean Phys. Soc. 70(10), 924–928 (2017)
Li, Y., Wei, C., Ma, T.: Towards explaining the regularization effect of initial large learning rate in training neural networks (2019). arXiv preprint arXiv:1907.04595
Lin, Y., Sakr, C., Kim, Y., Shanbhag, N.: Predictivenet: an energy-efficient convolutional neural network via zero prediction. In: 2017 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–4 (2017)
Liu, C.H., Fan, J., Branch, J.W., Leung, K.K.: Toward qoi and energy-efficiency in internet-of-things sensory environments. IEEE Trans. Emerg. Top. Comput. 2(4), 473–487 (2014). https://doi.org/10.1109/TETC.2014.2364915
Liu, S., Lin, Y., Zhou, Z., Nan, K., Liu, H., Du, J.: On-demand deep model compression for mobile devices: a usage-driven model selection framework. In: Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services, pp. 389–400. ACM (2018)
Liu, Z., Sun, M., Zhou, T., Huang, G., Darrell, T.: Rethinking the value of network pruning. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=rJlnB3C5Ym
Long, M., Zhu, H., Wang, J., Jordan, M.I.: Deep transfer learning with joint adaptation networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2208–2217. ICML’2017, JMLR org (2017)
Lv, K., Jiang, S., Li, J.: Learning gradient descent: Better generalization and longer horizons. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2247–2255. JMLR org (2017)
Miotto, R., Wang, F., Wang, S., Jiang, X., Dudley, J.T.: Deep learning for healthcare: review, opportunities and challenges. Briefings Bioinf. 19(6), 1236–1246 (2018)
Moreno, M., Úbeda, B., Skarmeta, A.F., Zamora, M.A.: How can we tackle energy efficiency in iot basedsmart buildings? Sensors 14(6), 9582–9614 (2014)
NVIDIA Inc.: NVIDIA Jetson TX2. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-tx2/, Accessed 01 Sep 2019
Park, K., Yi, Y.: Bpnet: branch-pruned conditional neural network for systematic time-accuracy tradeoff in dnn inference: work-in-progress. In: Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis Companion, pp. 1–2 (2019)
PARTICLE Inc.: PARTICLE LP103450. https://store.particle.io/products/li-po-battery, Accessed 29 Feb 2020
Patrick Mochel and Mike Murphy.: sysfs-The filesystem for exporting kernel objects. https://www.kernel.org/doc/Documentation/filesystems/sysfs.txt, Accessed 21 Nov 2019
Peters, M., Ruder, S., Smith, N.A.: To tune or not to tune? adapting pretrained representations to diverse tasks (2019). arXiv preprint arXiv:1903.05987
Petrolo, R., Lin, Y., Knightly, E.: Astro: autonomous, sensing, and tetherless networked drones. In: Proceedings of the 4th ACM Workshop on Micro Aerial Vehicle Networks, Systems, and Applications, New York, NY, USA. DroNet’2018, pp. 1–6. Association for Computing Machinery (2018). https://doi.org/10.1145/3213526.3213527
Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: explicit invariance during feature extraction (2011)
Sainath, T.N., Vinyals, O., Senior, A., Sak, H.: Convolutional, long short-term memory, fully connected deep neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4580–4584, April 2015. https://doi.org/10.1109/ICASSP.2015.7178838
SAMSUNG Inc.: SAMSUNG Galaxy S20. https://www.samsung.com/global/galaxy/galaxy-s20/specs/, Accessed 29 Feb 2020
Sannino, G., Pietro, G.D.: A deep learning approach for ECG-based heartbeat classification for arrhythmia detection. Future Gener. Comput. Syst. 86, 446–455 (2018). https://doi.org/10.1016/j.future.2018.03.057, http://www.sciencedirect.com/science/article/pii/S0167739X17324548
Sharma, H., et al.: Bit fusion: bit-level dynamically composable architecture for accelerating deep neural network. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), June 2018. https://doi.org/10.1109/isca.2018.00069, http://dx.doi.org/10.1109/ISCA.2018.00069
Singh, K.J., Kapoor, D.S.: Create your own internet of things: A survey of iot platforms. IEEE Consum. Electron. Mag. 6(2), 57–68 (2017). https://doi.org/10.1109/MCE.2016.2640718
Sokolić, J., Giryes, R., Sapiro, G., Rodrigues, M.R.: Robust large margin deep neural networks. IEEE Trans. Signal Process. 65(16), 4265–4280 (2017)
Subasi, A., Radhwan, M., Kurdi, R., Khateeb, K.: IoT based mobile healthcare system for human activity recognition. In: 2018 15th Learning and Technology Conference (L&T), pp. 29–34. IEEE (2018)
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11141, pp. 270–279. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01424-7_27
Texas Instruments Inc.: INA3221 Triple-Channel, High-Side Measurement, Shunt and Bus Voltage Monitor. http://www.ti.com/product/INA3221, Accessed 21 Nov 2019
Upton, E., Halfacree, G.: Raspberry Pi User Guide. John Wiley & Sons, Hoboken (2014)
Vergara, A., Vembu, S., Ayhan, T., Ryan, M.A., Homer, M.L., Huerta, R.: Chemical gas sensor drift compensation using classifier ensembles. Sensors Actuators B: Chem. 166, 320–329 (2012)
Wang, X., Yu, F., Dou, Z.Y., Darrell, T., Gonzalez, J.E.: Skipnet: Learning dynamic routing in convolutional networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 409–424 (2018)
Wang, Y., et al.: E2-train: training state-of-the-art CNNs with over 80% energy savings. In: Advances in Neural Information Processing Systems, pp. 5139–5151 (2019)
Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. In: Advances in Neural Information Processing Systems, pp. 2074–2082 (2016)
Wichrowska, O., et al.: Learned optimizers that scale and generalize. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3751–3760. JMLR org (2017)
Wu, Z., et al.: Blockdrop: Dynamic inference paths in residual networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8817–8826 (2018)
You, H., et al.: Drawing early-bird tickets: towards more efficient training of deep networks (2019)
Zhang, C., Bengio, S., Singer, Y.: Are all layers created equal? CoRR abs/1902.01996 (2019)
Acknowledgements
The work is supported by the National Science Foundation (NSF) through the Real-Time Machine Learning program (Award number: 1937592, 1937588).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, C., Chen, T., You, H., Wang, Z., Lin, Y. (2020). HALO: Hardware-Aware Learning to Optimize. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12354. Springer, Cham. https://doi.org/10.1007/978-3-030-58545-7_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-58545-7_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58544-0
Online ISBN: 978-3-030-58545-7
eBook Packages: Computer ScienceComputer Science (R0)