Abstract
Transfer learning involves taking an artificial neural network (ANN) trained on one dataset (the source) and adapting it to a new, second dataset (the target). While transfer learning has been shown to be quite powerful and is commonly used in most modern-day statistical learning setups, its use has generally been restricted by architecture, i.e., in order to facilitate the reuse of internal learned synaptic weights, the underlying topology of the ANN to be transferred across tasks must remain the same and a new output layer must be attached (entailing removing the old output layer’s weights). This work removes this restriction by proposing a neuro-evolutionary approach that facilitates what we call adaptive structure transfer learning, which means that an ANN can be transferred across tasks that have different input and output dimensions while having the internal latent structure continuously optimized. We test the proposed optimizer on two challenging real-world time series prediction problems – our process adapts recurrent neural networks (RNNs) to (1) predict coal-fired power plant data before and after the addition of new sensors, and to (2) predict engine parameters where RNN estimators are trained on different airframes with different engines. Experiments show that not only does the proposed neuro-evolutionary transfer learning process result in RNNs that evolve and train faster on the target set than those trained from scratch but, in many cases, the RNNs generalize better even after a long training and evolution process. To our knowledge, this work represents the first use of neuro-evolution for transfer learning, especially for RNNs, and is the first methodological framework capable of adapting entire structures for arbitrary input/output spaces.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Moreno, G.A., Cámara, J., Garlan, D., Schmerl, B.: Proactive self-adaptation under uncertainty: a probabilistic model checking approach. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pp. 1–12. ACM, New York (2015). http://doi.acm.org/10.1145/2786805.2786853
Moreno, G.A.: Adaptation timing in self-adaptive systems. Ph.D. thesis, Carnegie Mellon University (2017)
Palmerino, J., Yu, Q., Desell, T., Krutz, D.: Accounting for tactic volatility in self-adaptive systems for improved decision-making. In: Proceedings of the 34th ACM/IEEE International Conference on Automated Software Engineering. ASE 2019. ACM, New York (2019)
Gupta, P., Malhotra, P., Vig, L., Shroff, G.: Transfer learning for clinical time series analysis using recurrent neural networks. arXiv preprint arXiv:1807.01705 (2018)
Zhang, A., et al.: Transfer learning with deep recurrent neural networks for remaining useful life estimation. Appl. Sci. 8(12), 2416 (2018)
Yoon, S., Yun, H., Kim, Y., Park, G.T., Jung, K.: Efficient transfer learning schemes for personalized language modeling using recurrent neural network. In: Workshops at the Thirty-First AAAI Conference on Artificial Intelligence (2017)
Zarrella, G., Marsh, A.: MITRE at SemEval-2016 task 6: transfer learning for stance detection. arXiv preprint arXiv:1606.03784 (2016)
Mrkšić, N., et al.: Multi-domain dialog state tracking using recurrent neural networks. arXiv preprint arXiv:1506.07190 (2015)
Mun, S., Shon, S., Kim, W., Han, D.K., Ko, H.: Deep neural network based learning and transferring mid-level audio features for acoustic scene classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 796–800. IEEE (2017)
Taylor, M.E., Whiteson, S., Stone, P.: Transfer via inter-task mappings in policy search reinforcement learning. In: Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, p. 37. ACM (2007)
Yang, Z., Salakhutdinov, R., Cohen, W.W.: Transfer learning for sequence tagging with hierarchical recurrent networks. arXiv preprint arXiv:1703.06345 (2017)
Verbancsics, P., Stanley, K.O.: Evolving static representations for task transfer. J. Mach. Learn. Res. 11(May), 1737–1769 (2010)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Tang, Z., Wang, D., Zhang, Z.: Recurrent neural network training with dark knowledge transfer. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5900–5904. IEEE (2016)
Deo, R.V., Chandra, R., Sharma, A.: Stacked transfer learning for tropical cyclone intensity prediction. arXiv preprint arXiv:1708.06539 (2017)
Ororbia, A., ElSaid, A., Desell, T.: Investigating recurrent neural network memory structures using neuro-evolution. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2019, pp. 446–455. ACM, New York (2019). http://doi.acm.org/10.1145/3321707.3321795
Ororbia II, A.G., Mikolov, T., Reitter, D.: Learning simpler language models with the differential state framework. Neural Comput. 29(12), 1–26 (2017). https://doi.org/10.1162/neco_a_01017. PMID: 28957029
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Zhou, G.-B., Wu, J., Zhang, C.-L., Zhou, Z.-H.: Minimal gated unit for recurrent neural networks. Int. J. Autom. Comput. 13(3), 226–234 (2016). https://doi.org/10.1007/s11633-016-1006-2
Collins, J., Sohl-Dickstein, J., Sussillo, D.: Capacity and trainability in recurrent neural networks. arXiv preprint arXiv:1611.09913 (2016)
Message Passing Interface Forum: MPI: A message-passing interface standard. The International Journal of Supercomputer Applications and High Performance Computing 8(3/4), 159–416 (Fall/Winter 1994)
Werbos, P.J.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990)
ElSaid, A., El Jamiy, F., Higgins, J., Wild, B., Desell, T.: Optimizing long short-term memory recurrent neural networks using ant colony optimization to predict turbine engine vibration. Appl. Soft Comput. 73, 969–991 (2018)
Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: International Conference on Machine Learning, pp. 2342–2350 (2015)
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013)
Acknowledgements
This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Combustion Systems under Award Number #FE0031547.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
ElSaid, A., Karnas, J., Lyu, Z., Krutz, D., Ororbia, A.G., Desell, T. (2020). Neuro-Evolutionary Transfer Learning Through Structural Adaptation. In: Castillo, P.A., Jiménez Laredo, J.L., Fernández de Vega, F. (eds) Applications of Evolutionary Computation. EvoApplications 2020. Lecture Notes in Computer Science(), vol 12104. Springer, Cham. https://doi.org/10.1007/978-3-030-43722-0_39
Download citation
DOI: https://doi.org/10.1007/978-3-030-43722-0_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43721-3
Online ISBN: 978-3-030-43722-0
eBook Packages: Computer ScienceComputer Science (R0)