Abstract
In this chapter, we focus on modern parameter-based methods: the pre-training and fine-tuning approach. We will also step into deep transfer learning starting from this chapter. In next chapters, the deep transfer learning methods focus on how to design better network architectures and loss functions based on the pre-trained network. Thus, this chapter can be seen as the foundations of the next chapters. Pre-training and fine-tuning belongs to the category of parameter/model-based transfer learning methods that perform knowledge transfer by sharing some important information of the model structures. The basic assumption is that there exists some common information between source and target structures that can be shared.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The visualizations are made using the tool from https://poloclub.github.io/cnn-explainer/ and the complete image can be found at this link: https://github.com/jindongwang/tlbook-code/tree/main/chap08_pretrain_finetune.
References
Biesialska, M., Biesialska, K., and Costa-jussà , M. R. (2020). Continual lifelong learning in natural language processing: A survey. arXiv preprint arXiv:2012.09823.
Chen, C., Li, O., Tao, D., Barnett, A., Rudin, C., and Su, J. K. (2019a). This looks like that: deep learning for interpretable image recognition. In Advances in neural information processing systems, pages 8930–8941.
Chen, X., Wang, S., Fu, B., Long, M., and Wang, J. (2019b). Catastrophic forgetting meets negative transfer: Batch spectral shrinkage for safe transfer learning. Advances in Neural Information Processing Systems, 32:1908–1918.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. IEEE.
Deng, W., Zheng, Q., and Wang, Z. (2014). Cross-person activity recognition using reduced kernel extreme learning machine. Neural Networks, 53:1–7.
Donahue, J., Jia, Y., et al. (2014). Decaf: A deep convolutional activation feature for generic visual recognition. In ICML, pages 647–655.
Grachten, M. and Chacón, C. E. C. (2017). Strategies for conceptual change in convolutional neural networks. arXiv preprint arXiv:1711.01634.
He, K., Girshick, R., and Dollár, P. (2019). Rethinking ImageNet pre-training. In Proceedings of the IEEE International Conference on Computer Vision, pages 4918–4927.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
Hendrycks, D., Lee, K., and Mazeika, M. (2019). Using pre-training can improve model robustness and uncertainty. In ICML.
Jang, Y., Lee, H., Hwang, S. J., and Shin, J. (2019). Learning what and where to transfer. arXiv preprint arXiv:1905.05901.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia, pages 675–678.
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526.
Kornblith, S., Shlens, J., and Le, Q. V. (2019). Do better ImageNet models transfer better? In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2661–2671.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105.
Li, D. and Zhang, H. (2021). Improved regularization and robustness for fine-tuning in neural networks. Advances in Neural Information Processing Systems, 34.
Li, H., Shi, Y., Liu, Y., Hauptmann, A. G., and Xiong, Z. (2012). Cross-domain video concept detection: A joint discriminative and generative active learning approach. Expert Systems with Applications, 39(15):12220–12228.
Li, X., Grandvalet, Y., and Davoine, F. (2018). Explicit inductive bias for transfer learning with convolutional networks. In International Conference on Machine Learning, pages 2825–2834. PMLR.
Li, X., Xiong, H., An, H., Xu, C.-Z., and Dou, D. (2020). Rifle: Backpropagation in depth for deep transfer learning through re-initializing the fully-connected layer. In International Conference on Machine Learning, pages 6010–6019. PMLR.
Li, X., Xiong, H., Wang, H., Rao, Y., Liu, L., and Huan, J. (2019). Delta: Deep learning transfer using feature map with attention for convolutional networks. arXiv preprint arXiv:1901.09229.
Nater, F., Tommasi, T., Grabner, H., Van Gool, L., and Caputo, B. (2011). Transferring activities: Updating human behavior analysis. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages 1737–1744, Barcelona, Spain. IEEE.
Neyshabur, B., Sedghi, H., and Zhang, C. (2020). What is being transferred in transfer learning? arXiv preprint arXiv:2008.11687.
Pan, S. J., Kwok, J. T., and Yang, Q. (2008). Transfer learning via dimensionality reduction. In Proceedings of the 23rd AAAI conference on Artificial intelligence, volume 8, pages 677–682.
Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., and Wang, B. (2019). Moment matching for multi-source domain adaptation. In ICCV, pages 1406–1415.
Razavian, A. S., Azizpour, H., Sullivan, J., and Carlsson, S. (2014). Cnn features off-the-shelf: an astounding baseline for recognition. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on, pages 512–519. IEEE.
Wan, R., Xiong, H., Li, X., Zhu, Z., and Huan, J. (2019). Towards making deep transfer learning never hurt. In 2019 IEEE International Conference on Data Mining (ICDM), pages 578–587. IEEE.
Wang, J., Chen, Y., Yu, H., Huang, M., and Yang, Q. (2019). Easy transfer learning by exploiting intra-domain structures. In 2019 IEEE International Conference on Multimedia and Expo (ICME), pages 1210–1215. IEEE.
Yosinski, J., Clune, J., Bengio, Y., and Lipson, H. (2014). How transferable are features in deep neural networks? In Advances in neural information processing systems, pages 3320–3328.
You, K., Kou, Z., Long, M., and Wang, J. (2020). Co-tuning for transfer learning. Advances in Neural Information Processing Systems, 33.
Zhang, Y., Zhang, Y., and Yang, Q. (2019). Parameter transfer unit for deep neural networks. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD).
Zhao, Z., Chen, Y., Liu, J., Shen, Z., and Liu, M. (2011). Cross-people mobile-phone based activity recognition. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence (IJCAI), volume 11, pages 2545–2550. CiteSeer.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Wang, J., Chen, Y. (2023). Pre-Training and Fine-Tuning. In: Introduction to Transfer Learning. Machine Learning: Foundations, Methodologies, and Applications. Springer, Singapore. https://doi.org/10.1007/978-981-19-7584-4_8
Download citation
DOI: https://doi.org/10.1007/978-981-19-7584-4_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7583-7
Online ISBN: 978-981-19-7584-4
eBook Packages: Computer ScienceComputer Science (R0)