Pre-Training and Fine-Tuning

Wang, Jindong; Chen, Yiqiang

doi:10.1007/978-981-19-7584-4_8

Jindong Wang⁵ &
Yiqiang Chen⁶

Part of the book series: Machine Learning: Foundations, Methodologies, and Applications ((MLFMA))

1906 Accesses

Abstract

In this chapter, we focus on modern parameter-based methods: the pre-training and fine-tuning approach. We will also step into deep transfer learning starting from this chapter. In next chapters, the deep transfer learning methods focus on how to design better network architectures and loss functions based on the pre-trained network. Thus, this chapter can be seen as the foundations of the next chapters. Pre-training and fine-tuning belongs to the category of parameter/model-based transfer learning methods that perform knowledge transfer by sharing some important information of the model structures. The basic assumption is that there exists some common information between source and target structures that can be shared.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The visualizations are made using the tool from https://poloclub.github.io/cnn-explainer/ and the complete image can be found at this link: https://github.com/jindongwang/tlbook-code/tree/main/chap08_pretrain_finetune.

References

Biesialska, M., Biesialska, K., and Costa-jussà, M. R. (2020). Continual lifelong learning in natural language processing: A survey. arXiv preprint arXiv:2012.09823.
Google Scholar
Chen, C., Li, O., Tao, D., Barnett, A., Rudin, C., and Su, J. K. (2019a). This looks like that: deep learning for interpretable image recognition. In Advances in neural information processing systems, pages 8930–8941.
Google Scholar
Chen, X., Wang, S., Fu, B., Long, M., and Wang, J. (2019b). Catastrophic forgetting meets negative transfer: Batch spectral shrinkage for safe transfer learning. Advances in Neural Information Processing Systems, 32:1908–1918.
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. IEEE.
Google Scholar
Deng, W., Zheng, Q., and Wang, Z. (2014). Cross-person activity recognition using reduced kernel extreme learning machine. Neural Networks, 53:1–7.
Article Google Scholar
Donahue, J., Jia, Y., et al. (2014). Decaf: A deep convolutional activation feature for generic visual recognition. In ICML, pages 647–655.
Google Scholar
Grachten, M. and Chacón, C. E. C. (2017). Strategies for conceptual change in convolutional neural networks. arXiv preprint arXiv:1711.01634.
Google Scholar
He, K., Girshick, R., and Dollár, P. (2019). Rethinking ImageNet pre-training. In Proceedings of the IEEE International Conference on Computer Vision, pages 4918–4927.
Google Scholar
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
Google Scholar
Hendrycks, D., Lee, K., and Mazeika, M. (2019). Using pre-training can improve model robustness and uncertainty. In ICML.
Google Scholar
Jang, Y., Lee, H., Hwang, S. J., and Shin, J. (2019). Learning what and where to transfer. arXiv preprint arXiv:1905.05901.
Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia, pages 675–678.
Google Scholar
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526.
Article MathSciNet MATH Google Scholar
Kornblith, S., Shlens, J., and Le, Q. V. (2019). Do better ImageNet models transfer better? In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2661–2671.
Google Scholar
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105.
Google Scholar
Li, D. and Zhang, H. (2021). Improved regularization and robustness for fine-tuning in neural networks. Advances in Neural Information Processing Systems, 34.
Google Scholar
Li, H., Shi, Y., Liu, Y., Hauptmann, A. G., and Xiong, Z. (2012). Cross-domain video concept detection: A joint discriminative and generative active learning approach. Expert Systems with Applications, 39(15):12220–12228.
Article Google Scholar
Li, X., Grandvalet, Y., and Davoine, F. (2018). Explicit inductive bias for transfer learning with convolutional networks. In International Conference on Machine Learning, pages 2825–2834. PMLR.
Google Scholar
Li, X., Xiong, H., An, H., Xu, C.-Z., and Dou, D. (2020). Rifle: Backpropagation in depth for deep transfer learning through re-initializing the fully-connected layer. In International Conference on Machine Learning, pages 6010–6019. PMLR.
Google Scholar
Li, X., Xiong, H., Wang, H., Rao, Y., Liu, L., and Huan, J. (2019). Delta: Deep learning transfer using feature map with attention for convolutional networks. arXiv preprint arXiv:1901.09229.
Google Scholar
Nater, F., Tommasi, T., Grabner, H., Van Gool, L., and Caputo, B. (2011). Transferring activities: Updating human behavior analysis. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages 1737–1744, Barcelona, Spain. IEEE.
Google Scholar
Neyshabur, B., Sedghi, H., and Zhang, C. (2020). What is being transferred in transfer learning? arXiv preprint arXiv:2008.11687.
Google Scholar
Pan, S. J., Kwok, J. T., and Yang, Q. (2008). Transfer learning via dimensionality reduction. In Proceedings of the 23rd AAAI conference on Artificial intelligence, volume 8, pages 677–682.
Google Scholar
Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., and Wang, B. (2019). Moment matching for multi-source domain adaptation. In ICCV, pages 1406–1415.
Google Scholar
Razavian, A. S., Azizpour, H., Sullivan, J., and Carlsson, S. (2014). Cnn features off-the-shelf: an astounding baseline for recognition. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on, pages 512–519. IEEE.
Google Scholar
Wan, R., Xiong, H., Li, X., Zhu, Z., and Huan, J. (2019). Towards making deep transfer learning never hurt. In 2019 IEEE International Conference on Data Mining (ICDM), pages 578–587. IEEE.
Google Scholar
Wang, J., Chen, Y., Yu, H., Huang, M., and Yang, Q. (2019). Easy transfer learning by exploiting intra-domain structures. In 2019 IEEE International Conference on Multimedia and Expo (ICME), pages 1210–1215. IEEE.
Google Scholar
Yosinski, J., Clune, J., Bengio, Y., and Lipson, H. (2014). How transferable are features in deep neural networks? In Advances in neural information processing systems, pages 3320–3328.
Google Scholar
You, K., Kou, Z., Long, M., and Wang, J. (2020). Co-tuning for transfer learning. Advances in Neural Information Processing Systems, 33.
Google Scholar
Zhang, Y., Zhang, Y., and Yang, Q. (2019). Parameter transfer unit for deep neural networks. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD).
Google Scholar
Zhao, Z., Chen, Y., Liu, J., Shen, Z., and Liu, M. (2011). Cross-people mobile-phone based activity recognition. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence (IJCAI), volume 11, pages 2545–2550. CiteSeer.
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research Asia (China), Beijing, China
Jindong Wang
Institute of Computing Technology, Beijing, China
Yiqiang Chen

Authors

Jindong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yiqiang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, J., Chen, Y. (2023). Pre-Training and Fine-Tuning. In: Introduction to Transfer Learning. Machine Learning: Foundations, Methodologies, and Applications. Springer, Singapore. https://doi.org/10.1007/978-981-19-7584-4_8

Download citation

DOI: https://doi.org/10.1007/978-981-19-7584-4_8
Published: 12 November 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7583-7
Online ISBN: 978-981-19-7584-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics