Skip to main content

Pre-Training and Fine-Tuning

  • Chapter
  • First Online:
Introduction to Transfer Learning
  • 1906 Accesses

Abstract

In this chapter, we focus on modern parameter-based methods: the pre-training and fine-tuning approach. We will also step into deep transfer learning starting from this chapter. In next chapters, the deep transfer learning methods focus on how to design better network architectures and loss functions based on the pre-trained network. Thus, this chapter can be seen as the foundations of the next chapters. Pre-training and fine-tuning belongs to the category of parameter/model-based transfer learning methods that perform knowledge transfer by sharing some important information of the model structures. The basic assumption is that there exists some common information between source and target structures that can be shared.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Free shipping worldwide - see info
Hardcover Book
USD 79.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The visualizations are made using the tool from https://poloclub.github.io/cnn-explainer/ and the complete image can be found at this link: https://github.com/jindongwang/tlbook-code/tree/main/chap08_pretrain_finetune.

References

  • Biesialska, M., Biesialska, K., and Costa-jussà, M. R. (2020). Continual lifelong learning in natural language processing: A survey. arXiv preprint arXiv:2012.09823.

    Google Scholar 

  • Chen, C., Li, O., Tao, D., Barnett, A., Rudin, C., and Su, J. K. (2019a). This looks like that: deep learning for interpretable image recognition. In Advances in neural information processing systems, pages 8930–8941.

    Google Scholar 

  • Chen, X., Wang, S., Fu, B., Long, M., and Wang, J. (2019b). Catastrophic forgetting meets negative transfer: Batch spectral shrinkage for safe transfer learning. Advances in Neural Information Processing Systems, 32:1908–1918.

    Google Scholar 

  • Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. IEEE.

    Google Scholar 

  • Deng, W., Zheng, Q., and Wang, Z. (2014). Cross-person activity recognition using reduced kernel extreme learning machine. Neural Networks, 53:1–7.

    Article  Google Scholar 

  • Donahue, J., Jia, Y., et al. (2014). Decaf: A deep convolutional activation feature for generic visual recognition. In ICML, pages 647–655.

    Google Scholar 

  • Grachten, M. and Chacón, C. E. C. (2017). Strategies for conceptual change in convolutional neural networks. arXiv preprint arXiv:1711.01634.

    Google Scholar 

  • He, K., Girshick, R., and Dollár, P. (2019). Rethinking ImageNet pre-training. In Proceedings of the IEEE International Conference on Computer Vision, pages 4918–4927.

    Google Scholar 

  • He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.

    Google Scholar 

  • Hendrycks, D., Lee, K., and Mazeika, M. (2019). Using pre-training can improve model robustness and uncertainty. In ICML.

    Google Scholar 

  • Jang, Y., Lee, H., Hwang, S. J., and Shin, J. (2019). Learning what and where to transfer. arXiv preprint arXiv:1905.05901.

    Google Scholar 

  • Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia, pages 675–678.

    Google Scholar 

  • Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526.

    Article  MathSciNet  MATH  Google Scholar 

  • Kornblith, S., Shlens, J., and Le, Q. V. (2019). Do better ImageNet models transfer better? In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2661–2671.

    Google Scholar 

  • Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105.

    Google Scholar 

  • Li, D. and Zhang, H. (2021). Improved regularization and robustness for fine-tuning in neural networks. Advances in Neural Information Processing Systems, 34.

    Google Scholar 

  • Li, H., Shi, Y., Liu, Y., Hauptmann, A. G., and Xiong, Z. (2012). Cross-domain video concept detection: A joint discriminative and generative active learning approach. Expert Systems with Applications, 39(15):12220–12228.

    Article  Google Scholar 

  • Li, X., Grandvalet, Y., and Davoine, F. (2018). Explicit inductive bias for transfer learning with convolutional networks. In International Conference on Machine Learning, pages 2825–2834. PMLR.

    Google Scholar 

  • Li, X., Xiong, H., An, H., Xu, C.-Z., and Dou, D. (2020). Rifle: Backpropagation in depth for deep transfer learning through re-initializing the fully-connected layer. In International Conference on Machine Learning, pages 6010–6019. PMLR.

    Google Scholar 

  • Li, X., Xiong, H., Wang, H., Rao, Y., Liu, L., and Huan, J. (2019). Delta: Deep learning transfer using feature map with attention for convolutional networks. arXiv preprint arXiv:1901.09229.

    Google Scholar 

  • Nater, F., Tommasi, T., Grabner, H., Van Gool, L., and Caputo, B. (2011). Transferring activities: Updating human behavior analysis. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages 1737–1744, Barcelona, Spain. IEEE.

    Google Scholar 

  • Neyshabur, B., Sedghi, H., and Zhang, C. (2020). What is being transferred in transfer learning? arXiv preprint arXiv:2008.11687.

    Google Scholar 

  • Pan, S. J., Kwok, J. T., and Yang, Q. (2008). Transfer learning via dimensionality reduction. In Proceedings of the 23rd AAAI conference on Artificial intelligence, volume 8, pages 677–682.

    Google Scholar 

  • Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., and Wang, B. (2019). Moment matching for multi-source domain adaptation. In ICCV, pages 1406–1415.

    Google Scholar 

  • Razavian, A. S., Azizpour, H., Sullivan, J., and Carlsson, S. (2014). Cnn features off-the-shelf: an astounding baseline for recognition. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on, pages 512–519. IEEE.

    Google Scholar 

  • Wan, R., Xiong, H., Li, X., Zhu, Z., and Huan, J. (2019). Towards making deep transfer learning never hurt. In 2019 IEEE International Conference on Data Mining (ICDM), pages 578–587. IEEE.

    Google Scholar 

  • Wang, J., Chen, Y., Yu, H., Huang, M., and Yang, Q. (2019). Easy transfer learning by exploiting intra-domain structures. In 2019 IEEE International Conference on Multimedia and Expo (ICME), pages 1210–1215. IEEE.

    Google Scholar 

  • Yosinski, J., Clune, J., Bengio, Y., and Lipson, H. (2014). How transferable are features in deep neural networks? In Advances in neural information processing systems, pages 3320–3328.

    Google Scholar 

  • You, K., Kou, Z., Long, M., and Wang, J. (2020). Co-tuning for transfer learning. Advances in Neural Information Processing Systems, 33.

    Google Scholar 

  • Zhang, Y., Zhang, Y., and Yang, Q. (2019). Parameter transfer unit for deep neural networks. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD).

    Google Scholar 

  • Zhao, Z., Chen, Y., Liu, J., Shen, Z., and Liu, M. (2011). Cross-people mobile-phone based activity recognition. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence (IJCAI), volume 11, pages 2545–2550. CiteSeer.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Wang, J., Chen, Y. (2023). Pre-Training and Fine-Tuning. In: Introduction to Transfer Learning. Machine Learning: Foundations, Methodologies, and Applications. Springer, Singapore. https://doi.org/10.1007/978-981-19-7584-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-7584-4_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-7583-7

  • Online ISBN: 978-981-19-7584-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics