Abstract
The remarkable performance boost of artificial intelligence (AI) algorithms is a result of re-emergence of deep neural networks that have been applied in a diverse set of applications. The success of deep learning stems from relaxing the need for the non-trivial task of feature-engineering. However, this remarkable success is conditioned on manually annotating a large amount of data points to generate suitable training datasets to supervise training of these networks. Since manual data annotation is time-consuming and expensive in many applications, learning in data-scarce regimes has been a major recent area of research focus in machine learning (ML) and AI. Transferring and reusing knowledge from a related learning problem is a core strategy for addressing challenges of learning in data-scarce regimens. Transfer learning is not a new field in ML and several great survey exist on this topicĀ [63, 95, 98, 105, 120]. However, these existing survey are meant to be general and extensively survey many works in the area. In this chapter, we survey a very specific subset of works in this area. Our goal is to explore a framework that unifies a broad range of knowledge transfer problems as learning cross-problems relations and similarities using an representation learning. By representation learning, we mean representing the data in the input space in a latent embedding space. The latent embedding space is meant as an intermediate space to explore relationships between several ML problems. We review the recently developed algorithms that use this strategy to address several primary transfer learning settings in five primary area of: (i) online and offline multitask learning, (ii) lifelong learning and continual learning, (iii) low-shot learning, including, few-shot learning and zero-shot learning, (iv) domain adaptation, and (v) collective/distributed learning. We discuss existing challenges and future potential research directions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ahmad WU, Zhang Z, Ma X, Chang K-W, Peng N (2019) Cross-lingual dependency parsing with unlabeled auxiliary languages. In: Proceedings of the 23rd conference on computational natural language learning (CoNLL)
Baevski A, Zhou Y, Mohamed A, Auli M (2020) wav2vec 2.0: a framework for self-supervised learning of speech representations. In: Advances in neural information processing systems, vol 33
Baktashmotlagh M, Harandi M, Lovell B, Salzmann M (2013) Unsupervised domain adaptation by domain invariant projection. In: International conference on computer vision, pp 769ā776
Baxter J (2000) A model of inductive bias learning. J Artif Intell Res 12:149ā198
Bickel S, Bogojeska J, Lengauer T, Scheffer T (2008) Multi-task learning for HIV therapy screening. In: Proceedings of the 25th international conference on Machine learning, pp 56ā63
CandĆØs EJ, Romberg J, Tao T (2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inf Theory 52(2):489ā509
Chang M-W, Ratinov L-A, Roth D, Srikumar V (2008) Importance of semantic representation: dataless classification. In Aaai 2:830ā835
Changpinyo S, Hu H, Sha F (2018) Multi-task learning for sequence tagging: an empirical study. In: Proceedings of the 27th international conference on computational linguistics, pp 2965ā2977
Chen M, Chang K-W, Roth D (2020) Recent advances in transferable representation learning. In: AAAI tutorials
Chen S, Crammer K, He H, Roth D, Su WJ (2021) Weighted training for cross-task learning. arXiv:2105.14095
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning PMLR, pp 1597ā1607
Xilun Chen Yu, Sun BA, Cardie C, Weinberger K (2018) Adversarial deep averaging networks for cross-lingual sentiment classification. Trans Assoc Comput Linguist 6:557ā570
Chen X, Chen M, Fan C, Uppunda A, Zaniolo C (2020) Cross-lingual knowledge graph completion via ensemble knowledge transfer. In: EMNLP
Chen Z, Liu B (2018) Lifelong machine learning. Synth Lect Artif Intell Mach Learn 12(3):1ā207
Courty N, Flamary R, Tuia D, Rakotomamonjy A (2017) Optimal transport for domain adaptation. IEEE Trans Pattern Anal Mach Intell 39(9):1853ā1865
Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (1)
Dinu G, Lazaridou A, Baroni M (2014) Improving ZSL by mitigating the hubness problem. arXiv:1412.6568
Donoho DL (2006) Compressed sensing. IEEE Trans Inf Theory 52(4):1289ā1306
Bo D, Wang S, Chang X, Wang N, Zhang L, Tao D (2018) Multi-task learning for blind source separation. IEEE Trans Image Process 27(9):4219ā4231
Shaolei Du S, Hu W, Kakade SM, Lee JD, Lei Q (2021) Few-shot learning via learning the representation, provably. In: International conference on learning representations
Fernando B, Habrard A, Sebban M, Tuytelaars T (2013) Unsupervised visual domain adaptation using subspace alignment. In: International conference on computer vision, pp 2960ā2967
FitzGerald N, Michael J, He L, Zettlemoyer L (2018) Large-scale QA-SRL parsing. In: ACL, pp 2051ā2060
Freund Y, Iyer R, Schapire RE, Singer Y (2004) RankBoost: an efficient boosting algorithm for combining preferences. J Mach Learn Res (JMLR) 4(6):933ā969
Gabourie A, Rostami M, Kolouri S, Kim K (2019) Learning a domain-invariant embedding for unsupervised domain adaptation using class-conditioned distribution alignment. In: Allerton conference on communication, control, and computing, pp 352ā359
Ganin Y, Lempitsky V (2015) Unsupervised domain adaptation by backpropagation. In: Proceedings of international conference on machine learning
Gong B, Shi Y, Sha F, Grauman K (2012) Geodesic flow kernel for unsupervised domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2066ā2073
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial nets. In: Proceedings of the neural information processing systems
Guo J Darsh J Shah, Barzilay R (2018) Multi-source domain adaptation with mixture of experts. In: EMNLP
Gupta A, Devin C, Liu Y, Abbeel P, Levine S (2017) Learning invariant feature spaces to transfer skills with reinforcement learning. In: Proceedings of the international conference on learning representations, pp 1ā122
Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: Proceedings of the IEEE conference on computer vision and pattern recognition, volĀ 2. IEEE, pp 1735ā1742
Hao J, Ju C, Chen M, Sun Y, Zaniolo C, Wang W (2020) Bio-joie: joint representation learning of biological knowledge bases. In: Proceedings of the 11st ACM conference on bioinformics, computational biology and biomedicine (BCB). ACM
Hao N, Oghbaee A, Rostami M, Derbinsky N, Bento J (2016) Testing fine-grained parallelism for the admm on a factor-graph. In: 2016 IEEE international parallel and distributed processing symposium workshops (IPDPSW). IEEE, pp 835ā844
He H, Ning Q, Roth D (2020) QuASE: question-answer driven sentence encoding. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 8743ā8758
He H, Zhang M, Ning Q, Roth D (2021) Foreseeing the benefits of incidental supervision. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP)
He L, Lewis M, Zettlemoyer L (2015) Question-answer driven semantic role labeling: using natural language to annotate natural language. In: EMNLP, pp 643ā653
Hoffman J, Tzeng E, Park T, Zhu J-Y, Isola P, Saenko K, Efros A, Darrell T (2018) Cycada: cycle-consistent adversarial domain adaptation. In: International conference on machine learning. PMLR, pp 1989ā1998
Hwang GM, Schultz KM, Monaco JD, Zhang K (2021) Neuro-inspired dynamic replanning in swarms-theoretical neuroscience extends swarming in complex environments. Johns Hopkins APL Tech Digest 35:443ā447
Isele D, Rostami M, Eaton E (2016) Using task features for zero-shot knowledge transfer in lifelong learning. In: Proceedings of the international joint conferences on artificial intelligence, pp 1620ā1626
Jin X, Lin Y, Rostami M, Ren X (2021) Learn continually, generalize rapidly: lifelong knowledge accumulation for few-shot learning. In: Findings of EMNLP
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks
Kiros R, Zhu Y, Salakhutdinov RR, Zemel R, Urtasun R, Torralba A, Fidler S (2015) Skip-thought vectors. In: Advances in neural information processing systems, pp 3294ā3302
Klein A, Mamou J, Pyatkin V, Stepanov D, He H, Roth D, Zettlemoyer L, Dagan I (2020) QANom: question-answer driven srl for nominalizations. In: Proceedings of the 28th international conference on computational linguistics, pp 3069ā3083
Kodirov E, X T, Gong S (2017) Semantic autoencoder for zero-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Kodirov E, Xiang T, Fu Z, Gong S (2015) Unsupervised domain adaptation for zero-shot learning. In: International conference on computer vision, pp 2452ā2460
Kolesnikov A, Zhai X, Beyer L (2019) Revisiting self-supervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1920ā1929
Kolouri S, Rostami M, Owechko Y, Kim K (2018) Joint dictionaries for zero-shot learning. In: Proceedings of the AAAI conference on artificial intelligence, pp 3431ā3439
Kumar A, DaumĆ© H (2012) Learning task grouping and overlap in multi-task learning. In: Proceedings of international conference on machine learning, pp 1383ā1390
Lampert C, Nickisch H, Harmeling S (2009) Learning to detect unseen object classes by between-class attribute transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 951ā958
Le D, Thai M, Nguyen T (2020) Multi-task learning for metaphor detection with graph convolutional neural networks and word sense disambiguation. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 8139ā8146
Liu NF, Gardner M, Belinkov Y, Peters ME, Smith NA (2019) Linguistic knowledge and transferability of contextual representations. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Vol 1 (Long and Short Papers), pp 1073ā1094
Ma D, Ryant N, Liberman M (2021) Probing acoustic representations for phonetic properties. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 311ā315
Maurer A (2004) A note on the PAC-Bayesian theorem. arXiv:cs/0411099
Maurer A, Pontil M, Romera-Paredes B (2016) The benefit of multitask representation learning. J Mach Learn Res 17(1):2853ā2884
McMahan B, Moore E, Ramage D, Hampson S, Aguera yĀ Arcas B (2017) Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics. PMLR, pp 1273ā1282
McNamara D, Balcan M-F (2017) Risk bounds for transferring representations with and without fine-tuning. In: International conference on machine learning, pp 2373ā2381
Michael J (2017) Gabriel Stanovsky. Ido Dagan, and Luke Zettlemoyer. Crowdsourcing question-answer meaning representations. NAACL, Luheng He
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111ā3119
Mirtaheri M, Rostami M, Ren X, Morstatter F, Galstyan A (2021) One-shot learning for temporal knowledge graphs. In: 3rd conference on automated knowledge base construction
Morgenstern Y, Rostami M, Purves D (2014) Properties of artificial networks evolved to contend with natural spectra. Proc Natl Acad Sci 111(Supplement 3):10868ā10872
Nigam I, Huang C, Ramanan D (2018) Ensemble knowledge transfer for semantic segmentation. In: WACV. IEEE, pp 1499ā1508
Okamoto N, Minami S, Hirakawa T, Yamashita T, Fujiyoshi H (2021) Deep ensemble collaborative learning by using knowledge-transfer graph for fine-grained object classification. arXiv:2103.14845
Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345ā1359
Pei Z, Cao Z, Long M, Wang J (2018) Multi-adversarial domain adaptation. In: Thirty-second AAAI conference on artificial intelligence
Peng W, Tang Q, Dai W, Chen T (2022) Improving cancer driver gene identification using multi-task learning on graph convolutional network. Briefings Bioinf 23(1):bbab43
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of NAACL-HLT, pp 2227ā2237
Pope PE, Kolouri S, Rostami M, Martin CE, Hoffmann H (2019) Explainability methods for graph convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10772ā10781
Rakshit S, Tamboli D, Meshram PS, Banerjee B, Roig G, Chaudhuri S (2020) Multi-source open-set deep adversarial domain adaptation. In: European conference on computer vision. Springer, pp 735ā750
Rehman A, Rostami M, Wang Z, Brunet D, Vrscay ER (2012) Ssim-inspired image restoration using sparse representation. EURASIP J Adv Signal Process 2012(1):1ā12
Romera-Parede B, Torr P (2015) An embarrassingly simple approach to ZSL. In: Proceedings of international conference on machine learning, pp 2152ā2161
Rostami M, Huber D, Lu T (2018) A crowdsourcing triage algorithm for geopolitical event forecasting. In: ACM RecSys conference, pp 377ā381
Rostami M, Isele D, Eaton E (2020) Using task descriptions in lifelong machine learning for improved performance and zero-shot transfer. J Artif Intell Res
Rostami M, Kolouri S, Kim K, Eaton E (2018) Multi-agent distributed lifelong learning for collective knowledge acquisition. In: International conference on autonomous agents and multiagent systems, pp 712ā720
Rostami M, Kolouri S, Kim K, Eaton E (2019) Sar image classification using few-shot cross-domain transfer learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops
Rostami M, Kolouri S, McClelland J, Pilly P (2020) Generative continual concept learning. In: Proceedings of the AAAI conference on artificial intelligence
Rostami M, Kolouri S, Pilly P (2019) Complementary learning for overcoming catastrophic forgetting using experience replay. In: Proceedings of the international joint conferences on artificial intelligence, pp 3339ā3345
Rostami M (2019) Learning transferable knowledge through embedding spaces. PhD thesis, University of Pennsylvania
Rostami M (2021) Lifelong domain adaptation via consolidated internal distribution. Advances in neural information processing systems, 34
Rostami M (2021) Transfer learning through embedding spaces. CRC Press
Rostami M, Babaie-Zadeh M, Samadi S, Jutten C (2011) Blind source separation of discrete finite alphabet sources using a single mixture. In: 2011 IEEE statistical signal processing workshop (SSP). IEEE, pp 709ā712
Rostami M, Cheung N-M, QS Quek T (2013) Compressed sensing of diffusion fields under heat equation constraint. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 4271ā4274
Rostami M, Galstyan A (2020) Learning a max-margin classifier for cross-domain sentiment analysis
Rostami M, Galstyan A (2020) Sequential unsupervised domain adaptation through prototypical distributions
Rostami M, Galstyan A (2021) Cognitively inspired learning of incremental drifting concepts. arXiv:2110.04662
Rostami M, Kolouri S, Eaton E, Kim K (2019) Deep transfer learning for few-shot sar image classification. Remote Sensing 11(11):1374
Rostami M, Kolouri S, Murez Z, Owechko Y, Eaton E, Kim K (2022) Zero-shot image classification using coupled dictionary embedding. Mach Learn with Appl 8:100278
Rostami M, Spinoulas L, Hussein M, Mathai J, Abd-Almageed W (2021) Detection and continual learning of novel face presentation attacks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 14851ā14860
Ruvolo P, Eaton E (2013) ELLA: an efficient lifelong learning algorithm. In: Proceedings of international conference on machine learning, pp 507ā515
Shamir O, Srebro N (2014) Distributed stochastic optimization and learning. In: 2014 52nd annual allerton conference on communication, control, and computing (Allerton). IEEE, pp 850ā857
Shin H, Lee J, Kim J, Kim J (2017) Continual learning with deep generative replay. In: Proceedings of the neural information processing systems, pp 2990ā2999
Smith V, Chiang C-K, Sanjabi M, Talwalkar AS (2017) Federated multi-task learning. Advances in neural information processing systems, 30
Sorokin A, Forsyth D (2008)Utility data annotation with amazon mechanical turk. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. IEEE, pp 1ā8
Stan S, Rostami M (2021) Unsupervised model adaptation for continual semantic segmentation. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 2593ā2601
Stan S, Rostami M (2021) Unsupervised model adaptation for continual semantic segmentation. In: Proceedings of the AAAI conference on artificial intelligence
Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C (2018) A survey on deep transfer learning. In: International conference on artificial neural networks. Springer, pp 270ā279
Tenney I, Xia P, Chen B, Wang A, Poliak A, McCoy RT, Kim N, VanĀ Durme B, Bowman SR, Das D etĀ al (2018) What do you learn from context? probing for sentence structure in contextualized word representations. In: International conference on learning representations
Tommasi T, Quadrianto N, Caputo B, Lampert C (2012) Beyond dataset bias: Multi-task unaligned shared knowledge transfer. In: Asian conference on computer vision, pp 1ā15
Torrey L, Shavlik J (2010) Transfer learning. In: Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI global, pp 242ā264
Tripuraneni N, Jordan M, Jin C (2020) On the theory of transfer learning: The importance of task diversity. In :Advances in neural information processing systems, vol 33, pp 7852ā7862
Tzeng E, Ā Hoffman J, Saenko K, Ā Darrell T (2017) Adversarial discriminative domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7167ā7176
vanĀ de Ven GM, Siegelmann HT, Tolias AS (2020) Brain-inspired replay for continual learning with artificial neural networks. Nat Commun 11(1):1ā14
von Oswald J, Henning C, Sacramento J, Grewe BF (2019) Continual learning with hypernetworks. In: International conference on learning representations
Wang A, Hula J, Xia P, Pappagari R, McCoy RT, Patel R, Kim N, Tenney I, Huang Y, Yu K etĀ al (2019) Can you tell me how to get past sesame street? sentence-level pretraining beyond language modeling. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 4465ā4476
Wang C, Niepert M, Li H (2019) Recsys-dan: discriminative adversarial networks for cross-domain recommender systems. IEEE Trans Neural Netw Learn Syst 31(8):2731ā2740
Weiss K, Khoshgoftaar TM, Wang D (2016) A survey of transfer learning. J Big Data 3(1):1ā40
Xie Z, Cao W, Wang X, Ming Z, Zhang J, Zhang J (2020) A biologically inspired feature enhancement framework for zero-shot learning. In: 2020 7th ieee international conference on cyber security and cloud computing (CSCloud)/2020 6th IEEE international conference on edge computing and scalable cloud (EdgeCom). IEEE, pp 120ā125
Xue D, Liao X, Carin L, Krishnapuram B (2007) Multi-task learning for classification with dirichlet process priors. J Mach Learn Res 8(1)
Yeganeh H, Rostami M, Wang Z (2015) Objective quality assessment of interpolated natural images. IEEE Trans Image Process 24(11):4651ā4663
Yin W , Hay J, Roth D (2019) Benchmarking zero-shot text classification: datasets, evaluation and entailment approach. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 3914ā3923
Zhang D, Shen D, Initiative ADN et al (2012) Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in alzheimerās disease. Neuroimage 59(2):895ā907
Zhang L, Xiang T, Gong S (2017) Learning a deep embedding model for zero-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2021ā2030
Zhang Y, Barzilay R, Jaakkola T (2017) Aspect-augmented adversarial networks for domain adaptation. Trans Assoc Comput Linguist 5:515ā528
Zhang Z, Saligrama V (2015) Zero-shot learning via semantic similarity embedding. In: International conference on computer vision, pp 4166ā4174
Zhang Z, Luo P, Loy CC, Tang X (2014) Facial landmark detection by deep multi-task learning. In: European conference on computer vision. Springer, pp 94ā108
Zhao H, Zhang S, Wu G, Moura MFJ, Costeira JP, Gordon GJ (2018) Adversarial multiple source domain adaptation. Proc Neural Inf Process Syst 31:8559ā8570
Zhao S, Li B, Xu P, Yue X, Ding G, Keutzer K (2021) Madan: multi-source adversarial domain aggregation network for domain adaptation. Int J Comput Vis 1ā26
Zhou B, Khashabi D, Tsai C-T, Roth D (2018) Zero-shot open entity typing as type-compatible grounding. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 2065ā2076
Zhou J, Liu J, Narayan VA, Ye J (2013) Alzheimerās diseaseĀ neuroimaging initiative, etĀ al. modeling disease progression via multi-task learning. NeuroImage 78:233ā248
Zhu J, Park T, Isola P, Efros A (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2223ā2232
Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, Xiong H, He Q (2020) A comprehensive survey on transfer learning. Proc IEEE 109(1):43ā76
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Rostami, M., He, H., Chen, M., Roth, D. (2023). Transfer Learning via Representation Learning. In: Razavi-Far, R., Wang, B., Taylor, M.E., Yang, Q. (eds) Federated and Transfer Learning. Adaptation, Learning, and Optimization, vol 27. Springer, Cham. https://doi.org/10.1007/978-3-031-11748-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-11748-0_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11747-3
Online ISBN: 978-3-031-11748-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)