From multi-label learning to cross-domain transfer: a model-agnostic approach

Read, Jesse

doi:10.1007/s10489-023-04841-9

From multi-label learning to cross-domain transfer: a model-agnostic approach

Published: 03 August 2023

Volume 53, pages 25135–25153, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jesse Read ORCID: orcid.org/0000-0002-1013-6724¹

162 Accesses
Explore all metrics

Abstract

In multi-label learning it has been widely assumed in the literature that, to obtain best accuracy, the dependence among the labels should be explicitly modeled. This premise led to a proliferation of methods offering techniques to learn and predict labels together (joint modeling). Even though it is now acknowledged that in many contexts a model of dependence is not required for optimal performance, such models continue to outperform independent models in some of those very contexts, suggesting alternative explanations for their performance beyond label dependence. In this article we turn the original premise of multi-label learning on its head, and approach the problem of joint-modeling specifically under the absence of any measurable dependence among task labels. The insights from this study allow us to design a method for cross-domain transfer learning which, unlike most contemporary methods of this type, is model-agnostic (any base model class can be considered) and does not require any access to source data. The results we obtain have important implications and we provide clear directions for future work, both in the areas of multi-label and transfer learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 1

Multi-label Learning

A new transfer learning framework with application to model-agnostic multi-task learning

Article 19 February 2016

Notes

Throughout, we use the notation P(Y|x) as shorthand for \(P(Y|X=x)\); the conditional distribution – conditioned on observation x; a realization of the random variable X
Note that the inverse of Hamming loss and 0/1 subset loss are known as Hamming score and exact match (or subset accuracy), respectively
Logistic regression provides a linear decision boundary, a non-linearity is produced by converting the regression score \(\in [0,1]\) into label \(\in \{0,1\}\)
Even though there is no other similarity beyond the audio domain, we wish to insist later on the true cross-domain nature of the experiments

References

Alet F, Lozano-Perez T, Kaelbling LP (2018) Modular meta-learning. In Billard A, Dragan A, Peters J, Morimoto J, (eds) Proceedings of The 2nd Conference on Robot Learning, vol 87 of Proceedings of Machine Learning Research, pp 856–868. PMLR, 29–31
Andreas J, Rohrbach M, Darrell T, Klein D (2016) Neural module networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 39–48
Bogatinovski J, Todorovski L, Džeroski S, Kocev D (2022) Comprehensive comparative study of multi-label classification methods. Expert Syst Appl 203:117215
Article Google Scholar
Borchani H, Varando G, Bielza C, Larrañaga P (2015) A survey on multi-output regression. Wiley Int Rev Data Min Knowl Disc 5(5):216–233
Article Google Scholar
Breiman L (1996) Bagging predictors. Mach. Learning 24(2):123–140
MATH Google Scholar
Chandra R, Kapoor A (2020) Bayesian neural multi-source transfer learning. Neurocomputing 378:54–64
Article Google Scholar
Chehboune MA, Kaddah R, Read J (2023) Transferable deep metric learning for clustering. In IDA 2023: Advances in Intelligent Data Analysis XXI, 21st International Symposium, pp 15–28
Chen X, Awadallah AH, Hassan H, Wang W, Cardie C (2019) Multi-source cross-lingual model transfer: Learning what to share. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 3098–3112 Florence, Italy, July . Association for Computational Linguistics
Chen Y, Friesen AL, Behbahani F, Doucet A, Budden D, Hoffman M, de Freitas N (2020) Modular meta-learning with shrinkage 33:2858–2869
Google Scholar
Cheng W, Hüllermeier E (2009) Combining instance-based learning and logistic regression for multilabel classification. Mach Learn 76(2–3):211–225
Article MATH Google Scholar
Cisse M, Al-Shedivat M, Bengio S (2016) Adios: Architectures deep in output space. In Proceedings of The 33rd International Conference on Machine Learning, vol 48, pp 2770–2779, New York, USA, 20–22 Jun PMLR
Davis J, Domingos P (2009) Deep transfer via second-order markov logic. In Proceedings of the 26th annual international conference on machine learning, pp 217–224
Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2010) On label dependence in multi-label classification. In Workshop Proceedings of Learning from Multi-Label Data, pp 5–12, Haifa Israel, June
Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2010) Regret analysis for performance metrics in multi-label classification: the case of hamming and subset zero-one loss. In Joint European conference on machine learning and knowledge discovery in databases, pp 280–295. Springer
Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2012) On label dependence and loss minimization in multi-label classification. Mach Learn 88(1–2):5–45
Article MathSciNet MATH Google Scholar
Dembczyński K, Waegeman W, Hüllermeier E (2012) An analysis of chaining in multi-label classification. In ECAI: European Conference of Artificial Intelligence, vol 242, pp 294–299. IOS Press
Du K-L, Swamy MNS (2013) Neural Networks and Statistical Learning. Springer Publishing Company, Incorporated
MATH Google Scholar
Feldman S, Gupta MR, Frigyik BA (2014) Revisiting stein’s paradox: multi-task averaging. J Mach Learn Res 15(1):3441–3482
MathSciNet MATH Google Scholar
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, pp 1126–1135. PMLR
Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):44:1-44:37
Article MATH Google Scholar
Gasse M (2017) Probabilistic Graphical Model Structure Learning : Application to Multi-Label Classification. Université de Lyon, January, Theses
Google Scholar
Gomes HM, Barddal JP, Enembreck F, Bifet A (2017) A survey on ensemble learning for data stream classification. ACM Comput Surv 50(2):23:1-23:36
Google Scholar
Goutam K, Balasubramanian S, Gera D, Sarma RR (2020) Layerout: Freezing layers in deep neural networks. SN Comput Sci 1(5):1–9
Article Google Scholar
Grisel O (2021) All about scikit-learn, with olivier grisel. 2021. Accessed April
Han X, Zhang Z, Ding N, Gu Y, Liu X, Huo Y, Qiu J, Yao Y, Zhang A, Zhang L et al (2021) Pre-trained models: Past, present and future. AI Open 2:225–250
Article Google Scholar
Hastie T, Tibshirani R, Friedman J (2001) The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA
Hsu Y-C, Zhaoyang Lv, Kira Z (2018) Learning to cluster in order to transfer across domains and tasks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net
Huang G-B, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Machine Learn Cybernet 2(2):107–122
Article Google Scholar
Karbalayghareh A, Qian X, Dougherty ER (2018) Optimal bayesian transfer learning. IEEE Trans Signal Process 66(14):3724–3739
Article MathSciNet MATH Google Scholar
LeJeune D, Javadi H, Baraniuk R (2020) The implicit regularization of ordinary least squares ensembles. In International Conference on Artificial Intelligence and Statistics, pp 3525–3535. PMLR
Loza Mencía E, Janssen F (2016) Learning rules for multi-label classification: a stacking and a separate-and-conquer approach. Mach Learn 105(1):77–126
Article MathSciNet Google Scholar
Lukos̆evic̆ius M, Jaeger H (2009) Reservoir computing approaches to recurrent neural network training. Comput Sci Rev 3(3):127–149
Moyano JM, Galindo Cios KJ, Ventura S (2018) Review of ensembles of multi-label classifiers: Models, experimental study and prospects. Inf. Fusion 44:33–45
Article Google Scholar
Nakano FK, Pliakos K, Vens C (2022) Deep tree-ensembles for multi-output prediction. Pattern Recogn 121:108211
Article Google Scholar
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Article Google Scholar
Park LAF, Read J (2018) A blended metric for multi-label optimisation and evaluation. In ECML-PKDD 2018: 29th European Conference on Machine Learning, pp 719–734
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12:2825–2830
MathSciNet MATH Google Scholar
Pinker S et al. (1997) How the mind works, vol 524. New York Norton
Qiu X, Sun T, Xu Y, Shao Y, Dai N, Huang X (2020) Pre-trained models for natural language processing: A survey. Sci China Technol Sci 63(10):1872–1897
Article Google Scholar
Rahimi A, Recht B, et al. (2007) Random features for large-scale kernel machines. In NIPS, vol 3, pp 5. Citeseer
Read J, Hollmén J (2014) A deep interpretation of classifier chains. In IDA 2014: Advances in Intelligent Data Analysis XIII, 13th International Symposium, pp 251–262
Read J, Pfahringer B, Holmes G, Frank E (2021) Classifier chains: A review and perspectives. J Artif Intell Res (JAIR) 70:683–718. https://jair.org/index.php/jair/article/view/12376/26658
Rojas R (1996) Modular Neural Networks, pp 411–425 Springer Berlin Heidelberg
Senge R, del Coz JJ, Hüllermeier E (2014) On the problem of error propagation in classifier chains for multi-label classification. In: Schmidt-Thieme L, Janning R (eds) Spiliopoulou M. Data Analysis, Machine Learning and Knowledge Discovery, Cham Springer International Publishing pp, pp 163–170
Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition
Stewart JH (2005) Foreign language study in elementary schools: Benefits and implications for achievement in reading and math. Early Childhood Educ J 33(1):11–16
Article Google Scholar
Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C (2018) A survey on deep transfer learning. In International conference on artificial neural networks, p 270–279. Springer
Torrey L, Shavlik J (2010) Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques, pp 242–264. IGI global
Tripuraneni N, Jordan M, Jin C (2020) On the theory of transfer learning: The importance of task diversity. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in Neural Information Processing Systems, vol 33. Curran Associates Inc, pp 7852–7862
Google Scholar
Tsoumakas G, Katakis I (2007) Multi label classification: An overview. Int J Data Warehous Min 3(3):1–13
Article Google Scholar
Tsoumakas G, Spyromitros-Xioufis E, Vrekou A, Vlahavas I (2014) Multi-target regression via random linear target combinations. In ECML PKDD 2014
Villani C (2009) Optimal transport: old and new, vol 338 Springer
Waegeman W, Dembczyński K, Hüllermeier E (2019) Multi-target prediction: a unifying view on problems and methods. Data Min Knowl Disc 33(2):293–324
Wang S, Zhou W, Jiang C (2020) A survey of word embeddings based on deep learning. Computing 102:717–740
Article MathSciNet MATH Google Scholar
Weimann K, Conrad TOF (2021) Transfer learning for ecg classification. Scientific Reports 11(1):1–12
Article Google Scholar
Wu G, Zhu J (2020) Multi-label classification: do hamming loss and subset accuracy really conflict with each other? Advances in Neural Information Processing Systems 33:3130–3140
Google Scholar
Yeh C-K, Wu W-C, Ko W-J, Wang Y-CF (2017) Learning deep latent space for multi-label classification. In Proceedings of the AAAI conference on artificial intelligence, vol 31
Zhao P, Cai L-W, Zhou Z-H (2020) Handling concept drift via model reuse. Mach Learn 109(3):533–568
Article MathSciNet MATH Google Scholar
Zoph B, Le QV (2017) Neural architecture search with reinforcement learning

Download references

Funding

The authors did not receive support from any organization for the submitted work. No funding was received to assist with the preparation of this manuscript. The authors have no relevant financial or non-financial interests to disclose.

Author information

Authors and Affiliations

LIX, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, 91120, France
Jesse Read

Authors

Jesse Read
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jesse Read.

Ethics declarations

Conflicts of interests

Competing interests

Ethical standard

There are no ethical implications to discuss; the research in this paper did not involve human or animal participants. All datasets involved in the current study are listed in Table 2; the real-world benchmark data sets are available from the web link supplied in the table caption; the synthetic data/toy data sets for demonstration and illustration are described and displayed throughout the article, for example, Fig. 11.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Read, J. From multi-label learning to cross-domain transfer: a model-agnostic approach. Appl Intell 53, 25135–25153 (2023). https://doi.org/10.1007/s10489-023-04841-9

Download citation

Accepted: 25 June 2023
Published: 03 August 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s10489-023-04841-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

From multi-label learning to cross-domain transfer: a model-agnostic approach

Abstract

Access this article

Similar content being viewed by others

Multi-label Learning

Multi-label Learning

A new transfer learning framework with application to model-agnostic multi-task learning

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interests

Ethical standard

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

From multi-label learning to cross-domain transfer: a model-agnostic approach

Abstract

Access this article

Similar content being viewed by others

Multi-label Learning

Multi-label Learning

A new transfer learning framework with application to model-agnostic multi-task learning

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interests

Ethical standard

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation