Abstract
In multi-label learning it has been widely assumed in the literature that, to obtain best accuracy, the dependence among the labels should be explicitly modeled. This premise led to a proliferation of methods offering techniques to learn and predict labels together (joint modeling). Even though it is now acknowledged that in many contexts a model of dependence is not required for optimal performance, such models continue to outperform independent models in some of those very contexts, suggesting alternative explanations for their performance beyond label dependence. In this article we turn the original premise of multi-label learning on its head, and approach the problem of joint-modeling specifically under the absence of any measurable dependence among task labels. The insights from this study allow us to design a method for cross-domain transfer learning which, unlike most contemporary methods of this type, is model-agnostic (any base model class can be considered) and does not require any access to source data. The results we obtain have important implications and we provide clear directions for future work, both in the areas of multi-label and transfer learning.
Similar content being viewed by others
Notes
Throughout, we use the notation P(Y|x) as shorthand for \(P(Y|X=x)\); the conditional distribution – conditioned on observation x; a realization of the random variable X
Note that the inverse of Hamming loss and 0/1 subset loss are known as Hamming score and exact match (or subset accuracy), respectively
Logistic regression provides a linear decision boundary, a non-linearity is produced by converting the regression score \(\in [0,1]\) into label \(\in \{0,1\}\)
Even though there is no other similarity beyond the audio domain, we wish to insist later on the true cross-domain nature of the experiments
References
Alet F, Lozano-Perez T, Kaelbling LP (2018) Modular meta-learning. In Billard A, Dragan A, Peters J, Morimoto J, (eds) Proceedings of The 2nd Conference on Robot Learning, vol 87 of Proceedings of Machine Learning Research, pp 856–868. PMLR, 29–31
Andreas J, Rohrbach M, Darrell T, Klein D (2016) Neural module networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 39–48
Bogatinovski J, Todorovski L, Džeroski S, Kocev D (2022) Comprehensive comparative study of multi-label classification methods. Expert Syst Appl 203:117215
Borchani H, Varando G, Bielza C, Larrañaga P (2015) A survey on multi-output regression. Wiley Int Rev Data Min Knowl Disc 5(5):216–233
Breiman L (1996) Bagging predictors. Mach. Learning 24(2):123–140
Chandra R, Kapoor A (2020) Bayesian neural multi-source transfer learning. Neurocomputing 378:54–64
Chehboune MA, Kaddah R, Read J (2023) Transferable deep metric learning for clustering. In IDA 2023: Advances in Intelligent Data Analysis XXI, 21st International Symposium, pp 15–28
Chen X, Awadallah AH, Hassan H, Wang W, Cardie C (2019) Multi-source cross-lingual model transfer: Learning what to share. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 3098–3112 Florence, Italy, July . Association for Computational Linguistics
Chen Y, Friesen AL, Behbahani F, Doucet A, Budden D, Hoffman M, de Freitas N (2020) Modular meta-learning with shrinkage 33:2858–2869
Cheng W, Hüllermeier E (2009) Combining instance-based learning and logistic regression for multilabel classification. Mach Learn 76(2–3):211–225
Cisse M, Al-Shedivat M, Bengio S (2016) Adios: Architectures deep in output space. In Proceedings of The 33rd International Conference on Machine Learning, vol 48, pp 2770–2779, New York, USA, 20–22 Jun PMLR
Davis J, Domingos P (2009) Deep transfer via second-order markov logic. In Proceedings of the 26th annual international conference on machine learning, pp 217–224
Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2010) On label dependence in multi-label classification. In Workshop Proceedings of Learning from Multi-Label Data, pp 5–12, Haifa Israel, June
Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2010) Regret analysis for performance metrics in multi-label classification: the case of hamming and subset zero-one loss. In Joint European conference on machine learning and knowledge discovery in databases, pp 280–295. Springer
Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2012) On label dependence and loss minimization in multi-label classification. Mach Learn 88(1–2):5–45
Dembczyński K, Waegeman W, Hüllermeier E (2012) An analysis of chaining in multi-label classification. In ECAI: European Conference of Artificial Intelligence, vol 242, pp 294–299. IOS Press
Du K-L, Swamy MNS (2013) Neural Networks and Statistical Learning. Springer Publishing Company, Incorporated
Feldman S, Gupta MR, Frigyik BA (2014) Revisiting stein’s paradox: multi-task averaging. J Mach Learn Res 15(1):3441–3482
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, pp 1126–1135. PMLR
Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):44:1-44:37
Gasse M (2017) Probabilistic Graphical Model Structure Learning : Application to Multi-Label Classification. Université de Lyon, January, Theses
Gomes HM, Barddal JP, Enembreck F, Bifet A (2017) A survey on ensemble learning for data stream classification. ACM Comput Surv 50(2):23:1-23:36
Goutam K, Balasubramanian S, Gera D, Sarma RR (2020) Layerout: Freezing layers in deep neural networks. SN Comput Sci 1(5):1–9
Grisel O (2021) All about scikit-learn, with olivier grisel. 2021. Accessed April
Han X, Zhang Z, Ding N, Gu Y, Liu X, Huo Y, Qiu J, Yao Y, Zhang A, Zhang L et al (2021) Pre-trained models: Past, present and future. AI Open 2:225–250
Hastie T, Tibshirani R, Friedman J (2001) The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA
Hsu Y-C, Zhaoyang Lv, Kira Z (2018) Learning to cluster in order to transfer across domains and tasks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net
Huang G-B, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Machine Learn Cybernet 2(2):107–122
Karbalayghareh A, Qian X, Dougherty ER (2018) Optimal bayesian transfer learning. IEEE Trans Signal Process 66(14):3724–3739
LeJeune D, Javadi H, Baraniuk R (2020) The implicit regularization of ordinary least squares ensembles. In International Conference on Artificial Intelligence and Statistics, pp 3525–3535. PMLR
Loza Mencía E, Janssen F (2016) Learning rules for multi-label classification: a stacking and a separate-and-conquer approach. Mach Learn 105(1):77–126
Lukos̆evic̆ius M, Jaeger H (2009) Reservoir computing approaches to recurrent neural network training. Comput Sci Rev 3(3):127–149
Moyano JM, Galindo Cios KJ, Ventura S (2018) Review of ensembles of multi-label classifiers: Models, experimental study and prospects. Inf. Fusion 44:33–45
Nakano FK, Pliakos K, Vens C (2022) Deep tree-ensembles for multi-output prediction. Pattern Recogn 121:108211
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Park LAF, Read J (2018) A blended metric for multi-label optimisation and evaluation. In ECML-PKDD 2018: 29th European Conference on Machine Learning, pp 719–734
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12:2825–2830
Pinker S et al. (1997) How the mind works, vol 524. New York Norton
Qiu X, Sun T, Xu Y, Shao Y, Dai N, Huang X (2020) Pre-trained models for natural language processing: A survey. Sci China Technol Sci 63(10):1872–1897
Rahimi A, Recht B, et al. (2007) Random features for large-scale kernel machines. In NIPS, vol 3, pp 5. Citeseer
Read J, Hollmén J (2014) A deep interpretation of classifier chains. In IDA 2014: Advances in Intelligent Data Analysis XIII, 13th International Symposium, pp 251–262
Read J, Pfahringer B, Holmes G, Frank E (2021) Classifier chains: A review and perspectives. J Artif Intell Res (JAIR) 70:683–718. https://jair.org/index.php/jair/article/view/12376/26658
Rojas R (1996) Modular Neural Networks, pp 411–425 Springer Berlin Heidelberg
Senge R, del Coz JJ, Hüllermeier E (2014) On the problem of error propagation in classifier chains for multi-label classification. In: Schmidt-Thieme L, Janning R (eds) Spiliopoulou M. Data Analysis, Machine Learning and Knowledge Discovery, Cham Springer International Publishing pp, pp 163–170
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition
Stewart JH (2005) Foreign language study in elementary schools: Benefits and implications for achievement in reading and math. Early Childhood Educ J 33(1):11–16
Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C (2018) A survey on deep transfer learning. In International conference on artificial neural networks, p 270–279. Springer
Torrey L, Shavlik J (2010) Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques, pp 242–264. IGI global
Tripuraneni N, Jordan M, Jin C (2020) On the theory of transfer learning: The importance of task diversity. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in Neural Information Processing Systems, vol 33. Curran Associates Inc, pp 7852–7862
Tsoumakas G, Katakis I (2007) Multi label classification: An overview. Int J Data Warehous Min 3(3):1–13
Tsoumakas G, Spyromitros-Xioufis E, Vrekou A, Vlahavas I (2014) Multi-target regression via random linear target combinations. In ECML PKDD 2014
Villani C (2009) Optimal transport: old and new, vol 338 Springer
Waegeman W, Dembczyński K, Hüllermeier E (2019) Multi-target prediction: a unifying view on problems and methods. Data Min Knowl Disc 33(2):293–324
Wang S, Zhou W, Jiang C (2020) A survey of word embeddings based on deep learning. Computing 102:717–740
Weimann K, Conrad TOF (2021) Transfer learning for ecg classification. Scientific Reports 11(1):1–12
Wu G, Zhu J (2020) Multi-label classification: do hamming loss and subset accuracy really conflict with each other? Advances in Neural Information Processing Systems 33:3130–3140
Yeh C-K, Wu W-C, Ko W-J, Wang Y-CF (2017) Learning deep latent space for multi-label classification. In Proceedings of the AAAI conference on artificial intelligence, vol 31
Zhao P, Cai L-W, Zhou Z-H (2020) Handling concept drift via model reuse. Mach Learn 109(3):533–568
Zoph B, Le QV (2017) Neural architecture search with reinforcement learning
Funding
The authors did not receive support from any organization for the submitted work. No funding was received to assist with the preparation of this manuscript. The authors have no relevant financial or non-financial interests to disclose.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interests
Competing interests
Ethical standard
There are no ethical implications to discuss; the research in this paper did not involve human or animal participants. All datasets involved in the current study are listed in Table 2; the real-world benchmark data sets are available from the web link supplied in the table caption; the synthetic data/toy data sets for demonstration and illustration are described and displayed throughout the article, for example, Fig. 11.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Read, J. From multi-label learning to cross-domain transfer: a model-agnostic approach. Appl Intell 53, 25135–25153 (2023). https://doi.org/10.1007/s10489-023-04841-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04841-9