Skip to main content
Log in

From multi-label learning to cross-domain transfer: a model-agnostic approach

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In multi-label learning it has been widely assumed in the literature that, to obtain best accuracy, the dependence among the labels should be explicitly modeled. This premise led to a proliferation of methods offering techniques to learn and predict labels together (joint modeling). Even though it is now acknowledged that in many contexts a model of dependence is not required for optimal performance, such models continue to outperform independent models in some of those very contexts, suggesting alternative explanations for their performance beyond label dependence. In this article we turn the original premise of multi-label learning on its head, and approach the problem of joint-modeling specifically under the absence of any measurable dependence among task labels. The insights from this study allow us to design a method for cross-domain transfer learning which, unlike most contemporary methods of this type, is model-agnostic (any base model class can be considered) and does not require any access to source data. The results we obtain have important implications and we provide clear directions for future work, both in the areas of multi-label and transfer learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Algorithm 1
Fig. 13

Similar content being viewed by others

Notes

  1. Throughout, we use the notation P(Y|x) as shorthand for \(P(Y|X=x)\); the conditional distribution – conditioned on observation x; a realization of the random variable X

  2. Note that the inverse of Hamming loss and 0/1 subset loss are known as Hamming score and exact match (or subset accuracy), respectively

  3. Logistic regression provides a linear decision boundary, a non-linearity is produced by converting the regression score \(\in [0,1]\) into label \(\in \{0,1\}\)

  4. Even though there is no other similarity beyond the audio domain, we wish to insist later on the true cross-domain nature of the experiments

References

  1. Alet F, Lozano-Perez T, Kaelbling LP (2018) Modular meta-learning. In Billard A, Dragan A, Peters J, Morimoto J, (eds) Proceedings of The 2nd Conference on Robot Learning, vol 87 of Proceedings of Machine Learning Research, pp 856–868. PMLR, 29–31

  2. Andreas J, Rohrbach M, Darrell T, Klein D (2016) Neural module networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 39–48

  3. Bogatinovski J, Todorovski L, Džeroski S, Kocev D (2022) Comprehensive comparative study of multi-label classification methods. Expert Syst Appl 203:117215

    Article  Google Scholar 

  4. Borchani H, Varando G, Bielza C, Larrañaga P (2015) A survey on multi-output regression. Wiley Int Rev Data Min Knowl Disc 5(5):216–233

    Article  Google Scholar 

  5. Breiman L (1996) Bagging predictors. Mach. Learning 24(2):123–140

    MATH  Google Scholar 

  6. Chandra R, Kapoor A (2020) Bayesian neural multi-source transfer learning. Neurocomputing 378:54–64

    Article  Google Scholar 

  7. Chehboune MA, Kaddah R, Read J (2023) Transferable deep metric learning for clustering. In IDA 2023: Advances in Intelligent Data Analysis XXI, 21st International Symposium, pp 15–28

  8. Chen X, Awadallah AH, Hassan H, Wang W, Cardie C (2019) Multi-source cross-lingual model transfer: Learning what to share. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 3098–3112 Florence, Italy, July . Association for Computational Linguistics

  9. Chen Y, Friesen AL, Behbahani F, Doucet A, Budden D, Hoffman M, de Freitas N (2020) Modular meta-learning with shrinkage 33:2858–2869

    Google Scholar 

  10. Cheng W, Hüllermeier E (2009) Combining instance-based learning and logistic regression for multilabel classification. Mach Learn 76(2–3):211–225

    Article  MATH  Google Scholar 

  11. Cisse M, Al-Shedivat M, Bengio S (2016) Adios: Architectures deep in output space. In Proceedings of The 33rd International Conference on Machine Learning, vol 48, pp 2770–2779, New York, USA, 20–22 Jun PMLR

  12. Davis J, Domingos P (2009) Deep transfer via second-order markov logic. In Proceedings of the 26th annual international conference on machine learning, pp 217–224

  13. Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2010) On label dependence in multi-label classification. In Workshop Proceedings of Learning from Multi-Label Data, pp 5–12, Haifa Israel, June

  14. Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2010) Regret analysis for performance metrics in multi-label classification: the case of hamming and subset zero-one loss. In Joint European conference on machine learning and knowledge discovery in databases, pp 280–295. Springer

  15. Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2012) On label dependence and loss minimization in multi-label classification. Mach Learn 88(1–2):5–45

    Article  MathSciNet  MATH  Google Scholar 

  16. Dembczyński K, Waegeman W, Hüllermeier E (2012) An analysis of chaining in multi-label classification. In ECAI: European Conference of Artificial Intelligence, vol 242, pp 294–299. IOS Press

  17. Du K-L, Swamy MNS (2013) Neural Networks and Statistical Learning. Springer Publishing Company, Incorporated

    MATH  Google Scholar 

  18. Feldman S, Gupta MR, Frigyik BA (2014) Revisiting stein’s paradox: multi-task averaging. J Mach Learn Res 15(1):3441–3482

    MathSciNet  MATH  Google Scholar 

  19. Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, pp 1126–1135. PMLR

  20. Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):44:1-44:37

    Article  MATH  Google Scholar 

  21. Gasse M (2017) Probabilistic Graphical Model Structure Learning : Application to Multi-Label Classification. Université de Lyon, January, Theses

    Google Scholar 

  22. Gomes HM, Barddal JP, Enembreck F, Bifet A (2017) A survey on ensemble learning for data stream classification. ACM Comput Surv 50(2):23:1-23:36

    Google Scholar 

  23. Goutam K, Balasubramanian S, Gera D, Sarma RR (2020) Layerout: Freezing layers in deep neural networks. SN Comput Sci 1(5):1–9

    Article  Google Scholar 

  24. Grisel O (2021) All about scikit-learn, with olivier grisel. 2021. Accessed April

  25. Han X, Zhang Z, Ding N, Gu Y, Liu X, Huo Y, Qiu J, Yao Y, Zhang A, Zhang L et al (2021) Pre-trained models: Past, present and future. AI Open 2:225–250

    Article  Google Scholar 

  26. Hastie T, Tibshirani R, Friedman J (2001) The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York, NY, USA

  27. Hsu Y-C, Zhaoyang Lv, Kira Z (2018) Learning to cluster in order to transfer across domains and tasks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net

  28. Huang G-B, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Machine Learn Cybernet 2(2):107–122

    Article  Google Scholar 

  29. Karbalayghareh A, Qian X, Dougherty ER (2018) Optimal bayesian transfer learning. IEEE Trans Signal Process 66(14):3724–3739

    Article  MathSciNet  MATH  Google Scholar 

  30. LeJeune D, Javadi H, Baraniuk R (2020) The implicit regularization of ordinary least squares ensembles. In International Conference on Artificial Intelligence and Statistics, pp 3525–3535. PMLR

  31. Loza Mencía E, Janssen F (2016) Learning rules for multi-label classification: a stacking and a separate-and-conquer approach. Mach Learn 105(1):77–126

    Article  MathSciNet  Google Scholar 

  32. Lukos̆evic̆ius M, Jaeger H (2009) Reservoir computing approaches to recurrent neural network training. Comput Sci Rev 3(3):127–149

  33. Moyano JM, Galindo Cios KJ, Ventura S (2018) Review of ensembles of multi-label classifiers: Models, experimental study and prospects. Inf. Fusion 44:33–45

    Article  Google Scholar 

  34. Nakano FK, Pliakos K, Vens C (2022) Deep tree-ensembles for multi-output prediction. Pattern Recogn 121:108211

    Article  Google Scholar 

  35. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  36. Park LAF, Read J (2018) A blended metric for multi-label optimisation and evaluation. In ECML-PKDD 2018: 29th European Conference on Machine Learning, pp 719–734

  37. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  38. Pinker S et al. (1997) How the mind works, vol 524. New York Norton

  39. Qiu X, Sun T, Xu Y, Shao Y, Dai N, Huang X (2020) Pre-trained models for natural language processing: A survey. Sci China Technol Sci 63(10):1872–1897

    Article  Google Scholar 

  40. Rahimi A, Recht B, et al. (2007) Random features for large-scale kernel machines. In NIPS, vol 3, pp 5. Citeseer

  41. Read J, Hollmén J (2014) A deep interpretation of classifier chains. In IDA 2014: Advances in Intelligent Data Analysis XIII, 13th International Symposium, pp 251–262

  42. Read J, Pfahringer B, Holmes G, Frank E (2021) Classifier chains: A review and perspectives. J Artif Intell Res (JAIR) 70:683–718. https://jair.org/index.php/jair/article/view/12376/26658

  43. Rojas R (1996) Modular Neural Networks, pp 411–425 Springer Berlin Heidelberg

  44. Senge R, del Coz JJ, Hüllermeier E (2014) On the problem of error propagation in classifier chains for multi-label classification. In: Schmidt-Thieme L, Janning R (eds) Spiliopoulou M. Data Analysis, Machine Learning and Knowledge Discovery, Cham Springer International Publishing pp, pp 163–170

    Google Scholar 

  45. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition

  46. Stewart JH (2005) Foreign language study in elementary schools: Benefits and implications for achievement in reading and math. Early Childhood Educ J 33(1):11–16

    Article  Google Scholar 

  47. Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C (2018) A survey on deep transfer learning. In International conference on artificial neural networks, p 270–279. Springer

  48. Torrey L, Shavlik J (2010) Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques, pp 242–264. IGI global

  49. Tripuraneni N, Jordan M, Jin C (2020) On the theory of transfer learning: The importance of task diversity. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in Neural Information Processing Systems, vol 33. Curran Associates Inc, pp 7852–7862

    Google Scholar 

  50. Tsoumakas G, Katakis I (2007) Multi label classification: An overview. Int J Data Warehous Min 3(3):1–13

    Article  Google Scholar 

  51. Tsoumakas G, Spyromitros-Xioufis E, Vrekou A, Vlahavas I (2014) Multi-target regression via random linear target combinations. In ECML PKDD 2014

  52. Villani C (2009) Optimal transport: old and new, vol 338 Springer

  53. Waegeman W, Dembczyński K, Hüllermeier E (2019) Multi-target prediction: a unifying view on problems and methods. Data Min Knowl Disc 33(2):293–324

  54. Wang S, Zhou W, Jiang C (2020) A survey of word embeddings based on deep learning. Computing 102:717–740

    Article  MathSciNet  MATH  Google Scholar 

  55. Weimann K, Conrad TOF (2021) Transfer learning for ecg classification. Scientific Reports 11(1):1–12

    Article  Google Scholar 

  56. Wu G, Zhu J (2020) Multi-label classification: do hamming loss and subset accuracy really conflict with each other? Advances in Neural Information Processing Systems 33:3130–3140

    Google Scholar 

  57. Yeh C-K, Wu W-C, Ko W-J, Wang Y-CF (2017) Learning deep latent space for multi-label classification. In Proceedings of the AAAI conference on artificial intelligence, vol 31

  58. Zhao P, Cai L-W, Zhou Z-H (2020) Handling concept drift via model reuse. Mach Learn 109(3):533–568

    Article  MathSciNet  MATH  Google Scholar 

  59. Zoph B, Le QV (2017) Neural architecture search with reinforcement learning

Download references

Funding

The authors did not receive support from any organization for the submitted work. No funding was received to assist with the preparation of this manuscript. The authors have no relevant financial or non-financial interests to disclose.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jesse Read.

Ethics declarations

Conflicts of interests

Competing interests

Ethical standard

There are no ethical implications to discuss; the research in this paper did not involve human or animal participants. All datasets involved in the current study are listed in Table 2; the real-world benchmark data sets are available from the web link supplied in the table caption; the synthetic data/toy data sets for demonstration and illustration are described and displayed throughout the article, for example, Fig. 11.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Read, J. From multi-label learning to cross-domain transfer: a model-agnostic approach. Appl Intell 53, 25135–25153 (2023). https://doi.org/10.1007/s10489-023-04841-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04841-9

Keywords

Navigation