Abstract
This paper focuses on multi-label learning from small amounts of labelled data. We demonstrate that the binary-relevance extension of the interpolated label propagation algorithm, the harmonic function, is a competitive learning method with respect to many widely-used evaluation measures. This is achieved by a new transition matrix that better captures the underlying structure useful for classification coupled with the use of data dependent thresholding strategies. Furthermore, we show that in the case of label dependence, one can use the outputs of a competitive learning model as part of the input to the harmonic function to improve the performance of this model. Finally, since we are using multiple measures to thoroughly evaluate the performance of the algorithm, we propose to use the game-theory based method of Kalai and Smorodinsky to output a single compromise solution for all measures. This method can be applied to any learning model irrespective of the number of evaluation metrics used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The appendices are available at github.com/kmusayeva/M-LP.
- 2.
To see this let (1, 0, 1, 1) be the true label set. Then, although both (1, 0, 0, 0) and (1, 1, 0, 1) predict two labels incorrectly, the latter provides a higher F1 value because it provides higher recall without much degrading the precision.
- 3.
The code and the data are available at github.com/kmusayeva/M-LP.
- 4.
All datasets except for Fungi are taken from https://www.uco.es/kdis/mllresources/. The Fungi dataset has been kindly provided to us by C. Averill [1].
References
Averill, C., Werbin, Z., Atherton, K., Bhatnagar, J., Dietze, M.: Soil microbiome predictability increases with spatial and taxonomic scale. Nature Ecol. Evol. 5(6), 747–756 (2021)
Belkin, M., Niyogi, P.: Semi-supervised learning on Riemannian manifolds. Mach. Learn. 56(1), 209–239 (2004)
Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7(11) (2006)
Binois, M., Picheny, V., Taillandier, P., Habbal, A.: The Kalai-Smorodinsky solution for many-objective Bayesian optimization. J. Mach. Learn. Res. 21(150), 1–42 (2020)
Chapelle, O., Weston, J., Schölkopf, B.: Cluster kernels for semi-supervised learning. In: Advances in Neural Information Processing Systems. Citeseer (2002)
Cheng, W., Hüllermeier, E.: Combining instance-based learning and logistic regression for multilabel classification. Mach. Learn. 76, 211–225 (2009)
Chung, F.: Spectral graph theory, vol. 92. American Mathematical Soc. (1997)
Dembczyński, K., Jachnik, A., Kotlowski, W., Waegeman, W., Hüllermeier, E.: Optimizing the F-measure in multi-label classification: Plug-in rule approach versus structured loss minimization. In: International Conference on Machine Learning, pp. 1130–1138. PMLR (2013)
Dembczyński, K., Waegeman, W., Cheng, W., Hüllermeier, W.: Regret analysis for performance metrics in multi-label classification: the case of Hamming and subset zero-one loss. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 280–295. Springer (2010)
Dembczyński, K.K., Cheng, W., Hüllermeier, E.: Bayes optimal multilabel classification via probabilistic classifier chains. In: ICML (2010)
Dembczyński, K.K., Waegeman, W., Cheng, W., Hüllermeier, E.: On label dependence in multilabel classification. In: LastCFP: ICML Workshop on learning from multi-label data. Ghent University, KERMIT, Department of Applied Mathematics, Biometrics (2010)
Doyle, P., Snell, J.: Random walks and electric networks, vol. 22. American Mathematical Soc. (1984)
Fan, R., Lin, C.: A study on threshold selection for multi-label classification, pp. 1–23. Department of Computer Science, National Taiwan University pp (2007)
Hastie, T., Tibshirani, R., Friedman, J.H., Friedman, J.H.: The elements of statistical learning: data mining, inference, and prediction, vol. 2. Springer (2009)
Kalai, E., Smorodinsky, M.: Other solutions to Nash’s bargaining problem. Econometrica: J. Econometric Soc., 513–518 (1975)
Kong, X., Ng, M., Zhou, Z.: Transductive multilabel learning via label set propagation. IEEE Trans. Knowl. Data Eng. 25(3), 704–719 (2011)
Leathart, T., Frank, E., Holmes, G., Pfahringer, B.: Probability calibration trees. In: Asian Conference on Machine Learning, pp. 145–160 (2017)
Legendre, P., Gallagher, E.: Ecologically meaningful transformations for ordination of species data. Oecologia 129(2), 271–280 (2001)
Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. Advances in neural information processing systems 14 (2001)
Picheny, V., Binois, M.: GPGame: Solving Complex Game Problems using Gaussian Processes (2022). www.github.com/vpicheny/GPGame, R package version 1.2.0
Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classifiers 10(3), 61–74 (1999)
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2021). www.R-project.org/
Read, J., Pfahringer, B., Holmes, G.: Generating synthetic multi-label data streams. In: ECML/PKKD 2009 Workshop on Learning from Multi-label Data (MLD 2009), pp. 69–84. Citeseer (2009)
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333–359 (2011)
Rivolli, A., de Carvalho, A.: The utiml package: Multi-label classification in R. R J. 10(2), 24 (2018)
Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Sechidis, K., Tsoumakas, G., Vlahavas, I.: On the stratification of multi-label data. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 145–158. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_10
Shi, C., Kong, X., Yu, P., Wang, B.: Multi-objective multi-label classification. In: Proceedings of the 2012 SIAM International Conference on Data Mining, pp. 355–366. SIAM (2012)
Stellato, B., Banjac, G., Goulart, P., Bemporad, A., Boyd, S.: OSQP: an operator splitting solver for quadratic programs. Math. Program. Comput. 12(4), 637–672 (2020)
Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. Int. J. Data Warehousing Mining (IJDWM) 3(3), 1–13 (2007)
Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer (2009). https://doi.org/10.1007/978-0-387-09823-4_34
Tsoumakas, G., Katakis, I., Vlahavas, I.: Random k-labelsets for multilabel classification. IEEE Trans. Knowl. Data Eng. 23(7), 1079–1089 (2010)
Wang, B., Tu, Z., Tsotsos, J.: Dynamic label propagation for semi-supervised multi-class multi-label classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 425–432 (2013)
Wang, F., Zhang, C.: Label propagation through linear neighborhoods. IEEE Trans. Knowl. Data Eng. 20(1), 55–67 (2007)
Yang, Y.: A study of thresholding strategies for text categorization. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 137–145 (2001)
Zadrozny, B., Elkan, C.: Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 694–699 (2002)
Zhang, M., Li, Y., Yang, H., Liu, X.: Towards class-imbalance aware multi-label learning. IEEE Trans. Cybern. (2020)
Zhang, M., Zhou: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2013)
Zhou, D., Bousquet, O., Lal, T., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems, pp. 321–328 (2004)
Zhou, D., Schölkopf, B.: Learning from labeled and unlabeled data using random walks. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 237–244. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28649-3_29
Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation (2002)
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 912–919 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Musayeva, K., Binois, M. (2023). Improved Multi-label Propagation for Small Data with Multi-objective Optimization. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14172. Springer, Cham. https://doi.org/10.1007/978-3-031-43421-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-43421-1_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43420-4
Online ISBN: 978-3-031-43421-1
eBook Packages: Computer ScienceComputer Science (R0)