Abstract
Multilabel classification is an extension of conventional classification in which a single instance can be associated with multiple labels. Recent research has shown that, just like for conventional classification, instance-based learning algorithms relying on the nearest neighbor estimation principle can be used quite successfully in this context. However, since hitherto existing algorithms do not take correlations and interdependencies between labels into account, their potential has not yet been fully exploited. In this paper, we propose a new approach to multilabel classification, which is based on a framework that unifies instance-based learning and logistic regression, comprising both methods as special cases. This approach allows one to capture interdependencies between labels and, moreover, to combine model-based and similarity-based inference for multilabel classification. As will be shown by experimental studies, our approach is able to improve predictive accuracy in terms of several evaluation criteria for multilabel prediction.
Article PDF
Similar content being viewed by others
References
Aha, D., Kibler, D., & Alber, M. (1991). Instance-based learning algorithms. Machine Learning, 6(1), 37–66.
Boutell, M. R., Luo, J., Shen, X., & Brown, C. M. (2004). Learning multilabel scene classification. Pattern Recognition, 37(9), 1757–1771.
Clare, A., & King, R. D. (2001). Knowledge discovery in multilabel phenotype data. In L. D. Raedt & A. Siebes (Eds.), Lecture notes in computer science (Vol. 2168, pp. 42–53). Berlin: Springer.
Comite, F. D., Gilleron, R., & Tommasi, M. (2003). Learning multilabel alternating decision tree from texts and data. In P. Perner & A. Rosenfeld (Eds.), Lecture notes in computer science (Vol. 2734, pp. 35–49). Berlin: Springer.
Dasarathy, B. V., editor (1991). Nearest neighbor (NN) norms: NN pattern classification techniques. Los Alamitos: IEEE Comput. Soc.
Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.
Elisseeff, A., & Weston, J. (2002). A kernel method for multilabelled classification. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems (Vol. 14, pp. 681–687). Cambridge: MIT Press.
Getoor, L., & Taskar, B., editors (2007). Introduction to statistical relational learning. Cambridge: MIT Press.
Ghamrawi, N., & McCallum, A. (2005). Collective multilabel classification. In Proc. CIKM-05, Bremen, Germany.
Godbole, S., & Sarawagi, S. (2004). Discriminative methods for multilabeled classification. In LNCS: Vol. 3056. Advances in knowledge discovery and data mining (pp. 20–33). Berlin: Springer.
Kazawa, H., Izumitani, T., Taira, H., & Maeda, E. (2005). Maximal margin labeling for multi-topic text categorization. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural inf. proc. syst. (Vol. 17). Cambridge: MIT Press.
Lu, Q., & Getoor, L. (2003). Link-based classification. In Proc. ICML-03 (pp. 496–503) Washington.
Maron, O., & Ratan, A. L. (1998). Multiple-instance learning for natural scene classification. In Proc. ICML (pp. 341–349), Madison, WI.
Schapire, R. E., & Singer, Y. (2000). Boostexter: a boosting-based system for text categorization. Machine Learning, 39(2), 135–168.
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1–47.
Snoek, C. G. M., Worring, M., van Gemert, J. C., Geusebroek, J. M., & Smeulders, A. W. M. (2006). The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proc. ACM multimedia (pp. 421–430), Santa Barbara, USA.
Trohidis, K., Tsoumakas, G., Kalliris, G., & Vlahavas, I. (2008). Multilabel classification of music into emotions. In Proc. int. conf. music information retrieval.
Tsoumakas, G., & Katakis, I. (2007). Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 3(3), 1–13.
Ueda, N., & Saito, K. (2003). Parametric mixture models for multilabel text. In S. Becker & S. Thrun (Eds.), Advances in neural information processing (Vol. 15, pp. 721–728). Cambridge: MIT Press.
Vens, C., Struyf, J., Schietgat, L., Dzeroski, S., & Blockeel, H. (2008). Decision trees for hierarchical multilabel classification. Machine Learning, 73, 185–214.
Witten, I., & Frank, E. (2005). Data mining: practical machine learning tools and techniques (2nd ed.). San Francisco: Morgan Kaufmann.
Zhang, M.-L., & Zhou, Z.-H. (2006). Multi-label neural networks with applications to functional genomics and text categorization. In IEEE transactions on knowledge and data engineering (Vol. 18, pp. 1338–1351).
Zhang, M.-L., & Zhou, Z.-H. (2007). ML-kNN: A lazy learning approach to multilabel learning. Pattern Recognition, 40(7), 2038–2048.
Zhou, Z.-H., & Zhang, M.-L. (2007). Multi-instance multilabel learning with application to scene classification. In B. Schölkopf, J. Platt, & T. Hofmann (Eds.), Advances in neural inf. proc. syst. (Vol. 19, pp. 1609–1616). Cambridge: MIT Press.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Aleksander Kołcz, Dunja Mladenić, Wray Buntine, Marko Grobelnik, and John Shawe-Taylor.
Rights and permissions
About this article
Cite this article
Cheng, W., Hüllermeier, E. Combining instance-based learning and logistic regression for multilabel classification. Mach Learn 76, 211–225 (2009). https://doi.org/10.1007/s10994-009-5127-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-009-5127-5