Abstract
Effective learning in multi-label classification (MLC) requires an appropriate level of abstraction for representing the relationship between each instance and multiple categories. Current MLC methods have focused on learning-to-map from instances to categories in a relatively low-level feature space, such as individual words. The fine-grained features in such a space may not be sufficiently expressive for learning to rank categories, which is essential in multi-label classification. This paper presents an alternative solution by transforming the conventional representation of instances and categories into meta-level features, and by leveraging successful learning-to-rank retrieval algorithms over this feature space. Controlled experiments on six benchmark datasets using eight evaluation metrics show strong evidence for the effectiveness of the proposed approach, which significantly outperformed other state-of-the-art methods such as Rank-SVM, ML-kNN (Multi-label kNN), IBLR-ML (Instance-based logistic regression for multi-label classification) on most of the datasets. Thorough analyses are also provided for separating the factors responsible for the improved performance.
Article PDF
Similar content being viewed by others
References
Arya, S., Mount, D., Netanyahu, N., Silverman, R., & Wu, A. (1998). An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM , 45(6), 891–923.
Boutell, M., Luo, J., Shen, X., & Brown, C. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757–1771.
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., & Hullender, G. (2005). Learning to rank using gradient descent. In Proceedings of the 22nd international conference on machine learning (p. 96). New York: ACM.
Burges, C., Ragno, R., & Le, Q. (2007). Learning to rank with nonsmooth cost functions. Advances in Neural Information Processing Systems, 19, 193.
Cao, Z., Qin, T., Liu, T., Tsai, M., & Li, H. (2007). Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on machine learning (p. 136). New York: ACM.
Cheng, W., & Hüllermeier, E. (2009). Combining instance-based learning and logistic regression for multilabel classification. Machine Learning, 76(2–3), 211–225. doi:10.1007/s10994-009-5127-5, http://www.springerlink.com/content/m20342966250233x/.
Creecy, R., Masand, B., Smith, S., & Waltz, D. (1992). Trading MIPS and memory for knowledge engineering. Communications of the ACM, 35(8), 48–64.
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 30.
Donmez, P., Svore, K., & Burges, C. (2009). On the local optimality of LambdaRank. In Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval (pp. 460–467). New York: ACM.
Elisseeff, A., & Weston, J. (2001). Kernel methods for multi-labelled classification and categorical regression problems. In Advances in neural information processing systems (Vol. 14, pp. 681–687). Cambridge: MIT Press.
Freund, Y., Iyer, R., Schapire, R., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4, 933–969.
Ganapathiraju, A., Hamaker, J., & Picone, J. (1998). Support vector machines for speech recognition. In International conference on spoken language processing (pp. 2923–2926). New York: ACM.
Garcıa, S., & Herrera, F. (2008). An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. Journal of Machine Learning Research, 9, 2677–2694.
Gopal, S., & Yang, Y. (2010). Multilabel classification with meta-level features. In Proceeding of the 33rd international ACM SIGIR conference on research and development in information retrieval (pp. 315–322). New York: ACM.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning.
Järvelin, K., & Kekäläinen, J. (2000). IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (pp. 41–48). New York: ACM.
Joachims, T. (1999). Making large-scale support vector machine learning practical. In Advances in kernel methods: support vector learning.
Joachims, T. (2002). Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 133–142). New York: ACM.
Kleinberg, J. (1997). Two algorithms for nearest-neighbor search in high dimensions. In Proceedings of the twenty-ninth annual ACM symposium on theory of computing (pp. 599–608). New York: ACM.
Lewis, D., Schapire, R., Callan, J., & Papka, R. (1996). Training algorithms for linear text classifiers. In Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval (pp. 298–306). New York: ACM.
Li, P., Burges, C., Wu, Q., Platt, J., Koller, D., Singer, Y., & Roweis, S. (2007) McRank: Learning to rank using multiple classification and gradient boosting. Advances in Neural Information Processing Systems.
Qin, T., Liu, T., Xu, J., & Li, H. (2010) LETOR: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval, 1–29.
Roussopoulos, N., Kelley, S., & Vincent, F. (1995). Nearest neighbor queries. In ACM sigmod record (Vol. 24, pp. 71–79). New York: ACM.
Schapire, R., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3), 297–336.
Schapire, R., & Singer, Y. (2000). BoosTexter: A boosting-based system for text categorization. Machine Learning, 39(2), 135–168.
Trohidis, K., Tsoumakas, G., Kalliris, G., & Vlahavas, I. (2008). Multilabel classification of music into emotions. In Proc. 9th international conference on music information retrieval (ISMIR 2008), Philadelphia, PA, USA (Vol. 2008).
Tsai, M., Liu, T., Qin, T., Chen, H., & Ma, W. (2007). FRank: A ranking method with fidelity loss. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (p. 390). New York: ACM.
Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2006). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6(2), 1453.
Tsoumakas, G., Vilcek, J., Spyromitros, E., & Vlahavas, I. (2010). Mulan: a Java library for multilabel learning. Journal of Machine Learning Research, 1, 1–48.
Vapnik, V. (2000). The nature of statistical learning theory. Berlin: Springer.
Voorhees, E. (2003) Overview of TREC 2002. NIST special publication SP (pp. 1–16).
Xu, J., & Li, H. (2007). Adarank: A boosting algorithm for information retrieval. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (p. 398). New York: ACM.
Yang, Y. (1994). Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In ACM SIGIR conference on research and development in information retrieval (pp. 13–22). New York: Springer.
Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1), 69–90.
Yang, Y. (2001). A study of thresholding strategies for text categorization. In Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval (pp. 137–145). New York: ACM.
Yang, Y., Liu, X. (1999). A re-examination of text categorization methods. In Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (pp. 42–49). New York: ACM.
Yang, Y., & Pedersen, J. (1997) A comparative study on feature selection in text categorization. In International conference in machine learning (pp. 412–420). Citeseer.
Yianilos, P. (1993). Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the fourth annual ACM-SIAM symposium on discrete algorithms (pp. 311–321). Philadelphia: Society for Industrial and Applied Mathematics.
Yue, Y., & Finley, T. (2007). A support vector method for optimizing average precision. In Proceedings of SIGIR07 (pp. 271–278). New York: ACM.
Zhang, M., & Zhou, Z. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Grigorios Tsoumakas, Min-Ling Zhang, and Zhi-Hua Zhou.
The work is supported, in part, by the National Science Foundation (NSF) under grant IIS_0704689. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors.
Rights and permissions
About this article
Cite this article
Yang, Y., Gopal, S. Multilabel classification with meta-level features in a learning-to-rank framework. Mach Learn 88, 47–68 (2012). https://doi.org/10.1007/s10994-011-5270-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-011-5270-7