Compressed labeling on distilled labelsets for multi-label learning

Zhou, Tianyi; Tao, Dacheng; Wu, Xindong

doi:10.1007/s10994-011-5276-1

Compressed labeling on distilled labelsets for multi-label learning

Published: 06 January 2012

Volume 88, pages 69–126, (2012)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Compressed labeling on distilled labelsets for multi-label learning

Download PDF

Tianyi Zhou¹,
Dacheng Tao¹ &
Xindong Wu^2,3

1861 Accesses
45 Citations
8 Altmetric
2 Mentions
Explore all metrics

Abstract

Directly applying single-label classification methods to the multi-label learning problems substantially limits both the performance and speed due to the imbalance, dependence and high dimensionality of the given label matrix. Existing methods either ignore these three problems or reduce one with the price of aggravating another. In this paper, we propose a {0,1} label matrix compression and recovery method termed “compressed labeling (CL)” to simultaneously solve or at least reduce these three problems. CL first compresses the original label matrix to improve balance and independence by preserving the signs of its Gaussian random projections. Afterward, we directly utilize popular binary classification methods (e.g., support vector machines) for each new label. A fast recovery algorithm is developed to recover the original labels from the predicted new labels. In the recovery algorithm, a “labelset distilling method” is designed to extract distilled labelsets (DLs), i.e., the frequently appeared label subsets from the original labels via recursive clustering and subtraction. Given a distilled and an original label vector, we discover that the signs of their random projections have an explicit joint distribution that can be quickly computed from a geometric inference. Based on this observation, the original label vector is exactly determined after performing a series of Kullback-Leibler divergence based hypothesis tests on the distribution about the new labels. CL significantly improves the balance of the training samples and reduces the dependence between different labels. Moreover, it accelerates the learning process by training fewer binary classifiers for compressed labels, and makes use of label dependence via DLs based tests. Theoretically, we prove the recovery bounds of CL which verifies the effectiveness of CL for label compression and multi-label classification performance improvement brought by label correlations preserved in DLs. We show the effectiveness, efficiency and robustness of CL via 5 groups of experiments on 21 datasets from text classification, image annotation, scene classification, music categorization, genomics and web page classification.

References

Bian, W., & Tao, D. (2011). Max-min distance analysis by using sequential sdp relaxation for dimension reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 1037–1050.
Article Google Scholar
Bianchi, N.C., Gentile, C., & Zaniboni, L. (2006). Incremental algorithms for hierarchical classification. Journal of Machine Learning Research, 7, 31–54.
MATH Google Scholar
Boutell, M.R., Luo, J., Shen, X., & Brown, C.M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757–1771.
Article Google Scholar
Breiman, L., & Friedman, J.H. (1997). Predicting multivariate responses in multiple linear regression (with discussion). The Journal of the Royal Statistical Society Series B, 54, 5–54.
Google Scholar
Candès, E.J., Romberg, J.K., & Tao, T. (2006). Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory, 52(2), 489–509.
Article MATH Google Scholar
Chang, C.-C., & Lin, C.-J. (2001). LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P. (2002). Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.
MATH Google Scholar
Chen, J., Liu, J., & Ye, J. (2010). Learning incoherent sparse and low-rank patterns from multiple tasks. In SIGKDD’10: The 16th ACM SIGKDD international conference on knowledge discovery and data mining.
Google Scholar
Cheng, W., & Hüllermeier, E. (2009). Combining instance-based learning and logistic regression for multilabel classification. Machine Learning, 76(2–3), 211–225.
Article Google Scholar
Clare, A., & King, R. D. (2001). Knowledge discovery in multi-label phenotype data. In PKDD’01: Proceedings of the 5th European conference on principles of data mining and knowledge discovery (pp. 42–53). London: Springer.
Chapter Google Scholar
Clarkson, K.L. (2008). Tighter bounds for random projections of manifolds. In SCG’08: Proceedings of the 24 annual symposium on computational geometry (pp. 39–48).
Google Scholar
Crammer, K., & Singer, Y. (2003). A family of additive online algorithms for category ranking. Journal of Machine Learning Research, 3, 1025–1058.
MathSciNet MATH Google Scholar
Dasgupta, S. (2000). Experiments with random projection. In UAI’00: Proceedings of the 16th conference on uncertainty in artificial intelligence, San Francisco, CA, USA (pp. 143–151).
Google Scholar
Dasgupta, S., & Freund, Y. (2008). Random projection trees and low dimensional manifolds. In STOC’08: Proceedings of the 40th annual ACM symposium on theory of computing (pp. 537–546).
Google Scholar
Dembczyński, K., Cheng, W., & Hüllermeier, E. (2010). Bayes optimal multilabel classification via probabilistic classifier chains. In The 27th international conference on machine learning (ICML 2010).
Google Scholar
Dembczyński, K., Waegeman, W., Cheng, W., & Hüllermeier, E. (2010). On label dependence in multi-label classification. In ICML 2010 workshop on learning from multi-label data (MLD 10) (pp. 5–13).
Google Scholar
Dietterich, T.G., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–282.
MATH Google Scholar
Diplaris, S., Tsoumakas, G., Mitkas, P.A., & Vlahavas, I. (2005). Protein classification with multiple algorithms. In Proceedings of the 10th Panhellenic conference on informatics (PCI 2005) (pp. 448–456).
Google Scholar
Donoho, D.L. (2006). Compressed sensing. IEEE Transactions on Information Theory, 52(4), 1289–1306.
Article MathSciNet Google Scholar
Duygulu, P., Barnard, K., de Freitas, N., & Forsyth, D. (2002). Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In ECCV’02: Proceedings of the 7th European conference on computer vision-part IV (pp. 97–112). London: Springer.
Google Scholar
Efron, B., Hastie, T., Johnstone, L., & Tibshirani, R. (2002). Least angle regression. Annals of Statistics, 32, 407–499.
MathSciNet Google Scholar
Escalera, S., Pujol, O., & Radeva, P. (2010). On the decoding process in ternary error-correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(1), 120–134.
Article Google Scholar
Evgeniou, T., & Pontil, M. (2004). Regularized multi-task learning. In KDD’04: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 109–117).
Google Scholar
Fürnkranz, J., Hüllermeier, E., Mencía, E.L., & Brinker, K. (2008). Multilabel classification via calibrated label ranking. Machine Learning, 73(2), 133–153.
Article Google Scholar
Ghamrawi, N., & McCallum, A. (2005). Collective multi-label classification. In CIKM’05: Proceedings of the 14th ACM international conference on information and knowledge management (pp. 195–200).
Chapter Google Scholar
Goemans, M.X., & Williamson, D.P. (1995). Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM, 42(6), 1115–1145.
Article MathSciNet MATH Google Scholar
Gomez, J., Boiy, E., & Moens, M.-F. (2011). Highly discriminative statistical features for email classification. Knowledge and Information Systems. doi:10.1007/s10115-011-0403-7.
Gretton, A., Bousquet, O., Smola, E., & Schölkopf, B. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. In Proceedings algorithmic learning theory (pp. 63–77). Berlin: Springer.
Chapter Google Scholar
Guan, N., Tao, D., Luo, Z., & Yuan, B. (2011). Non-negative patch alignment framework. IEEE Transactions on Neural Networks, 22(8), 1218–1230.
Article Google Scholar
Gupta, A., Nowak, R., & Recht, B. (2010). Sample complexity for 1-bit compressed sensing and sparse classification. In Proceedings of the IEEE international symposium on information theory (ISIT).
Google Scholar
Halko, N., Martinsson, P.-G., & Tropp, J.A. (2009). Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions. arXiv:0909.4061.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction (2nd ed.). Springer series in statistics. Berlin: Springer. Corr. 3rd printing edition.
MATH Google Scholar
Hsu, D., Kakade, S.M., Langford, J., & Zhang, T. (2009). Multi-label prediction via compressed sensing. In Advances in neural information processing systems 23.
Google Scholar
Hüllermeier, E., Fürnkranz, J., Cheng, W., & Brinker, K. (2008). Label ranking by learning pairwise preferences. Artificial Intelligence, 172(16–17), 1897–1916.
Article MathSciNet MATH Google Scholar
Indyk, P., & Motwani, R. (1998). Approximate nearest neighbors: towards removing the curse of dimensionality. In STOC’98: Proceedings of the thirtieth annual ACM symposium on theory of computing (pp. 604–613).
Chapter Google Scholar
Jia, S., & Ye, J. (2009). An accelerated gradient method for trace norm minimization. In The 26th international conference on machine learning (ICML) (pp. 457–464).
Google Scholar
Jia, S., Tang, L., Yu, S., & Ye, J. (2010). A shared-subspace learning framework for multi-label classification. ACM Transactions on Knowledge Discovery from Data, 2(1).
Johnson, W., & Lindenstrauss, J. (1984). Extensions of Lipschitz mappings into a Hilbert space. In Contemporary Mathematics: Vol. 26. Conference in modern analysis and probability, New Haven, Conn., 1982 (pp. 189–206). Providence: Am. Math. Soc.
Chapter Google Scholar
Katakis, I., Tsoumakas, G., & Vlahavas, I. (2008). Multilabel text classification for automated tag suggestion. In Proceedings of the ECML/PKDD 2008 discovery challenge.
Google Scholar
Kong, X., & Yu, P. (2011). gmlc: a multi-label feature selection framework for graph classification. Knowledge and Information Systems. doi:10.1007/s10115-011-0407-3.
Koufakou, A., Secretan, J., & Georgiopoulos, M. (2011). Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data. Knowledge and Information Systems, 29, 697–725.
Article Google Scholar
Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 79–86.
Article MathSciNet MATH Google Scholar
Langford, J., & Beygelzimer, A. (2005). Sensitive error correcting output codes. In COLT’05: annual conference on learning theory (Vol. 3559, pp. 158–172).
Google Scholar
Li, P. (2008). Estimators and tail bounds for dimension reduction in ℓ _α(0<α≤2) using stable random projections. In SODA’08: Proceedings of the nineteenth annual ACM-SIAM symposium on discrete algorithms (pp. 10–19). Philadelphia: Society for Industrial and Applied Mathematics.
Google Scholar
Li, P. (2010). Approximating higher-order distances using random projections. In The 26th conference on uncertainty in artificial intelligence (UAI 2010).
Google Scholar
Liu, L., & Liang, Q. (2011). A high-performing comprehensive learning algorithm for text classification without pre-labeled training set. Knowledge and Information Systems, 29, 727–738.
Article Google Scholar
Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395–416.
Article MathSciNet Google Scholar
MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In Proceeding of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281–297).
Google Scholar
Masud, M., Woolam, C., Gao, J., Khan, L., Han, J., Hamlen, K., & Oza, N. (2011). Facing the reality of data stream classification: coping with scarcity of labeled data. Knowledge and Information Systems. doi:10.1007/s10115-011-0447-8.
Mencía, L., & Fürnkranz, J. (2008). Pairwise learning of multilabel classifications with perceptrons. In IEEE international joint conference on neural networks (IJCNN-08) (pp. 995–1000).
Google Scholar
Ng, A.Y., Jordan, M.I., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In NIPS’01: advances in neural information processing systems 14 (Vol. 2, pp. 849–856).
Google Scholar
Osuna, E., Freund, R., & Girosi, F. (1997). Support vector machines: Training and applications (Technical report). Massachusetts Institute of Technology.
Raginsky, M., & Lazebnik, S. (2009). Locality-sensitive binary codes from shift-invariant kernels. In The 23rd annual conference on neural information processing systems (NIPS 2009).
Google Scholar
Read, J. (2010). Meka softwares.
Read, J., Pfahringer, B., & Holmes, G. (2008). Multi-label classification using ensembles of pruned sets. In ICDM’08: Proceedings of the 2008 eighth IEEE international conference on data mining, Washington, DC, USA (pp. 995–1000).
Chapter Google Scholar
Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2009). Classifier chains for multi-label classification. Machine Learning, 85(3), 333–359.
Article Google Scholar
Schapire, R.E., & Singer, Y. (2000). Boostexter: a boosting-based system for text categorization. Machine Learning, 39(2/3), 135–168.
Article MATH Google Scholar
Si, S., Tao, D., & Geng, B. (2010). Bregman divergence-based regularization for transfer subspace learning. IEEE Transactions on Knowledge and Data Engineering, 22(7), 929–942.
Article Google Scholar
Snoek, C. G. M., Worring, M., van Gemert, J. C., Geusebroek, J. M., & Smeulders, A. W. M. (2006). The challenge problem for automated detection of 101 semantic concepts in multimedia. In MULTIMEDIA ’06: Proceedings of the 14th annual ACM international conference on multimedia (pp. 421–430). New York: ACM.
Chapter Google Scholar
Tao, D., Li, X., Wu, X., & Maybank, S.J. (2009). Geometric mean for subspace selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2), 260–274.
Article Google Scholar
Trohidis, K., Tsoumakas, G., Kalliris, G., & Vlahavas, I. (2008). Multilabel classification of music into emotions. In Proc. 9th international conference on music information retrieval (ISMIR 2008). Philadelphia, PA, USA.
Google Scholar
Tsoumakas, G. (2010). Mulan: A java library for multi-label learning. http://mulan.sourceforge.net/.
Tsoumakas, G., & Katakis, I. (2007). Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 3(3), 1–13.
Article Google Scholar
Tsoumakas, G., Katakis, I., & Vlahavas, I. (2008). Effective and efficient multilabel classification in domains with large number of labels. In Proc. ECML/PKDD 2008 workshop on mining multidimensional data (MMD’08 2007).
Google Scholar
Tsoumakas, G., Katakis, I., & Vlahavas, I. (2010). Mining multi-label data. In Data mining and knowledge discovery handbook.
Google Scholar
Tsoumakas, G., & Vlahavas, I. (2007). Random k-labelsets: An ensemble method for multilabel classification. In Proceedings of the 18th European conference on machine learning (ECML 2007), Warsaw, Poland, 2007 (pp. 406–417)
Google Scholar
Ueda, N., & Saito, K. (2002). Parametric mixture models for multi-labeled text. In Advances in neural information processing systems 15. Neural information processing systems, NIPS 2002.
Google Scholar
Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.
MATH Google Scholar
Vempala, S.S. (2004). The random projection method. DIMACS Series in Discrete Mathematics and Theoretical Computer Science: Vol. 65. Providence: Am. Math. Soc.
MATH Google Scholar
Yates, F. (1934). Contingency tables involving small numbers and the χ ² test. Journal of the Royal Statistical Society, 1, 217–235.
Google Scholar
Zhang, M., & Wang, Z. (2009). Mimlrbf: Rbf neural networks for multi-instance multi-label learning. Neurocomputing, 72(16–18), 3951–3956.
Article Google Scholar
Zhang, M., & Zhou, Z. (2006). Multi-label neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering, 18(10), 1338–1351.
Article Google Scholar
Zhang, M., & Zhou, Z. (2007). Ml-knn: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048.
Article MATH Google Scholar
Zhang, Y., & Zhou, Z. (2008). Multi-label dimensionality reduction via dependence maximization. In AAAI’08: Proceedings of the 23rd national conference on artificial intelligence (pp. 1503–1505).
Google Scholar
Zhou, T., Tao, D., & Wu, X. (2010). Nesvm: A fast gradient method for support vector machines. In ICDM’10: Proceedings of the 2010 IEEE international conference on data mining (pp. 679–688).
Chapter Google Scholar
Zhou, T., Tao, D., & Wu, X. (2011). Manifold elastic net: a unified framework for sparse dimension reduction. Data Mining and Knowledge Discovery (Springer), 22(3), 340–371.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Quantum Computation & Intelligent Systems (QCIS), Faculty of Engineering & IT, University of Technology, Sydney (UTS), 235 Jones Street, Broadway, NSW, 2007, Australia
Tianyi Zhou & Dacheng Tao
Department of Computer Science, Hefei University of Technology, Hefei, 230009, China
Xindong Wu
Department of Computer Science, University of Vermont, 351 Votey Hall, Burlington, VT, 05405, USA
Xindong Wu

Authors

Tianyi Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Dacheng Tao
View author publications
You can also search for this author in PubMed Google Scholar
Xindong Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xindong Wu.

Additional information

Editors: Grigorios Tsoumakas, Min-Ling Zhang, and Zhi-Hua Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, T., Tao, D. & Wu, X. Compressed labeling on distilled labelsets for multi-label learning. Mach Learn 88, 69–126 (2012). https://doi.org/10.1007/s10994-011-5276-1

Download citation

Received: 30 September 2010
Accepted: 12 December 2011
Published: 06 January 2012
Issue Date: July 2012
DOI: https://doi.org/10.1007/s10994-011-5276-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Compressed labeling on distilled labelsets for multi-label learning

Abstract

Article PDF

Similar content being viewed by others

A survey on semi-supervised learning

Learning from positive and unlabeled data: a survey

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Compressed labeling on distilled labelsets for multi-label learning

Abstract

Article PDF

Similar content being viewed by others

A survey on semi-supervised learning

Learning from positive and unlabeled data: a survey

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation