Learning Semantic Similarity for Multi-label Text Categorization

Li, Li; Wang, Mengxiang; Zhang, Longkai; Wang, Houfeng

doi:10.1007/978-3-319-14331-6_26

Li Li⁶,
Mengxiang Wang⁶,
Longkai Zhang⁶ &
…
Houfeng Wang⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8922))

Included in the following conference series:

Workshop on Chinese Lexical Semantics

1841 Accesses
5 Citations

Abstract

The multi-label text categorization is supervised learning, where a document is associated with multiple labels simultaneously. The current multi-label text categorization approaches suffer from limitations when the expensive labelled text data is little but the unlabelled text data is abundant, because they are unable to exploit information from unlabelled text data. To address this problem, we learn the word semantic similarity by deep learning using the unlabelled text data, and then incorporate the learned word semantic similarity into current multi-label text categorization approaches. We conduct experiments with the Slashdot and Tmc2007 datasets, and these experiments demonstrate our proposed method will greatly improve the performance of current multi-label text categorization approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bengio, Y.: Learning deep architectures for ai. Foundations and Trends\({\textregistered }\) in Machine Learning 2(1), 1–127 (2009)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. The Journal of Machine Learning Research 12, 2493–2537 (2011)
MATH Google Scholar
Ghamrawi, N., McCallum, A.: Collective multi-label classification. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 195–200. ACM (2005)
Google Scholar
Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 873–882. Association for Computational Linguistics (2012)
Google Scholar
Katakis, I., Tsoumakas, G., Vlahavas, I.: Multilabel text classification for automated tag suggestion. In: Proceedings of the ECML/PKDD (2008)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). arXiv preprint arXiv:1301.3781
Mikolov, T., Yih, W.-T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of NAACL-HLT, pp. 746–751 (2013)
Google Scholar
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Machine Learning 85(3), 333–359 (2011)
Article MathSciNet Google Scholar
Schapire, R.E., Singer, Y.: Boostexter: A boosting-based system for text categorization. Machine Learning 39(2), 135–168 (2000)
Article MATH Google Scholar
Schölkopf, B., Simard, P., Smola, A.J., Vapnik, V.: Prior knowledge in support vector kernels. In: Advances in Neural Information Processing Systems, pp. 640–646 (1998)
Google Scholar
Srivastava, A.N., Zane-Ulman, B.: Discovering recurring anomalies in text reports regarding complex space systems. In: 2005 IEEE Aerospace Conference, pp. 3853–3862. IEEE (2005)
Google Scholar
Mikolov, G.C.T., Chen, K., Dean, J.: word2vec (2013). https://code.google.com/p/word2vec/
Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM) 3(3), 1–13 (2007)
Article Google Scholar
Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and efficient multilabel classification in domains with large number of labels. In: Proc. ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD 2008), pp. 30–44 (2008)
Google Scholar
Tsoumakas, G., Vlahavas, I.P.: Random k-labelsets: An ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007)
Chapter Google Scholar
Zhang, Y., Dubrawski, A., Schneider, J.G.: Learning the semantic correlation: An alternative way to gain from unlabeled text. In: Advances in Neural Information Processing Systems, pp. 1945–1952 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Computational Linguistics, Peking University, Ministry of Education, Beijing, China
Li Li, Mengxiang Wang, Longkai Zhang & Houfeng Wang

Authors

Li Li
View author publications
You can also search for this author in PubMed Google Scholar
Mengxiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Longkai Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Houfeng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Houfeng Wang .

Editor information

Editors and Affiliations

Xiamen University, Xiamen, Fujian, China
Xinchun Su
Central China Normal University, Wuhan, Hubei, China
Tingting He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, L., Wang, M., Zhang, L., Wang, H. (2014). Learning Semantic Similarity for Multi-label Text Categorization. In: Su, X., He, T. (eds) Chinese Lexical Semantics. CLSW 2014. Lecture Notes in Computer Science(), vol 8922. Springer, Cham. https://doi.org/10.1007/978-3-319-14331-6_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-14331-6_26
Published: 27 December 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14330-9
Online ISBN: 978-3-319-14331-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics