Abstract
Automatic text categorization (ATC) is a prominent research area within Information retrieval. Through this paper a classification model for ATC in multi-label domain is discussed. We are proposing a new multi label text classification model for assigning more relevant set of categories to every input text document. Our model is greatly influenced by graph based framework and Semi supervised learning. We demonstrate the effectiveness of our model using Enron, Slashdot, Bibtex and RCV1 datasets. We also compare performance of our model with few popular existing supervised techniques. Our experimental results indicate that the use of Semi Supervised Learning in multi label text classification greatly improves the decision making capability of classifier.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Zhu, J.: Semi-supervised learning Literature Survey. Computer Science Technical Report TR 1530, University of Wisconsin – Madison (2005)
Chapelle, O., Schfolkopf, B., Zien, A.: Semi-Supervised Learning, 03-08. MIT Press (2006)
Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3(3), 1–13 (2007)
Santos, A., Canuto, A., Neto, A.: A comparative analysis of classification methods to multi-label tasks in different application domains. International Journal of Computer Information Systems and Industrial Management Applications 3, 218–227 (2011) ISSN: 2150-7988
Cerri, R., da Silva, R.R.O., de Carvalho, A.C.P.L.F.: Comparing methods for multilabel classification of proteins using machine learning techniques. In: Guimarães, K.S., Panchenko, A., Przytycka, T.M. (eds.) BSB 2009. LNCS, vol. 5676, pp. 109–120. Springer, Heidelberg (2009)
Tsoumakas, G., Kalliris, G., Vlahavas, I.: Effective and efficient multilabel classification in domains with large number of labels. In: Proc. of the ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD 2008), pp. 30–44 (2008)
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.M.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 103–134 (2000)
Liu, Y., Jin, R., Yang, L.: Semi-supervised Multi-label Learning by Constrained Non-Negative Matrix Factorization. In: AAAI (2006)
Zha, Z., Mie, T., Wang, Z., Hua, X.: Graph-Based Semi-Supervised Learning with Multi-label. In: ICME, pp. 1321–1324 (2008)
Chen, G., Song, Y., Zhang, C.: Semi-supervised Multi-label Learning by Solving a Sylvester Equation. In: SDM (2008)
Semi-supervised Nonnegative Matrix factorization. IEEE (January 2011)
Wei, Q., Yang, Z., Junping, Z., Wang, Y.: Semi-supervised Multi- label Learning Algorithm using dependency among labels. In: IPCSIT, vol. 3 (2011)
Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. In: 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (2004)
Angelova, R., Weikum, G.: Graph based text classification: Learn from your neighbours. In: SIGIR 2006. ACM (2006) 1-59593-369-7/06/0008
Jebara, T., Wang, Chang: Graph construction and b-matching for semi supervised learning. In: Proceedings of ICML- 2009(2009)
Thomas, Ilias, Nello: Scalable corpus annotation by graph construction and label propogation. In: Proceedings of ICPRAM, pp. 25–34 (2012)
Talukdar, P., Pereira, F.: Experimentation in graph based semi supervised learning methods for class instance acquisition. In: The Proceedings of 48th Annual Meet of ACL, pp. 1473–1481 (2010)
Dai, X., Tian, B., Zhou, J., Chen, J.: Incorporating LSI into spectral graph transducer for text classification. In: The Proceedings of AAAI (2008)
Dharmadhikari, S.C., Ingle, M., Kulkarni, P.: Analysis of semi supervised methods towards multi-label text classification. IJCA 42, 15–20, ISBN: 973-93-80866-84-5
Dharmadhikari, S.C., Ingle, M., Kulkarni, P.: A comparative analysis of supervised multi-label text classification methods. IJERA 1(4), 1952–1961, ISSN: 2248-9622
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part II. LNCS, vol. 5782, pp. 254–269. Springer, Heidelberg (2009)
Schapire, R.E., Singer, Y.: Boostexter: A boosting based system for text categorization. Machine learning 39(2-3) (2000)
Ueda, Saito, K.: Parametric mixture models for multi-labelled text. In: Proc. of NIPS (2002)
Griffiths, Ghahramani: Infinite latent feature models and the Indian buffet process. In: Proc. of NIPS (2005)
Rousu, Saunders: On maximum margin hierarchical multi-label classification. In: Proc. of NIPS Workshop on Learning with Structured Outputs (2004)
Zhu, S., Ji, X., Gong, Y.: Multi-labelled classification using maximum entropy method. In: Proc. of SIGIR (2005)
Ding, C., Jin, R., li, T., Simon, H.: A learning framework using Green’s Function and Kernel Regularization with application to Recommender System. ACM, San Jose (2007) 978-1-59593-609-7/07/0008
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Dharmadhikari, S.C., Ingle, M., Kulkarni, P. (2014). Semi Supervised Learning Based Text Classification Model for Multi Label Paradigm. In: Das, V.V., Elkafrawy, P. (eds) Signal Processing and Information Technology. SPIT 2012. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 117. Springer, Cham. https://doi.org/10.1007/978-3-319-11629-7_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-11629-7_26
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11628-0
Online ISBN: 978-3-319-11629-7
eBook Packages: Computer ScienceComputer Science (R0)