Information Gain Based Term Weighting Method for Multi-label Text Classification Task

Mazyad, Ahmad; Teytaud, Fabien; Fonlupt, Cyril

doi:10.1007/978-3-030-01054-6_44

Information Gain Based Term Weighting Method for Multi-label Text Classification Task

Ahmad Mazyad¹⁷,
Fabien Teytaud¹⁷ &
Cyril Fonlupt¹⁷

Conference paper
First Online: 09 November 2018

1616 Accesses
3 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 868))

Abstract

In text classification, terms are given weights using Term Weighting Scheme (TWS) in order to improve classification performance. Multi-label classification task are generally simplified into several single-label binary task. Thus, the term distribution are considered only in terms of positive and negative categories. In this paper, we propose a new TWS based on the information gain measure for multi-label classification task. This TWS try to overcome this shortness without affecting the complexity of the problem. In this paper, we examine our proposed TWS with eight well-known TWS on two popular problems using five learning algorithms. From our experimental results, our new proposed method outperforms other methods, especially regarding the macro-averaging measure.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://disi.unitn.it/moschitti/corpora.htm.

References

Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Article Google Scholar
Jones, S.K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)
Article Google Scholar
Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Text Mining and its Applications, pp. 81–97. Springer (2004)
Google Scholar
Deng, Z.-H., Tang, S.-W., Yang, D.-Q., Li, M.Z.L.-Y., Xie, K.-Q.: A comparative study on feature weight in text categorization. In: Advanced Web Technologies and Applications, pp. 588–597. Springer (2004)
Google Scholar
Wang, D., Zhang, H.: Inverse category frequency based supervised term weighting scheme for text categorization, preprint arXiv:1012.2609v4 (2013)
Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 721–735 (2009)
Article Google Scholar
Mazyad, A., Teytaud, F., Fonlupt, C.: A comparative study on term weighting schemes for text classification (2017)
Google Scholar
Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. Int. J. Data Warehous. Min. (IJDWM) 3(3), 1–13 (2007)
Article Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Hoboken (2012)
MATH Google Scholar
Mladeni’c, D., Grobelnik, M.: Feature selection for classification based on text hierarchy. In: Text and the Web, Conference on Automated Learning and Discovery, CONALD 1998. Citeseer (1998)
Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: European Conference on Machine Learning, pp. 137–142. Springer (1998)
Google Scholar
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. J. Mach. Learn. Res. 7(Mar), 551–585 (2006)
Google Scholar
Zhang, T.: Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: ICML 2004, pp. 919–926 (2004)
Google Scholar
Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. 99(10), 6567–6572 (2002)
Article Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

LISIC, Université du Littoral Côte d’Opale, 50 Rue Ferdinand Buisson, 62100, Calais, France
Ahmad Mazyad, Fabien Teytaud & Cyril Fonlupt

Authors

Ahmad Mazyad
View author publications
You can also search for this author in PubMed Google Scholar
Fabien Teytaud
View author publications
You can also search for this author in PubMed Google Scholar
Cyril Fonlupt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmad Mazyad .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, UK
Supriya Kapoor
The Science and Information (SAI) Organization, Bradford, UK
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mazyad, A., Teytaud, F., Fonlupt, C. (2019). Information Gain Based Term Weighting Method for Multi-label Text Classification Task. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2018. Advances in Intelligent Systems and Computing, vol 868. Springer, Cham. https://doi.org/10.1007/978-3-030-01054-6_44

Download citation

DOI: https://doi.org/10.1007/978-3-030-01054-6_44
Published: 09 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01053-9
Online ISBN: 978-3-030-01054-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics