A New Inductive Learning Method for Multilabel Text Categorization
- Yu-Chuan ChangAffiliated withDepartment of Computer Science and Information Engineering, National Taiwan University of Science and Technology
- , Shyi-Ming ChenAffiliated withDepartment of Computer Science and Information Engineering, National Taiwan University of Science and Technology
- , Churn-Jung LiauAffiliated withInstitute of Information Science, Academia Sinica
In this paper, we present a new inductive learning method for multilabel text categorization. The proposed method uses a mutual information measure to select terms and constructs document descriptor vectors for each category based on these terms. These document descriptor vectors form a document descriptor matrix. It also uses the document descriptor vectors to construct a document-similarity matrix based on the "cosine similarity measure". It then constructs a term-document relevance matrix by applying the inner product of the document descriptor matrix to the document similarity matrix. The proposed method infers the degree of relevance of the selected terms to construct the category descriptor vector of each category. Then, the relevance score between each category and a testing document is calculated by applying the inner product of its category descriptor vector to the document descriptor vector of the testing document. The maximum relevance score L is then chosen. If the relevance score between a category and the testing document divided by L is not less than a predefined threshold value λ between zero and one, then the document is classified into that category. We also compare the classification accuracy of the proposed method with that of the existing learning methods (i.e., Find Similar, Naïve Bayes, Bayes Nets and Decision Trees) in terms of the break-even point of micro-averaging for categorizing the "Reuters-21578 Aptè split" data set. The proposed method gets a higher average accuracy than the existing methods.
- A New Inductive Learning Method for Multilabel Text Categorization
- Book Title
- Advances in Applied Artificial Intelligence
- Book Subtitle
- 19th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2006, Annecy, France, June 27-30, 2006. Proceedings
- pp 1249-1258
- Print ISBN
- Online ISBN
- Series Title
- Lecture Notes in Computer Science
- Series Volume
- Series ISSN
- Springer Berlin Heidelberg
- Copyright Holder
- Springer-Verlag Berlin Heidelberg
- Additional Links
- Industry Sectors
- eBook Packages
- Editor Affiliations
- 18. Department of Computer Science, Texas State University-San Marcos
- 19. ESIA Laboratoire d’Informatique, Sytèmes, Traitement de l’Information et de la Connaissance, Université de Savoie
- Author Affiliations
- 20. Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, R.O.C.
- 21. Institute of Information Science, Academia Sinica, Taipei, Taiwan, R.O.C.
To view the rest of this content please follow the download PDF link above.