Improving Text Classification with Concept Index Terms and Expansion Terms

Fu, XiangHua; Liu, LianDong; Gong, TianXue; Tao, Lan

doi:10.1007/978-3-642-21111-9_55

XiangHua Fu²¹,
LianDong Liu²¹,
TianXue Gong²¹ &
…
Lan Tao²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6677))

Included in the following conference series:

International Symposium on Neural Networks

2143 Accesses

Abstract

Feature selection methods are widely employed to improve classification accuracy by removing redundant and noisy features. However, removing terms from documents may damage the integrity of content. To bridge the gap between the integrity of documents and the performance of classification, we propose a novel method for classification by two steps. Firstly, we select index terms and expansion terms through Maximum-Relevance and Minimum-Redundancy Analysis (MR2A). Then we combine the predictive power of index terms and expansion terms via Concept Similarity Mapping (CSM). Testing experiments on 20Newsgroups, and SOGOU datasets are carried out under different classifiers. The experiment results show that both CSM and MR2A outperform the baseline methods: Information Gain and Chi-square.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A non-redundant feature selection method for text categorization based on term co-occurrence frequency and mutual information

Article 31 July 2023

Semantic similarity-aware feature selection and redundancy removal for text classification using joint mutual information

Article 13 June 2024

A Novel Feature Selection Technique for Text Classification

References

Liu, H., Liu, L., Zhang, H.: Boosting feature selection using information metric for classification. Neurocomputing 73(1-3), 295–303 (2009)
Article Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1226–1238 (2005)
Google Scholar
Sahlgren, M., Cöster, R.: Using bag-of-concepts to improve the performance of support vector machines in text categorization. In: Proceedings of the 20th international conference on Computational Linguistics. Association for Computational Linguistics, p. 487 (2004)
Google Scholar
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the 2003 IEEE of the Bioinformatics Conference, CSB 2003, pp. 523–528. IEEE, Los Alamitos (2003)
Google Scholar
Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Machine Learning-International Workshop Then Conference, vol. 20, p. 856 (2003)
Google Scholar
Wang, G., Lochovsky, F.: Feature selp̈ection with conditional mutual information maximin in text categorization. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 342–349. ACM, New York (2004)
Google Scholar
Yang, Y., Chute, C.: An example-based mapping method for text categorization and retrieval. ACM Transactions on Information Systems (TOIS) 12(3), 252–277 (1994)
Article Google Scholar
Wittek, P., Darányi, S., Tan, C.: Improving text classification by a sense spectrum approach to term expansion. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning. Association for Computational Linguistics, pp. 183–191 (2009)
Google Scholar
Mladenic, D., Grobelnik, M.: Feature selection for classification based on text hierarchy. In: Proceedings of the Workshop on Learning from Text and the Web (1998)
Google Scholar
Rogati, M., Yang, Y.: High-performing feature selection for text classification. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 659–661. ACM, New York (2002)
Google Scholar
Deng, Z., Tang, S., Yang, D., Li, M., Xie, K.: A comparative study on feature weight in text categorization. In: Advanced Web Technologies and Applications, pp. 588–597 (2004)
Google Scholar
Ko, Y., Seo, J.: Automatic text categorization by unsupervised learning. In: Proceedings of the 18th Conference on Computational Linguistics. Association for Computational Linguistics, vol. 1, pp. 453–459 (2000)
Google Scholar
Yang, Y., Slattery, S., Ghani, R.: A study of approaches to hypertext categorization. Journal of Intelligent Information Systems 18(2), 219–241 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Software Engineering, Shenzhen University, Shenzhen Guangdong, 518060, China
XiangHua Fu, LianDong Liu, TianXue Gong & Lan Tao

Authors

XiangHua Fu
View author publications
You can also search for this author in PubMed Google Scholar
LianDong Liu
View author publications
You can also search for this author in PubMed Google Scholar
TianXue Gong
View author publications
You can also search for this author in PubMed Google Scholar
Lan Tao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Automation, Key Laboratory of Complex Systems and Intelligence Science, Chinese Academy of Sciences, 100190, Beijing, China
Derong Liu
College of Information Science and Engineering, Northeastern University, 110004, Shenyang, Liaoing, China
Huaguang Zhang
Department of Electrical and Computer Engineering, University of Cyprus, 75 Kallipoleos Avenue, 1678, Nicosia, Cyprus
Marios Polycarpou
Dipartimento di Elettronica, Politecnico di Milano, Piazza L. da Vinci 32, 20133, Milano, Italy
Cesare Alippi
Deptartment of Electrical, Computer and Biomedical Engineering, University of Rhode Island, 02881, Kingston, RI, USA
Haibo He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fu, X., Liu, L., Gong, T., Tao, L. (2011). Improving Text Classification with Concept Index Terms and Expansion Terms. In: Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H. (eds) Advances in Neural Networks – ISNN 2011. ISNN 2011. Lecture Notes in Computer Science, vol 6677. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21111-9_55

Download citation

DOI: https://doi.org/10.1007/978-3-642-21111-9_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21110-2
Online ISBN: 978-3-642-21111-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Text Classification with Concept Index Terms and Expansion Terms

Abstract

Access this chapter

Preview

Similar content being viewed by others

A non-redundant feature selection method for text categorization based on term co-occurrence frequency and mutual information

Semantic similarity-aware feature selection and redundancy removal for text classification using joint mutual information

A Novel Feature Selection Technique for Text Classification

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Improving Text Classification with Concept Index Terms and Expansion Terms

Abstract

Access this chapter

Preview

Similar content being viewed by others

A non-redundant feature selection method for text categorization based on term co-occurrence frequency and mutual information

Semantic similarity-aware feature selection and redundancy removal for text classification using joint mutual information

A Novel Feature Selection Technique for Text Classification

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation