Adaptable Term Weighting Framework for Text Classification

Huynh, Dat; Tran, Dat; Ma, Wanli; Sharma, Dharmendra

doi:10.1007/978-3-642-19437-5_21

Dat Huynh¹⁷,
Dat Tran¹⁷,
Wanli Ma¹⁷ &
…
Dharmendra Sharma¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6609))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1289 Accesses
1 Citations

Abstract

In text classification, term frequency and term co-occurrence factors are dominantly used in weighting term features. Category relevance factors have recently been used to propose term weighting approaches. However, these approaches are mainly based on their own-designed text classifiers to adapt to category information, where the advantages of popular text classifiers have been ignored. This paper proposes a term weighting framework for text classification tasks. The framework firstly inherits the benefits of provided category information to estimate the weighting of features. Secondly, based on the feedback information, it is able to continuously adjust feature weightings to find the best representations for documents. Thirdly, the framework robustly makes it possible to work with different text classifiers on classifying the text representations, based on category information. On several corpora with SVM classifier, experiments show that given predicted information from TFxIDF method as initial status, the proposed approach leverages accuracy results and outperforms current text classification approaches.

Download to read the full chapter text

Chapter PDF

On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification

Article 21 May 2019

Turgut Dogan & Alper Kursat Uysal

An improved term weighting method based on relevance frequency for text classification

Article 10 November 2022

Chuanxiao Li, Wenqiang Li, … Hai Xiang

Probabilistic Term Weighting Based on Three-Way Decisions for Class Based Feature Selection

Keywords

References

Apte, C., Damerau, F., Weiss, S.: Text mining with decision rules and decision trees. In: Proc. of Conference on Automated Learning and Discovery, Workshop 6: Learning from Text and the Web (1998)
Google Scholar
Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Proc. of the 2003 ACM Symposium on Applied Computing, pp. 784–788. ACM, New York (2003)
Chapter Google Scholar
Deng, Z., Tang, S., Yang, D., Zhang, M., Wu, X., Yang, M.: A Linear Text Classification Algorithm Based on Category Relevance Factors. In: Digital Libraries: People, Knowledge, and Technology, pp. 88–98 (2010)
Google Scholar
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proc. of the 20^th IJCAI, pp. 1606–1611 (2007)
Google Scholar
Hassan, S., Banea, C.: Random-walk term weighting for improved text classification. In: Proc. of TextGraphs, pp. 53–60 (2006)
Google Scholar
Joachims, T.: Text categorisation with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Lafferty, J., Zhai, C.: Document language models, query models, and risk minimization for information retrieval. In: Proc. of 24^th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 111–119. ACM, New York (2001)
Chapter Google Scholar
Lan, M., Tan, C., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(4), 721–735 (2009)
Article Google Scholar
McCallum, A., Nigam, K.: A comparaison of event models for naive bayes text classfication. In: AAA 1998 Workshop on Learning for Text Categorization (1998)
Google Scholar
Ng, H., Goh, W., Low, K.: Feature selection, perceptron learning, and a usability case study for text categorization. In: Proc. 20^th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 67–73 (1997)
Google Scholar
Robertson, S., Jones, K.S.: Simple, proven approaches to text retrieval. Tech. rep., University of Cambridge (1997)
Google Scholar
Salton, G., Buckley, C.: Approaches term-weighting in automatic text retrieval. In: Proc. of Information Processing and Management, pp. 513–523 (1988)
Google Scholar
Schapire, R., Singer, Y.: Boostester: A boosting-based system for text categorization. Machine Learning, 135–168 (2000)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM computing surveys (CSUR) 34(1), 1–47 (2002)
Article Google Scholar
Strube, M., Ponzetto, S.P.: Wikirelate! computing semantic relatedness using wikipedia. In: Proc. of the 21^st AAAI, pp. 1419–1424 (2006)
Google Scholar
Wang, P., Hu, J., Zeng, H.J., Chen, L., Chen, Z.: Improving text classification by using encyclopaedia knowledge. In: The Seventh IEEE ICDM, pp. 332–341 (2007)
Google Scholar
Wang, W., Do, D.B., Lin, X.: Term graph model for text classification. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 19–30. Springer, Heidelberg (2005)
Chapter Google Scholar
Yang, Y.: Effective and efficient learning for human decision in text categorization and retrieval. In: Proc. 17^th ACM SIGIR Conference on Reseach and Development in Inforation Retrieval, pp. 13–22 (1994)
Google Scholar
Yang, Y., Liu, X.: A re-examination of text categorisation methods. In: Proc. of the 22^nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49 (1999)
Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorisation. In: Proc. of the 14^th ICML, pp. 412–420 (1997)
Google Scholar
Yu, S., Zhang, J.: A class core extraction method for text categorisation. In: Proc. of the 6^th FSKD, pp. 3–7 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information Sciences and Engineering, University of Canberra, ACT 2601, Australia
Dat Huynh, Dat Tran, Wanli Ma & Dharmendra Sharma

Authors

Dat Huynh
View author publications
You can also search for this author in PubMed Google Scholar
Dat Tran
View author publications
You can also search for this author in PubMed Google Scholar
Wanli Ma
View author publications
You can also search for this author in PubMed Google Scholar
Dharmendra Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huynh, D., Tran, D., Ma, W., Sharma, D. (2011). Adaptable Term Weighting Framework for Text Classification. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19437-5_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-19437-5_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19436-8
Online ISBN: 978-3-642-19437-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Adaptable Term Weighting Framework for Text Classification

Abstract

Chapter PDF

Similar content being viewed by others

On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification

An improved term weighting method based on relevance frequency for text classification

Probabilistic Term Weighting Based on Three-Way Decisions for Class Based Feature Selection

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Adaptable Term Weighting Framework for Text Classification

Abstract

Chapter PDF

Similar content being viewed by others

On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification

An improved term weighting method based on relevance frequency for text classification

Probabilistic Term Weighting Based on Three-Way Decisions for Class Based Feature Selection

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation