Abstract
In text classification, term frequency and term co-occurrence factors are dominantly used in weighting term features. Category relevance factors have recently been used to propose term weighting approaches. However, these approaches are mainly based on their own-designed text classifiers to adapt to category information, where the advantages of popular text classifiers have been ignored. This paper proposes a term weighting framework for text classification tasks. The framework firstly inherits the benefits of provided category information to estimate the weighting of features. Secondly, based on the feedback information, it is able to continuously adjust feature weightings to find the best representations for documents. Thirdly, the framework robustly makes it possible to work with different text classifiers on classifying the text representations, based on category information. On several corpora with SVM classifier, experiments show that given predicted information from TFxIDF method as initial status, the proposed approach leverages accuracy results and outperforms current text classification approaches.
Chapter PDF
Similar content being viewed by others
Keywords
References
Apte, C., Damerau, F., Weiss, S.: Text mining with decision rules and decision trees. In: Proc. of Conference on Automated Learning and Discovery, Workshop 6: Learning from Text and the Web (1998)
Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Proc. of the 2003 ACM Symposium on Applied Computing, pp. 784–788. ACM, New York (2003)
Deng, Z., Tang, S., Yang, D., Zhang, M., Wu, X., Yang, M.: A Linear Text Classification Algorithm Based on Category Relevance Factors. In: Digital Libraries: People, Knowledge, and Technology, pp. 88–98 (2010)
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proc. of the 20th IJCAI, pp. 1606–1611 (2007)
Hassan, S., Banea, C.: Random-walk term weighting for improved text classification. In: Proc. of TextGraphs, pp. 53–60 (2006)
Joachims, T.: Text categorisation with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Lafferty, J., Zhai, C.: Document language models, query models, and risk minimization for information retrieval. In: Proc. of 24th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 111–119. ACM, New York (2001)
Lan, M., Tan, C., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(4), 721–735 (2009)
McCallum, A., Nigam, K.: A comparaison of event models for naive bayes text classfication. In: AAA 1998 Workshop on Learning for Text Categorization (1998)
Ng, H., Goh, W., Low, K.: Feature selection, perceptron learning, and a usability case study for text categorization. In: Proc. 20th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 67–73 (1997)
Robertson, S., Jones, K.S.: Simple, proven approaches to text retrieval. Tech. rep., University of Cambridge (1997)
Salton, G., Buckley, C.: Approaches term-weighting in automatic text retrieval. In: Proc. of Information Processing and Management, pp. 513–523 (1988)
Schapire, R., Singer, Y.: Boostester: A boosting-based system for text categorization. Machine Learning, 135–168 (2000)
Sebastiani, F.: Machine learning in automated text categorization. ACM computing surveys (CSUR) 34(1), 1–47 (2002)
Strube, M., Ponzetto, S.P.: Wikirelate! computing semantic relatedness using wikipedia. In: Proc. of the 21st AAAI, pp. 1419–1424 (2006)
Wang, P., Hu, J., Zeng, H.J., Chen, L., Chen, Z.: Improving text classification by using encyclopaedia knowledge. In: The Seventh IEEE ICDM, pp. 332–341 (2007)
Wang, W., Do, D.B., Lin, X.: Term graph model for text classification. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 19–30. Springer, Heidelberg (2005)
Yang, Y.: Effective and efficient learning for human decision in text categorization and retrieval. In: Proc. 17th ACM SIGIR Conference on Reseach and Development in Inforation Retrieval, pp. 13–22 (1994)
Yang, Y., Liu, X.: A re-examination of text categorisation methods. In: Proc. of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49 (1999)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorisation. In: Proc. of the 14th ICML, pp. 412–420 (1997)
Yu, S., Zhang, J.: A class core extraction method for text categorisation. In: Proc. of the 6th FSKD, pp. 3–7 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huynh, D., Tran, D., Ma, W., Sharma, D. (2011). Adaptable Term Weighting Framework for Text Classification. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19437-5_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-19437-5_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19436-8
Online ISBN: 978-3-642-19437-5
eBook Packages: Computer ScienceComputer Science (R0)