Advertisement

A New Improved Term Weighting Scheme for Text Categorization

  • Nguyen Pham Xuan
  • Hieu Le Quang
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 244)

Abstract

In text categorization, term weighting is the task to assign weights to terms during the document presentation phase. Thus, it affects the classification performance. In this paper, we propose a new term weighting scheme logtf.rf max . It is an improvement to tf.rf − one of the most effective term weighting schemes to date. We conducted experiments to compare the new term weighting scheme to tf.rf and others on common text categorization benchmark data sets. The experimental results show that logtf.rf max consistently outperforms tf.rf as well as other schemes. Furthermore, our new scheme is simpler than tf.rf.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic Query Expansion Using SMART: TREC 3. In: NIST SPECIAL PUBLICATION SP, pp. 69–69 (1995)Google Scholar
  2. 2.
    Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Sirmakessis, S. (ed.) Text Mining and its Applications. STUDFUZZ, vol. 138, pp. 81–97. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  3. 3.
    Deng, Z.-H., Tang, S.-W., Yang, D.-Q., Li, M.Z.L.-Y., Xie, K.-Q.: A comparative study on feature weight in text categorization. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 588–597. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  4. 4.
    Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: The Seventh International Conference on Information and Knowledge Management (1998)Google Scholar
  5. 5.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A Library for Large Linear classification. The Journal of Machine Learning Research 9, 1871–1874 (2008), Software available at http://www.csie.ntu.edu.tw/~cjlin/liblinear
  6. 6.
    Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classification (2003)Google Scholar
  7. 7.
    Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  8. 8.
    Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(4), 721–735 (2009)CrossRefGoogle Scholar
  9. 9.
    Leopold, E., Kindermann, J.: Text categorization with support vector machines. How to represent texts in input space? Machine Learning 46(1-3), 423–444 (2002)CrossRefMATHGoogle Scholar
  10. 10.
    Rogati, M., Yang, Y.: High-performing feature selection for text classification. In: The Eleventh International Conference on Information and Knowledge Management, pp. 659–661. ACM (2002)Google Scholar
  11. 11.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)CrossRefGoogle Scholar
  12. 12.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1), 1–47 (2002)CrossRefGoogle Scholar
  13. 13.
    Wu, H., Salton, G.: A comparison of search term weighting: term relevance vs. inverse document frequency. ACM SIGIR Forum 16(1), 30–39 (1981)CrossRefGoogle Scholar
  14. 14.
    Yang, Y., Liu, X.: A re-examination of text categorization methods. In: The 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49. ACM (1999)Google Scholar
  15. 15.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Machine Learning-International Workshop Then Conference, pp. 412–420. Morgan Kaufmann Publishers, Inc. (1997)Google Scholar
  16. 16.
    Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval 1(1-2), 69–90 (1999)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Faculty of Information TechnologyVNU University of Engineering and TechnologyHanoiVietnam

Personalised recommendations