Keywords Weights Improvement and Application of Information Extraction

  • Yang Junhui
  • Huang Chan
Conference paper
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 144)


In keywords extraction approach, TF-IDF algorithm was commonly used as a formula for calculating the weighting of keywords, the algorithm was relatively simple and had higher precision and recall rate, but it exits many defects. This article based on the traditional TF-IDF formula to calculation weighting, put forward improvement TF-IDF formula based on the weighting of the location and the keyword length, through the experimental result inspects show that the proposed method outperforms TF-IDF in precision and recall.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    How, B.C., Narayanan, K.: An empirical study of feature selection for text categorization based on term weight age. In: Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 599–602. IEEE Computer Society, Washington, DC (2004)Google Scholar
  2. 2.
    Mladenic, D., Grobelnik, M.: Feature Selection for Unbalanced Class Distribution and NaYve Bayees. In: Proceedings of the 6th International Conference on Machine Learning, pp. 258–267. Morgan Kaufmann, Blrf (1999)Google Scholar
  3. 3.
    Luo, X., Sun, M., Tsou, B.K.: Covering ambiguity resolution in Chinese word segmentation based on contextual information. In: Pleadings of the 19th International Conference on Computational Linguistics, pp. 1–7. Association for Computational Linguistics, Morristown (2002)Google Scholar
  4. 4.
    Hulth, A.: Improved automatic keyword extractiongiven more linguistic knowledge. In: Proceedings of the Conference on EmpiricalMethods in Natural Language Processing, EMNLP, Sapporo, pp. 216–223 (2003)Google Scholar
  5. 5.
    Yu, H.: SVMC: Single-class classification with support vector machines. In: Proc. of IJCAI, pp. 415–422 (2003)Google Scholar
  6. 6.
    Qu, S., Wang, S., Zou, Y.: Improvement of Text Feature Selection Method based on TFIDF. In: International Seminar on Future Information Technology and Management Engineering, pp. 79–81 (2008)Google Scholar
  7. 7.
    Salton, G., Fox, E.A., Wu, H.: Extended boolean information retrieval. Communications of the ACM 26(11), 1022–1036 (1983)MathSciNetMATHCrossRefGoogle Scholar
  8. 8.
    Borko, H., Bernier, C.L.: Abstracts of the concepts and methods. Academic Press, America New York (1991)Google Scholar
  9. 9.
    Taniar, D.: Web Information Systems, pp. 25–58. Idea Group Publishing, London (2004)Google Scholar
  10. 10.
    Douthat, A.: The Message Understanding Conference Scoring Software User’s Manual. In: Proceedings of the Seventh Message Understanding Conference (1998)Google Scholar

Copyright information

© Springer-Verlag GmbH Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Jiangxi University of Science and TechnologyGanzhouChina
  2. 2.Department of ComputerGanNan Teach CollegeGanzhouChina

Personalised recommendations