Advertisement

Keyword Extraction Based on Multi-feature Fusion for Chinese Web Pages

  • Qi He
  • Hong-Wei Hao
  • Xu-Cheng Yin
Conference paper
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 144)

Abstract

In order to overcome the shortcomings of the incomprehensive of traditional keyword extraction, this paper proposes a keyword extraction based on multi-feature fusion for Chinese web pages. First, the part-of-speech and the position information of candidate words are combined in the improved TF-IDF algorithm. Second, the mutual information of the web title is taken into account to calculate the weight of candidate words. Third, the multi-feature fusion technology is formed by the linear combination of the improved TF-IDF method and mutual information. Thus, our method is proposed based on this multi-feature fusion technology for keyword extraction. Comparative experiments show that extracting keywords generated by our method has higher precision and recall compared with the classical TF-IDF algorithm.

Keywords

Mutual Information Chinese Word Candidate Word Keyword Extraction Haidian District 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001) ISBN 1558604898Google Scholar
  2. 2.
    Edmundson, H.P., Oswald, V.A.: Automatic Indexing and Abstracting of the Contents of Documents. Planning Research Corp., Document PRC R-126, ASTIA AD No. 231606, Los Angeles, pp. 1–124 (1959)Google Scholar
  3. 3.
    Li, J., Fan, Q., Zhang, K.: Keyword Extraction Based on tf/idf for Chinese News Document. Wuhan University Journal of Ntural Sciences 12(5) (2007)Google Scholar
  4. 4.
    Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. Int. J. Aritif. Intell. 13(1), 157–169 (2004)CrossRefGoogle Scholar
  5. 5.
    Csomai, A., Mihalcea, R.: Linguistically motivated features for enhanced back-of-the-book indexing. In: Proc. ACL, pp. 932–940 (2008)Google Scholar
  6. 6.
    Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proc. EMNLP, pp. 216–223 (2003)Google Scholar
  7. 7.
    D’Avanzo, E., Magnini, B., Vallin, A.: Keyphere extraction for summarization purposes: The LAKE system at DUC-2004. In: Proc. Document Understanding Conf. (2004)Google Scholar
  8. 8.
    Kolcz, A., Prabakarmurthi, V., Kalita, J.: Summarization as Feature Selection for Text Categorization. In: Processings of the 11th International Conference on Information and Knowledge Management, pp. 365–370. ACM Press, USA (2001)Google Scholar
  9. 9.
    Gong, Y.H., Liu, X.: Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. In: Processing of ACM SIGIR 2001, pp. 19–25 (2001)Google Scholar

Copyright information

© Springer-Verlag GmbH Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.University of Science and Technology BeijingBeijingChina

Personalised recommendations