Keyword Extraction Based on Multi-feature Fusion for Chinese Web Pages
In order to overcome the shortcomings of the incomprehensive of traditional keyword extraction, this paper proposes a keyword extraction based on multi-feature fusion for Chinese web pages. First, the part-of-speech and the position information of candidate words are combined in the improved TF-IDF algorithm. Second, the mutual information of the web title is taken into account to calculate the weight of candidate words. Third, the multi-feature fusion technology is formed by the linear combination of the improved TF-IDF method and mutual information. Thus, our method is proposed based on this multi-feature fusion technology for keyword extraction. Comparative experiments show that extracting keywords generated by our method has higher precision and recall compared with the classical TF-IDF algorithm.
KeywordsMutual Information Chinese Word Candidate Word Keyword Extraction Haidian District
Unable to display preview. Download preview PDF.
- 1.Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001) ISBN 1558604898Google Scholar
- 2.Edmundson, H.P., Oswald, V.A.: Automatic Indexing and Abstracting of the Contents of Documents. Planning Research Corp., Document PRC R-126, ASTIA AD No. 231606, Los Angeles, pp. 1–124 (1959)Google Scholar
- 3.Li, J., Fan, Q., Zhang, K.: Keyword Extraction Based on tf/idf for Chinese News Document. Wuhan University Journal of Ntural Sciences 12(5) (2007)Google Scholar
- 5.Csomai, A., Mihalcea, R.: Linguistically motivated features for enhanced back-of-the-book indexing. In: Proc. ACL, pp. 932–940 (2008)Google Scholar
- 6.Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proc. EMNLP, pp. 216–223 (2003)Google Scholar
- 7.D’Avanzo, E., Magnini, B., Vallin, A.: Keyphere extraction for summarization purposes: The LAKE system at DUC-2004. In: Proc. Document Understanding Conf. (2004)Google Scholar
- 8.Kolcz, A., Prabakarmurthi, V., Kalita, J.: Summarization as Feature Selection for Text Categorization. In: Processings of the 11th International Conference on Information and Knowledge Management, pp. 365–370. ACM Press, USA (2001)Google Scholar
- 9.Gong, Y.H., Liu, X.: Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. In: Processing of ACM SIGIR 2001, pp. 19–25 (2001)Google Scholar