Advertisement

Apply the Dynamic N-gram to Extract the Keywords of Chinese News

  • Ren-Xiang Lin
  • Heng-Li Yang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8482)

Abstract

The explosive growth of information on the Internet has created a great demand for new and powerful tools to acquire useful information. The first step to retrieve information form Chinese article is word segmentation. But there are two major segmentation problems that might affect the accuracy of word segmentation performance, ambiguity and long words. In this paper, we propose a novel character-based approach, namely, dynamic N-gram (DNG) to deal with the two above problems of word segmentation and apply it to Chinese news articles to evaluate the accuracy of N-gram. The evaluation result indicated most of the readers agreed that dynamic N-gram approach could extract meaningful keywords. Even in different news categories, the keywords extraction results still have no significant difference. The primary contribution of this approach is that dynamic N-gram helps us to extract the most meaningful keywords in different types of Chinese articles without considering the number of grams.

Keywords

Chinese Word Segmentation Dynamic N-gram Information Retrieval 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Zhang, M., Lu, Z., Zou, C.: A Chinese Word Segmentation Based on Language Situation in Processing Ambiguous Words. Information Sciences 162, 275–285 (2004)CrossRefzbMATHGoogle Scholar
  2. 2.
    Fu, G., Kit, C., Webster, J.J.: Chinese Word Segmentation as Morpheme-based Lexical Chunking. Information Sciences 178, 2282–2296 (2008)Google Scholar
  3. 3.
    Tsai, R.T.: Chinese Text Segmentation: A Hybrid Approach Using Transductive Learning and Statistical Association Measures. Expert Systems with Applications 37(5), 3553–3560 (2010)CrossRefGoogle Scholar
  4. 4.
    Haizhou, L., Baosheng, Y.: Chinese Word Segmentation. Language, Information and Computation, 212–217 (1998)Google Scholar
  5. 5.
    Sun, X., Zhang, Y., Matsuzaki, T., Tsuruoka, Y., Tsujii, J.: Probabilistic Chinese Word Segmentation with Non-local Information and Stochastic Training. Information Processing and Management 49(3), 626–636 (2013)CrossRefGoogle Scholar
  6. 6.
    Foo, S., Li, H.: Chinese Word Segmentation and Its Effect on Information Retrieval. Information Processing & Management 40, 161–190 (2004)CrossRefGoogle Scholar
  7. 7.
    Tong, X., Zhai, C., Milic-Frayling, N., Evans, D.A.: Experiments on Chinese Text Indexing-—CLARIT TREC-5 Chinese Track Report. In: TREC (1996)Google Scholar
  8. 8.
    Kwok, K.L.: Lexicon Effects on Chinese Information Retrieval. In: Proc. of 2nd Conf. on Empirical Methods in NLP, pp. 141–148. ACL (1997)Google Scholar
  9. 9.
    Sproat, R., Shih, C.: A Statistical Method for Finding Word Boundaries in Chinese Text. Computer Processing of Chinese and Oriental Languages 4(4), 336–351 (1990)Google Scholar
  10. 10.
    Wu, Z., Tseng, G.: Chinese Text Segmentation for Text Retrieval: Achievements and Problems. Journal of the American Society for Information Science 44(9), 532–542 (1993)CrossRefGoogle Scholar
  11. 11.
    Sproat, R., Shih, C., Gale, W., Chang, N.: A Stochastic Finite-state Word-segmentation Algorithm for Chinese. Computational Linguistics 22(3), 377–404 (1996)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Ren-Xiang Lin
    • 1
  • Heng-Li Yang
    • 1
  1. 1.Dept. of MISNational Chengchi UniversityTaiwan

Personalised recommendations