Apply the Dynamic N-gram to Extract the Keywords of Chinese News

Lin, Ren-Xiang; Yang, Heng-Li

doi:10.1007/978-3-319-07467-2_42

Ren-Xiang Lin²³ &
Heng-Li Yang²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8482))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

1574 Accesses
1 Citations

Abstract

The explosive growth of information on the Internet has created a great demand for new and powerful tools to acquire useful information. The first step to retrieve information form Chinese article is word segmentation. But there are two major segmentation problems that might affect the accuracy of word segmentation performance, ambiguity and long words. In this paper, we propose a novel character-based approach, namely, dynamic N-gram (DNG) to deal with the two above problems of word segmentation and apply it to Chinese news articles to evaluate the accuracy of N-gram. The evaluation result indicated most of the readers agreed that dynamic N-gram approach could extract meaningful keywords. Even in different news categories, the keywords extraction results still have no significant difference. The primary contribution of this approach is that dynamic N-gram helps us to extract the most meaningful keywords in different types of Chinese articles without considering the number of grams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zhang, M., Lu, Z., Zou, C.: A Chinese Word Segmentation Based on Language Situation in Processing Ambiguous Words. Information Sciences 162, 275–285 (2004)
Article MATH Google Scholar
Fu, G., Kit, C., Webster, J.J.: Chinese Word Segmentation as Morpheme-based Lexical Chunking. Information Sciences 178, 2282–2296 (2008)
Google Scholar
Tsai, R.T.: Chinese Text Segmentation: A Hybrid Approach Using Transductive Learning and Statistical Association Measures. Expert Systems with Applications 37(5), 3553–3560 (2010)
Article Google Scholar
Haizhou, L., Baosheng, Y.: Chinese Word Segmentation. Language, Information and Computation, 212–217 (1998)
Google Scholar
Sun, X., Zhang, Y., Matsuzaki, T., Tsuruoka, Y., Tsujii, J.: Probabilistic Chinese Word Segmentation with Non-local Information and Stochastic Training. Information Processing and Management 49(3), 626–636 (2013)
Article Google Scholar
Foo, S., Li, H.: Chinese Word Segmentation and Its Effect on Information Retrieval. Information Processing & Management 40, 161–190 (2004)
Article Google Scholar
Tong, X., Zhai, C., Milic-Frayling, N., Evans, D.A.: Experiments on Chinese Text Indexing-—CLARIT TREC-5 Chinese Track Report. In: TREC (1996)
Google Scholar
Kwok, K.L.: Lexicon Effects on Chinese Information Retrieval. In: Proc. of 2nd Conf. on Empirical Methods in NLP, pp. 141–148. ACL (1997)
Google Scholar
Sproat, R., Shih, C.: A Statistical Method for Finding Word Boundaries in Chinese Text. Computer Processing of Chinese and Oriental Languages 4(4), 336–351 (1990)
Google Scholar
Wu, Z., Tseng, G.: Chinese Text Segmentation for Text Retrieval: Achievements and Problems. Journal of the American Society for Information Science 44(9), 532–542 (1993)
Article Google Scholar
Sproat, R., Shih, C., Gale, W., Chang, N.: A Stochastic Finite-state Word-segmentation Algorithm for Chinese. Computational Linguistics 22(3), 377–404 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of MIS, National Chengchi University, Taiwan
Ren-Xiang Lin & Heng-Li Yang

Authors

Ren-Xiang Lin
View author publications
You can also search for this author in PubMed Google Scholar
Heng-Li Yang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Texas State University, 78666, San Marcos, TX, USA
Moonis Ali
Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, No. 415, Chien Kung Road, 80778, Kaohsiung, Taiwan
Jeng-Shyang Pan
Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan
Shyi-Ming Chen
Department of Electronics Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan (ROC)
Mong-Fong Horng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, RX., Yang, HL. (2014). Apply the Dynamic N-gram to Extract the Keywords of Chinese News. In: Ali, M., Pan, JS., Chen, SM., Horng, MF. (eds) Modern Advances in Applied Intelligence. IEA/AIE 2014. Lecture Notes in Computer Science(), vol 8482. Springer, Cham. https://doi.org/10.1007/978-3-319-07467-2_42

Download citation

DOI: https://doi.org/10.1007/978-3-319-07467-2_42
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07466-5
Online ISBN: 978-3-319-07467-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics