Automatic Acquisition of Chinese Words’ Property of Times
- 2.6k Downloads
Words’ property of times is an important type of additional meaning which represents the spirit of times. People get the information of times from words by their own experience, but automatic recognition by computers is still difficult. This paper proposes a method of automatic recognition of the property of times based on large-scale corpus, which uses the TF-IDF and TF-IWF values to quantify Chinese words’ property of times. Experiments on People’s Daily of 54 years show that words’ TF-IDF values aided with TF-IWF value outperform words’ frequency. Naïve Bayes classifier is also used in for automatic acquisition of words’ property of times, and it achieves satisfactory results.
KeywordsProperty of Times Frequency TF-IDF TF-IWF Naive Bayes
Unable to display preview. Download preview PDF.
- 1.Jakobson, R.: Time Factor in Language. Collected Works of Roman Jakobson. The Commercial Press, Beijing (2012)Google Scholar
- 2.Yang, Z.N.: First exploration of words’property of times. Transactions of Shandong University (edition of philosophy and social science) 3, 102–106 (1988)Google Scholar
- 3.Shen, M.Y.: Discuss on principal properties of words’ colors of times. Transactions of Inner Mongolia Nationality Normal University 3, 24–29 (1991)Google Scholar
- 4.Wang, J.H.: Words’ colors of times and the usages of words. Theory and Modernization, 372–377 (2001)Google Scholar
- 5.Zhang, P.: On Cybernetics and Dynamic Updating of Language Knowledge. Applied Linguistics 4, 76–82 (2001)Google Scholar
- 6.National Language Resources Monitoring and Research Center. Broadcast Media Language Branch, http://ling.cuc.edu.cn/RawPub/Default.aspx
- 7.Research Centre on Linguistics and Language Information Sciences. Hong Kong Institute of Education: LIVAC Synchronous Corpus, http://www.livac.org
- 8.National Language Resources Monitoring and Research Center, http://cnlr.blcu.edu.cn/news_show.aspx?nid=286
- 9.Google. Google books.Ngram Viewer, http://books.google.com/ngrams/datasets
- 10.ICTCLAS, http://www.ictclas.org
- 12.Basili, R., Moschitti, A., Pazienza, M.: A text classifier based on linguistic processing. In: Proceedings of IJCAI 1999. Machine Learning for Information Filtering (1999)Google Scholar