Automatic Acquisition of Chinese Words’ Property of Times

  • Liu Liu
  • Bin Li
  • Lijun Bu
  • Tian-tian Zhang
  • Xiaohe Chen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7717)

Abstract

Words’ property of times is an important type of additional meaning which represents the spirit of times. People get the information of times from words by their own experience, but automatic recognition by computers is still difficult. This paper proposes a method of automatic recognition of the property of times based on large-scale corpus, which uses the TF-IDF and TF-IWF values to quantify Chinese words’ property of times. Experiments on People’s Daily of 54 years show that words’ TF-IDF values aided with TF-IWF value outperform words’ frequency. Naïve Bayes classifier is also used in for automatic acquisition of words’ property of times, and it achieves satisfactory results.

Keywords

Property of Times Frequency TF-IDF TF-IWF Naive Bayes 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Jakobson, R.: Time Factor in Language. Collected Works of Roman Jakobson. The Commercial Press, Beijing (2012)Google Scholar
  2. 2.
    Yang, Z.N.: First exploration of words’property of times. Transactions of Shandong University (edition of philosophy and social science) 3, 102–106 (1988)Google Scholar
  3. 3.
    Shen, M.Y.: Discuss on principal properties of words’ colors of times. Transactions of Inner Mongolia Nationality Normal University 3, 24–29 (1991)Google Scholar
  4. 4.
    Wang, J.H.: Words’ colors of times and the usages of words. Theory and Modernization, 372–377 (2001)Google Scholar
  5. 5.
    Zhang, P.: On Cybernetics and Dynamic Updating of Language Knowledge. Applied Linguistics 4, 76–82 (2001)Google Scholar
  6. 6.
    National Language Resources Monitoring and Research Center. Broadcast Media Language Branch, http://ling.cuc.edu.cn/RawPub/Default.aspx
  7. 7.
    Research Centre on Linguistics and Language Information Sciences. Hong Kong Institute of Education: LIVAC Synchronous Corpus, http://www.livac.org
  8. 8.
    National Language Resources Monitoring and Research Center, http://cnlr.blcu.edu.cn/news_show.aspx?nid=286
  9. 9.
    Google. Google books.Ngram Viewer, http://books.google.com/ngrams/datasets
  10. 10.
  11. 11.
    Jones, S., Karen: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28(1), 11–21 (1972)CrossRefGoogle Scholar
  12. 12.
    Basili, R., Moschitti, A., Pazienza, M.: A text classifier based on linguistic processing. In: Proceedings of IJCAI 1999. Machine Learning for Information Filtering (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Liu Liu
    • 1
  • Bin Li
    • 1
    • 2
  • Lijun Bu
    • 1
  • Tian-tian Zhang
    • 1
  • Xiaohe Chen
    • 1
  1. 1.Research Center of Language and InformaticsNanjing Normal UniversityNanjingChina
  2. 2.State Key Laboratory for Novel Software TechnologyNanjing University NanjingNanjingChina

Personalised recommendations