Skip to main content

Recognition and Extraction of Honorifics in Chinese Diachronic Corpora

  • Conference paper
  • First Online:
Book cover Chinese Lexical Semantics (CLSW 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8922))

Included in the following conference series:

Abstract

Honorifics in this paper refer to names of official positions and titles of nobility or honor. They can be found in various written records in different periods and have great historical significance. This paper introduces a machine learning system to recognize the honorifics in diachronic corpora. A tagged corpus of four classic novels written in the Ming and Qing dynasties is used to train the system. The system is then used to automatically recognize and extract the honorifics in pre-Qin classics, Tang-dynasty poems, and modern Chinese news. Experimental results show that the system can achieve relatively good results in recognizing the honorifics in the pre-Qin classics and Tang-dynasty poems. This work is an attempt to improve the performance of automatic recognition of honorifics in diachronic corpora. The system can be a helpful tool in the studies on the evolution of honorifics throughout Chinese history.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Xiong, D., Lu, Q., Lo, F., Shi, D., Chiu, T.-s., Li, W.: Specification for Segmentation and Named Entity Annotation of Chinese Classics in the Ming and Qing Dynasties. In: Ji, D., Xiao, G. (eds.) CLSW 2012. LNCS, vol. 7717, pp. 280–293. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  2. Yu, L.N.: Dictionary of Chinese Bureaucracy. Heilongjiang People’s Publishing House, Harbin (俞鹿年:中國官制大辭典.黑龍江人民出版社,哈爾濱) (1992). (in Chinese)

    Google Scholar 

  3. Zhang, Z.L., Lü, Z.L.: A Comprehensive Dictionary of Official Title System in Imperial China. Beijing Publishing House, Beijing (張政烺,呂宗力:中國歷代官制大辭典.北京出版社,北京) (1994). (in Chinese)

    Google Scholar 

  4. Xu, L.D.: A Dictionary of Chinese Official Title System. Shanghai University Press, Shanghai (徐連達: 中國官制大辭典.上海大學出版社,上海) (2010). (in Chinese)

    Google Scholar 

  5. Yu, S.W., Duan, H.M., Zhu, X.F., Swen, B., Chang, B.B.: Specification for Corpus Processing at Peking University: Word Segmentation, POS Tagging and Phonetic Notation. Journal of Chinese Language and Computing 13(2), 121–158 (2003). (in Chinese)

    Google Scholar 

  6. Wei, P.C., Thompson, P.M., Liu, C.H., Huang, C.R., Sun, C.F.: Historical Corpora for Synchronic and Diachronic Linguistics Studies. International Journal of Computational Linguistics and Chinese Language Processing 2(1), 131–145 (1997). (in Chinese)

    Google Scholar 

  7. Academia Sinica Tagged Corpus of Early Mandarin Chinese. http://app.sinica.edu.tw/cgi-bin/kiwi/pkiwi/kiwi.sh

  8. Academia Sinica Ancient Chinese Corpus. http://app.sinica.edu.tw/cgi-bin/kiwi/akiwi/kiwi.sh

  9. Xiong, D., Lu, Q., Lo, F.J., Shi, D.X., Chiu, T.S.: A Corpus-Based Study of Personal Names and Terms of Address in Chinese Classical Novels. Journal of Chinese Information Processing (to be published). (in Chinese)

    Google Scholar 

  10. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  11. Wick, M., Rohanimanesh, K., Culotta, A., McCallum, A.: SampleRank: Learning Preferences from Atomic Gradients. In: Neural Information Processing Systems (NIPS), Workshop on Advances in Ranking (2009)

    Google Scholar 

  12. Wallach, H.: Efficient Training of Conditional Random Fields. In: Proc. 6th Annual CLUK Research Colloquium (2002)

    Google Scholar 

  13. McCallum, A., Schultz, K., Singh, S.: FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs. In: Advances in Neural Information Processing Systems 22 (NIPS 2009 Proceedings), pp. 1249–1257 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qin Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Xiong, D., Xu, J., Lu, Q., Lo, F. (2014). Recognition and Extraction of Honorifics in Chinese Diachronic Corpora. In: Su, X., He, T. (eds) Chinese Lexical Semantics. CLSW 2014. Lecture Notes in Computer Science(), vol 8922. Springer, Cham. https://doi.org/10.1007/978-3-319-14331-6_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14331-6_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14330-9

  • Online ISBN: 978-3-319-14331-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics