Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

  • Aaron L. -F. Han
  • Derek F. Wong
  • Lidia S. Chao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7912)

Abstract

This paper introduces the research works of Chinese named entity recognition (CNER) including person name, organization name and location name. To differ from the conventional approaches that usually introduce more about the used algorithms with less discussion about the CNER problem itself, this paper firstly conducts a study of the Chinese characteristics and makes a discussion of the different feature sets; then a promising comparison result is shown with the optimized features and concise model. Furthermore, different performances are analyzed of various features and algorithms employed by other researchers. To facilitate the further researches, this paper provides some formal definitions about the issues in the CNER with potential solutions. Following the SIGHAN bakeoffs, the experiments are performed in the closed track but the problems of the open track tasks are also discussed.

Keywords

Natural language processing Chinese named entity recognition Chinese characteristics Features Conditional random fields 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ratinov, L., Roth, D.: Design Challenges and Misconceptions in Named Entity Recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009), pp. 147–155. Association for Computational Linguistics Press, Stroudsburg (2009)CrossRefGoogle Scholar
  2. 2.
    Sang, E.F.T.K., Meulder, F.D.: Introduciton to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: HLT-NAACL, pp. 142–147. ACL Press, USA (2003)Google Scholar
  3. 3.
    Sobhana, N., Mitra, P., Ghosh, S.: Conditional Random Field Based Named Entity Recognition in Geological text. J. IJCA 1(3), 143–147 (2010)CrossRefGoogle Scholar
  4. 4.
    Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: Collier, N., Ruch, P., Nazarenko, A. (eds.) International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pp. 104–107. ACL Press, Stroudsburg (2004)Google Scholar
  5. 5.
    Levow, G.A.: The third international CLP bakeoff: Word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on CLP, pp. 122–131. ACL Press, Sydney (2006)Google Scholar
  6. 6.
    Jin, G., Chen, X.: The fourth international CLP bakeoff: Chinese word segmentation, named entity recognition and Chinese pos tagging. In: Sixth SIGHAN Workshop on CLP, pp. 83–95. ACL Press, Hyderabad (2008)Google Scholar
  7. 7.
    Chen, Y., Jin, P., Li, W., Huang, C.-R.: The Chinese Persons Name Disambiguation Evaluation: Exploration of Personal Name Disambiguation in Chinese News. In: CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 346–352. ACL Press, BeiJing (2010)Google Scholar
  8. 8.
    Sun, L., Zhang, Z., Dong, Q.: Overview of the Chinese Word Sense Induction Task at CLP2010. In: CIPS-SIGHAN Joint Conference on CLP (CLP2010), pp. 403–409. ACL Press, BeiJing (2010)Google Scholar
  9. 9.
    Jaynes, E.: The relation of Bayesian and maximum entropy methods. J. Maximum-entropy and Bayesian Methods in Science and Engineering 1, 25–29 (1988)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Wong, F., Chao, S., Hao, C.C., Leong, K.S.: A Maximum Entropy (ME) Based Translation Model for Chinese Characters Conversion. J. Advances in Computational Linguistics, Research in Computer Science. 41, 267–276 (2009)Google Scholar
  11. 11.
    Ekbal, A., Bandyopadhyay, S.: A hidden Markov model based named entity recognition system: Bengali and Hindi as case studies. In: Ghosh, A., De, R.K., Pal, S.K. (eds.) PReMI 2007. LNCS, vol. 4815, pp. 545–552. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  12. 12.
    Mansouri, A., Affendey, L., Mamat, A.: Named entity recognition using a new fuzzy support vector machine. J. IJCSNS 8(2), 320 (2008)Google Scholar
  13. 13.
    Putthividhya, D.P., Hu, J.: Bootstrapped named entity recognition for product attribute extraction. In: EMNLP 2011, pp. 1557–1567. ACL Press, Stroudsburg (2011)Google Scholar
  14. 14.
    Peng, F., Feng, F., McCallum, A.: Chinese segmentation and new word detection using conditional random fields. In: Proceedings of the 20th international conference on Computational Linguistics (COLING 2004), Article 562. Computational Linguistics Press, Stroudsburg (2004)Google Scholar
  15. 15.
    Chen, W., Zhang, Y., Isahara, H.: Chinese named entity recognition with conditional random fields. In: Fifth SIGHAN Workshop on Chinese Language Processing, pp. 118–121. ACL Press, Sydney (2006)Google Scholar
  16. 16.
    Zhu, F., Liu, Z., Yang, J., Zhu, P.: Chinese event place phrase recognition of emergency event using Maximum Entropy. In: Cloud Computing and Intelligence Systems (CCIS), pp. 614–618. IEEE, ShangHai (2011)Google Scholar
  17. 17.
    Qin, Y., Yuan, C., Sun, J., Wang, X.: BUPT Systems in the SIGHAN Bakeoff 2007. In: Sixth SIGHAN Workshop on CLP, pp. 94–97. ACL Press, Hyderabad (2008)Google Scholar
  18. 18.
    Feng, Y., Huang, R., Sun, L.: Two Step Chinese Named Entity Recognition Based on Conditional Random Fields Models. In: Sixth SIGHAN Workshop on CLP, pp. 120–123. ACL Press, Hyderabad (2008)Google Scholar
  19. 19.
    Yuan, Y., Zhong, W.: Contemporary Surnames. Jiangxi people’s publishing house, China (2006)Google Scholar
  20. 20.
    Yuan, Y., Qiu, J., Zhang, R.: 300 most common surname in Chinese surnames-population genetic and population distribution. East China Normal University Publishing House, China (2007)Google Scholar
  21. 21.
    Huang, D., Sun, X., Jiao, S., Li, L., Ding, Z., Wan, R.: HMM and CRF based hybrid model for chinese lexical analysis. In: Sixth SIGHAN Workshop on CLP, pp. 133–137. ACL Press, Hyderabad (2008)Google Scholar
  22. 22.
    Sun, G.-L., Sun, C.-J., Sun, K., Wang, X.-L.: A Study of Chinese Lexical Analysis Based on Discriminative Models. In: Sixth SIGHAN Workshop on CLP, pp. 147–150. ACL Press, Hyderabad (2008)Google Scholar
  23. 23.
    Yang, F., Zhao, J., Zou, B.: CRFs-Based Named Entity Recognition Incorporated with Heuristic Entity List Searching. In: Sixth SIGHAN Workshop on CLP, pp. 171–174. ACL Press, Hyderabad (2008)Google Scholar
  24. 24.
    Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceeding of 18th International Conference on Machine Learning, pp. 282–289. DBLP, Massachusetts (2001)Google Scholar
  25. 25.
    Shewchuk, J.R.: An introduction to the conjugate gradient method without the agonizing pain. Technical Report CMUCS-TR-94-125, Carnegie Mellon University (1994)Google Scholar
  26. 26.
    Collins, M., Duffy, N.: New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL 2002), pp. 263–270. Association for Computational Linguistics Press, Stroudsburg (2002)Google Scholar
  27. 27.
    The Numerical Algorithms Group. E04 - Minimizing or Maximizing a Function, NAG Library Manual, Mark 23 (retrieved 2012)Google Scholar
  28. 28.
    Zhao, H., Liu, Q.: The CIPS-SIGHAN CLP2010 Chinese Word Segmentation Backoff. In: CIPS-SIGHAN Joint Conference on CLP, pp. 199–209. ACL Press, BeiJing (2010)Google Scholar
  29. 29.
    Zhou, Q., Zhu, J.: Chinese Syntactic Parsing Evaluation. In: CIPS-SIGHAN Joint Conference on CLP (CLP 2010), pp. 286–295. ACL Press, BeiJing (2010)Google Scholar
  30. 30.
    Xu, Z., Qian, X., Zhang, Y., Zhou, Y.: CRF-based Hybrid Model for Word Segmentation, NER and even POS Tagging. In: Sixth SIGHAN Workshop on CLP, pp. 167–170. ACL Press, India (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Aaron L. -F. Han
    • 1
  • Derek F. Wong
    • 1
  • Lidia S. Chao
    • 1
  1. 1.Department of Computer and Information ScienceUniversity of MacauMacauChina

Personalised recommendations