Skip to main content

Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

  • Conference paper
Language Processing and Intelligent Information Systems (IIS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7912))

Included in the following conference series:

Abstract

This paper introduces the research works of Chinese named entity recognition (CNER) including person name, organization name and location name. To differ from the conventional approaches that usually introduce more about the used algorithms with less discussion about the CNER problem itself, this paper firstly conducts a study of the Chinese characteristics and makes a discussion of the different feature sets; then a promising comparison result is shown with the optimized features and concise model. Furthermore, different performances are analyzed of various features and algorithms employed by other researchers. To facilitate the further researches, this paper provides some formal definitions about the issues in the CNER with potential solutions. Following the SIGHAN bakeoffs, the experiments are performed in the closed track but the problems of the open track tasks are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ratinov, L., Roth, D.: Design Challenges and Misconceptions in Named Entity Recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009), pp. 147–155. Association for Computational Linguistics Press, Stroudsburg (2009)

    Chapter  Google Scholar 

  2. Sang, E.F.T.K., Meulder, F.D.: Introduciton to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: HLT-NAACL, pp. 142–147. ACL Press, USA (2003)

    Google Scholar 

  3. Sobhana, N., Mitra, P., Ghosh, S.: Conditional Random Field Based Named Entity Recognition in Geological text. J. IJCA 1(3), 143–147 (2010)

    Article  Google Scholar 

  4. Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: Collier, N., Ruch, P., Nazarenko, A. (eds.) International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pp. 104–107. ACL Press, Stroudsburg (2004)

    Google Scholar 

  5. Levow, G.A.: The third international CLP bakeoff: Word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on CLP, pp. 122–131. ACL Press, Sydney (2006)

    Google Scholar 

  6. Jin, G., Chen, X.: The fourth international CLP bakeoff: Chinese word segmentation, named entity recognition and Chinese pos tagging. In: Sixth SIGHAN Workshop on CLP, pp. 83–95. ACL Press, Hyderabad (2008)

    Google Scholar 

  7. Chen, Y., Jin, P., Li, W., Huang, C.-R.: The Chinese Persons Name Disambiguation Evaluation: Exploration of Personal Name Disambiguation in Chinese News. In: CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 346–352. ACL Press, BeiJing (2010)

    Google Scholar 

  8. Sun, L., Zhang, Z., Dong, Q.: Overview of the Chinese Word Sense Induction Task at CLP2010. In: CIPS-SIGHAN Joint Conference on CLP (CLP2010), pp. 403–409. ACL Press, BeiJing (2010)

    Google Scholar 

  9. Jaynes, E.: The relation of Bayesian and maximum entropy methods. J. Maximum-entropy and Bayesian Methods in Science and Engineering 1, 25–29 (1988)

    Article  MathSciNet  Google Scholar 

  10. Wong, F., Chao, S., Hao, C.C., Leong, K.S.: A Maximum Entropy (ME) Based Translation Model for Chinese Characters Conversion. J. Advances in Computational Linguistics, Research in Computer Science. 41, 267–276 (2009)

    Google Scholar 

  11. Ekbal, A., Bandyopadhyay, S.: A hidden Markov model based named entity recognition system: Bengali and Hindi as case studies. In: Ghosh, A., De, R.K., Pal, S.K. (eds.) PReMI 2007. LNCS, vol. 4815, pp. 545–552. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  12. Mansouri, A., Affendey, L., Mamat, A.: Named entity recognition using a new fuzzy support vector machine. J. IJCSNS 8(2), 320 (2008)

    Google Scholar 

  13. Putthividhya, D.P., Hu, J.: Bootstrapped named entity recognition for product attribute extraction. In: EMNLP 2011, pp. 1557–1567. ACL Press, Stroudsburg (2011)

    Google Scholar 

  14. Peng, F., Feng, F., McCallum, A.: Chinese segmentation and new word detection using conditional random fields. In: Proceedings of the 20th international conference on Computational Linguistics (COLING 2004), Article 562. Computational Linguistics Press, Stroudsburg (2004)

    Google Scholar 

  15. Chen, W., Zhang, Y., Isahara, H.: Chinese named entity recognition with conditional random fields. In: Fifth SIGHAN Workshop on Chinese Language Processing, pp. 118–121. ACL Press, Sydney (2006)

    Google Scholar 

  16. Zhu, F., Liu, Z., Yang, J., Zhu, P.: Chinese event place phrase recognition of emergency event using Maximum Entropy. In: Cloud Computing and Intelligence Systems (CCIS), pp. 614–618. IEEE, ShangHai (2011)

    Google Scholar 

  17. Qin, Y., Yuan, C., Sun, J., Wang, X.: BUPT Systems in the SIGHAN Bakeoff 2007. In: Sixth SIGHAN Workshop on CLP, pp. 94–97. ACL Press, Hyderabad (2008)

    Google Scholar 

  18. Feng, Y., Huang, R., Sun, L.: Two Step Chinese Named Entity Recognition Based on Conditional Random Fields Models. In: Sixth SIGHAN Workshop on CLP, pp. 120–123. ACL Press, Hyderabad (2008)

    Google Scholar 

  19. Yuan, Y., Zhong, W.: Contemporary Surnames. Jiangxi people’s publishing house, China (2006)

    Google Scholar 

  20. Yuan, Y., Qiu, J., Zhang, R.: 300 most common surname in Chinese surnames-population genetic and population distribution. East China Normal University Publishing House, China (2007)

    Google Scholar 

  21. Huang, D., Sun, X., Jiao, S., Li, L., Ding, Z., Wan, R.: HMM and CRF based hybrid model for chinese lexical analysis. In: Sixth SIGHAN Workshop on CLP, pp. 133–137. ACL Press, Hyderabad (2008)

    Google Scholar 

  22. Sun, G.-L., Sun, C.-J., Sun, K., Wang, X.-L.: A Study of Chinese Lexical Analysis Based on Discriminative Models. In: Sixth SIGHAN Workshop on CLP, pp. 147–150. ACL Press, Hyderabad (2008)

    Google Scholar 

  23. Yang, F., Zhao, J., Zou, B.: CRFs-Based Named Entity Recognition Incorporated with Heuristic Entity List Searching. In: Sixth SIGHAN Workshop on CLP, pp. 171–174. ACL Press, Hyderabad (2008)

    Google Scholar 

  24. Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceeding of 18th International Conference on Machine Learning, pp. 282–289. DBLP, Massachusetts (2001)

    Google Scholar 

  25. Shewchuk, J.R.: An introduction to the conjugate gradient method without the agonizing pain. Technical Report CMUCS-TR-94-125, Carnegie Mellon University (1994)

    Google Scholar 

  26. Collins, M., Duffy, N.: New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL 2002), pp. 263–270. Association for Computational Linguistics Press, Stroudsburg (2002)

    Google Scholar 

  27. The Numerical Algorithms Group. E04 - Minimizing or Maximizing a Function, NAG Library Manual, Mark 23 (retrieved 2012)

    Google Scholar 

  28. Zhao, H., Liu, Q.: The CIPS-SIGHAN CLP2010 Chinese Word Segmentation Backoff. In: CIPS-SIGHAN Joint Conference on CLP, pp. 199–209. ACL Press, BeiJing (2010)

    Google Scholar 

  29. Zhou, Q., Zhu, J.: Chinese Syntactic Parsing Evaluation. In: CIPS-SIGHAN Joint Conference on CLP (CLP 2010), pp. 286–295. ACL Press, BeiJing (2010)

    Google Scholar 

  30. Xu, Z., Qian, X., Zhang, Y., Zhou, Y.: CRF-based Hybrid Model for Word Segmentation, NER and even POS Tagging. In: Sixth SIGHAN Workshop on CLP, pp. 167–170. ACL Press, India (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Han, A.L.F., Wong, D.F., Chao, L.S. (2013). Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics. In: Kłopotek, M.A., Koronacki, J., Marciniak, M., Mykowiecka, A., Wierzchoń, S.T. (eds) Language Processing and Intelligent Information Systems. IIS 2013. Lecture Notes in Computer Science, vol 7912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38634-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38634-3_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38633-6

  • Online ISBN: 978-3-642-38634-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics