Abstract
This paper introduces the research works of Chinese named entity recognition (CNER) including person name, organization name and location name. To differ from the conventional approaches that usually introduce more about the used algorithms with less discussion about the CNER problem itself, this paper firstly conducts a study of the Chinese characteristics and makes a discussion of the different feature sets; then a promising comparison result is shown with the optimized features and concise model. Furthermore, different performances are analyzed of various features and algorithms employed by other researchers. To facilitate the further researches, this paper provides some formal definitions about the issues in the CNER with potential solutions. Following the SIGHAN bakeoffs, the experiments are performed in the closed track but the problems of the open track tasks are also discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ratinov, L., Roth, D.: Design Challenges and Misconceptions in Named Entity Recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009), pp. 147–155. Association for Computational Linguistics Press, Stroudsburg (2009)
Sang, E.F.T.K., Meulder, F.D.: Introduciton to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: HLT-NAACL, pp. 142–147. ACL Press, USA (2003)
Sobhana, N., Mitra, P., Ghosh, S.: Conditional Random Field Based Named Entity Recognition in Geological text. J. IJCA 1(3), 143–147 (2010)
Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: Collier, N., Ruch, P., Nazarenko, A. (eds.) International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pp. 104–107. ACL Press, Stroudsburg (2004)
Levow, G.A.: The third international CLP bakeoff: Word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on CLP, pp. 122–131. ACL Press, Sydney (2006)
Jin, G., Chen, X.: The fourth international CLP bakeoff: Chinese word segmentation, named entity recognition and Chinese pos tagging. In: Sixth SIGHAN Workshop on CLP, pp. 83–95. ACL Press, Hyderabad (2008)
Chen, Y., Jin, P., Li, W., Huang, C.-R.: The Chinese Persons Name Disambiguation Evaluation: Exploration of Personal Name Disambiguation in Chinese News. In: CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 346–352. ACL Press, BeiJing (2010)
Sun, L., Zhang, Z., Dong, Q.: Overview of the Chinese Word Sense Induction Task at CLP2010. In: CIPS-SIGHAN Joint Conference on CLP (CLP2010), pp. 403–409. ACL Press, BeiJing (2010)
Jaynes, E.: The relation of Bayesian and maximum entropy methods. J. Maximum-entropy and Bayesian Methods in Science and Engineering 1, 25–29 (1988)
Wong, F., Chao, S., Hao, C.C., Leong, K.S.: A Maximum Entropy (ME) Based Translation Model for Chinese Characters Conversion. J. Advances in Computational Linguistics, Research in Computer Science. 41, 267–276 (2009)
Ekbal, A., Bandyopadhyay, S.: A hidden Markov model based named entity recognition system: Bengali and Hindi as case studies. In: Ghosh, A., De, R.K., Pal, S.K. (eds.) PReMI 2007. LNCS, vol. 4815, pp. 545–552. Springer, Heidelberg (2007)
Mansouri, A., Affendey, L., Mamat, A.: Named entity recognition using a new fuzzy support vector machine. J. IJCSNSÂ 8(2), 320 (2008)
Putthividhya, D.P., Hu, J.: Bootstrapped named entity recognition for product attribute extraction. In: EMNLP 2011, pp. 1557–1567. ACL Press, Stroudsburg (2011)
Peng, F., Feng, F., McCallum, A.: Chinese segmentation and new word detection using conditional random fields. In: Proceedings of the 20th international conference on Computational Linguistics (COLING 2004), Article 562. Computational Linguistics Press, Stroudsburg (2004)
Chen, W., Zhang, Y., Isahara, H.: Chinese named entity recognition with conditional random fields. In: Fifth SIGHAN Workshop on Chinese Language Processing, pp. 118–121. ACL Press, Sydney (2006)
Zhu, F., Liu, Z., Yang, J., Zhu, P.: Chinese event place phrase recognition of emergency event using Maximum Entropy. In: Cloud Computing and Intelligence Systems (CCIS), pp. 614–618. IEEE, ShangHai (2011)
Qin, Y., Yuan, C., Sun, J., Wang, X.: BUPT Systems in the SIGHAN Bakeoff 2007. In: Sixth SIGHAN Workshop on CLP, pp. 94–97. ACL Press, Hyderabad (2008)
Feng, Y., Huang, R., Sun, L.: Two Step Chinese Named Entity Recognition Based on Conditional Random Fields Models. In: Sixth SIGHAN Workshop on CLP, pp. 120–123. ACL Press, Hyderabad (2008)
Yuan, Y., Zhong, W.: Contemporary Surnames. Jiangxi people’s publishing house, China (2006)
Yuan, Y., Qiu, J., Zhang, R.: 300 most common surname in Chinese surnames-population genetic and population distribution. East China Normal University Publishing House, China (2007)
Huang, D., Sun, X., Jiao, S., Li, L., Ding, Z., Wan, R.: HMM and CRF based hybrid model for chinese lexical analysis. In: Sixth SIGHAN Workshop on CLP, pp. 133–137. ACL Press, Hyderabad (2008)
Sun, G.-L., Sun, C.-J., Sun, K., Wang, X.-L.: A Study of Chinese Lexical Analysis Based on Discriminative Models. In: Sixth SIGHAN Workshop on CLP, pp. 147–150. ACL Press, Hyderabad (2008)
Yang, F., Zhao, J., Zou, B.: CRFs-Based Named Entity Recognition Incorporated with Heuristic Entity List Searching. In: Sixth SIGHAN Workshop on CLP, pp. 171–174. ACL Press, Hyderabad (2008)
Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceeding of 18th International Conference on Machine Learning, pp. 282–289. DBLP, Massachusetts (2001)
Shewchuk, J.R.: An introduction to the conjugate gradient method without the agonizing pain. Technical Report CMUCS-TR-94-125, Carnegie Mellon University (1994)
Collins, M., Duffy, N.: New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL 2002), pp. 263–270. Association for Computational Linguistics Press, Stroudsburg (2002)
The Numerical Algorithms Group. E04 - Minimizing or Maximizing a Function, NAG Library Manual, Mark 23 (retrieved 2012)
Zhao, H., Liu, Q.: The CIPS-SIGHAN CLP2010 Chinese Word Segmentation Backoff. In: CIPS-SIGHAN Joint Conference on CLP, pp. 199–209. ACL Press, BeiJing (2010)
Zhou, Q., Zhu, J.: Chinese Syntactic Parsing Evaluation. In: CIPS-SIGHAN Joint Conference on CLP (CLP 2010), pp. 286–295. ACL Press, BeiJing (2010)
Xu, Z., Qian, X., Zhang, Y., Zhou, Y.: CRF-based Hybrid Model for Word Segmentation, NER and even POS Tagging. In: Sixth SIGHAN Workshop on CLP, pp. 167–170. ACL Press, India (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Han, A.L.F., Wong, D.F., Chao, L.S. (2013). Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics. In: Kłopotek, M.A., Koronacki, J., Marciniak, M., Mykowiecka, A., Wierzchoń, S.T. (eds) Language Processing and Intelligent Information Systems. IIS 2013. Lecture Notes in Computer Science, vol 7912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38634-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-38634-3_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38633-6
Online ISBN: 978-3-642-38634-3
eBook Packages: Computer ScienceComputer Science (R0)