Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

Han, Aaron L. -F.; Wong, Derek F.; Chao, Lidia S.

doi:10.1007/978-3-642-38634-3_8

Aaron L. -F. Han¹⁸,
Derek F. Wong¹⁸ &
Lidia S. Chao¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7912))

Included in the following conference series:

Intelligent Information Systems Symposium

1259 Accesses
15 Citations

Abstract

This paper introduces the research works of Chinese named entity recognition (CNER) including person name, organization name and location name. To differ from the conventional approaches that usually introduce more about the used algorithms with less discussion about the CNER problem itself, this paper firstly conducts a study of the Chinese characteristics and makes a discussion of the different feature sets; then a promising comparison result is shown with the optimized features and concise model. Furthermore, different performances are analyzed of various features and algorithms employed by other researchers. To facilitate the further researches, this paper provides some formal definitions about the issues in the CNER with potential solutions. Following the SIGHAN bakeoffs, the experiments are performed in the closed track but the problems of the open track tasks are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ratinov, L., Roth, D.: Design Challenges and Misconceptions in Named Entity Recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009), pp. 147–155. Association for Computational Linguistics Press, Stroudsburg (2009)
Chapter Google Scholar
Sang, E.F.T.K., Meulder, F.D.: Introduciton to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: HLT-NAACL, pp. 142–147. ACL Press, USA (2003)
Google Scholar
Sobhana, N., Mitra, P., Ghosh, S.: Conditional Random Field Based Named Entity Recognition in Geological text. J. IJCA 1(3), 143–147 (2010)
Article Google Scholar
Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: Collier, N., Ruch, P., Nazarenko, A. (eds.) International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pp. 104–107. ACL Press, Stroudsburg (2004)
Google Scholar
Levow, G.A.: The third international CLP bakeoff: Word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on CLP, pp. 122–131. ACL Press, Sydney (2006)
Google Scholar
Jin, G., Chen, X.: The fourth international CLP bakeoff: Chinese word segmentation, named entity recognition and Chinese pos tagging. In: Sixth SIGHAN Workshop on CLP, pp. 83–95. ACL Press, Hyderabad (2008)
Google Scholar
Chen, Y., Jin, P., Li, W., Huang, C.-R.: The Chinese Persons Name Disambiguation Evaluation: Exploration of Personal Name Disambiguation in Chinese News. In: CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 346–352. ACL Press, BeiJing (2010)
Google Scholar
Sun, L., Zhang, Z., Dong, Q.: Overview of the Chinese Word Sense Induction Task at CLP2010. In: CIPS-SIGHAN Joint Conference on CLP (CLP2010), pp. 403–409. ACL Press, BeiJing (2010)
Google Scholar
Jaynes, E.: The relation of Bayesian and maximum entropy methods. J. Maximum-entropy and Bayesian Methods in Science and Engineering 1, 25–29 (1988)
Article MathSciNet Google Scholar
Wong, F., Chao, S., Hao, C.C., Leong, K.S.: A Maximum Entropy (ME) Based Translation Model for Chinese Characters Conversion. J. Advances in Computational Linguistics, Research in Computer Science. 41, 267–276 (2009)
Google Scholar
Ekbal, A., Bandyopadhyay, S.: A hidden Markov model based named entity recognition system: Bengali and Hindi as case studies. In: Ghosh, A., De, R.K., Pal, S.K. (eds.) PReMI 2007. LNCS, vol. 4815, pp. 545–552. Springer, Heidelberg (2007)
Chapter Google Scholar
Mansouri, A., Affendey, L., Mamat, A.: Named entity recognition using a new fuzzy support vector machine. J. IJCSNS 8(2), 320 (2008)
Google Scholar
Putthividhya, D.P., Hu, J.: Bootstrapped named entity recognition for product attribute extraction. In: EMNLP 2011, pp. 1557–1567. ACL Press, Stroudsburg (2011)
Google Scholar
Peng, F., Feng, F., McCallum, A.: Chinese segmentation and new word detection using conditional random fields. In: Proceedings of the 20th international conference on Computational Linguistics (COLING 2004), Article 562. Computational Linguistics Press, Stroudsburg (2004)
Google Scholar
Chen, W., Zhang, Y., Isahara, H.: Chinese named entity recognition with conditional random fields. In: Fifth SIGHAN Workshop on Chinese Language Processing, pp. 118–121. ACL Press, Sydney (2006)
Google Scholar
Zhu, F., Liu, Z., Yang, J., Zhu, P.: Chinese event place phrase recognition of emergency event using Maximum Entropy. In: Cloud Computing and Intelligence Systems (CCIS), pp. 614–618. IEEE, ShangHai (2011)
Google Scholar
Qin, Y., Yuan, C., Sun, J., Wang, X.: BUPT Systems in the SIGHAN Bakeoff 2007. In: Sixth SIGHAN Workshop on CLP, pp. 94–97. ACL Press, Hyderabad (2008)
Google Scholar
Feng, Y., Huang, R., Sun, L.: Two Step Chinese Named Entity Recognition Based on Conditional Random Fields Models. In: Sixth SIGHAN Workshop on CLP, pp. 120–123. ACL Press, Hyderabad (2008)
Google Scholar
Yuan, Y., Zhong, W.: Contemporary Surnames. Jiangxi people’s publishing house, China (2006)
Google Scholar
Yuan, Y., Qiu, J., Zhang, R.: 300 most common surname in Chinese surnames-population genetic and population distribution. East China Normal University Publishing House, China (2007)
Google Scholar
Huang, D., Sun, X., Jiao, S., Li, L., Ding, Z., Wan, R.: HMM and CRF based hybrid model for chinese lexical analysis. In: Sixth SIGHAN Workshop on CLP, pp. 133–137. ACL Press, Hyderabad (2008)
Google Scholar
Sun, G.-L., Sun, C.-J., Sun, K., Wang, X.-L.: A Study of Chinese Lexical Analysis Based on Discriminative Models. In: Sixth SIGHAN Workshop on CLP, pp. 147–150. ACL Press, Hyderabad (2008)
Google Scholar
Yang, F., Zhao, J., Zou, B.: CRFs-Based Named Entity Recognition Incorporated with Heuristic Entity List Searching. In: Sixth SIGHAN Workshop on CLP, pp. 171–174. ACL Press, Hyderabad (2008)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceeding of 18th International Conference on Machine Learning, pp. 282–289. DBLP, Massachusetts (2001)
Google Scholar
Shewchuk, J.R.: An introduction to the conjugate gradient method without the agonizing pain. Technical Report CMUCS-TR-94-125, Carnegie Mellon University (1994)
Google Scholar
Collins, M., Duffy, N.: New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL 2002), pp. 263–270. Association for Computational Linguistics Press, Stroudsburg (2002)
Google Scholar
The Numerical Algorithms Group. E04 - Minimizing or Maximizing a Function, NAG Library Manual, Mark 23 (retrieved 2012)
Google Scholar
Zhao, H., Liu, Q.: The CIPS-SIGHAN CLP2010 Chinese Word Segmentation Backoff. In: CIPS-SIGHAN Joint Conference on CLP, pp. 199–209. ACL Press, BeiJing (2010)
Google Scholar
Zhou, Q., Zhu, J.: Chinese Syntactic Parsing Evaluation. In: CIPS-SIGHAN Joint Conference on CLP (CLP 2010), pp. 286–295. ACL Press, BeiJing (2010)
Google Scholar
Xu, Z., Qian, X., Zhang, Y., Zhou, Y.: CRF-based Hybrid Model for Word Segmentation, NER and even POS Tagging. In: Sixth SIGHAN Workshop on CLP, pp. 167–170. ACL Press, India (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, University of Macau, Av. Padre Toms Pereira Taipa, Macau, China
Aaron L. -F. Han, Derek F. Wong & Lidia S. Chao

Authors

Aaron L. -F. Han
View author publications
You can also search for this author in PubMed Google Scholar
Derek F. Wong
View author publications
You can also search for this author in PubMed Google Scholar
Lidia S. Chao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248, Warsaw, Poland
Mieczysław A. Kłopotek , Jacek Koronacki , Małgorzata Marciniak & Agnieszka Mykowiecka , , &
Institute of Computer Science, Polish Academy of Sciences, ul. Brzegi 55, 80-045, Gdańsk, Poland
Sławomir T. Wierzchoń

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, A.L.F., Wong, D.F., Chao, L.S. (2013). Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics. In: Kłopotek, M.A., Koronacki, J., Marciniak, M., Mykowiecka, A., Wierzchoń, S.T. (eds) Language Processing and Intelligent Information Systems. IIS 2013. Lecture Notes in Computer Science, vol 7912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38634-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-38634-3_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38633-6
Online ISBN: 978-3-642-38634-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics