Cluster Computing

, Volume 22, Supplement 3, pp 5195–5206 | Cite as

Named entity recognition based on conditional random fields

  • Shengli SongEmail author
  • Nan Zhang
  • Haitao Huang


Named entity recognition (NER) is one of the fundamental problems in many natural language processing applications and the study on NER has great significance. Combining words segmentation and parts of speech analysis, the paper proposes a new NER method based on conditional random fields considering the graininess of candidate entities. The recognition granularity can be divided into two levels: word-based and character-based. We use segmented text to extract characteristics according to the characteristic templates which had been trained in the training phase, and then calculate \(P(y{\vert }x)\) to get the best result from the input sequence. The paper valuates the algorithm for different graininess on large-scale corpus experimentally, and the results show that this method has high research value and feasibility.


Named entity recognition Conditional random fields Graininess 


  1. 1.
    Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)CrossRefGoogle Scholar
  2. 2.
    Bhargava, R., Vamsi, B., Sharma, Y.: Named entity recognition for code mixing in indian languages using hybrid approach. Facilities 23, 10 (2016)Google Scholar
  3. 3.
    Şeker, G.A., Eryiğit, G.: Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content. Semant. Web 8(5), 625–642 (2017)CrossRefGoogle Scholar
  4. 4.
    Lample, G., Ballesteros, M., Subramanian, S., et al.: Neural architectures for named entity recognition (2016). arXiv:1603.01360
  5. 5.
    Chowdhury, G.G.: Natural language processing. Ann. Rev. Inf. Sci. Technol. 37(1), 51–89 (2003)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Baum, L.E., Petrie, T.: Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Stat. 37(6), 1554–1563 (1966)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Müller, H.M., Kenny, E.E., Sternberg, P.W.: Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol. 2(11), e309 (2004)CrossRefGoogle Scholar
  8. 8.
    Lehnert, W.G.: The Process of Question Answering: A Computer Simulation of Cognition. Lawrence Erlbaum Associates, Hillsdale (1978)zbMATHGoogle Scholar
  9. 9.
    Cho, K., Van Merriënboer, B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014). arXiv:1406.1078
  10. 10.
    Goldberg, D.E., Holland, J.H.: Genetic algorithms and machine learning. Mach. Learn. 3(2), 95–99 (1988)CrossRefGoogle Scholar
  11. 11.
    Suxiang, Z.: Based cascaded conditional random fields model for Chinese Named Entity recognition In: Signal Processing. ICSP 2008. 9th International Conference on. IEEE, pp. 1573–1577 (2008)Google Scholar
  12. 12.
    Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing multilingual text analysis tools via robust projection across aligned corpora. In: Human Language Technology Conference, pp. 109–116 (2001)Google Scholar
  13. 13.
    Kim, S., Toutanova, K., Yu, H.: Multilingual named entity recognition using parallel data and metadata from Wikipedia. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (2012)Google Scholar
  14. 14.
    Fu, R., Qin, B., Liu, T.: Generating Chinese named entity data from a parallel corpus. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, pp. 264–272 (2011)Google Scholar
  15. 15.
    Muslea, I., Minton, S., Knoblock, C.A.: Active learning with multiple views. J. Artif. Intell. Res. 27, 203–233 (2006)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Jones, R., Ghani, R., Mitchell, T., Rilo, E.: Active learning for information extraction with multiple view. In: Proceedings of the European Conference in Machine Learning (ECML 2003), vol. 77, pp. 257–286 (2003)Google Scholar
  17. 17.
    Li, Q., Li, H., Ji, H.: Joint bilingual name tagging for parallel corpora. In: Proceedings of CIKM 2012 (2012)Google Scholar
  18. 18.
    Mao, X., Dong, Y., He, S., et al.: Chinese word segmentation and named entity recognition based on conditional random fields. In: IJCNLP, pp. 90–93 (2008)Google Scholar
  19. 19.
    McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL -Volume 4. Association for Computational Linguistics, vol. 2003, pp. 188–191 (2003)Google Scholar
  20. 20.
    Zhao, H., Huang, C.N., Li, M.: An improved Chinese word segmentation system with conditional random field. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing. Sydney: July, 1082117 (2006)Google Scholar
  21. 21.
    Joseph, K.: Bradley and Carlos Guestrin. Learning tree conditional random Felds. In: International Conference on Machine Learning (ICML 2010) (2010)Google Scholar
  22. 22.
    Tran, T., Phung, D., Bui, H., et al.: Hierarchical semi-Markov conditional random fields for deep recursive sequential data. Artif. Intell. 246, 53–85 (2017)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)Google Scholar
  24. 24.
    Sutton, C., McCallum, A.: An introduction to conditional random fields, pp. 21–23 (2010). arXiv:1011.4088v1 [stat.ML]
  25. 25.
    McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the 17th International Conference on Machine Learning (ICML’ 2000), pp. 591–598 (2000)Google Scholar
  26. 26.
    Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden Markov model structure for information extraction. In: Proceedings of AAAI’1999 Workshop on Machine Learning for Information Extraction (1999)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Software Engineering InstituteXidian UniversityXi’anChina
  2. 2.School of Computer Science and TechnologyXidian UniversityXi’anChina

Personalised recommendations