Skip to main content
Log in

Named entity recognition based on conditional random fields

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Named entity recognition (NER) is one of the fundamental problems in many natural language processing applications and the study on NER has great significance. Combining words segmentation and parts of speech analysis, the paper proposes a new NER method based on conditional random fields considering the graininess of candidate entities. The recognition granularity can be divided into two levels: word-based and character-based. We use segmented text to extract characteristics according to the characteristic templates which had been trained in the training phase, and then calculate \(P(y{\vert }x)\) to get the best result from the input sequence. The paper valuates the algorithm for different graininess on large-scale corpus experimentally, and the results show that this method has high research value and feasibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)

    Article  Google Scholar 

  2. Bhargava, R., Vamsi, B., Sharma, Y.: Named entity recognition for code mixing in indian languages using hybrid approach. Facilities 23, 10 (2016)

    Google Scholar 

  3. Şeker, G.A., Eryiğit, G.: Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content. Semant. Web 8(5), 625–642 (2017)

    Article  Google Scholar 

  4. Lample, G., Ballesteros, M., Subramanian, S., et al.: Neural architectures for named entity recognition (2016). arXiv:1603.01360

  5. Chowdhury, G.G.: Natural language processing. Ann. Rev. Inf. Sci. Technol. 37(1), 51–89 (2003)

    Article  MathSciNet  Google Scholar 

  6. Baum, L.E., Petrie, T.: Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Stat. 37(6), 1554–1563 (1966)

    Article  MathSciNet  Google Scholar 

  7. Müller, H.M., Kenny, E.E., Sternberg, P.W.: Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol. 2(11), e309 (2004)

    Article  Google Scholar 

  8. Lehnert, W.G.: The Process of Question Answering: A Computer Simulation of Cognition. Lawrence Erlbaum Associates, Hillsdale (1978)

    MATH  Google Scholar 

  9. Cho, K., Van Merriënboer, B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014). arXiv:1406.1078

  10. Goldberg, D.E., Holland, J.H.: Genetic algorithms and machine learning. Mach. Learn. 3(2), 95–99 (1988)

    Article  Google Scholar 

  11. Suxiang, Z.: Based cascaded conditional random fields model for Chinese Named Entity recognition In: Signal Processing. ICSP 2008. 9th International Conference on. IEEE, pp. 1573–1577 (2008)

  12. Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing multilingual text analysis tools via robust projection across aligned corpora. In: Human Language Technology Conference, pp. 109–116 (2001)

  13. Kim, S., Toutanova, K., Yu, H.: Multilingual named entity recognition using parallel data and metadata from Wikipedia. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (2012)

  14. Fu, R., Qin, B., Liu, T.: Generating Chinese named entity data from a parallel corpus. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, pp. 264–272 (2011)

  15. Muslea, I., Minton, S., Knoblock, C.A.: Active learning with multiple views. J. Artif. Intell. Res. 27, 203–233 (2006)

    Article  MathSciNet  Google Scholar 

  16. Jones, R., Ghani, R., Mitchell, T., Rilo, E.: Active learning for information extraction with multiple view. In: Proceedings of the European Conference in Machine Learning (ECML 2003), vol. 77, pp. 257–286 (2003)

  17. Li, Q., Li, H., Ji, H.: Joint bilingual name tagging for parallel corpora. In: Proceedings of CIKM 2012 (2012)

  18. Mao, X., Dong, Y., He, S., et al.: Chinese word segmentation and named entity recognition based on conditional random fields. In: IJCNLP, pp. 90–93 (2008)

  19. McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL -Volume 4. Association for Computational Linguistics, vol. 2003, pp. 188–191 (2003)

  20. Zhao, H., Huang, C.N., Li, M.: An improved Chinese word segmentation system with conditional random field. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing. Sydney: July, 1082117 (2006)

  21. Joseph, K.: Bradley and Carlos Guestrin. Learning tree conditional random Felds. In: International Conference on Machine Learning (ICML 2010) (2010)

  22. Tran, T., Phung, D., Bui, H., et al.: Hierarchical semi-Markov conditional random fields for deep recursive sequential data. Artif. Intell. 246, 53–85 (2017)

    Article  MathSciNet  Google Scholar 

  23. Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)

    Google Scholar 

  24. Sutton, C., McCallum, A.: An introduction to conditional random fields, pp. 21–23 (2010). arXiv:1011.4088v1 [stat.ML]

  25. McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the 17th International Conference on Machine Learning (ICML’ 2000), pp. 591–598 (2000)

  26. Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden Markov model structure for information extraction. In: Proceedings of AAAI’1999 Workshop on Machine Learning for Information Extraction (1999)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shengli Song.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, S., Zhang, N. & Huang, H. Named entity recognition based on conditional random fields. Cluster Comput 22 (Suppl 3), 5195–5206 (2019). https://doi.org/10.1007/s10586-017-1146-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-1146-3

Keywords

Navigation