Improving the Performance of a NER System by Post-processing and Voting

  • Asif Ekbal
  • Sivaji Bandyopadhyay
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5342)

Abstract

This paper reports about the development of a NER system in Bengali by combining outputs of the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM). The training set consists of approximately 250K wordforms and has been manually annotated with the four major named entity (NE) tags such as Person, Location, Organization and Miscellaneous tags. The classifiers make use of the different contextual information of the words along with the variety of features that are helpful in predicting the various NE classes. Lexical context patterns, which are generated from an unlabeled corpus of 1 million wordforms in a semi-automatic way, have been used as the features of the classifiers in order to improve their performance. In addition, we have used the second best tags of the classifiers and applied several heuristics to improve the performance. Finally, the classifiers are combined using a majority voting approach. Experimental results show the effectiveness of the proposed approach with the overall average recall, precision, and f-score values of 90.78%, 87.35%, and 89.03%, respectively, which shows an improvement of 11.8% in f-score over the best performing SVM based baseline system and an improvement of 15.11% in f-score over the least performing ME based baseline system. The proposed system also outperforms the other existing Bengali NER system.

Keywords

Natural Language Processing Named Entity Recognition Maximum Entropy Conditional Random Field Support Vector Machine Majority Voting 

References

  1. 1.
    Bikel, D.M., Schwartz, R., Weischedel, R.M.: An Algorithm that Learns What’s in Name. Machine Learning (Special Issue on NLP), 1–20 (1999) Google Scholar
  2. 2.
    Bothwick, A.: A Maximum Entropy Approach to Named Entity Recognition. Ph.D. Thesis, New York University (1999) Google Scholar
  3. 3.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proc. of 18th ICML, pp. 282–289 (2001) Google Scholar
  4. 4.
    Hiroyasu, Y., Kudo, T., Matsumoto, Y.: Japanese Named Entity Extraction using Support Vector Machine. Transactions of IPSJ 43(1), 44–53 (2003)Google Scholar
  5. 5.
    Wu, D., Ngai, G., Carpuat, M.: A Stacked, Voted, Stacked Model for Named Entity Recognition. In: Proceedings of CoNLL 2003 (2003) Google Scholar
  6. 6.
    Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named Entity Recognition through Classifier Combination. In: Proceedings of CoNLL 2003 (2003) Google Scholar
  7. 7.
    Munro, R., Ler, D., Patrick, J.: Meta-learning Orthographic and Contextual Models for Language Independent Named Entity Recognition. In: Proceedings of CoNLL 2003 (2003) Google Scholar
  8. 8.
    Ekbal, A., Bandyopadhyay, S.: Lexical Pattern Learning from Corpus Data for Named Entity Recognition. In: Proc. of 5th ICON, India, pp. 123–128 (2007) Google Scholar
  9. 9.
    Ekbal, A., Naskar, S., Bandyopadhyay, S.: Named Entity Recognition and Transliteration in Bengali. Named Entities: Recognition, Classification and Use, Special Issue of Lingvisticae Investigationes Journal 30(1), 95–114 (2007)Google Scholar
  10. 10.
    Ekbal, A., Haque, R., Bandyopadhyay, S.: Named Entity Recognition in Bengali: A Conditional Random Field Approach. In: Proc. of IJCNLP 2008, pp. 589–594 (2008) Google Scholar
  11. 11.
    Ekbal, A., Bandyopadhyay, S.: Bengali Named Entity Recognition using Support Vector Machine. In: Proc. of NERSSEAL, IJCNLP 2008, pp. 51–58 (2008) Google Scholar
  12. 12.
    Li, Wei, McCallum, Andrew: Rapid Development of Hindi Named Entity Recognition Using Conditional Random Fields and Feature Inductions. ACM TALIP 2(3), 290–294 (2003)Google Scholar
  13. 13.
    Cucerzan, S., Yarowsky, D.: Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence. In: Proc. of the Joint SIGDAT Conference on EMNLP and VLC, pp. 90–99 (1999) Google Scholar
  14. 14.
    Saha, S., Sarkar, S., Mitra, P.: A Hybrid Feature Set based Maximum Entropy Hindi Named Entity Recognition. In: Proc. of IJCNLP 2008, pp. 343-349 (2008) Google Scholar
  15. 15.
    Kumar, N., Bhattacharyya, P.: Named Entity Recognition in Hindi using MEMM. Technical Report, IIT Bombay, India (2006) Google Scholar
  16. 16.
    Ekbal, A., Bandyopadhyay, S.: A Web-based Bengali News Corpus for Named Entity Recognition. Language Resources and Evaluation Journal 40 (2008) Google Scholar
  17. 17.
    Niu, C.g., Li, W., Ding, J., Srihari, R.: A Bootstrapping Approach to Named Entity Classification Using Sucessive Learners. In: Proc. of ACL 2003, pp. 335–342 (2003) Google Scholar
  18. 18.
    Ekbal, A., Haque, R., Bandyopadhyay, S.: Bengali Part of Speech Tagging using Conditional Random Field. In: Proc. of SNLP, Thailand (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Asif Ekbal
    • 1
  • Sivaji Bandyopadhyay
    • 1
  1. 1.Department of Computer Science and EngineeringJadavpur UniversityKolkataIndia

Personalised recommendations