Skip to main content

Advertisement

Log in

Classifier Ensemble Selection Using Genetic Algorithm for Named Entity Recognition

  • Published:
Research on Language and Computation

Abstract

In this paper, we propose a classifier ensemble technique based on genetic algorithm (GA) for named entity recognition (NER). We assume that the classifiers based on different feature representations can be effectively combined together using GA to achieve better performance. The proposed approach is also able to find the appropriate ensemble approach, i.e. either majority voting or weighted voting. Maximum entropy (ME) model is used as a base to generate a number of different classifiers depending upon the various representations of the available features. The proposed approach is evaluated for three leading Indian languages, namely Bengali, Hindi and Telugu. Evaluation results yield the recall, precision and F-measure values of 88.12, 93.99 and 90.96%, respectively for Bengali, 80.26, 92.70 and 86.03%, respectively for Hindi and 74.79, 85.38 and 79.73%, respectively for Telugu. We also evaluate the proposed approach with the CoNLL-2003 benchmark English datasets and it shows the recall, precision and F-measure values of 83.05, 85.52 and 84.27%, respectively. It is observed that the GA based ensemble attains the performance which is superior to all the individual classifiers as well as two conventional baseline ensembles for all the languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alba E., Luque G., Araujo L. (2006) Natural language tagging with genetic algorithms. Information Processing Letters 100(5): 173–182

    Article  Google Scholar 

  • Alfonseca, E., & Manandhar, S. (1999). An unsupervised method for general named entity recognition and automated concept discovery. In Proceedings AAAI ’99/IAAI ’99: Proceedings of the sixteenth national conference on artificial intelligence and the eleventh conference on innovative applications of artificial intelligence (pp. 474–479).

  • Anderson T. W., Scolve S. (1978) Introduction to the statistical analysis of data. Houghton Mifflin, Boston

    Google Scholar 

  • Aone, C., Halverson, L., Hampton, T., & Ramos-Santacruz, M. (1998). SRA: Description of the IE2 system used for MUC-7. In MUC-7, Fairfax, Virginia.

  • Araujo L. (2007) How evolutionary algorithms are applied to statistical natural language processing. Artificial Intelligence Review 28(4): 275–303

    Article  Google Scholar 

  • Babych, B., & Hartley, A. (2003). Improving machine translation quality with automatic named entity recognition. In Proceedings of EAMT/EACL 2003 workshop on MT and other language technology tools (pp. 1–8).

  • Bennet, S. W., Aone, C., & Lovell, C. (1997). Learning to tag multilingual texts through observation. In Proceedings of empirical methods of natural language processing (pp. 109–116). Providence, Rhode Island.

  • Bikel D. M., Schwartz R. L., Weischedel R. M. (1999) An algorithm that learns what’s in a name. Machine Learning 34(1–3): 211–231

    Article  Google Scholar 

  • Blasband, M. (1998). GAG: Genetic algorithms for grammars. Technical report, Compuleer.

  • Borthwick, A. (1999). Maximum entropy approach to named entity recognition. PhD thesis, New York University.

  • Borthwick, A., Sterling, J., Agichtein, E., & Grishman, R. (1998). NYU:Description of the MENE named entity system as used in MUC-7. In MUC-7, Fairfax.

  • Collins, M., & Singer, Y. (1999). Unsupervised models for named entity classification. In Proceedings of the joint SIGDAT conference on empirical methods in natural language processing and very large corpora.

  • Cunningham H. (2002) GATE, a general architecture for text engineering. Computers and the Humanities 36: 223–254

    Article  Google Scholar 

  • Darroch J., Ratcliff D. (1972) Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics 43: 1470–1480

    Article  Google Scholar 

  • De Jong K. A., Spears W. M., Gordon D. F. (1993) Using genetic algorithms for concept learning. Machine Learning 13(2–3): 161–188

    Article  Google Scholar 

  • Ekbal, A., & Bandyopadhyay, S. (2007). Lexical pattern learning from corpus data for named entity recognition. In Proceedings of the 5th international conference on natural language processing (ICON) (pp. 123–128). India.

  • Ekbal, A., & Bandyopadhyay, S. (2008a). Bengali named entity recognition using support vector machine. In Proceedings of workshop on NER for south and south east Asian languages, 3rd international joint conference on natural languge processing (IJCNLP) (pp. 51–58). India.

  • Ekbal A., Bandyopadhyay S. (2008b) A web-based Bengali news corpus for named entity recognition. Language Resources and Evaluation Journal 42(2): 173–182

    Article  Google Scholar 

  • Ekbal A., Bandyopadhyay S. (2008c) Web-based Bengali news corpus for lexicon development and POS tagging. POLIBITS, ISSN 1870–9044 37: 20–29

    Google Scholar 

  • Ekbal A., Bandyopadhyay S. (2009a) A conditional random field approach for named entity recognition in Bengali and Hindi. Linguistic Issues in Language Technology (LiLT) 2(1): 1–44

    Google Scholar 

  • Ekbal, A., & Bandyopadhyay, S. (2009b). Voted NER system using appropriate unlabeled data. In Proceedings of the 2009 named entities workshop: Shared task on transliteration (NEWS 2009), ACL-IJCNLP 2009 (pp. 202–210).

  • Ekbal A., Naskar S., Bandyopadhyay S. (2007) Named entity recognition and transliteration in Bengali. Named Entities: Recognition, Classification and Use, Special Issue of Lingvisticae Investigationes Journal 30(1): 95–114

    Google Scholar 

  • Florian, R., Ittycheriah, A., Jing, H., & Zhang, T. (2003). Named entity recognition through classifier combination. In Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003.

  • Goldberg D. E. (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, New York

    Google Scholar 

  • Holland J. H. (1975) Adaptation in natural and artificial systems. The University of Michigan Press, AnnArbor

    Google Scholar 

  • Humphreys, K., Gaizauskas, R., Azzam, S., Huyck, C., Mitchell, B., Cunnigham, H., et al. (1998). University of Sheffield: Description of the LaSIE-II system as used for MUC-7. In MUC-7, Fairfax, Virginia.

  • Jain A., Zongker D. (1997) Feature selection: Evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence 19: 153–158

    Article  Google Scholar 

  • Kazakov, D. (1997). Unsupervised learning of naive morphology with genetic algorithms. In ECML/Mlnet workshop on empirical learning of natural language processing tasks (pp. 105–112). Prague.

  • Kool, A., Daelemans, W., & Zavrel, J. (2000). Genetic algorithms for feature relevance assignment in memory-based language processing. In Proceedings of the 2nd workshop on learning language in logic and the 4th conference on computational natural language learning (pp. 103–106). Association for Computational Linguistics.

  • Lafferty, J. D., McCallum, A., & Pereira, F. C. N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML (pp. 282–289).

  • Lankhorst, M. M. (1994). Breeding grammars: Grammatical inference with a genetic algorithm. In Proceedings of the 1994 Eurosim conference on massively parallel processing applications and development (pp. 423–430). Elsevier.

  • Li W., McCallum A. (2004) Rapid development of Hindi named entity recognition using conditional random fields and feature induction. ACM Transactions on Asian Languages Information Processing 2(3): 290–294

    Article  Google Scholar 

  • Losee, R. M. (2000). Learning syntactic rules and tags with genetic algorithms for information retrieval and filtering: An empirical basis for grammatical rules. Information Processing & Management 185–197.

  • Martin-Bautista, M. J., & Vila, M. A. (1999). A survey of genetic feature selection in mining issues. In Proceeding of congress on evolutionary computation (CEC-99) (pp. 1314–1321).

  • McCallum, A., & Li, W. (2003). Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of CoNLL, Canada (pp. 188–191).

  • Mikheev, A., Grover, C., & Moens, M. (1998). Description of the LTG system used for MUC-7. In MUC-7, Fairfax, Virginia.

  • Mikheev, A., Grover, C., & Moens, M. (1999). Named entity recognition without gazeteers. In Proceedings of EACL (pp. 1–8). Bergen, Norway.

  • Miller, S., Crystal, M., Fox, H., Ramshaw, L., Schawartz, R., Stone, R., et al. (1998). BBN: Description of the SIFT system as used for MUC-7. In MUC-7, Fairfax, Virginia.

  • Moldovan, D., Harabagiu, S., Girju, R., Morarescu, P., Lacatusu, F., Novischi, A., et al. (2002). LCC tools for question answering. In Text REtrieval Conference (TREC).

  • Pasca, M., Lin, D., Bigham, J., Lifchits, A., & Jain, A. (2006). Organizing and searching the World Wide Web of facts-step one: The one-million fact extraction challenge. In Proceedings of national conference on artificial intelligence (AAAI-06).

  • Pietra D., Stephen V., Lafferty J. (1997) Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19: 380–393

    Article  Google Scholar 

  • Raymer M., Punch W., Goodman E., Kuhn L., Jain A. (2000) Dimensionality reduction using genetic algorithm. IEEE Transactions on Evolutionary Computation 4: 164–171

    Article  Google Scholar 

  • Riloff, E., & Jones, R. (1999). Learning dictionaries for information extraction by multi-level bootstrapping. In Proceedings AAAI ’99/IAAI ’99: Proceedings of the sixteenth national conference on artificial intelligence and the eleventh conference on innovative applications of artificial intelligence (pp. 474–479).

  • Sekine, S. (1998). Description of the Japanese NE system used for MET-2. In MUC-7, Fairfax, Virginia.

  • Shinyama, Y., & Sekine, S. (2004). Named entity discovery using comparable news articles. In Proceedings of the international conference on computational linguistics (COLING), Switzerland (pp. 848–855).

  • Smith, T. C., & Witten, I. H. (1995). A genetic algorithm for the induction of natural language grammars. In Proc IJCAI-95 workshop on new approaches to learning for natural language processing (pp. 17–24).

  • Srikanth, P., & Murthy, K. N. (2008). Named entity recognition for Telugu. In Proceedings of the IJCNLP-08 workshop on NER for south and south east Asian languages (pp. 41–50).

  • Srihari, R., Niu, C., & Li, W. (2002). A hybrid approach for named entity and sub-type tagging. In: Proceedings of sixth conference on applied natural language processing (ANLP) (pp. 247–254).

  • Srinivas M., Patnaik L. M. (1994) Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Transactions on Systems, Man and Cybernatics 24(4): 656–667

    Article  Google Scholar 

  • Tjong Kim Sang, E. F., & De Meulder, F. (2003). Introduction to the CoNLL-2003 shared task: Language independent named entity recognition. In Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003 (pp. 142–147).

  • Vijayakrishna, R., & Sobha, L. (2008). Domain focused named entity recognizer for Tamil using conditional random fields. In Proceedings of the IJCNLP-08 workshop on NER for south and south east Asian languages (pp. 93–100).

  • Wang H., Dai D. (1996) An inductive method with genetic algorithm for learning phrase-structure-rule of natural language. Wuhan University Journal of Natural Sciences 1: 640–644

    Article  Google Scholar 

  • Yangarber, R., Lin, W., & Grishman, R. (2002). Unsupervised learning of generalized names. In Proceedings of the 19th international conference on computational linguistics (COLING-2002) (pp. 1–7).

  • Yu, X. (2007). Chinese named entity recognition with Cascaded hybrid model. In Proceedings of NAACL HLT 2007 (pp. 197–200). Prague.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asif Ekbal.

Additional information

Asif Ekbal and Sriparna Saha have equally contributed to this article.

About this article

Cite this article

Ekbal, A., Saha, S. Classifier Ensemble Selection Using Genetic Algorithm for Named Entity Recognition. Res on Lang and Comput 8, 73–99 (2010). https://doi.org/10.1007/s11168-010-9071-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11168-010-9071-0

Keywords

Navigation