Classifier Ensemble Selection Using Genetic Algorithm for Named Entity Recognition

Ekbal, Asif; Saha, Sriparna

doi:10.1007/s11168-010-9071-0

Classifier Ensemble Selection Using Genetic Algorithm for Named Entity Recognition

Published: 28 December 2010

Volume 8, pages 73–99, (2010)
Cite this article

Research on Language and Computation

Asif Ekbal¹ &
Sriparna Saha¹

138 Accesses
12 Citations
3 Altmetric
Explore all metrics

Abstract

In this paper, we propose a classifier ensemble technique based on genetic algorithm (GA) for named entity recognition (NER). We assume that the classifiers based on different feature representations can be effectively combined together using GA to achieve better performance. The proposed approach is also able to find the appropriate ensemble approach, i.e. either majority voting or weighted voting. Maximum entropy (ME) model is used as a base to generate a number of different classifiers depending upon the various representations of the available features. The proposed approach is evaluated for three leading Indian languages, namely Bengali, Hindi and Telugu. Evaluation results yield the recall, precision and F-measure values of 88.12, 93.99 and 90.96%, respectively for Bengali, 80.26, 92.70 and 86.03%, respectively for Hindi and 74.79, 85.38 and 79.73%, respectively for Telugu. We also evaluate the proposed approach with the CoNLL-2003 benchmark English datasets and it shows the recall, precision and F-measure values of 83.05, 85.52 and 84.27%, respectively. It is observed that the GA based ensemble attains the performance which is superior to all the individual classifiers as well as two conventional baseline ensembles for all the languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Alba E., Luque G., Araujo L. (2006) Natural language tagging with genetic algorithms. Information Processing Letters 100(5): 173–182
Article Google Scholar
Alfonseca, E., & Manandhar, S. (1999). An unsupervised method for general named entity recognition and automated concept discovery. In Proceedings AAAI ’99/IAAI ’99: Proceedings of the sixteenth national conference on artificial intelligence and the eleventh conference on innovative applications of artificial intelligence (pp. 474–479).
Anderson T. W., Scolve S. (1978) Introduction to the statistical analysis of data. Houghton Mifflin, Boston
Google Scholar
Aone, C., Halverson, L., Hampton, T., & Ramos-Santacruz, M. (1998). SRA: Description of the IE2 system used for MUC-7. In MUC-7, Fairfax, Virginia.
Araujo L. (2007) How evolutionary algorithms are applied to statistical natural language processing. Artificial Intelligence Review 28(4): 275–303
Article Google Scholar
Babych, B., & Hartley, A. (2003). Improving machine translation quality with automatic named entity recognition. In Proceedings of EAMT/EACL 2003 workshop on MT and other language technology tools (pp. 1–8).
Bennet, S. W., Aone, C., & Lovell, C. (1997). Learning to tag multilingual texts through observation. In Proceedings of empirical methods of natural language processing (pp. 109–116). Providence, Rhode Island.
Bikel D. M., Schwartz R. L., Weischedel R. M. (1999) An algorithm that learns what’s in a name. Machine Learning 34(1–3): 211–231
Article Google Scholar
Blasband, M. (1998). GAG: Genetic algorithms for grammars. Technical report, Compuleer.
Borthwick, A. (1999). Maximum entropy approach to named entity recognition. PhD thesis, New York University.
Borthwick, A., Sterling, J., Agichtein, E., & Grishman, R. (1998). NYU:Description of the MENE named entity system as used in MUC-7. In MUC-7, Fairfax.
Collins, M., & Singer, Y. (1999). Unsupervised models for named entity classification. In Proceedings of the joint SIGDAT conference on empirical methods in natural language processing and very large corpora.
Cunningham H. (2002) GATE, a general architecture for text engineering. Computers and the Humanities 36: 223–254
Article Google Scholar
Darroch J., Ratcliff D. (1972) Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics 43: 1470–1480
Article Google Scholar
De Jong K. A., Spears W. M., Gordon D. F. (1993) Using genetic algorithms for concept learning. Machine Learning 13(2–3): 161–188
Article Google Scholar
Ekbal, A., & Bandyopadhyay, S. (2007). Lexical pattern learning from corpus data for named entity recognition. In Proceedings of the 5th international conference on natural language processing (ICON) (pp. 123–128). India.
Ekbal, A., & Bandyopadhyay, S. (2008a). Bengali named entity recognition using support vector machine. In Proceedings of workshop on NER for south and south east Asian languages, 3rd international joint conference on natural languge processing (IJCNLP) (pp. 51–58). India.
Ekbal A., Bandyopadhyay S. (2008b) A web-based Bengali news corpus for named entity recognition. Language Resources and Evaluation Journal 42(2): 173–182
Article Google Scholar
Ekbal A., Bandyopadhyay S. (2008c) Web-based Bengali news corpus for lexicon development and POS tagging. POLIBITS, ISSN 1870–9044 37: 20–29
Google Scholar
Ekbal A., Bandyopadhyay S. (2009a) A conditional random field approach for named entity recognition in Bengali and Hindi. Linguistic Issues in Language Technology (LiLT) 2(1): 1–44
Google Scholar
Ekbal, A., & Bandyopadhyay, S. (2009b). Voted NER system using appropriate unlabeled data. In Proceedings of the 2009 named entities workshop: Shared task on transliteration (NEWS 2009), ACL-IJCNLP 2009 (pp. 202–210).
Ekbal A., Naskar S., Bandyopadhyay S. (2007) Named entity recognition and transliteration in Bengali. Named Entities: Recognition, Classification and Use, Special Issue of Lingvisticae Investigationes Journal 30(1): 95–114
Google Scholar
Florian, R., Ittycheriah, A., Jing, H., & Zhang, T. (2003). Named entity recognition through classifier combination. In Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003.
Goldberg D. E. (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, New York
Google Scholar
Holland J. H. (1975) Adaptation in natural and artificial systems. The University of Michigan Press, AnnArbor
Google Scholar
Humphreys, K., Gaizauskas, R., Azzam, S., Huyck, C., Mitchell, B., Cunnigham, H., et al. (1998). University of Sheffield: Description of the LaSIE-II system as used for MUC-7. In MUC-7, Fairfax, Virginia.
Jain A., Zongker D. (1997) Feature selection: Evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence 19: 153–158
Article Google Scholar
Kazakov, D. (1997). Unsupervised learning of naive morphology with genetic algorithms. In ECML/Mlnet workshop on empirical learning of natural language processing tasks (pp. 105–112). Prague.
Kool, A., Daelemans, W., & Zavrel, J. (2000). Genetic algorithms for feature relevance assignment in memory-based language processing. In Proceedings of the 2nd workshop on learning language in logic and the 4th conference on computational natural language learning (pp. 103–106). Association for Computational Linguistics.
Lafferty, J. D., McCallum, A., & Pereira, F. C. N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML (pp. 282–289).
Lankhorst, M. M. (1994). Breeding grammars: Grammatical inference with a genetic algorithm. In Proceedings of the 1994 Eurosim conference on massively parallel processing applications and development (pp. 423–430). Elsevier.
Li W., McCallum A. (2004) Rapid development of Hindi named entity recognition using conditional random fields and feature induction. ACM Transactions on Asian Languages Information Processing 2(3): 290–294
Article Google Scholar
Losee, R. M. (2000). Learning syntactic rules and tags with genetic algorithms for information retrieval and filtering: An empirical basis for grammatical rules. Information Processing & Management 185–197.
Martin-Bautista, M. J., & Vila, M. A. (1999). A survey of genetic feature selection in mining issues. In Proceeding of congress on evolutionary computation (CEC-99) (pp. 1314–1321).
McCallum, A., & Li, W. (2003). Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of CoNLL, Canada (pp. 188–191).
Mikheev, A., Grover, C., & Moens, M. (1998). Description of the LTG system used for MUC-7. In MUC-7, Fairfax, Virginia.
Mikheev, A., Grover, C., & Moens, M. (1999). Named entity recognition without gazeteers. In Proceedings of EACL (pp. 1–8). Bergen, Norway.
Miller, S., Crystal, M., Fox, H., Ramshaw, L., Schawartz, R., Stone, R., et al. (1998). BBN: Description of the SIFT system as used for MUC-7. In MUC-7, Fairfax, Virginia.
Moldovan, D., Harabagiu, S., Girju, R., Morarescu, P., Lacatusu, F., Novischi, A., et al. (2002). LCC tools for question answering. In Text REtrieval Conference (TREC).
Pasca, M., Lin, D., Bigham, J., Lifchits, A., & Jain, A. (2006). Organizing and searching the World Wide Web of facts-step one: The one-million fact extraction challenge. In Proceedings of national conference on artificial intelligence (AAAI-06).
Pietra D., Stephen V., Lafferty J. (1997) Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19: 380–393
Article Google Scholar
Raymer M., Punch W., Goodman E., Kuhn L., Jain A. (2000) Dimensionality reduction using genetic algorithm. IEEE Transactions on Evolutionary Computation 4: 164–171
Article Google Scholar
Riloff, E., & Jones, R. (1999). Learning dictionaries for information extraction by multi-level bootstrapping. In Proceedings AAAI ’99/IAAI ’99: Proceedings of the sixteenth national conference on artificial intelligence and the eleventh conference on innovative applications of artificial intelligence (pp. 474–479).
Sekine, S. (1998). Description of the Japanese NE system used for MET-2. In MUC-7, Fairfax, Virginia.
Shinyama, Y., & Sekine, S. (2004). Named entity discovery using comparable news articles. In Proceedings of the international conference on computational linguistics (COLING), Switzerland (pp. 848–855).
Smith, T. C., & Witten, I. H. (1995). A genetic algorithm for the induction of natural language grammars. In Proc IJCAI-95 workshop on new approaches to learning for natural language processing (pp. 17–24).
Srikanth, P., & Murthy, K. N. (2008). Named entity recognition for Telugu. In Proceedings of the IJCNLP-08 workshop on NER for south and south east Asian languages (pp. 41–50).
Srihari, R., Niu, C., & Li, W. (2002). A hybrid approach for named entity and sub-type tagging. In: Proceedings of sixth conference on applied natural language processing (ANLP) (pp. 247–254).
Srinivas M., Patnaik L. M. (1994) Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Transactions on Systems, Man and Cybernatics 24(4): 656–667
Article Google Scholar
Tjong Kim Sang, E. F., & De Meulder, F. (2003). Introduction to the CoNLL-2003 shared task: Language independent named entity recognition. In Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003 (pp. 142–147).
Vijayakrishna, R., & Sobha, L. (2008). Domain focused named entity recognizer for Tamil using conditional random fields. In Proceedings of the IJCNLP-08 workshop on NER for south and south east Asian languages (pp. 93–100).
Wang H., Dai D. (1996) An inductive method with genetic algorithm for learning phrase-structure-rule of natural language. Wuhan University Journal of Natural Sciences 1: 640–644
Article Google Scholar
Yangarber, R., Lin, W., & Grishman, R. (2002). Unsupervised learning of generalized names. In Proceedings of the 19th international conference on computational linguistics (COLING-2002) (pp. 1–7).
Yu, X. (2007). Chinese named entity recognition with Cascaded hybrid model. In Proceedings of NAACL HLT 2007 (pp. 197–200). Prague.

Download references

Author information

Authors and Affiliations

Department of Information Engineering and Computer Science, University of Trento, Trento, Italy
Asif Ekbal & Sriparna Saha

Authors

Asif Ekbal
View author publications
You can also search for this author in PubMed Google Scholar
Sriparna Saha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Asif Ekbal.

Additional information

Asif Ekbal and Sriparna Saha have equally contributed to this article.

About this article

Cite this article

Ekbal, A., Saha, S. Classifier Ensemble Selection Using Genetic Algorithm for Named Entity Recognition. Res on Lang and Comput 8, 73–99 (2010). https://doi.org/10.1007/s11168-010-9071-0

Download citation

Published: 28 December 2010
Issue Date: March 2010
DOI: https://doi.org/10.1007/s11168-010-9071-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classifier Ensemble Selection Using Genetic Algorithm for Named Entity Recognition

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Introduction to Machine Learning

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Keywords

Navigation

Classifier Ensemble Selection Using Genetic Algorithm for Named Entity Recognition

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Introduction to Machine Learning

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Share this article

Keywords

Search

Navigation