An Approach to the POS Tagging Problem Using Genetic Algorithms

Silva, Ana Paula; Silva, Arlindo; Rodrigues, Irene

doi:10.1007/978-3-319-11271-8_1

Ana Paula Silva⁶,
Arlindo Silva⁶ &
Irene Rodrigues⁷

Part of the book series: Studies in Computational Intelligence ((SCI,volume 577))

Included in the following conference series:

International Joint Conference on Computational Intelligence

910 Accesses
3 Citations

Abstract

The automatic part-of-speech tagging is the process of automatically assigning to the words of a text a part-of-speech (POS) tag. The words of a language are grouped into grammatical categories that represent the function that they might have in a sentence. These grammatical classes (or categories) are usually called part-of-speech. However, in most languages, there are a large number of words that can be used in different ways, thus having more than one possible part-of-speech. To choose the right tag for a particular word, a POS tagger must consider the surrounding words’ part-of-speeches. The neighboring words could also have more than one possible way to be tagged. This means that, in order to solve the problem, we need a method to disambiguate a word’s possible tags set. In this work, we modeled the part-of-speech tagging problem as a combinatorial optimization problem, which we solve using a genetic algorithm. The search for the best combinatorial solution is guided by a set of disambiguation rules that we first discovered using a classification algorithm, that also includes a genetic algorithm. Using rules to disambiguate the tagging, we were able to generalize the context information present on the training tables adopted by approaches based on probabilistic data. We were also able to incorporate other type of information that helps to identify a word’s grammatical class. The results obtained on two different corpora are amongst the best ones published.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Steven Bird, E.K., Loper, E.: Natural Language Processing with Python. O’Reilly Media (2009)
Google Scholar
Brants, T.: Tnt: a statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, ANLC 2000, pp. 224–231. Association for Computational Linguistics, Stroudsburg (2000)
Google Scholar
Araujo, L.: Part-of-speech tagging with evolutionary algorithms. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 230–239. Springer, Heidelberg (2002)
Chapter Google Scholar
Araujo, L.: Symbiosis of evolutionary techniques and statistical natural language processing. IEEE Transactions on Evolutionary Computation 8, 14–27 (2004)
Article Google Scholar
Araujo, L.: How evolutionary algorithms are applied to statistical natural language processing. Artificial Intelligence Review 28, 275–303 (2007)
Article Google Scholar
Araujo, L., Luque, G., Alba, E.: Metaheuristics for natural language tagging. In: Deb, K., Tari, Z. (eds.) GECCO 2004. LNCS, vol. 3102, pp. 889–900. Springer, Heidelberg (2004)
Chapter Google Scholar
Alba, E., Luque, G., Araujo, L.: Natural language tagging with genetic algorithms. Information Processing Letters 100, 173–182 (2006)
Article MathSciNet Google Scholar
Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21, 543–565 (1995)
MathSciNet Google Scholar
Wilson, G., Heywood, M.: Use of a genetic algorithm in brill’s transformation-based part-of-speech tagger. In: Proceedings of the 2005 Conference on Genetic and Evolutionary Computation, GECCO 2005, pp. 2067–2073. ACM, New York (2005)
Google Scholar
Nogueira Dos Santos, C., Milidiú, R.L., Rentería, R.P.: Portuguese part-of-speech tagging using entropy guided transformation learning. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS (LNAI), vol. 5190, pp. 143–152. Springer, Heidelberg (2008)
Chapter Google Scholar
Manning, C., Schütze, H.: Foundation of Statistical Natural Language Processing. MIT Press, Cambridge (2000)
Google Scholar
Charniak, E.: Statistical Language Learning. MIT Press, Cambridge (1993)
Google Scholar
Freitas, A.A.,, I.: A survey of evolutionary algorithms for data mining and knowledge discovery, pp. 819–845. Springer-Verlag New York, Inc., New York (2003)
Google Scholar
Greene, D.P., Smith, S.F.: Competition-based induction of decision models from examples. Machine Learning 13, 229–257 (1993)
Article Google Scholar
Giordana, A., Neri, F.: Search-intensive concept induction. Evol. Comput. 3, 375–416 (1995)
Article Google Scholar
de Jong, K.A., Spears, W.M., Gordon, D.F.: Using genetic algorithms for concept learning. Machine Learning 13, 161–188 (1993), doi:10.1023/A:1022617912649
Article Google Scholar
Janikow, C.Z.: A knowledge-intensive genetic algorithm for supervised learning. Machine Learning 13, 189–228 (1993), doi:10.1007/BF00993043
Article Google Scholar
Poli, R.: A simple but theoretically-motivated method to control bloat in genetic programming. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E.P.K., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 43–76. Springer, Heidelberg (2003)
Chapter Google Scholar
Noda, E., Freitas, A., Lopes, H.: Discovering interesting prediction rules with a genetic algorithm. In: Proceedings of the 1999 Congress on Evolutionary Computation, CEC 1999, vol. 2, 3 vol. (xxxvii+2348) (1999)
Google Scholar
Nelson, F.W., Kučera, H.: Manual of information to accompany a standard corpus of present-day edited american english, for use with digital computers. Technical report, Dep. of Linguistics, Brown University (1979)
Google Scholar
Marcus, M.P., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19, 313–330 (1994)
Google Scholar
Hindle, D.: Acquiring disambiguation rules from text (1989)
Google Scholar

Download references

Author information

Authors and Affiliations

Escola Superior de Tecnologia do Instituto Politécnico de Castelo Branco, Castelo Branco, Portugal
Ana Paula Silva & Arlindo Silva
Universidade de Évora, Évora, Portugal
Irene Rodrigues

Authors

Ana Paula Silva
View author publications
You can also search for this author in PubMed Google Scholar
Arlindo Silva
View author publications
You can also search for this author in PubMed Google Scholar
Irene Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ana Paula Silva .

Editor information

Editors and Affiliations

University Paris-Est Créteil (UPEC), Créteil, France
Kurosh Madani
Departamento de Engenharia Informatica, University of Coimbra, Coimbra, Portugal
António Dourado Correia
Evolutionary Systems and Biomedical Engineering Lab, Instituto Superior Tecnico IST Systems and Robotics Institute, Lisboa, Portugal
Agostinho Rosa
Polytechnic Institute of Setúbal INSTICC, Setubal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Silva, A.P., Silva, A., Rodrigues, I. (2015). An Approach to the POS Tagging Problem Using Genetic Algorithms. In: Madani, K., Correia, A., Rosa, A., Filipe, J. (eds) Computational Intelligence. IJCCI 2012. Studies in Computational Intelligence, vol 577. Springer, Cham. https://doi.org/10.1007/978-3-319-11271-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-11271-8_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11270-1
Online ISBN: 978-3-319-11271-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics