Skip to main content

An Approach to the POS Tagging Problem Using Genetic Algorithms

  • Conference paper
Computational Intelligence (IJCCI 2012)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 577))

Included in the following conference series:

Abstract

The automatic part-of-speech tagging is the process of automatically assigning to the words of a text a part-of-speech (POS) tag. The words of a language are grouped into grammatical categories that represent the function that they might have in a sentence. These grammatical classes (or categories) are usually called part-of-speech. However, in most languages, there are a large number of words that can be used in different ways, thus having more than one possible part-of-speech. To choose the right tag for a particular word, a POS tagger must consider the surrounding words’ part-of-speeches. The neighboring words could also have more than one possible way to be tagged. This means that, in order to solve the problem, we need a method to disambiguate a word’s possible tags set. In this work, we modeled the part-of-speech tagging problem as a combinatorial optimization problem, which we solve using a genetic algorithm. The search for the best combinatorial solution is guided by a set of disambiguation rules that we first discovered using a classification algorithm, that also includes a genetic algorithm. Using rules to disambiguate the tagging, we were able to generalize the context information present on the training tables adopted by approaches based on probabilistic data. We were also able to incorporate other type of information that helps to identify a word’s grammatical class. The results obtained on two different corpora are amongst the best ones published.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Steven Bird, E.K., Loper, E.: Natural Language Processing with Python. O’Reilly Media (2009)

    Google Scholar 

  2. Brants, T.: Tnt: a statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, ANLC 2000, pp. 224–231. Association for Computational Linguistics, Stroudsburg (2000)

    Google Scholar 

  3. Araujo, L.: Part-of-speech tagging with evolutionary algorithms. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 230–239. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Araujo, L.: Symbiosis of evolutionary techniques and statistical natural language processing. IEEE Transactions on Evolutionary Computation 8, 14–27 (2004)

    Article  Google Scholar 

  5. Araujo, L.: How evolutionary algorithms are applied to statistical natural language processing. Artificial Intelligence Review 28, 275–303 (2007)

    Article  Google Scholar 

  6. Araujo, L., Luque, G., Alba, E.: Metaheuristics for natural language tagging. In: Deb, K., Tari, Z. (eds.) GECCO 2004. LNCS, vol. 3102, pp. 889–900. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  7. Alba, E., Luque, G., Araujo, L.: Natural language tagging with genetic algorithms. Information Processing Letters 100, 173–182 (2006)

    Article  MathSciNet  Google Scholar 

  8. Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21, 543–565 (1995)

    MathSciNet  Google Scholar 

  9. Wilson, G., Heywood, M.: Use of a genetic algorithm in brill’s transformation-based part-of-speech tagger. In: Proceedings of the 2005 Conference on Genetic and Evolutionary Computation, GECCO 2005, pp. 2067–2073. ACM, New York (2005)

    Google Scholar 

  10. Nogueira Dos Santos, C., Milidiú, R.L., Rentería, R.P.: Portuguese part-of-speech tagging using entropy guided transformation learning. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS (LNAI), vol. 5190, pp. 143–152. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Manning, C., Schütze, H.: Foundation of Statistical Natural Language Processing. MIT Press, Cambridge (2000)

    Google Scholar 

  12. Charniak, E.: Statistical Language Learning. MIT Press, Cambridge (1993)

    Google Scholar 

  13. Freitas, A.A.,, I.: A survey of evolutionary algorithms for data mining and knowledge discovery, pp. 819–845. Springer-Verlag New York, Inc., New York (2003)

    Google Scholar 

  14. Greene, D.P., Smith, S.F.: Competition-based induction of decision models from examples. Machine Learning 13, 229–257 (1993)

    Article  Google Scholar 

  15. Giordana, A., Neri, F.: Search-intensive concept induction. Evol. Comput. 3, 375–416 (1995)

    Article  Google Scholar 

  16. de Jong, K.A., Spears, W.M., Gordon, D.F.: Using genetic algorithms for concept learning. Machine Learning 13, 161–188 (1993), doi:10.1023/A:1022617912649

    Article  Google Scholar 

  17. Janikow, C.Z.: A knowledge-intensive genetic algorithm for supervised learning. Machine Learning 13, 189–228 (1993), doi:10.1007/BF00993043

    Article  Google Scholar 

  18. Poli, R.: A simple but theoretically-motivated method to control bloat in genetic programming. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E.P.K., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 43–76. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  19. Noda, E., Freitas, A., Lopes, H.: Discovering interesting prediction rules with a genetic algorithm. In: Proceedings of the 1999 Congress on Evolutionary Computation, CEC 1999, vol. 2, 3 vol. (xxxvii+2348) (1999)

    Google Scholar 

  20. Nelson, F.W., Kučera, H.: Manual of information to accompany a standard corpus of present-day edited american english, for use with digital computers. Technical report, Dep. of Linguistics, Brown University (1979)

    Google Scholar 

  21. Marcus, M.P., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19, 313–330 (1994)

    Google Scholar 

  22. Hindle, D.: Acquiring disambiguation rules from text (1989)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ana Paula Silva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Silva, A.P., Silva, A., Rodrigues, I. (2015). An Approach to the POS Tagging Problem Using Genetic Algorithms. In: Madani, K., Correia, A., Rosa, A., Filipe, J. (eds) Computational Intelligence. IJCCI 2012. Studies in Computational Intelligence, vol 577. Springer, Cham. https://doi.org/10.1007/978-3-319-11271-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11271-8_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11270-1

  • Online ISBN: 978-3-319-11271-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics