Comparing Two Markov Methods for Part-of-Speech Tagging of Portuguese

Kepler, Fábio N.; Finger, Marcelo

doi:10.1007/11874850_52

Comparing Two Markov Methods for Part-of-Speech Tagging of Portuguese

Fábio N. Kepler²¹ &
Marcelo Finger²¹

Conference paper

911 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4140))

Abstract

There is a wide variety of statistical methods applied to Part-of-Speech (PoS) tagging, that associate words in a text to their corresponding PoS. The majority of those methods analyse a fixed, small neighborhood of words imposing some form of Markov restriction. In this work we implement and compare a fixed length hidden Markov model (HMM) with a variable length Markov chain (VLMC); the latter is, in principle, capable of detecting long distance dependencies. We show that the VLMC model performs better in terms of accuracy and almost equally in terms of tagging time, also doing very well in training time. However, the VLMC method actually fails to capture really long distance dependencies, and we analyse the reasons for such behaviour.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–285 (1989)
Article Google Scholar
Bühlmann, P., Wyner, A.J.: Variable length markov chains. Annals of Statistics 27(2), 480–513 (1999)
Article MATH MathSciNet Google Scholar
Mächler, M., Bühlmann, P.: Variable length markov chains: Methodology, computing and software. Research Report 104, Eidgenossische Technische Hochschule (ETH), CH-8091 Zürich, Switzerland (2002) Seminar fur Statistik
Google Scholar
Rissanen, J.: A universal data compression system. IEEE Trans. Inform. Theory IT-29, 656–664 (1983)
Google Scholar
IEL-UNICAMP and IME-USP: Corpus Anotado do Português Histórico Tycho Brahe, Acessado em 2005 (2005)
Google Scholar
Church, K.W.: A stochastic parts program and noun phrase parser for unrestricted text. In: Proceedings of the second conference on Applied natural language processing, Austin, Texas, Association for Computational Linguistics, pp. 136–143 (1988)
Google Scholar
DeRose, S.J.: Grammatical category disambiguation by statistical optimization. Computational Linguistics 14, 31–39 (1988)
Google Scholar
Brill, E.: Unsupervised learning of disambiguation rules for part of speech tagging. In: Yarovsky, D., Church, K. (eds.) Proceedings of the Third Workshop on Very Large Corpora, Somerset, New Jersey, Association for Computational Linguistics, pp. 1–13 (1995)
Google Scholar
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19(2), 313–330 (1994)
Google Scholar
Alves, C.D.C., Finger, M.: Etiquetagem do português clássico baseada em córpora. In: Proceedings of IV Encontro para o Processamento Computacional da Língua Portuguesa Escrita e Falada (PROPOR 1999), Évora, Portugal, pp. 21–22 (1999)
Google Scholar
Finger, M.: Técnicas de otimização da precisão empregadas no etiquetador Tycho Brahe. In: Proceedings of V Encontro para o Processamento Computacional da Língua Portuguesa Escrita e Falada (PROPOR 2000), Atibaia, Brazil, pp. 19–22 (2000)
Google Scholar
Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of the Empirical Methods in Natural Language Processing Conference, University of Pennsylvania (1996)
Google Scholar
Brants, T.: Tnt – a statistical part-of-speech tagger. In: Proceedings of the Sixth Applied Natural Language Processing Conference (ANLP-2000), Seattle, WA (2000)
Google Scholar
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL 2003, pp. 252–259 (2003)
Google Scholar
Aires, R.V.X.: Implementação, adaptação, combinação e avaliação de etiquetadores para o português do brasil. Dissertação de mestrado, Instituto de Ciências Matemáticas e Computação, Universidade de São Paulo - Campus São Carlos (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Mathematics and Statistics, University of São Paulo (USP),
Fábio N. Kepler & Marcelo Finger

Authors

Fábio N. Kepler
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Finger
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Laboratório de Técnicas Inteligentes (LTI) Escola Politécnica (EP), Universidade de São Paulo (USP),
Jaime Simão Sichman
Dep. de Informática, Universidade de Lisboa, Campo Grande, 1749-016, Lisboa, Portugal
Helder Coelho
Institute of Mathematics and Computer Science, Department of Computer Science, University of São Paulo,, Av. Trabalhador Sao-Carlense, 400, Centro, CP: 668, 13560-970, São Carlos, SP, Brazil
Solange Oliveira Rezende

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kepler, F.N., Finger, M. (2006). Comparing Two Markov Methods for Part-of-Speech Tagging of Portuguese. In: Sichman, J.S., Coelho, H., Rezende, S.O. (eds) Advances in Artificial Intelligence - IBERAMIA-SBIA 2006. IBERAMIA SBIA 2006 2006. Lecture Notes in Computer Science(), vol 4140. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11874850_52

Download citation

DOI: https://doi.org/10.1007/11874850_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45462-5
Online ISBN: 978-3-540-45464-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics