Improving the RACAI Neural Network MSD Tagger

Boroş, Tiberiu; Dumitrescu, Stefan Daniel

doi:10.1007/978-3-642-41013-0_5

Tiberiu Boroş⁴ &
Stefan Daniel Dumitrescu⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 383))

Included in the following conference series:

International Conference on Engineering Applications of Neural Networks

1728 Accesses
1 Citations

Abstract

Part-of-speech (POS) tagging is a key process for various natural language processing related tasks, in which each word of a sentence is assigned a uniquely interpretable label (called a POS tag). There are many proposed methodologies for this task, such as Hidden Markov Models, Conditional Random Fields, Maximum Entropy classifiers etc. Such methods are primarily intended for English which, in comparison to highly inflectional languages has a relatively small tagset inventory. One of the well-known methods used for large tagset labeling (referred to as morpho-syntactic descriptors or MSDs) is called Tiered Tagging (Tufiş, 1999), (Tufiş and Dragomirescu, 2006) and it exploits a reduced set of tags from which context irrelevant features (e.g. gender information) which can be deduced trough the word form’s flectional analysis are stripped. In our previous work we presented an alternative method to Tiered Tagging, in which we performed multi-class classification with a feed-forward neural network. Our methodology has the advantage that it does not require extensive linguistic knowledge as implied by the previously mentioned approach. We extend our work by testing our tool on Czech and successfully experimenting with a genetic algorithm designed to find a better network topology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71 (1996)
Google Scholar
Boros, T., Ion, R., Tufiş, D.: Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language. Accepted for publication in ACL, Sofia, Bulgaria (2013)
Google Scholar
Brants, T.: TnT: a statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 224–231. Association for Computational Linguistics (2000)
Google Scholar
Calzolari, N., Monachini, M. (eds.): Common Specifications and Notation for Lexicon Encoding and Preliminary Proposal for the Tagsets. MULTEXT Report (March 1995)
Google Scholar
Ceausu, A.: Maximum entropy tiered tagging. In: Proceedings of the 11th ESSLLI Student Session, pp. 173–179 (2006)
Google Scholar
Erjavec, T., Monachini, M. (eds.): Specifications and Notation for Lexicon Encoding. Deliverable D1.1 F. Multext-East Project COP-106 (1997)
Google Scholar
Fischer, M.M., Leung, Y.: A genetic-algorithms based evolutionary computational neural network for modelling spatial interaction dataNeural network for modelling spatial interaction data. The Annals of Regional Science 32(3), 437–458 (1998)
Article Google Scholar
Fiszelew, A., Britos, P., Ochoa, A., Merlino, H., Fernández, E., García-Martínez, R.: Finding optimal neural network architecture using genetic algorithms. Adv. Comput. Sci. Eng. Res. Comput. Sci. 27 (2007)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001)
Google Scholar
Marques, N.C., Lopes, G.P.: A neural network approach to part-of-speech tagging. In: Proceedings of the 2nd Meeting for Computational Processing of Spoken and Written Portuguese, pp. 21–22 (1996)
Google Scholar
Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 133–142 (1996)
Google Scholar
Samuelsson, C.: Morphological tagging based entirely on Bayesian inference. In: 9th Nordic Conference on Computational Linguistics (June 1993)
Google Scholar
Schmid, H.: Part-of-speech tagging with neural networks. In: Proceedings of the 15th Conference on Computational Linguistics, vol. 1, pp. 172–176. Association for Computational Linguistics (August 1994)
Google Scholar
Schaffer, J.D., Whitley, D., Eshelman, L.J.: Combinations of genetic algorithms and neural networks: A survey of the state of the art. In: International Workshop on Combinations of Genetic Algorithms and Neural Networks, COGANN 1992, pp. 1–37. IEEE (June 1992)
Google Scholar
Tufiş, D., Barbu, A.M., Pătraşcu, V., Rotariu, G., Popescu, C.: Corpora and Corpus-Based Morpho-Lexical Processing. In: Recent Advances in Romanian Language Technology, pp. 35–56. Romanian Academy Publishing House (1997) ISBN 973-27-0626-0
Google Scholar
Tufiş, D.: Tiered tagging and combined language models classifiers. In: Matoušek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds.) TSD 1999. LNCS (LNAI), vol. 1692, pp. 28–33. Springer, Heidelberg (1999)
Chapter Google Scholar
Tufiş, D., Dragomirescu, L.: Tiered tagging revisited. In: Proceedings of the 4th LREC Conference (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Research Insitute for Artificial Inteligence “Mihai Drăgănescu”, Romanian Academy (RACAI), Romania
Tiberiu Boroş & Stefan Daniel Dumitrescu

Authors

Tiberiu Boroş
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Daniel Dumitrescu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Forestry & Management of the Environment and Natural Resources, Democritus University of Thrace, GR-68200, Orestiada, Hellas
Lazaros Iliadis
Frederick University of Cyprus, Cyprus
Harris Papadopoulos
Faculty of Engineering and Computing, Coventry University, UK
Chrisina Jayne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boroş, T., Dumitrescu, S.D. (2013). Improving the RACAI Neural Network MSD Tagger. In: Iliadis, L., Papadopoulos, H., Jayne, C. (eds) Engineering Applications of Neural Networks. EANN 2013. Communications in Computer and Information Science, vol 383. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41013-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-41013-0_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41012-3
Online ISBN: 978-3-642-41013-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics