Abstract
Part-of-speech (POS) tagging is a key process for various natural language processing related tasks, in which each word of a sentence is assigned a uniquely interpretable label (called a POS tag). There are many proposed methodologies for this task, such as Hidden Markov Models, Conditional Random Fields, Maximum Entropy classifiers etc. Such methods are primarily intended for English which, in comparison to highly inflectional languages has a relatively small tagset inventory. One of the well-known methods used for large tagset labeling (referred to as morpho-syntactic descriptors or MSDs) is called Tiered Tagging (Tufiş, 1999), (Tufiş and Dragomirescu, 2006) and it exploits a reduced set of tags from which context irrelevant features (e.g. gender information) which can be deduced trough the word form’s flectional analysis are stripped. In our previous work we presented an alternative method to Tiered Tagging, in which we performed multi-class classification with a feed-forward neural network. Our methodology has the advantage that it does not require extensive linguistic knowledge as implied by the previously mentioned approach. We extend our work by testing our tool on Czech and successfully experimenting with a genetic algorithm designed to find a better network topology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71 (1996)
Boros, T., Ion, R., Tufiş, D.: Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language. Accepted for publication in ACL, Sofia, Bulgaria (2013)
Brants, T.: TnT: a statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 224–231. Association for Computational Linguistics (2000)
Calzolari, N., Monachini, M. (eds.): Common Specifications and Notation for Lexicon Encoding and Preliminary Proposal for the Tagsets. MULTEXT Report (March 1995)
Ceausu, A.: Maximum entropy tiered tagging. In: Proceedings of the 11th ESSLLI Student Session, pp. 173–179 (2006)
Erjavec, T., Monachini, M. (eds.): Specifications and Notation for Lexicon Encoding. Deliverable D1.1 F. Multext-East Project COP-106 (1997)
Fischer, M.M., Leung, Y.: A genetic-algorithms based evolutionary computational neural network for modelling spatial interaction dataNeural network for modelling spatial interaction data. The Annals of Regional Science 32(3), 437–458 (1998)
Fiszelew, A., Britos, P., Ochoa, A., Merlino, H., Fernández, E., García-Martínez, R.: Finding optimal neural network architecture using genetic algorithms. Adv. Comput. Sci. Eng. Res. Comput. Sci. 27 (2007)
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001)
Marques, N.C., Lopes, G.P.: A neural network approach to part-of-speech tagging. In: Proceedings of the 2nd Meeting for Computational Processing of Spoken and Written Portuguese, pp. 21–22 (1996)
Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 133–142 (1996)
Samuelsson, C.: Morphological tagging based entirely on Bayesian inference. In: 9th Nordic Conference on Computational Linguistics (June 1993)
Schmid, H.: Part-of-speech tagging with neural networks. In: Proceedings of the 15th Conference on Computational Linguistics, vol. 1, pp. 172–176. Association for Computational Linguistics (August 1994)
Schaffer, J.D., Whitley, D., Eshelman, L.J.: Combinations of genetic algorithms and neural networks: A survey of the state of the art. In: International Workshop on Combinations of Genetic Algorithms and Neural Networks, COGANN 1992, pp. 1–37. IEEE (June 1992)
Tufiş, D., Barbu, A.M., Pătraşcu, V., Rotariu, G., Popescu, C.: Corpora and Corpus-Based Morpho-Lexical Processing. In: Recent Advances in Romanian Language Technology, pp. 35–56. Romanian Academy Publishing House (1997) ISBN 973-27-0626-0
Tufiş, D.: Tiered tagging and combined language models classifiers. In: Matoušek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds.) TSD 1999. LNCS (LNAI), vol. 1692, pp. 28–33. Springer, Heidelberg (1999)
Tufiş, D., Dragomirescu, L.: Tiered tagging revisited. In: Proceedings of the 4th LREC Conference (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Boroş, T., Dumitrescu, S.D. (2013). Improving the RACAI Neural Network MSD Tagger. In: Iliadis, L., Papadopoulos, H., Jayne, C. (eds) Engineering Applications of Neural Networks. EANN 2013. Communications in Computer and Information Science, vol 383. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41013-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-41013-0_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41012-3
Online ISBN: 978-3-642-41013-0
eBook Packages: Computer ScienceComputer Science (R0)