Abstract
Lexical and morphological ambiguities present a serious challenge in rule-based machine translation (RBMT). This chapter describes an approach to resolve morphologically ambiguous verb forms if a rule-based decision is not possible due to parsing or tagging errors. The rule-based core system has a set of rules to decide, based on context information, which verb form should be generated in the target language. However, if the parse tree is not correct, part of the context information might be missing and the rules cannot make a safe decision. In this case, we use a classifier to assign a verb form. We tested the classifier on a set of four texts, increasing the correct verb forms in the translation from 78.68 %, with the purely rule-based disambiguation, to 95.11 % with the hybrid approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Extracted from the Spanish part of Multilingual Central Repository 3.0 (Gonzalez-Agirre et al. 2012).
- 3.
Extracted from the AnCora verb lexicon (Taulé et al. 2008).
- 4.
Abbreviations used: \(\begin{array}{ll} \mbox{ Acc: accusative} & \mbox{ Add: additive (`too,also')} \\ \mbox{ Ag: agentive} & \mbox{ Ben: benefactive (`for')} \\ \mbox{ Con: connective (`and')} & \mbox{ Dir: directional} \\ \mbox{ DirE: direct evidentiality} & \mbox{ DS: different subject } \\ \mbox{ Gen: genitive} & \mbox{ Imp: imperative} \\ \mbox{ Inch: inchoative} & \mbox{ Loc: locative}\\ \mbox{ Neg: negation} & \mbox{ Obl: obligative} \\ \mbox{ Perf: perfect} & \mbox{ Poss: possessive} \\ \mbox{ Prog: progressive} & \mbox{ Pst: past} \\ \mbox{ Rflx: reflexive} & \mbox{ Sg: singular} \\ \mbox{ SS: same subject} & \mbox{ Top: topic}\\ \end{array}\)
- 5.
Double marking of negation in (6.b): ama: negation particle in imperative clauses (‘don’t’), -chu: negation suffix, attached to the constituent in focus.
- 6.
- 7.
- 8.
Note that, although IULA contains more than twice as many sentences as AnCora, the sentences in IULA are mostly short, simple sentences, without subordinated clauses.
- 9.
In our previous setting with Naïve Bayes, we achieved only 81 % accuracy, but we had a smaller training set of only ∼ 7300 instances.
- 10.
The first verb to the left or right that is not an auxiliary and with no conjunction or relative pronoun between them.
- 11.
- 12.
- 13.
- 14.
The Spanish reflexive se is a device to render a transitive verb intransitive.
- 15.
http://es.wikipedia.org/wiki/ retrieved 11.01.2014.
References
Adelaar, W.F.H., and P. Muysken. 2004. The languages of the Andes. Cambridge language surveys. Cambridge: Cambridge University Press.
Alegria, I., A. Casillas, A. Díaz de Ilarraza, J. Iguartua, G. Labaka, M. Lersundi, A. Mayor, and K. Sarasola. 2008. Mixing approaches to MT for Basque: Selecting the best output from RBMT, EBMT and SMT. In Proceedings of the MATMT2008 Workshop: Mixing Approaches to Machine Translation.
Chang, C.C., and C.J. Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(3):27:1–27:27.
Cusihuamán, A.G. 2001. Gramática Quechua: Cuzco-Collao, 2nd ed. Serie Saber Andino. Lima: Ministerio de Educación.
Dedenbach-Salazar Sáenz, S., U. von Gleich, R. Hartmann, P. Masson, and C. Soto Ruiz. 2002. Rimaykullayki - Unterrichtsmaterialien zum Quechua Ayacuchano, 4th ed. Berlin: Dietrich Reimer Verlag GmbH.
Eisele, A., C. Federmann, H. Uszkoreit, H. Saint-Amand, M. Kay, M. Jellinghaus, S. Hunsicker, T. Herrmann, and Y. Chen. 2008. Hybrid machine translation architectures within and beyond the EuroMatrix project. In Proceedings of the European Machine Translation Conference EAMT, European Association for Machine Translation, 27–34.
España-Bonet, C., G. Labaka, A. Díaz de Ilarraza, L. Màrquez, and K. Sarasola. 2011. Hybrid machine translation guided by a rule-based system. In Proceedings of the 13th Machine Translation Summit, Xiamen, 554–561.
Gonzalez-Agirre, A., E. Laparra, and G. Rigau. 2012. Multilingual central repository version 3.0: Upgrading a very large lexical knowledge base. In Proceedings of the Sixth International Global WordNet Conference (GWC’12), Matsue.
Hunsicker, S., Y. Chen, and C. Federmann. 2012. Machine learning for hybrid machine translation. In Proceedings of the Seventh Workshop on Statistical Machine Translation, Montreal, 312–316.
Marimon, M., N. Seghezzi, and N. Bel. 2007. An open-source Lexicon for Spanish. Procesamiento del Lenguaje Natural 39:131–137.
Marimon, M., B. Fisas, N. Bel, B. Arias, S. Vázquez, J. Vivaldi, S. Torner, M. Villegas, and M. Lorente. 2012. The IULA treebank. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul.
Melero, M., A. Oliver, T. Badia, and T. Suñol. 2007. Dealing with bilingual divergences in MT using target language N-gram models. In Proceedings of the METIS-II Workshop: New Approaches to Machine Translation, Leuven, 19–26.
Oepen, S., E. Velldal, J.T. Lønning, P. Meurer, V. Rosén, and D. Flickinger. 2007. Towards hybrid quality-oriented machine translation. On linguistics and probabilities in MT. In Proceedings of Theoretical and Methodological Issues in Machine Translation, Skövde.
Rios, A., and A. Göhring. 2013. Machine learning disambiguation of Quechua verb morphology. In Proceedings of the Second Workshop on Hybrid Approaches to Translation, Sofia, 13–18.
Rudnick, A., and M. Gasser. 2013. Lexical selection for hybrid MT with sequence labeling. In Proceedings of the Second Workshop on Hybrid Approaches to Translation, Sofia, 102–108.
Sawaf. H. 2010. Arabic dialect handling in hybrid machine translation. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas.
Smith, J., and S. Clark. 2009. EBMT for SMT: A new EBMT-SMT hybrid. In Proceedings of the 3rd Workshop on ExampleBased Machine Translation, 3–10.
Taulé, M., M.A. Martí, and M. Recasens. 2008. AnCora: Multilevel annotated corpora for Catalan and Spanish. In Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech.
Valderrama Fernández, R., and C. Escalante Gutiérrez. 1982. Gregorio Condori Mamani: Autobiografía. Cuzco: Centro Bartolomé de las Casas.
Acknowledgements
This research is funded by the Swiss National Science Foundation under grant 100015_132219/1.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Rios, A., Göhring, A. (2016). Machine Learning Applied to Rule-Based Machine Translation. In: Costa-jussà, M., Rapp, R., Lambert, P., Eberle, K., Banchs, R., Babych, B. (eds) Hybrid Approaches to Machine Translation. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-21311-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-21311-8_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21310-1
Online ISBN: 978-3-319-21311-8
eBook Packages: Computer ScienceComputer Science (R0)