Skip to main content

Machine Learning Applied to Rule-Based Machine Translation

  • Chapter
  • First Online:
Hybrid Approaches to Machine Translation

Abstract

Lexical and morphological ambiguities present a serious challenge in rule-based machine translation (RBMT). This chapter describes an approach to resolve morphologically ambiguous verb forms if a rule-based decision is not possible due to parsing or tagging errors. The rule-based core system has a set of rules to decide, based on context information, which verb form should be generated in the target language. However, if the parse tree is not correct, part of the context information might be missing and the rules cannot make a safe decision. In this case, we use a classifier to assign a verb form. We tested the classifier on a set of four texts, increasing the correct verb forms in the translation from 78.68 %, with the purely rule-based disambiguation, to 95.11 % with the hybrid approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://tiny.uzh.ch/xc.

  2. 2.

    Extracted from the Spanish part of Multilingual Central Repository 3.0 (Gonzalez-Agirre et al. 2012).

  3. 3.

    Extracted from the AnCora verb lexicon (Taulé et al. 2008).

  4. 4.

    Abbreviations used: \(\begin{array}{ll} \mbox{ Acc: accusative} & \mbox{ Add: additive (`too,also')} \\ \mbox{ Ag: agentive} & \mbox{ Ben: benefactive (`for')} \\ \mbox{ Con: connective (`and')} & \mbox{ Dir: directional} \\ \mbox{ DirE: direct evidentiality} & \mbox{ DS: different subject } \\ \mbox{ Gen: genitive} & \mbox{ Imp: imperative} \\ \mbox{ Inch: inchoative} & \mbox{ Loc: locative}\\ \mbox{ Neg: negation} & \mbox{ Obl: obligative} \\ \mbox{ Perf: perfect} & \mbox{ Poss: possessive} \\ \mbox{ Prog: progressive} & \mbox{ Pst: past} \\ \mbox{ Rflx: reflexive} & \mbox{ Sg: singular} \\ \mbox{ SS: same subject} & \mbox{ Top: topic}\\ \end{array}\)

  5. 5.

    Double marking of negation in (6.b): ama: negation particle in imperative clauses (‘don’t’), -chu: negation suffix, attached to the constituent in focus.

  6. 6.

    http://clic.ub.edu/corpus/en/ancora.

  7. 7.

    http://www.iula.upf.edu/recurs01_tbk_uk.htm.

  8. 8.

    Note that, although IULA contains more than twice as many sentences as AnCora, the sentences in IULA are mostly short, simple sentences, without subordinated clauses.

  9. 9.

    In our previous setting with Naïve Bayes, we achieved only 81 % accuracy, but we had a smaller training set of only ∼ 7300 instances.

  10. 10.

    The first verb to the left or right that is not an auxiliary and with no conjunction or relative pronoun between them.

  11. 11.

    http://www.camara-alemana.org.pe/Publicaciones/MIGEdiciones/2010MEMORIA2009.pdf.

  12. 12.

    http://www.camara-alemana.org.pe/Publicaciones/MIGEdiciones/2010MEMORIA-JAHRESBE RICHT2009x.pdf.

  13. 13.

    http://www.inforesources.ch/pdf/focus08_1_s.pdf.

  14. 14.

    The Spanish reflexive se is a device to render a transitive verb intransitive.

  15. 15.

    http://es.wikipedia.org/wiki/ retrieved 11.01.2014.

References

  • Adelaar, W.F.H., and P. Muysken. 2004. The languages of the Andes. Cambridge language surveys. Cambridge: Cambridge University Press.

    Google Scholar 

  • Alegria, I., A. Casillas, A. Díaz de Ilarraza, J. Iguartua, G. Labaka, M. Lersundi, A. Mayor, and K. Sarasola. 2008. Mixing approaches to MT for Basque: Selecting the best output from RBMT, EBMT and SMT. In Proceedings of the MATMT2008 Workshop: Mixing Approaches to Machine Translation.

    Google Scholar 

  • Chang, C.C., and C.J. Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(3):27:1–27:27.

    Google Scholar 

  • Cusihuamán, A.G. 2001. Gramática Quechua: Cuzco-Collao, 2nd ed. Serie Saber Andino. Lima: Ministerio de Educación.

    Google Scholar 

  • Dedenbach-Salazar Sáenz, S., U. von Gleich, R. Hartmann, P. Masson, and C. Soto Ruiz. 2002. Rimaykullayki - Unterrichtsmaterialien zum Quechua Ayacuchano, 4th ed. Berlin: Dietrich Reimer Verlag GmbH.

    Google Scholar 

  • Eisele, A., C. Federmann, H. Uszkoreit, H. Saint-Amand, M. Kay, M. Jellinghaus, S. Hunsicker, T. Herrmann, and Y. Chen. 2008. Hybrid machine translation architectures within and beyond the EuroMatrix project. In Proceedings of the European Machine Translation Conference EAMT, European Association for Machine Translation, 27–34.

    Google Scholar 

  • España-Bonet, C., G. Labaka, A. Díaz de Ilarraza, L. Màrquez, and K. Sarasola. 2011. Hybrid machine translation guided by a rule-based system. In Proceedings of the 13th Machine Translation Summit, Xiamen, 554–561.

    Google Scholar 

  • Gonzalez-Agirre, A., E. Laparra, and G. Rigau. 2012. Multilingual central repository version 3.0: Upgrading a very large lexical knowledge base. In Proceedings of the Sixth International Global WordNet Conference (GWC’12), Matsue.

    Google Scholar 

  • Hunsicker, S., Y. Chen, and C. Federmann. 2012. Machine learning for hybrid machine translation. In Proceedings of the Seventh Workshop on Statistical Machine Translation, Montreal, 312–316.

    Google Scholar 

  • Marimon, M., N. Seghezzi, and N. Bel. 2007. An open-source Lexicon for Spanish. Procesamiento del Lenguaje Natural 39:131–137.

    Google Scholar 

  • Marimon, M., B. Fisas, N. Bel, B. Arias, S. Vázquez, J. Vivaldi, S. Torner, M. Villegas, and M. Lorente. 2012. The IULA treebank. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul.

    Google Scholar 

  • Melero, M., A. Oliver, T. Badia, and T. Suñol. 2007. Dealing with bilingual divergences in MT using target language N-gram models. In Proceedings of the METIS-II Workshop: New Approaches to Machine Translation, Leuven, 19–26.

    Google Scholar 

  • Oepen, S., E. Velldal, J.T. Lønning, P. Meurer, V. Rosén, and D. Flickinger. 2007. Towards hybrid quality-oriented machine translation. On linguistics and probabilities in MT. In Proceedings of Theoretical and Methodological Issues in Machine Translation, Skövde.

    Google Scholar 

  • Rios, A., and A. Göhring. 2013. Machine learning disambiguation of Quechua verb morphology. In Proceedings of the Second Workshop on Hybrid Approaches to Translation, Sofia, 13–18.

    Google Scholar 

  • Rudnick, A., and M. Gasser. 2013. Lexical selection for hybrid MT with sequence labeling. In Proceedings of the Second Workshop on Hybrid Approaches to Translation, Sofia, 102–108.

    Google Scholar 

  • Sawaf. H. 2010. Arabic dialect handling in hybrid machine translation. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas.

    Google Scholar 

  • Smith, J., and S. Clark. 2009. EBMT for SMT: A new EBMT-SMT hybrid. In Proceedings of the 3rd Workshop on ExampleBased Machine Translation, 3–10.

    Google Scholar 

  • Taulé, M., M.A. Martí, and M. Recasens. 2008. AnCora: Multilevel annotated corpora for Catalan and Spanish. In Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech.

    Google Scholar 

  • Valderrama Fernández, R., and C. Escalante Gutiérrez. 1982. Gregorio Condori Mamani: Autobiografía. Cuzco: Centro Bartolomé de las Casas.

    Google Scholar 

Download references

Acknowledgements

This research is funded by the Swiss National Science Foundation under grant 100015_132219/1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Annette Rios .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Rios, A., Göhring, A. (2016). Machine Learning Applied to Rule-Based Machine Translation. In: Costa-jussà, M., Rapp, R., Lambert, P., Eberle, K., Banchs, R., Babych, B. (eds) Hybrid Approaches to Machine Translation. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-21311-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21311-8_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21310-1

  • Online ISBN: 978-3-319-21311-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics