Abstract
This survey chapter provides an overview of the recent research in hybrid Machine Translation (MT). The main MT paradigms are sketched and their integration at different levels of depth is described starting with system combination techniques and followed by integration strategies led by rule-based and statistical systems. System combination does not involve any hybrid architecture since it combines translation outputs. It can be done with different granularities that include sentence, sub-sentential and graph-levels. When considering a deeper integration, architectures guided by the rule-based approach introduce statistics to enrich resources, modules or the backbone of the system. Architectures guided by the statistical approach include rules in pre-/post-processing or at a inner level which means including rules or dictionaries in the core system. This chapter overviewing hybrid MT puts in context, introduces, and motivates the subsequent chapters that constitute this book.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
References
Ahsan, A., P. Kolachina, S. Kolachina, D.M. Sharma, and R. Sanga. 2010. Coupling statistical machine translation with rule-based transfer and generation. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas.
Akiba, Y., T. Watanabe, and E. Sumita. 2002. Using language and translation models to select the best among outputs from multiple MT systems. In Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics.
Alegria, I., A. Díaz de Ilarraza, G. Labaka, M. Lersundi, A. Mayor, and K. Sarasola. 2007. Transfer-based MT from Spanish into Basque: Reusability, standardization and open source. Lecture Notes in Computer Science 4394:374–384.
Alegria, I., A. Casillas, A.D.D. Ilarraza, J. Igartua, G. Labaka, M. Lersundi, A. Mayor, K. Sarasola, X. Saralegi, E. Fundazioa, B. Laskurain, and S.L. Eleka. 2008. Mixing approaches to MT for basque: Selecting the best output from RBMT, EBMT and SMT. In MATMT2008 Workshop: Mixing Approaches to Machine Translation, 27–34.
Alonso, J.A., and Thurmair, G. 2003. The comprendium translator system. In Proceedings of MT Summit IX, New Orleans, LA.
Bahdanau, D., K. Cho, and Y. Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
Banerjee, S., and A. Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, 65–72. Ann Arbor, MI: Association for Computational Linguistics.
Bangalore, S., G. Bordel, and G. Riccardi. 2001. Computing consensus translation from multiple machine translation systems. In IEEE Workshop onAutomatic Speech Recognition and Understanding, 2001. ASRU ’01, 351–354. doi:10.1109/ASRU.2001.1034659
Callison-Burch, C., and R.S. Flournoy. 2001. A program for automatically selecting the best output from multiple machine translation engines. In Proceedings of the Machine Translation Summit VIII, 63–66.
Chen, Y., and A. Eisele. 2010. Integrating a rule-based with a hierarchical translation system. In Proceedings of LREC.
Chiang, D. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), 263–270.
Chiang, D. 2007. Hierarchical phrase-based translation. Computational Linguists 33(2):201–228. doi:10.1162/coli.2007.33.2.201. http://dx.doi.org/10.1162/coli.2007.33.2.201.
Collins, M., P. Koehn, and I. Kučerová. 2005. Clause restructuring for statistical machine translation. In Proceedings of the ACL, Ann Arbor, 531–540. doi:10.3115/1219840.1219906.
Collobert, R., J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12:2493–2537. http://dl.acm.org/citation.cfm?id=1953048.2078186.
Costa-jussà, M.R. 2015. How much hybridization does machine translation need? Journal of the Association for Information Science and Technology 6(10):2160–2165.
Costa-jussà, M.R., and J.A.R. Fonollosa. 2006. Statistical machine reordering. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, EMNLP ’06, 70–76. Stroudsburg, PA: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1610075.1610086.
Costa-jussà, M.R., J.M. Crego, D. Vilar, J.A.R. Fonollosa, J.B. Mariño, and H. Ney. 2007. Analysis and system combination of phrase- and n-gram-based statistical machine translation systems. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Companion volume, Short Papers, 137–140.
Costa-jussà, M.R., M. Farrús, J.B. Mariño, and J.A.R. Fonollosa. 2012. Study and comparison of rule-based and statistical Catalan-Spanish machine translation systems. Computing and Informatics 31(2):245–270.
DeNero, J., S. Kumar, C. Chelba, and F. Och. 2010. Model combination for machine translation. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 975–983. Association for Computational Linguistics.
Doddington, G. 2002. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of the 2nd International Conference on Human Language Technology (HLT), San Diego, CA, 138–145.
Duan, N., M. Li, and M. Zhou. 2011. Hypothesis mixture decoding for statistical machine translation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 1258–1267. Association for Computational Linguistics.
Dugast, L., J. Senellart, and P. Koehn. 2009. Selective addition of corpus-extracted phrasal lexical rules to a rule-based machine translation system. In Proceedings of MT Summit XII, Ottawa.
Eisele, A., C. Federmann, H. Saint-Amand, M. Jellinghaus, T. Herrmann, and Y. Chen. 2008. Using moses to integrate multiple rule-based machine translation engi nes into a hybrid system. In Proceedings of the Third Workshop on Statistical Machine Translation, StatMT ’08, 179–182. Stroudsburg, PA: Association for Computational Linguistics.
Eisele, A., C. Federmann, H. Uszkoreit, H. Saint-Amand, M. Kay, M. Jellinghaus, S. Hunsicker, T. Herrmann, and Y. Chen. 2008. Hybrid machine translation architectures within and beyond the euromatrix project. In Hybrid MT methods in practice: Their use in multilingual extraction, cross-language information retrieval, multilingual summarization, and applications in hand-held devices, ed. J. Hutchins, W. Hahn. Proceedings of the European Machine Translation Conference, 27–34. European Association for Machine Translation, HITEC e.V.
Enache, R., C. España-Bonet, A. Ranta, and L. Màrquez. 2012. A hybrid system for patent translation. In Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT12), Trento, 269–276.
España-Bonet, C., G. Labaka, A. Díaz de Ilarraza, L. Màrquez, and K. Sarasola. 2011. Hybrid machine translation guided by a rule-based system. In Proceedings of the 13th Machine Translation Summit (MT-Summit), Xiamen, 554–561.
Farrús, M., M.R. Costa-jussà, J. Mariño, M. Poch, A. Hernández, C. Henríquez, and J. Fonollosa. 2011. Overcoming statistical machine translation limitations: Error analysis and proposed solutions for the Catalan-Spanish language pair. Language Resources and Evaluation 45(2):181–208. doi:10.1007/s10579-011-9137-0.
Federmann, C., and S. Hunsicker. 2011. Stochastic parse tree selection for an existing RBMT system. In Proceedings of the Sixth Workshop on Statistical Machine Translation, 351–357. Edinburgh: Association for Computational Linguistics.
Federmann, C., A. Eisele, Y. Chen, S. Hunsicker, J. Xu, and H. Uszkoreit. 2010. Further experiments with shallow hybrid MT systems. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, 77–81. Uppsala: Association for Computational Linguistics.
Federmann, C., Y. Chen, S. Hunsicker, and R. Wang. 2011. DFKI system combination using syntactic information at ML4HMT-2011. In Proceedings of the International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT 2011) and of the Shared Task on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid Machine Translation (ML4HMT-11), Barcelona, 104–109.
Fiscus, J.G. 1997. A post-processing system to yield reduced word error rates: Recognizeroutput voting error reduction (ROVER). In Proceedings of the Conference on Automatic Speech Recognition and Understanding (ASRU), 347–354.
Formiga, L., A. Hernández, J.B. Mariño, and E. Monte. 2012. Improving English to Spanish out-of-domain translations by morphology generalization and generation. In AMTA Workshop on Monolingual Machine Translation.
Formiga, L., M. Gonzàlez, A. Barrón-Cedeño, J.A.R. Fonollosa, and L. Marquez. 2013. The TALP-UPC approach to system selection: Asiya features and pairwise classification using random forests. In Proceedings of the Eighth Workshop on Statistical Machine Translation, 359–364. Sofia: Association for Computational Linguistics. http://www.aclweb.org/anthology/W13-2244.
Frederking, R., and S. Nirenburg. 1994. Three heads are better than one. In Proceedings of the Fourth Conference on Applied Natural Language Processing, ANLC ’94, 95–100. Stroudsburg, PA: Association for Computational Linguistics. doi:10.3115/974358.974380. http://dx.doi.org/10.3115/974358.974380.
Green, N.D. 2014. Creating hybrid dependency parsers for syntax-based MT. In Hybrid approaches to translation. Berlin: Springer.
Gurrutxaga Hernaiz, A., X. Saralegi Urizar, S. Ugartetxea, and I. Alegría Loinaz. 2006. Elexbi, a basic tool for bilingual term extraction from Spanish-basque parallel corpora. In Atti del XII Congresso Internazionale di Lessicografia, ed. E. Corino, C. Marello, C. Onesti. Torino, 159–165.
Habash, N.Y. 2003. Generation-heavy hybrid machine translation. Ph.D. thesis, College Park, MD. AAI3094491.
Habash, N., B. Dorr, and C. Monz. 2009 Symbolic-to-statistical hybridization: Extending generation-heavy machine translation. Machine Translation 23:23–63.
Han, D., P. Martínez-Gómez, and Y. Miyao. 2014. Syntax-based pre-reordering for Chinese-to-Japanese statistical machine translation. In Hybrid approaches to translation. Berlin: Springer.
Hatakoshi, Y., G. Neubig, S. Sakti, T. Toda, and S. Nakamura. 2014. Rule-based syntactic preprocessing for syntax-based machine translation. In Syntax, semantics and structure in statistical translation, 34.
He, X., M. Yang, J. Gao, P. Nguyen, and R. Moore. 2008. Indirect-hmm-based hypothesis alignment for combining outputs from machine translation systems. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, 98–107. Stroudsburg, PA: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1613715.1613730.
Jayaraman, S., and A. Lavie. 2005. Multi-engine machine translation guided by explicit word matching. In Proceedings of the ACL 2005 on Interactive Poster and Demonstration Sessions, ACLdemo ’05, 101–104. Stroudsburg, PA: Association for Computational Linguistics. doi:10.3115/1225753.1225779. http://dx.doi.org/10.3115/1225753.1225779.
Kalchbrenner, N., and P. Blunsom. 2013. Recurrent continuous translation models. EMNLP 3(39):413.
Kauers, M., S. Vogel, C. Fügen, and A. Waibel. 2002. Interlingua based statistical machine translation. In 7th International Conference on Spoken Language Processing, ICSLP2002 - INTERSPEECH 2002, Denver, CO, September 16–20, 2002. http://www.isca-speech.org/archive/icslp_2002/i02_1909.html.
Khalilov, M., and J.A.R. Fonollosa. 2011. Syntax-based reordering for statistical machine translation. Computer Speech and Language 25(4):761–788. doi:10.1016/j.csl.2011.01.001. http://dx.doi.org/10.1016/j.csl.2011.01.001.
Koehn, P., F.J. Och, and D. Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology- Volume 1, NAACL ’03, 48–54. Stroudsburg, PA: Association for Computational Linguistics. doi:10.3115/1073445.1073462. http://dx.doi.org/10.3115/1073445.1073462.
Koehn, P. 2010. Statistical machine translation. Cambridge: Cambridge University Press.
Labaka, G., C. España-Bonet, L. Màrquez, and K. Sarasola. 2014. A hybrid machine translation architecture guided by syntax. Machine Translation 28:1–35. doi:10.1007/s10590-014-9153-0. http://dx.doi.org/10.1007/s10590-014-9153-0.
Lewis, W., and C. Quirk. 2013. Controlled ascent: Imbuing statistical MT with linguistic knowledge. In Proceedings of the Second Workshop on Hybrid Approaches to Translation, 51–66. Sofia: Association for Computational Linguistics. http://www.aclweb.org/anthology/W13-2809.
Lewis, W.D., C. Quirk, and Q. Gao. 2014. Controlled ascent: Imbuing statistical MT with linguistic knowledge. In Hybrid approaches to translation. Berlin: Springer.
Li, M., N. Duan, D. Zhang, C. Li, and M. Zhou. 2009. Collaborative decoding: Partial hypothesis re-ranking using translation consensus between decoders. In ACL 2009, Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Singapore, 2–7 August 2009, 585–592. http://www.aclweb.org/anthology/P09-1066.
Li, X., Y. Lü, Y. Meng, Q. Liu, and H. Yu. 2011. Feedback selecting of manually acquired rules using automatic evaluation. In Proceedings of the 4th Workshop on Patent Translation, MT Summit XIII, Xiamen, September 2011, 52–59.
Matusov, E., N. Ueffing, and H. Ney. 2006. Computing consensus translation for multiple machine translation systems using enhanced hypothesis alignment. In Proceedings of the EACL.
Mellebeek, B., and J. van Genabith. 2006. Multi-engine machine translation by recursive sentence decomposition. In Proceedings of the 7th Biennial Conference of the Association for Machine Translation in the Americas, 110–118.
Nießen, S., and H. Ney. 2004. Statistical machine translation with scarce resources using morpho-syntactic information. Computational Linguists 30(2):181–204. doi:10.1162/089120104323093285. http://dx.doi.org/10.1162/089120104323093285.
Nießen, S., F.J. Och, G. Leusch, and H. Ney. 2000. An evaluation tool for machine translation: Fast evaluation for MT research. In Proceedings of the 2nd International Conference on Language Resources and Evaluation, Athens, 39–45.
Nomoto, T. 2004. Multi-engine machine translation with voted language model. In Proceedings of the 42Nd Annual Meeting on Association for Computational Linguistics, ACL ’04. Stroudsburg, PA: Association for Computational Linguistics. doi:10.3115/1218955.1219018. http://dx.doi.org/10.3115/1218955.1219018.
Och, F.J. 2002. Statistical machine translation: From single-word models to alignment templates. Ph.D. thesis, RWTH Aachen University, Computer Science Department, RWTH Aachen University, Aachen.
Okita, T., and J. van Genabith. 2012. Minimum bayes risk decoding with enlarged hypothesis space in system combination. In Computational linguistics and intelligent text processing, 40–51. Berlin: Springer.
Okuma, H., H. Yamamoto, and E. Sumita. 2008. Introducing a translation dictionary into phrase-based SMT. IEICE - Transactions on Information and Systems E91-D(7):2051–2057. doi:10.1093/ietisy/e91-d.7.2051.
Pal, S., and S.K. Naskar, 2014. Hybrid word alignment. In Hybrid approaches to translation. Berlin: Springer.
Papineni, K., S. Roukos, T. Ward, and W. Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, 311–318.
Patel, R.N., R. Gupta, P.B. Pimpale, and M. Sasikumar. 2013. Reordering rules for English-Hindi SMT. In Proceedings of the Second Workshop on Hybrid Approaches to Translation, 34–41. Sofia: Association for Computational Linguistics. http://www.aclweb.org/anthology/W13-2807.
Quirk, C., A. Menezes, and C. Cherry. 2005. Dependency treelet translation: Syntactically informed phrasal SMT. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), Association for Computational Linguistics, 271–279. http://www.aclweb.org/anthology/P05-1034.
Ranta, A. 2011 Grammatical framework: Programming with multilingual grammars. Stanford: CSLI Publications. ISBN-10: 1-57586-626-9 (Paper), 1-57586-627-7 (Cloth).
Rios, A., and A. Göhring. 2014. Machine learning applied to rule-based machine translation. In Hybrid approaches to translation. Berlin: Springer.
Rosti, A.V.I., B. Zhang, S. Matsoukas, and R. Schwartz. 2008. Incremental hypothesis alignment for building confusion networks with application to machine translation system combination. In Proceedings of the Third Workshop on Statistical Machine Translation, StatMT ’08, 183–186. Stroudsburg, PA: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1626394.1626423.
Rudolf, R. 2014. Depfix, a tool for automatic rule-based post-editing of SMT. The Prague Bulletin of Mathematical Linguistics 102(1):47–56.
Sánchez-Martínez, F., and M.L. Forcada. 2009. Inferring shallow-transfer machine translation rules from small parallel corpora. Journal of Artificial Intelligence Research 34:605–635.
Sánchez-Martínez, F., M.L. Forcada, and A. Way. 2009. Hybrid rule-based example-based MT: Feeding apertium with sub-sentential translation units. In Proceedings of the 3rd Workshop on Example-Based Machine Translation, Dublin, ed. M.L. Forcada, A. Way, 11–18.
Sánchez-Cartagena, V.M., J.A. Pérez-Ortiz, and F. Sánchez-Martínez. 2016. Integrating rules and dictionaries from shallow-transfer machine translation into phrase-based statistical machine translation. Journal of Artificial Intelligence Research 55:17–61.
Sim, K.C., W.J. Byrne, M.J. Gales, H. Sahbi, and P.C. Woodland. 2007. Consensus network decoding for statistical machine translation system combination. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2007. ICASSP 2007, vol. 4, pp. IV–105. New York: IEEE.
Snover, M., B. Dorr, R. Schwartz, L. Micciulla, and J. Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the Seventh Conference of the Association for Machine Translation in the Americas (AMTA 2006), Cambridge, 223–231.
Tambouratzis, G., M. Vassiliou, and S. Sofianopoulos. 2014. Language-independent hybrid MT: Comparative evaluation of translation quality. In Hybrid approaches to translation. Berlin: Springer.
Toutanova, K., H. Suzuki, and A. Ruopp. 2008. Applying morphology generation models to machine translation. In Proceedings of ACL-08: HLT, 514–522. Columbus, OH: Association for Computational Linguistics. http://www.aclweb.org/anthology/P/P08/P08-1059.
Tyers, F.M., F. Sánchez-Martínez, and M.L. Forcada. 2012. Flexible finite-state lexical selection for rule-based machine translation. In Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, 213–220.
Vauquois, B. 1968. A survey of formal grammars and algorithms for recognition and transformation in mechanical translation. In IFIP Congress (2), 1114–1122.
Vintar, V., and D. Fišer. 2014. Using wordnet-based word sense disambiguation to improve MT performance. In Hybrid approaches to translation. Berlin: Springer.
Vogel, S., and C. Monson. 2004. Augmenting manual dictionaries for statistical machine translation systems. In 2003 Proceedings of LREC, 1593–1596.
Wang, C., M. Collins, and P. Koehn. 2007. Chinese syntactic reordering for statistical machine translation. In EMNLP-CoNLL, ACL, 737–745.
Wolf, P., U. Bernardi, C. Federmann, and S. Hunsicker. 2011. From statistical term extraction to hybrid machine translation. In Proceedings of the 15th Annual Conference of the European Association for Machine Translation. Annual Conference of the European Association for Machine Translation (EAMT-11), Leuven, May 30–31, 225–231. Leuven: European Association for Machine Translation. http://www.mt-archive.info/EAMT-2011-Wolf.pdf.
Wu, D. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics 23(3):377–403.
Xia, F., and M. McCord. 2004. Improving a statistical MT system with automatically learned rewrite patterns. In Proceedings of the 20th International Conference on Computational Linguistics, COLING ’04. Stroudsburg, PA: Association for Computational Linguistics. doi:10.3115/1220355.1220428. http://dx.doi.org/10.3115/1220355.1220428.
Zabokrtsky, Z., J. Ptacek, and P. Pajas. 2008. Tectomt: Highly modular MT system with tectogrammatics used as transfer layer. In Proceedings of the Third Workshop on Statistical Machine Translation, 167–170. Columbus, OH: Association for Computational Linguistics. http://www.aclweb.org/anthology/W08-0325.
Acknowledgements
This work has been partially funded by the Spanish Ministerio de Economía y Competitividad project TACARDI (TIN2012-38523-C02-00) and contract TEC2015-69266-P, and the Seventh Framework Program of the European Commission through the International Outgoing Fellowship Marie Curie Action (IMTraP-2011-29951).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
España-Bonet, C., Costa-jussà, M.R. (2016). Hybrid Machine Translation Overview. In: Costa-jussà, M., Rapp, R., Lambert, P., Eberle, K., Banchs, R., Babych, B. (eds) Hybrid Approaches to Machine Translation. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-21311-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-21311-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21310-1
Online ISBN: 978-3-319-21311-8
eBook Packages: Computer ScienceComputer Science (R0)