Abstract
We describe BBN’s contribution to the machine translation (MT) task in the LoReHLT 2016 evaluation, focusing on the techniques and methodologies employed to build the Uyghur–English MT systems in low-resource conditions. In particular, we discuss the data selection process, morphological segmentation of the source, neural network feature models, and our use of a native informant and related language resources. Our final submission for the evaluation was ranked first among all participants.
Similar content being viewed by others
Notes
LDC catalogue number LDC2011T07.
References
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceedings of the international conference on learning representations
Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: ACL ’05 Proceedings of the 43rd annual meeting on association for computational linguistics, Association for Computational Linguistics, Stroudsburg, PA, pp 263–270. doi:10.3115/1219840.1219873
Chiang D, Knight K, Wang W (2009) 11,001 new features for statistical machine translation. In: NAACL ’09: proceedings of the 2009 human language technology conference of the North American chapter of the association for computational linguistics, pp 218–226
Devlin J (2009) Lexical features for statistical machine translation. Master’s thesis, University of Maryland
Devlin J, Matsoukas S (2012) Trait-based hypothesis selection for machine translation. In: NAACL HLT ’12 proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: human language technologies, Association for Computational Linguistics, Stroudsburg, PA, pp 528–532. http://dl.acm.org/citation.cfm?id=2382029.2382107
Devlin J, Zbib R, Huang Z, Lamar T, Schwartz R, Makhoul J (2014) Fast and robust neural network joint models for statistical machine translation. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (vol 1: Long Papers), Association for Computational Linguistics, Baltimore, Maryland, pp 1370–1380. http://www.aclweb.org/anthology/P14-1129
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. ArXiv e-prints 1705:03122
Grönroos SA, Virpioja S, Smit P, Kurimo M (2014) Morfessor flatcat: an HMM-based method for unsupervised and semi-supervised learning of morphology. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: Technical Papers, Dublin City University and Association for Computational Linguistics, Dublin, pp 1177–1185. http://www.aclweb.org/anthology/C14-1111
Ha TL, Niehues J, Waibel A (2016) Toward multilingual neural machine translation with universal encoder and decoder. In: Proceedings of the 13th international workshop on spoken language translation
Haghighi A, Blitzer J, DeNero J, Klein D (2009) Better word alignments with supervised ITG models. In: Proceedings of ACL, Association for Computational Linguistics, Suntec, pp 923–931. http://www.aclweb.org/anthology/P/P09/P09-1104
Johnson M, Schuster M, Le QV, Krikun M, Wu Y, Chen Z, Thorat N, Vigas F, Wattenberg M, Corrado G, Hughes M, Dean J (2016) Google’s multilingual neural machine translation system: enabling zero-shot translation. arxiv:1611.04558
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the 2003 human language technology conference of the North American chapter of the association for computational linguistics, Edmonton, pp 48–54
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51
Papineni K, Roukos S, Ward T, Zhu W (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics (ACL), Philadelphia, PA
Rosti AI, Ayan NF, Xiang B, Matsoukas S, Schwartz R, Dorr B (2007) Combining outputs from multiple machine translation systems. In: Proceedings of the 2007 human language technology conference of the North American chapter of the association for computational linguistics, Rochester, NY
Rosti AI, Zhang B, Matsoukas S, Schwartz R (2010) BBN system description for WMT10 system combination task. In: ACL 2010 joint fifth workshop on statistical machine translation and metrics MATR, Uppsala
Sennrich R, Haddow B, Birch A (2016) Edinburgh neural machine translation systems for WMT 16. In: Proceedings of the first conference on machine translation
Setiawan H, Huang Z, Devlin J, Lamar T, Zbib R, Schwartz R, Makhoul J (2015) Statistical machine translation features with multitask tensor networks. In: Proceedings of ACL, Association for Computational Linguistics
Shen L, Xu J, Weischedel R (2008) A new string-to-dependency machine translation algorithm with a target dependency language model. In: Proceedings of the 46th annual meeting of the association for computational linguistics (ACL), Columbus, Ohio, pp 577–585
Shen L, Xu J, Weischedel R (2010) String-to-dependency statistical machine translation. Comput Linguist 36(4):649–671
Stallard D, Devlin J, Kayser M, Lee YK, Barzilay R (2012) Unsupervised morphology rivals supervised morphology for Arabic MT. In: The 50th annual meeting of the association for computational linguistics, Proceedings of the conference, vol 2: short papers, Jeju Island, Korea, 8–14 July 2012, pp 322–327. http://www.aclweb.org/anthology/P12-2063
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, ukasz Kaiser, Gouws S, Kato Y, Kudo T, Kazawa H, Stevens K, Kurian G, Patil N, Wang W, Young C, Smith J, Riesa J, Rudnick A, Vinyals O, Corrado G, Hughes M, Dean J (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR arxiv:1609.08144,
Xu H, Marcus M, Ungar L, Yang C (2017) Unsupervised morphology learning with statistical paradigms (Unpublished)
Acknowledgements
This work was supported by DARPA/I2O under the LORELEI program. The views, opinions, and/or findings contained in this article are those of the author and should not be interpreted as representing the official views or policies, either expressed or implied, of the Defense Advanced Research Projects Agency or the Department of Defense.
Author information
Authors and Affiliations
Corresponding author
Additional information
Work done while Hendra Setiawan was at Raytheon BBN Technologies.
This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-15-C-0113. The views, opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. Distribution Statement ‘A’ (Approved for Public Release by DARPA on Aug 29, 2017 (DISTAR Approval #28392), Distribution Unlimited).
Rights and permissions
About this article
Cite this article
Setiawan, H., Huang, Z. & Zbib, R. BBN’s low-resource machine translation for the LoReHLT 2016 evaluation. Machine Translation 32, 45–57 (2018). https://doi.org/10.1007/s10590-017-9206-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-017-9206-2