Skip to main content
Log in

BBN’s low-resource machine translation for the LoReHLT 2016 evaluation

  • Published:
Machine Translation

Abstract

We describe BBN’s contribution to the machine translation (MT) task in the LoReHLT 2016 evaluation, focusing on the techniques and methodologies employed to build the Uyghur–English MT systems in low-resource conditions. In particular, we discuss the data selection process, morphological segmentation of the source, neural network feature models, and our use of a native informant and related language resources. Our final submission for the evaluation was ranked first among all participants.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. LDC catalogue number LDC2011T07.

  2. http://dict.yulghun.com.

References

  • Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceedings of the international conference on learning representations

  • Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: ACL ’05 Proceedings of the 43rd annual meeting on association for computational linguistics, Association for Computational Linguistics, Stroudsburg, PA, pp 263–270. doi:10.3115/1219840.1219873

  • Chiang D, Knight K, Wang W (2009) 11,001 new features for statistical machine translation. In: NAACL ’09: proceedings of the 2009 human language technology conference of the North American chapter of the association for computational linguistics, pp 218–226

  • Devlin J (2009) Lexical features for statistical machine translation. Master’s thesis, University of Maryland

  • Devlin J, Matsoukas S (2012) Trait-based hypothesis selection for machine translation. In: NAACL HLT ’12 proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: human language technologies, Association for Computational Linguistics, Stroudsburg, PA, pp 528–532. http://dl.acm.org/citation.cfm?id=2382029.2382107

  • Devlin J, Zbib R, Huang Z, Lamar T, Schwartz R, Makhoul J (2014) Fast and robust neural network joint models for statistical machine translation. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (vol 1: Long Papers), Association for Computational Linguistics, Baltimore, Maryland, pp 1370–1380. http://www.aclweb.org/anthology/P14-1129

  • Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. ArXiv e-prints 1705:03122

  • Grönroos SA, Virpioja S, Smit P, Kurimo M (2014) Morfessor flatcat: an HMM-based method for unsupervised and semi-supervised learning of morphology. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: Technical Papers, Dublin City University and Association for Computational Linguistics, Dublin, pp 1177–1185. http://www.aclweb.org/anthology/C14-1111

  • Ha TL, Niehues J, Waibel A (2016) Toward multilingual neural machine translation with universal encoder and decoder. In: Proceedings of the 13th international workshop on spoken language translation

  • Haghighi A, Blitzer J, DeNero J, Klein D (2009) Better word alignments with supervised ITG models. In: Proceedings of ACL, Association for Computational Linguistics, Suntec, pp 923–931. http://www.aclweb.org/anthology/P/P09/P09-1104

  • Johnson M, Schuster M, Le QV, Krikun M, Wu Y, Chen Z, Thorat N, Vigas F, Wattenberg M, Corrado G, Hughes M, Dean J (2016) Google’s multilingual neural machine translation system: enabling zero-shot translation. arxiv:1611.04558

  • Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the 2003 human language technology conference of the North American chapter of the association for computational linguistics, Edmonton, pp 48–54

  • Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51

    Article  MATH  Google Scholar 

  • Papineni K, Roukos S, Ward T, Zhu W (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics (ACL), Philadelphia, PA

  • Rosti AI, Ayan NF, Xiang B, Matsoukas S, Schwartz R, Dorr B (2007) Combining outputs from multiple machine translation systems. In: Proceedings of the 2007 human language technology conference of the North American chapter of the association for computational linguistics, Rochester, NY

  • Rosti AI, Zhang B, Matsoukas S, Schwartz R (2010) BBN system description for WMT10 system combination task. In: ACL 2010 joint fifth workshop on statistical machine translation and metrics MATR, Uppsala

  • Sennrich R, Haddow B, Birch A (2016) Edinburgh neural machine translation systems for WMT 16. In: Proceedings of the first conference on machine translation

  • Setiawan H, Huang Z, Devlin J, Lamar T, Zbib R, Schwartz R, Makhoul J (2015) Statistical machine translation features with multitask tensor networks. In: Proceedings of ACL, Association for Computational Linguistics

  • Shen L, Xu J, Weischedel R (2008) A new string-to-dependency machine translation algorithm with a target dependency language model. In: Proceedings of the 46th annual meeting of the association for computational linguistics (ACL), Columbus, Ohio, pp 577–585

  • Shen L, Xu J, Weischedel R (2010) String-to-dependency statistical machine translation. Comput Linguist 36(4):649–671

    Article  Google Scholar 

  • Stallard D, Devlin J, Kayser M, Lee YK, Barzilay R (2012) Unsupervised morphology rivals supervised morphology for Arabic MT. In: The 50th annual meeting of the association for computational linguistics, Proceedings of the conference, vol 2: short papers, Jeju Island, Korea, 8–14 July 2012, pp 322–327. http://www.aclweb.org/anthology/P12-2063

  • Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, ukasz Kaiser, Gouws S, Kato Y, Kudo T, Kazawa H, Stevens K, Kurian G, Patil N, Wang W, Young C, Smith J, Riesa J, Rudnick A, Vinyals O, Corrado G, Hughes M, Dean J (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR arxiv:1609.08144,

  • Xu H, Marcus M, Ungar L, Yang C (2017) Unsupervised morphology learning with statistical paradigms (Unpublished)

Download references

Acknowledgements

This work was supported by DARPA/I2O under the LORELEI program. The views, opinions, and/or findings contained in this article are those of the author and should not be interpreted as representing the official views or policies, either expressed or implied, of the Defense Advanced Research Projects Agency or the Department of Defense.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rabih Zbib.

Additional information

Work done while Hendra Setiawan was at Raytheon BBN Technologies.

This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-15-C-0113. The views, opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. Distribution Statement ‘A’ (Approved for Public Release by DARPA on Aug 29, 2017 (DISTAR Approval #28392), Distribution Unlimited).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Setiawan, H., Huang, Z. & Zbib, R. BBN’s low-resource machine translation for the LoReHLT 2016 evaluation. Machine Translation 32, 45–57 (2018). https://doi.org/10.1007/s10590-017-9206-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-017-9206-2

Keywords

Navigation