Skip to main content
Log in

Learning Accurate Integer Transformer Machine-Translation Models

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

We describe a method for training accurate Transformer machine-translation models to run inference using 8-bit integer (INT8) hardware matrix multipliers, as opposed to the more costly single-precision floating-point (FP32) hardware. Unlike previous work, which converted only 85 Transformer matrix multiplications to INT8, leaving 48 out of 133 of them in FP32 because of unacceptable accuracy loss, we convert them all to INT8 without compromising accuracy. Tested on the newstest2014 English-to-German translation task, our INT8 Transformer Base and Transformer Big models yield BLEU scores that are 99.3–100% relative to those of the corresponding FP32 models. Our approach converts all matrix-multiplication tensors from an existing FP32 model into INT8 tensors by automatically making range-precision trade-offs during training. To demonstrate the robustness of this approach, we also include results from INT6 Transformer models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. The threshold scalar may be limited to an integer power of 2 so that multiplication by an integer matrix becomes an arithmetic shift operation.

References

  1. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In: Bengio Y, LeCun Y, editors. 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings (2015).

  2. Barrault L, Bojar O, Costa-jussà MR, Federmann C, Fishel M, Graham Y, Haddow B, Huck M, Koehn P, Malmasi S, Monz C, Müller M, Pal S, Post M, Zampieri M. Findings of the 2019 conference on machine translation (WMT19). In: Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), p. 1–61. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/W19-5301, https://www.aclweb.org/anthology/W19-5301.

  3. Bengio Y, Léonard N, Courville AC. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv:1308.3432. Accessed Jan 2020.

  4. Bhandare A, Sripathi V, Karkada D, Menon V, Choi S, Datta K, Saletore V. Efficient 8-bit quantization of transformer neural machine language translation model. CoRR abs/1906.00532. 2019. arXiv:1906.00532.

  5. Bojar O, Chatterjee R, Federmann C, Fishel M, Graham Y, Haddow B, Huck M, Jimeno-Yepes A, Koehn P, Monz C, Negri M, Névéol A, Neves ML, Post M, Specia L, Turchi M, Verspoor K. editors. Proceedings of the Third Conference on Machine Translation: Shared Task Papers, WMT 2018, Belgium, Brussels, October 31–November 1, 2018. Association for Computational Linguistics (2018). https://www.aclweb.org/anthology/volumes/W18-64/.

  6. Bojar O, Federmann C, Fishel M, Graham Y, Haddow B, Koehn P, Monz C. Findings of the 2018 conference on machine translation (WMT18). In: Proceedings of the Third Conference on Machine Translation: Shared Task Papers, p. 272–303. Association for Computational Linguistics, Belgium, Brussels (2018). https://doi.org/10.18653/v1/W18-6401, https://www.aclweb.org/anthology/W18-6401.

  7. He H, Huang G, Yuan Y. Asymmetric valleys: beyond sharp and flat local minima. In: Wallach H, Larochelle H, Beygelzimer A, d’ Alché-Buc F, Fox E, Garnett R, editors. Advances in Neural Information Processing Systems, 32, pp. 2549–60. Curran Associates, Inc. (2019). http://papers.nips.cc/paper/8524-asymmetric-valleys-beyond-sharp-and-flat-local-minima.pdf.

  8. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016; p. 770–8. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.90.

  9. Horowitz M. Computing’s energy problem (and what we can do about it). In: 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014. pp. 10–14. https://doi.org/10.1109/ISSCC.2014.6757323.

  10. Huang Y, Cheng Y, Bapna A, Firat O, Chen D, Chen M, Lee H, Ngiam J, Le QV, Wu Y, Che, Z. Gpipe: efficient training of giant neural networks using pipeline parallelism. In: Wallach H, Larochelle H, Beygelzimer A, d’ Alché-Buc F, Fox E, Garnett R, editors. Advances in Neural Information Processing Systems, 32, pp. 103–12. Curran Associates, Inc. 2019. http://papers.nips.cc/paper/8305-gpipe-efficient-training-of-giant-neural-networks-using-pipeline-parallelism.pdf.

  11. Jain SR, Gural A, Wu M, Dick C. Trained uniform quantization for accurate and efficient neural network inference on fixed-point hardware. 2019. arXiv:1903.08066. Accessed Jan 2020.

  12. Junczys-Dowmunt M, Grundkiewicz R, Dwojak T, Hoang H, Heafield K, Neckermann T, Seide F, Germann U, Fikri Aji A, Bogoychev N, Martins AFT, Birch A. Marian: fast neural machine translation in C++. In: Proceedings of ACL 2018, System Demonstrations, pp. 116–21. Association for Computational Linguistics, Melbourne, Australia, 2018. http://www.aclweb.org/anthology/P18-4020.

  13. Junczys-Dowmunt M, Heafield K, Hoang H, Grundkiewicz R, Aue A: Marian: cost-effective high-quality neural machine translation in C++. In: Birch A, Finch AM, Luong M, Neubig G, Oda Y, editors. Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, NMT@ACL 2018, Melbourne, Australia, July 20, 2018, pp. 129–35. Association for Computational Linguistics, 2018. https://www.aclweb.org/anthology/W18-2716/.

  14. Popel M, Bojar O. Training tips for the transformer model. 2018. arXiv:1804.00247. Accessed Jan 2020.

  15. Post M. A call for clarity in reporting BLEU scores. 2018. arXiv:1804.08771. Accessed Jan 2020.

  16. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI Technical Report. 2019. https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf. Accessed Jan 2020.

  17. Raffel C, Shazeer N, Roberts A, Lee Katherine, Narang S, Matena M, Zhou Y, Li W, Liu P J. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 2020;21(40):1–67. http://jmlr.org/papers/v21/20-074.html.

  18. Shazeer N, Stern M. Adafactor: adaptive learning rates with sublinear memory cost. In: Dy JG, Krause A, editors. Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10–15, 2018, Proceedings of Machine Learning Research, vol. 80, pp. 4603–11. PMLR, 2018. http://proceedings.mlr.press/v80/shazeer18a.html.

  19. Tillmann C, Ney H. Word reordering and a dynamic programming beam search algorithm for statistical machine translation. Comput Linguist. 2003;29(1):97–133. https://doi.org/10.1162/089120103321337458, https://www.aclweb.org/anthology/J03-1005.

  20. Vaswani A, Bengio S, Brevdo E, Chollet F, Gomez AN, Gouws S, Jones L, Kaiser L, Kalchbrenner N, Parmar N, Sepass, R, Shazeer N, Uszkoreit J. Tensor2Tensor for neural machine translation. In: Cherry C, Neubig G, editors. Proceedings of the 13th Conference of the Association for Machine Translation in the Americas, AMTA 2018, Boston, MA, USA, March 17-21, 2018 - Volume 1: Research Papers, pp. 193–9. Association for Machine Translation in the Americas (2018).

  21. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4–9 December 2017, Long Beach, CA, USA, 2017; p. 5998–6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.

  22. Wang Q, Li B, Xiao T, Zhu J, Li C, Wong DF, Chao LS. Learning deep transformer models for machine translation. In: Korhonen A, Traum DR, Màrquez L, editors. Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pp. 1810–22. Association for Computational Linguistics (2019). https://www.aclweb.org/anthology/P19-1176/.

  23. Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, Kaiser L, Gouws S, Kato Y, Kudo T, Kazawa H, Stevens K, Kurian G, Patil N, Wang W, Young C, Smith J, Riesa J, Rudnick A, Vinyals O, Corrado G, Hughes M, Dean J. Google’s neural machine translation system: bridging the gap between human and machine translation. (2016). arXiv:1609.08144.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ephrem Wu.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, E. Learning Accurate Integer Transformer Machine-Translation Models. SN COMPUT. SCI. 2, 291 (2021). https://doi.org/10.1007/s42979-021-00688-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-021-00688-4

Keywords

Navigation