Learning Accurate Integer Transformer Machine-Translation Models

Wu, Ephrem

doi:10.1007/s42979-021-00688-4

Learning Accurate Integer Transformer Machine-Translation Models

Original Research
Published: 22 May 2021

Volume 2, article number 291, (2021)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Ephrem Wu ORCID: orcid.org/0000-0002-9131-2813¹

392 Accesses
2 Citations
4 Altmetric
Explore all metrics

Abstract

We describe a method for training accurate Transformer machine-translation models to run inference using 8-bit integer (INT8) hardware matrix multipliers, as opposed to the more costly single-precision floating-point (FP32) hardware. Unlike previous work, which converted only 85 Transformer matrix multiplications to INT8, leaving 48 out of 133 of them in FP32 because of unacceptable accuracy loss, we convert them all to INT8 without compromising accuracy. Tested on the newstest2014 English-to-German translation task, our INT8 Transformer Base and Transformer Big models yield BLEU scores that are 99.3–100% relative to those of the corresponding FP32 models. Our approach converts all matrix-multiplication tensors from an existing FP32 model into INT8 tensors by automatically making range-precision trade-offs during training. To demonstrate the robustness of this approach, we also include results from INT6 Transformer models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

A review on the long short-term memory model

Article 13 May 2020

MOSS: An Open Conversational Large Language Model

Article 20 May 2024

Notes

The threshold scalar may be limited to an integer power of 2 so that multiplication by an integer matrix becomes an arithmetic shift operation.

References

Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In: Bengio Y, LeCun Y, editors. 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings (2015).
Barrault L, Bojar O, Costa-jussà MR, Federmann C, Fishel M, Graham Y, Haddow B, Huck M, Koehn P, Malmasi S, Monz C, Müller M, Pal S, Post M, Zampieri M. Findings of the 2019 conference on machine translation (WMT19). In: Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), p. 1–61. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/W19-5301, https://www.aclweb.org/anthology/W19-5301.
Bengio Y, Léonard N, Courville AC. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv:1308.3432. Accessed Jan 2020.
Bhandare A, Sripathi V, Karkada D, Menon V, Choi S, Datta K, Saletore V. Efficient 8-bit quantization of transformer neural machine language translation model. CoRR abs/1906.00532. 2019. arXiv:1906.00532.
Bojar O, Chatterjee R, Federmann C, Fishel M, Graham Y, Haddow B, Huck M, Jimeno-Yepes A, Koehn P, Monz C, Negri M, Névéol A, Neves ML, Post M, Specia L, Turchi M, Verspoor K. editors. Proceedings of the Third Conference on Machine Translation: Shared Task Papers, WMT 2018, Belgium, Brussels, October 31–November 1, 2018. Association for Computational Linguistics (2018). https://www.aclweb.org/anthology/volumes/W18-64/.
Bojar O, Federmann C, Fishel M, Graham Y, Haddow B, Koehn P, Monz C. Findings of the 2018 conference on machine translation (WMT18). In: Proceedings of the Third Conference on Machine Translation: Shared Task Papers, p. 272–303. Association for Computational Linguistics, Belgium, Brussels (2018). https://doi.org/10.18653/v1/W18-6401, https://www.aclweb.org/anthology/W18-6401.
He H, Huang G, Yuan Y. Asymmetric valleys: beyond sharp and flat local minima. In: Wallach H, Larochelle H, Beygelzimer A, d’ Alché-Buc F, Fox E, Garnett R, editors. Advances in Neural Information Processing Systems, 32, pp. 2549–60. Curran Associates, Inc. (2019). http://papers.nips.cc/paper/8524-asymmetric-valleys-beyond-sharp-and-flat-local-minima.pdf.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016; p. 770–8. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.90.
Horowitz M. Computing’s energy problem (and what we can do about it). In: 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014. pp. 10–14. https://doi.org/10.1109/ISSCC.2014.6757323.
Huang Y, Cheng Y, Bapna A, Firat O, Chen D, Chen M, Lee H, Ngiam J, Le QV, Wu Y, Che, Z. Gpipe: efficient training of giant neural networks using pipeline parallelism. In: Wallach H, Larochelle H, Beygelzimer A, d’ Alché-Buc F, Fox E, Garnett R, editors. Advances in Neural Information Processing Systems, 32, pp. 103–12. Curran Associates, Inc. 2019. http://papers.nips.cc/paper/8305-gpipe-efficient-training-of-giant-neural-networks-using-pipeline-parallelism.pdf.
Jain SR, Gural A, Wu M, Dick C. Trained uniform quantization for accurate and efficient neural network inference on fixed-point hardware. 2019. arXiv:1903.08066. Accessed Jan 2020.
Junczys-Dowmunt M, Grundkiewicz R, Dwojak T, Hoang H, Heafield K, Neckermann T, Seide F, Germann U, Fikri Aji A, Bogoychev N, Martins AFT, Birch A. Marian: fast neural machine translation in C++. In: Proceedings of ACL 2018, System Demonstrations, pp. 116–21. Association for Computational Linguistics, Melbourne, Australia, 2018. http://www.aclweb.org/anthology/P18-4020.
Junczys-Dowmunt M, Heafield K, Hoang H, Grundkiewicz R, Aue A: Marian: cost-effective high-quality neural machine translation in C++. In: Birch A, Finch AM, Luong M, Neubig G, Oda Y, editors. Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, NMT@ACL 2018, Melbourne, Australia, July 20, 2018, pp. 129–35. Association for Computational Linguistics, 2018. https://www.aclweb.org/anthology/W18-2716/.
Popel M, Bojar O. Training tips for the transformer model. 2018. arXiv:1804.00247. Accessed Jan 2020.
Post M. A call for clarity in reporting BLEU scores. 2018. arXiv:1804.08771. Accessed Jan 2020.
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI Technical Report. 2019. https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf. Accessed Jan 2020.
Raffel C, Shazeer N, Roberts A, Lee Katherine, Narang S, Matena M, Zhou Y, Li W, Liu P J. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 2020;21(40):1–67. http://jmlr.org/papers/v21/20-074.html.
Shazeer N, Stern M. Adafactor: adaptive learning rates with sublinear memory cost. In: Dy JG, Krause A, editors. Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10–15, 2018, Proceedings of Machine Learning Research, vol. 80, pp. 4603–11. PMLR, 2018. http://proceedings.mlr.press/v80/shazeer18a.html.
Tillmann C, Ney H. Word reordering and a dynamic programming beam search algorithm for statistical machine translation. Comput Linguist. 2003;29(1):97–133. https://doi.org/10.1162/089120103321337458, https://www.aclweb.org/anthology/J03-1005.
Vaswani A, Bengio S, Brevdo E, Chollet F, Gomez AN, Gouws S, Jones L, Kaiser L, Kalchbrenner N, Parmar N, Sepass, R, Shazeer N, Uszkoreit J. Tensor2Tensor for neural machine translation. In: Cherry C, Neubig G, editors. Proceedings of the 13th Conference of the Association for Machine Translation in the Americas, AMTA 2018, Boston, MA, USA, March 17-21, 2018 - Volume 1: Research Papers, pp. 193–9. Association for Machine Translation in the Americas (2018).
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4–9 December 2017, Long Beach, CA, USA, 2017; p. 5998–6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.
Wang Q, Li B, Xiao T, Zhu J, Li C, Wong DF, Chao LS. Learning deep transformer models for machine translation. In: Korhonen A, Traum DR, Màrquez L, editors. Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pp. 1810–22. Association for Computational Linguistics (2019). https://www.aclweb.org/anthology/P19-1176/.
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, Kaiser L, Gouws S, Kato Y, Kudo T, Kazawa H, Stevens K, Kurian G, Patil N, Wang W, Young C, Smith J, Riesa J, Rudnick A, Vinyals O, Corrado G, Hughes M, Dean J. Google’s neural machine translation system: bridging the gap between human and machine translation. (2016). arXiv:1609.08144.

Download references

Author information

Authors and Affiliations

Xilinx, Inc., San Jose, USA
Ephrem Wu

Authors

Ephrem Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ephrem Wu.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, E. Learning Accurate Integer Transformer Machine-Translation Models. SN COMPUT. SCI. 2, 291 (2021). https://doi.org/10.1007/s42979-021-00688-4

Download citation

Received: 13 June 2020
Accepted: 10 May 2021
Published: 22 May 2021
DOI: https://doi.org/10.1007/s42979-021-00688-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Accurate Integer Transformer Machine-Translation Models

Abstract

Access this article

Similar content being viewed by others

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A review on the long short-term memory model

MOSS: An Open Conversational Large Language Model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning Accurate Integer Transformer Machine-Translation Models

Abstract

Access this article

Similar content being viewed by others

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A review on the long short-term memory model

MOSS: An Open Conversational Large Language Model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation