Skip to main content

Efficient Transformer-Based Sentence Encoding for Sentence Pair Modelling

  • Conference paper
  • First Online:
Advances in Artificial Intelligence (Canadian AI 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11489))

Included in the following conference series:

Abstract

Modelling a pair of sentences is important for many NLP tasks such as textual entailment (TE), paraphrase identification (PI), semantic relatedness (SR) and question answer pairing (QAP). Most sentence pair modelling work has looked only at the local context to generate a distributed sentence representation without considering the mutual information found in the other sentence. The proposed attentive encoder uses the representation of one sentence generated by a multi-head transformer encoder to guide the focussing on the most semantically relevant words from the other sentence using multi-branch attention. Evaluating this novel sentence encoder on the TE, PI, SR and QAP tasks shows notable improvements over the standard Transformer encoder as well as other current state-of-the-art models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Marelli, M., et al.: Semeval-2014 task 1: evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation, pp. 1–8 (2014)

    Google Scholar 

  2. Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326 (2015)

  3. Yin, W., et al.: ABCNN: attention-based convolutional neural network for modeling sentence pairs. Trans. Assoc. Comput. Linguist. 4, 259–272 (2016)

    Article  Google Scholar 

  4. He, H., Gimpel, K., Lin, J.: Multi-perspective sentence similarity modeling with convolutional neural networks. In: Proceedings of the 2015 Conference on Empirical Methods of Natural Language Processing, pp. 1576–1586 (2015)

    Google Scholar 

  5. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533 (1986)

    Article  Google Scholar 

  6. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  7. Chung, J., et al.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)

  8. Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)

  9. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  10. Parikh, A.P., et al.: A decomposable attention model for natural language inference. arXiv preprint arXiv:1606.01933 (2016)

  11. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  12. Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122 (2017)

  13. Ba, J.L., et al.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)

  14. Cer, D., et al.: Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018)

  15. Conneau, A., et al.: Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364 (2017)

  16. Zhou, Y., Liu, C., Pan, Y.: Modelling sentence pairs with tree-structured attentive encoder. arXiv preprint arXiv:1610.02806 (2016)

  17. Lin, Z., et al.: A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130 (2017)

  18. Zhao, H., Lu, Z., Poupart, P.: Self-adaptive hierarchical sentence model. In: IJCAI, pp. 4069–4076 (2015)

    Google Scholar 

  19. Yang, Z., et al.: Hierarchical attention networks for document classification. In: Proceeding of the 2016 Conference of NAACL: HLT, pp. 1480–1489 (2016)

    Google Scholar 

  20. Socher, R., et al.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Advances in Neural Inf. Processing Systems, pp. 801–809 (2011)

    Google Scholar 

  21. Mueller, J., Thyagarajan, A.: Siamese recurrent architectures for learning sentence similarity. In: AAAI, vol. 16, pp. 2786–2792 (2016)

    Google Scholar 

  22. Rocktäschel, T., et al.: Reasoning about entailment with neural attention. arXiv preprint arXiv:1509.06664 (2015)

  23. Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp. 1693–1701 (2015)

    Google Scholar 

  24. Ahmed, K., Keskar, N.S., Socher, R.: Weighted transformer network for machine translation. arXiv preprint arXiv:1711.02132 (2017)

  25. Tai, K.S., et al.: Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075 (2015)

  26. Dolan, B., Quirk, C., Brockett, C.: Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 350 (2004)

    Google Scholar 

  27. Baudis, P., Stanko, S., Sedivy, J.: Joint learning of sentence embeddings for relevance and entailment. arXiv preprint arXiv:1605.04655 (2016)

  28. Kiros, R., et al.: Skip-thought vectors. In: Advances in Neural Information Processing Systems, pp. 3294–3302 (2015)

    Google Scholar 

  29. Lai, A., et al.: Illinois-LH: a denotational and distributional approach to semantics. In: Proceedings of the 8th International Workshop on Semantic Evaluation, pp. 329–334 (2014)

    Google Scholar 

  30. Jimenez, S., et al.: UNAL-NLP: combining soft cardinality features for semantic textual similarity, relatedness and entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 732–742 (2014)

    Google Scholar 

  31. Zhao, J., et al.: ECNU: one stone two birds: ensemble of heterogenous measures for semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 271–277 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Mahtab Ahmed or Robert E. Mercer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ahmed, M., Mercer, R.E. (2019). Efficient Transformer-Based Sentence Encoding for Sentence Pair Modelling. In: Meurs, MJ., Rudzicz, F. (eds) Advances in Artificial Intelligence. Canadian AI 2019. Lecture Notes in Computer Science(), vol 11489. Springer, Cham. https://doi.org/10.1007/978-3-030-18305-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18305-9_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18304-2

  • Online ISBN: 978-3-030-18305-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics