Abstract
Modelling a pair of sentences is important for many NLP tasks such as textual entailment (TE), paraphrase identification (PI), semantic relatedness (SR) and question answer pairing (QAP). Most sentence pair modelling work has looked only at the local context to generate a distributed sentence representation without considering the mutual information found in the other sentence. The proposed attentive encoder uses the representation of one sentence generated by a multi-head transformer encoder to guide the focussing on the most semantically relevant words from the other sentence using multi-branch attention. Evaluating this novel sentence encoder on the TE, PI, SR and QAP tasks shows notable improvements over the standard Transformer encoder as well as other current state-of-the-art models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Marelli, M., et al.: Semeval-2014 task 1: evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation, pp. 1–8 (2014)
Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326 (2015)
Yin, W., et al.: ABCNN: attention-based convolutional neural network for modeling sentence pairs. Trans. Assoc. Comput. Linguist. 4, 259–272 (2016)
He, H., Gimpel, K., Lin, J.: Multi-perspective sentence similarity modeling with convolutional neural networks. In: Proceedings of the 2015 Conference on Empirical Methods of Natural Language Processing, pp. 1576–1586 (2015)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533 (1986)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Chung, J., et al.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Parikh, A.P., et al.: A decomposable attention model for natural language inference. arXiv preprint arXiv:1606.01933 (2016)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122 (2017)
Ba, J.L., et al.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Cer, D., et al.: Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018)
Conneau, A., et al.: Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364 (2017)
Zhou, Y., Liu, C., Pan, Y.: Modelling sentence pairs with tree-structured attentive encoder. arXiv preprint arXiv:1610.02806 (2016)
Lin, Z., et al.: A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130 (2017)
Zhao, H., Lu, Z., Poupart, P.: Self-adaptive hierarchical sentence model. In: IJCAI, pp. 4069–4076 (2015)
Yang, Z., et al.: Hierarchical attention networks for document classification. In: Proceeding of the 2016 Conference of NAACL: HLT, pp. 1480–1489 (2016)
Socher, R., et al.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Advances in Neural Inf. Processing Systems, pp. 801–809 (2011)
Mueller, J., Thyagarajan, A.: Siamese recurrent architectures for learning sentence similarity. In: AAAI, vol. 16, pp. 2786–2792 (2016)
Rocktäschel, T., et al.: Reasoning about entailment with neural attention. arXiv preprint arXiv:1509.06664 (2015)
Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp. 1693–1701 (2015)
Ahmed, K., Keskar, N.S., Socher, R.: Weighted transformer network for machine translation. arXiv preprint arXiv:1711.02132 (2017)
Tai, K.S., et al.: Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075 (2015)
Dolan, B., Quirk, C., Brockett, C.: Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 350 (2004)
Baudis, P., Stanko, S., Sedivy, J.: Joint learning of sentence embeddings for relevance and entailment. arXiv preprint arXiv:1605.04655 (2016)
Kiros, R., et al.: Skip-thought vectors. In: Advances in Neural Information Processing Systems, pp. 3294–3302 (2015)
Lai, A., et al.: Illinois-LH: a denotational and distributional approach to semantics. In: Proceedings of the 8th International Workshop on Semantic Evaluation, pp. 329–334 (2014)
Jimenez, S., et al.: UNAL-NLP: combining soft cardinality features for semantic textual similarity, relatedness and entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 732–742 (2014)
Zhao, J., et al.: ECNU: one stone two birds: ensemble of heterogenous measures for semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 271–277 (2014)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ahmed, M., Mercer, R.E. (2019). Efficient Transformer-Based Sentence Encoding for Sentence Pair Modelling. In: Meurs, MJ., Rudzicz, F. (eds) Advances in Artificial Intelligence. Canadian AI 2019. Lecture Notes in Computer Science(), vol 11489. Springer, Cham. https://doi.org/10.1007/978-3-030-18305-9_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-18305-9_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18304-2
Online ISBN: 978-3-030-18305-9
eBook Packages: Computer ScienceComputer Science (R0)