Multi-turn Inference Matching Network for Natural Language Inference

  • Chunhua Liu
  • Shan Jiang
  • Hainan Yu
  • Dong YuEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11109)


Natural Language Inference (NLI) is a fundamental and challenging task in Natural Language Processing (NLP). Most existing methods only apply one-pass inference process on a mixed matching feature, which is a concatenation of different matching features between a premise and a hypothesis. In this paper, we propose a new model called Multi-turn Inference Matching Network (MIMN) to perform multi-turn inference on different matching features. In each turn, the model focuses on one particular matching feature instead of the mixed matching feature. To enhance the interaction between different matching features, a memory component is employed to store the history inference information. The inference of each turn is performed on the current matching feature and the memory. We conduct experiments on three different NLI datasets. The experimental results show that our model outperforms or achieves the state-of-the-art performance on all the three datasets.


Natural Language Inference Multi-turn inference Memory mechanism 



This work is funded by Beijing Advanced Innovation for Language Resources of BLCU, the Fundamental Research Funds for the Central Universities in BLCU (No. 17PT05) and Graduate Innovation Fund of BLCU (No. 18YCX010).


  1. 1.
    Abadi, M., et al.: Tensorflow: a system for large-scale machine learning (2016)Google Scholar
  2. 2.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR,abs/1409.0473, September 2014Google Scholar
  3. 3.
    Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: EMNLP (2015)Google Scholar
  4. 4.
    Chen, Q., Zhu, X.D., Ling, Z.H., Wei, S., Jiang, H., Inkpen, D.: Recurrent neural network-based sentence encoder with gated attention for natural language inference. In: EMNLP (2017)Google Scholar
  5. 5.
    Chen, Q., Zhu, X., Ling, Z.H., Wei, S., Jiang, H., Inkpen, D.: Enhanced LSTM for natural language inference. In: ACL, Vancouver, pp. 1657–1668, July 2017Google Scholar
  6. 6.
    Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. In: EMNLP, Austin, pp. 551–561, November 2016Google Scholar
  7. 7.
    Dagan, I., Glickman, O., Magnini, B.: The PASCAL recognising textual entailment challenge. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 177–190. Springer, Heidelberg (2006). Scholar
  8. 8.
    Ghaeini, R., et al.: DR-BiLSTM: dependent reading bidirectional LSTM for natural language inference. arXiv preprint arXiv:1802.05577 (2018)
  9. 9.
    Glickman, O., Dagan, I.: A probabilistic setting and lexical cooccurrence model for textual entailment. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, EMSEE 2005 (2005)Google Scholar
  10. 10.
    Gong, Y., Luo, H., Zhang, J.: Natural language inference over interaction space. In: ICLR (2018)Google Scholar
  11. 11.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)CrossRefGoogle Scholar
  12. 12.
    Khot, T., Sabharwal, A., Clark, P.: SciTail: a textual entailment dataset from science question answering. In: AAAI (2018)Google Scholar
  13. 13.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR,abs/1412.6980 (2014)Google Scholar
  14. 14.
    Lai, A., Bisk, Y., Hockenmaier, J.: Natural language inference from multiple premises. In: IJCNLP, pp. 100–109 (2017)Google Scholar
  15. 15.
    Liu, P., Qiu, X., Chen, J., Huang, X.: Deep fusion LSTMs for text semantic matching. In: ACL, pp. 1034–1043 (2016)Google Scholar
  16. 16.
    Liu, P., Qiu, X., Zhou, Y., Chen, J., Huang, X.: Modelling interaction of sentence pair with coupled-LSTMs. In: EMNLP (2016)Google Scholar
  17. 17.
    Liu, Y., Sun, C., Lin, L., Wang, X.: Learning natural language inference using bidirectional LSTM model and inner-attention. CoRR,abs/1605.09090 (2016)Google Scholar
  18. 18.
    Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP, pp. 1412–1421 (2015)Google Scholar
  19. 19.
    MacCartney, B., Galley, M., Manning, C.D.: A phrase-based alignment model for natural language inference. In: EMNLP 2008 (2008)Google Scholar
  20. 20.
    Marelli, M., Menini, S., Baroni, M., Bentivogli, L., bernardi, R., Zamparelli, R.: A sick cure for the evaluation of compositional distributional semantic models. In: LREC (2014)Google Scholar
  21. 21.
    McCann, B., Bradbury, J., Xiong, C., Socher, R.: Learned in translation: contextualized word vectors. arXiv preprint arXiv:1708.00107 (2017)
  22. 22.
    Mou, L., et al.: Natural language inference by tree-based convolution and heuristic matching. In: ACL (2016)Google Scholar
  23. 23.
    Munkhdalai, T., Yu, H.: Neural semantic encoders. In: EACL, pp. 397–407 (2016)Google Scholar
  24. 24.
    Nie, Y., Bansal, M.: Shortcut-stacked sentence encoders for multi-domain inference. In: RepEval@EMNLP (2017)Google Scholar
  25. 25.
    Parikh, A., Täckström, O., Das, D., Uszkoreit, J.: A decomposable attention model for natural language inference. In: EMNLP, Austin, pp. 2249–2255, November 2016Google Scholar
  26. 26.
    Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: EMNLP (2014)Google Scholar
  27. 27.
    Rocktäschel, T., Grefenstette, E., Hermann, K.M., Kocisky, T., Blunsom, P.: Reasoning about entailment with neural attention. In: ICLR (2016)Google Scholar
  28. 28.
    Sha, L., Chang, B., Sui, Z., Li, S.: Reading and thinking: re-read LSTM unit for textual entailment recognition. In: COLING, pp. 2870–2879, December 2016Google Scholar
  29. 29.
    Shen, T., Zhou, T., Long, G., Jiang, J., Pan, S., Zhang, C.: DiSAN: directional self-attention network for RNN/CNN-free language understanding. CoRR,abs/1709.04696 (2017)Google Scholar
  30. 30.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: ACL, Beijing, pp. 1556–1566, July 2015Google Scholar
  32. 32.
    Tay, Y., Tuan, L.A., Hui, S.C.: A compare-propagate architecture with alignment factorization for natural language inference. arXiv preprint arXiv:1801.00102 (2017)
  33. 33.
    Wang, S., Jiang, J.: A compare-aggregate model for matching text sequences. CoRR,abs/1611.01747 (2016)Google Scholar
  34. 34.
    Wang, S., Jiang, J.: Learning natural language inference with LSTM. In: NAACL, San Diego, pp. 1442–1451, June 2016Google Scholar
  35. 35.
    Wang, W., Yang, N., Wei, F., Chang, B., Zhou, M.: Gated self-matching networks for reading comprehension and question answering. In: ACL, pp. 189–198 (2017)Google Scholar
  36. 36.
    Wang, Z., Wael, H., Radu, F.: Bilateral multi-perspective matching for natural language sentences. In: IJCAI, pp. 4144–4150 (2017)Google Scholar
  37. 37.
    Weston, J., Chopra, S., Bordes, A.: Memory networks. In: ICLR (2015)Google Scholar
  38. 38.
    Yu, H., Munkhdalai, T.: Neural tree indexers for text understanding. In: ACL, vol. 1, pp. 11–21 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Beijing Language and Culture UniversityBeijingChina
  2. 2.Beijing Advanced Innovation for Language Resources of BLCUBeijingChina

Personalised recommendations