Advertisement

ATNet: Answering Cloze-Style Questions via Intra-attention and Inter-attention

  • Chengzhen FuEmail author
  • Yuntao LiEmail author
  • Yan ZhangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11440)

Abstract

This paper proposes a novel framework, named ATNet, for answering cloze-style questions over documents. Our model, in the encoder phase, projects all contextual embeddings into multiple latent semantic spaces, with representations of each space attending to a specific aspect of semantics. Long-term dependencies among the whole document are captured via the intra-attention module. A gate is produced to control the degree to which the retrieved dependency information is fused and the previous token embedding is exposed. Then, in the interaction phase, the context is aligned with the query across different semantic spaces to achieve the information aggregation. Specifically, we compute inter-attention based on a sophisticated feature set. Experiments and ablation studies demonstrate the effectiveness of ATNet.

Keywords

Question answering Intra-attention Inter-attention 

Notes

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments and helpful suggestions. This work is supported by NSFC under Grant No. 61532001, and MOE-ChinaMobile program under Grant No. MCM20170503.

References

  1. 1.
    Bordes, A., Usunier, N., Chopra, S., Weston, J.: Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075 (2015)
  2. 2.
    Chen, D., Bolton, J., Manning, C.D.: A thorough examination of the CNN/daily mail reading comprehension task. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 2358–2367 (2016)Google Scholar
  3. 3.
    Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 551–561 (2016)Google Scholar
  4. 4.
    Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
  5. 5.
    Cui, Y., Chen, Z., Wei, S., Wang, S., Liu, T., Hu, G.: Attention-over-attention neural networks for reading comprehension. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 593–602 (2017)Google Scholar
  6. 6.
    Dhingra, B., Liu, H., Yang, Z., Cohen, W., Salakhutdinov, R.: Gated-attention readers for text comprehension. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1832–1846 (2017)Google Scholar
  7. 7.
    Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 1693–1701. Curran Associates Inc. (2015). http://papers.nips.cc/paper/5945-teaching-machines-to-read-and-comprehend.pdf
  8. 8.
    Hill, F., Bordes, A., Chopra, S., Weston, J.: The goldilocks principle: reading children’s books with explicit memory representations. arXiv preprint arXiv:1511.02301 (2015)
  9. 9.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  10. 10.
    Kadlec, R., Schmid, M., Bajgar, O., Kleindienst, J.: Text understanding with the attention sum reader network. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 908–918 (2016)Google Scholar
  11. 11.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  12. 12.
    Kumar, A., et al.: Ask me anything: dynamic memory networks for natural language processing. In: International Conference on Machine Learning, pp. 1378–1387 (2016)Google Scholar
  13. 13.
    Lin, Z., et al.: A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130 (2017)
  14. 14.
    Munkhdalai, T., Yu, H.: Neural semantic encoders. In: Proceedings of the Conference Association for Computational Linguistics Meeting, vol. 1, p. 397. NIH Public Access (2017)Google Scholar
  15. 15.
    Paulus, R., Xiong, C., Socher, R.: A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304 (2017)
  16. 16.
    Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)Google Scholar
  17. 17.
    Seo, M., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603 (2016)
  18. 18.
    Shen, Y., Huang, P.S., Gao, J., Chen, W.: Reasonet: learning to stop reading in machine comprehension. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1047–1055. ACM (2017)Google Scholar
  19. 19.
    Sordoni, A., Bachman, P., Trischler, A., Bengio, Y.: Iterative alternating neural attention for machine reading. arXiv preprint arXiv:1606.02245 (2016)
  20. 20.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Sukhbaatar, S., Weston, J., Fergus, R., et al.: End-to-end memory networks. In: Advances in Neural Information Processing Systems, pp. 2440–2448 (2015)Google Scholar
  22. 22.
    Trischler, A., Ye, Z., Yuan, X., Bachman, P., Sordoni, A., Suleman, K.: Natural language comprehension with the epireader. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 128–137 (2016)Google Scholar
  23. 23.
    Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 6000–6010 (2017)Google Scholar
  24. 24.
    Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: Advances in Neural Information Processing Systems, pp. 2692–2700 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Machine IntelligencePeking UniversityBeijingChina

Personalised recommendations