Skip to main content
Log in

TAGNet: a tiny answer-guided network for conversational question generation

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Conversational Question Generation (CQG) aims to generate conversational questions with the given passage and conversation history. Previous work of CQG presumes a contiguous span as the answer and generates a question targeting it. However, this limits the application scenarios because answers in practical conversations are usually abstractive free-form text instead of extractive spans. In addition, most state-of-the-art CQG systems are based on pretrained language models consisting of hundreds of millions of parameters, bringing challenges to real-life applications due to latency and capacity constraints. To elegantly address these problems, in this work, we introduce the Tiny Answer-Guided Network (TAGNet) based on the lightweight module (Bi-LSTM) for CQG. We explicitly take the target answers as input, which interacts with the passages and conversation history in the encoder and guides the question generation through the gated attention mechanism in the decoder. Besides, we distill the knowledge from larger pretrained language models into our smaller network to make the trade-off between performance and efficiency. Experimental results show that our TAGNet achieves a comparable performance with large pretrained language models (retaining \(95.9\%\) of teacher performance) while using \(5.7\times\) fewer parameters and \(10.4\times\) faster inference latency. TAGNet outperforms the previous best-performing model with similar parameter size by a large margin, and further analysis shows that TAGNet generates more answer-specific conversational questions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availibility Statement

The CoQA dataset supporting Tables 12345678 is available at https://stanfordnlp.github.io/coqa. The QuAC dataset supporting Table 9 is available at https://quac.ai/.

Notes

  1. B for the first token of the rationale sentence, I for other tokens in the sentence, and O for others.

  2. This model achieves 74.4 F1 scores on the CoQA development set.

References

  1. Reddy S, Chen D, Manning CD (2019) Coqa: A conversational question answering challenge. Trans Assoc Comput Linguist 7:249–266

    Article  Google Scholar 

  2. Huang H-Y, Choi E, Yih W-t (2019) FlowQA: Grasping flow in history for conversational machine comprehension. In: International Conference on Learning Representations

  3. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 4171–4186

  4. Dong L, Yang N, Wang W, Wei F, Liu X, Wang Y, Gao J, Zhou M, Hon H-W (2019) Unified language model pre-training for natural language understanding and generation. In: Advances in Neural Information Processing Systems, pp 13042–13054

  5. Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) Xlnet: Generalized autoregressive pretraining for language understanding. In: Advances in Neural Information Processing Systems, pp 5754–5764

  6. Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692

  7. Choi E, He H, Iyyer M, Yatskar M, Yih W-t, Choi Y, Liang P, Zettlemoyer L (2018) QuAC: Question answering in context. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 2174–2184

  8. Li J, Liu M, Kan M, Zheng Z, Wang Z, Lei W, Liu T, Qin B (2020) Molweni: A challenge multiparty dialogues-based machine reading comprehension dataset with discourse structure. In: Scott D, Bel N, Zong C (eds) Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8-13, pp. 2642–2652. International Committee on Computational Linguistics. https://doi.org/10.18653/v1/2020.coling-main.238

  9. Gao Y, Li P, King I, Lyu MR (2019) Interconnected question generation with coreference alignment and conversation flow modeling. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 4853–4862

  10. Pan B, Li H, Yao Z, Cai D, Sun H (2019) Reinforced dynamic reasoning for conversational question generation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 2114–2124

  11. Nakanishi M, Kobayashi T, Hayashi Y (2019) Towards answer-unaware conversational question generation. In: Proceedings of the 2nd Workshop on Machine Reading for Question Answering, pp 63–71

  12. Qi P, Zhang Y, Manning CD (2020) Stay hungry, stay focused: Generating informative and specific questions in information-seeking conversations. arXiv preprint arXiv:2004.14530

  13. Gu J, Mirshekari M, Yu Z, Sisto A (2021) Chaincqg: Flow-aware conversational question generation. In: Merlo, P., Tiedemann, J., Tsarfaty, R. (eds.) Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, EACL 2021, Online, April 19 - 23, 2021, pp. 2061–2070. Association for Computational Linguistics, ??? . https://doi.org/10.18653/v1/2021.eacl-main.177

  14. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners

  15. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  16. Hinton GE, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. CoRR abs/1503.02531arXiv:1503.02531

  17. Jiao X, Yin Y, Shang L, Jiang X, Chen X, Li L, Wang F, Liu Q (2019) Tinybert: Distilling BERT for natural language understanding. CoRR abs/1909.10351arXiv:1909.10351

  18. Wang W, Wei F, Dong L, Bao H, Yang N, Zhou M (2020) Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. In: Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H (eds) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual. https://proceedings.neurips.cc/paper/2020/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

  19. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 5998–6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

  20. Heilman M, Smith NA (2010) Good question! statistical ranking for question generation. In: Human Language Technologies. In: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp 609–617

  21. Tang D, Duan N, Qin T, Yan Z, Zhou M (2017) Question answering and question generation as dual tasks. arXiv preprint arXiv:1706.02027

  22. Zhu H, Dong L, Wei F, Wang W, Qin B, Liu T (2019) Learning to ask unanswerable questions for machine reading comprehension. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 4238–4248

  23. Du X, Shao J, Cardie C (2017) Learning to ask: Neural question generation for reading comprehension. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1342–1352

  24. Zhou Q, Yang N, Wei F, Tan C, Bao H, Zhou M (2017) Neural question generation from text: A preliminary study. In: National CCF Conference on Natural Language Processing and Chinese Computing, pp 662–671 . Springer

  25. Rajpurkar P, Zhang J, Lopyrev K, Liang P (2016) SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp 2383–2392

  26. Yuan X, Wang T, Gulcehre C, Sordoni A, Bachman P, Zhang S, Subramanian S, Trischler A (2017) Machine comprehension by text-to-text neural question generation. In: Proceedings of the 2nd Workshop on Representation Learning for NLP, pp 15–25

  27. Du X, Cardie C (2018) Harvesting paragraph-level question-answer pairs from Wikipedia. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1907–1917

  28. Zhao Y, Ni X, Ding Y, Ke Q (2018) Paragraph-level neural question generation with maxout pointer and gated self-attention networks. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 3901–3910

  29. Sun X, Liu J, Lyu Y, He W, Ma Y, Wang S (2018) Answer-focused and position-aware neural question generation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 3930–3939

  30. Song L, Wang Z, Hamza W, Zhang Y, Gildea D (2018) Leveraging context information for natural question generation. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp 569–574

  31. Kim Y, Lee H, Shin J, Jung K (2019) Improving neural question generation using answer separation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp 6602–6609

  32. Puri R, Spring R, Patwary M, Shoeybi M, Catanzaro B (2020) Training question answering models from synthetic data. arXiv preprint arXiv:2002.09599

  33. Bao H, Dong L, Wei F, Wang W, Yang N, Liu X, Wang Y, Gao J, Piao S, Zhou M, Hon H (2020) Unilmv2: Pseudo-masked language models for unified language model pre-training. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp 642–652. PMLR, ??? . http://proceedings.mlr.press/v119/bao20a.html

  34. Wang Y, Liu C, Huang M, Nie L (2018) Learning to ask questions in open-domain conversational systems with typed decoders. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 2193–2203

  35. Ba J, Caruana R (2014) Do deep nets really need to be deep? In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp. 2654–2662. https://proceedings.neurips.cc/paper/2014/hash/ea8fcd92d59581717e06eb187f10666d-Abstract.html

  36. Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2015) Fitnets: Hints for thin deep nets. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. http://arxiv.org/abs/1412.6550

  37. Zagoruyko S, Komodakis N (2017) Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, ??? . https://openreview.net/forum?id=Sks9_ajex

  38. Sanh V, Debut L, Chaumond J, Wolf T (2019) Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108arXiv:1910.01108

  39. Wang Z, Wang W, Zhu H, Liu M, Qin B, Wei F (2021) Distilled dual-encoder model for vision-language understanding. CoRR abs/2112.08723arXiv:2112.08723

  40. Kim Y, Rush AM (2016) Sequence-level knowledge distillation. In: Su J, Carreras X, Duh K (eds) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, pp 317–1327. The Association for Computational Linguistics, ???. https://doi.org/10.18653/v1/d16-1139

  41. Shleifer S, Rush AM (2020) Pre-trained summarization distillation. arXiv preprint arXiv:2010.13002

  42. Zhang S, Zhang X, Bao H, Wei F (2022) Attention temperature matters in abstractive summarization distillation. In: Muresan, S., Nakov, P., Villavicencio, A. (eds.) Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pp 127–141. Association for Computational Linguistics, ??? . https://aclanthology.org/2022.acl-long.11

  43. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473

  44. See A, Liu PJ, Manning CD (2017) Get to the point: Summarization with pointer-generator networks. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1073–1083

  45. Bao H, Dong L, Wang W, Yang N, Wei F (2021) s2s-ft: Fine-tuning pretrained transformer encoders for sequence-to-sequence learning. CoRR abs/2110.13640arXiv:2110.13640

  46. Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 532–1543

  47. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958

    MathSciNet  MATH  Google Scholar 

  48. Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159

    MathSciNet  MATH  Google Scholar 

  49. Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp 311–318

  50. Denkowski M, Lavie A (2014) Meteor universal: Language specific translation evaluation for any target language. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp 376–380

  51. Lin C-Y (2004) ROUGE: A package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp 74–81

  52. Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21:140–114067

    MathSciNet  MATH  Google Scholar 

  53. Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2020) BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Jurafsky D, Chai J, Schluter N, Tetreault JR (eds) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, pp. 7871–7880. Association for Computational Linguistics, ???. https://doi.org/10.18653/v1/2020.acl-main.703

  54. Zaheer M, Guruganesh G, Dubey KA, Ainslie J, Alberti C, Ontañón S, Pham P, Ravula A, Wang Q, Yang L, Ahmed A (2020) Big bird: Transformers for longer sequences. In: Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H (eds) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual. https://proceedings.neurips.cc/paper/2020/hash/c8512d142a2d849725f31a9a7a361ab9-Abstract.html

  55. Richardson M, Burges CJC, Renshaw E (2013) Mctest: A challenge dataset for the open-domain machine comprehension of text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp 193–203. ACL, ???. https://aclanthology.org/D13-1020/

  56. Lai G, Xie Q, Liu H, Yang Y, Hovy EH (2017) RACE: large-scale reading comprehension dataset from examinations. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, pp 785–794. Association for Computational Linguistics, ???. https://doi.org/10.18653/v1/d17-1082

  57. Hermann KM, Kociský T, Grefenstette E, Espeholt L, Kay W, Suleyman M, Blunsom P (2015) Teaching machines to read and comprehend. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pp 1693–1701. https://proceedings.neurips.cc/paper/2015/hash/afdec7005cc9f14302cd0474fd0f3c96-Abstract.html

  58. Fan A, Lewis M, Dauphin YN (2018) Hierarchical neural story generation. In: Gurevych I, Miyao Y (eds) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, pp. 889–898. Association for Computational Linguistics, ???. https://doi.org/10.18653/v1/P18-1082. https://aclanthology.org/P18-1082/

  59. Welbl J, Liu NF, Gardner M (2017) Crowdsourcing multiple choice science questions. In: Derczynski L, Xu W, Ritter A, Baldwin T (eds) Proceedings of the 3rd Workshop on Noisy User-generated Text, NUT@EMNLP 2017, Copenhagen, Denmark, September 7, 2017, pp 94–106. Association for Computational Linguistics, ???. https://doi.org/10.18653/v1/w17-4413

Download references

Acknowledgements

We thank anonymous reviewers for their insightful feedback that helped improve the paper. The research in this article is supported by the National Key Research and Development Project (2021YFF0901600), the National Science Foundation of China (U22B2059, 61976073, 62276083), and Shenzhen Foun-dational Research Funding (JCYJ20200109113441941),the Project of State Key Laboratory of Communication Content Cognition (A02101), the Major Key Project of PCL (PCL2021A06). Ming Liu is the corresponding author.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Z., Zhu, H., Liu, M. et al. TAGNet: a tiny answer-guided network for conversational question generation. Int. J. Mach. Learn. & Cyber. 14, 1921–1932 (2023). https://doi.org/10.1007/s13042-022-01737-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01737-x

Keywords

Navigation