Towards Human-Like Educational Question Generation with Large Language Models

Wang, Zichao; Valdez, Jakob; Basu Mallick, Debshila; Baraniuk, Richard G.

doi:10.1007/978-3-031-11644-5_13

Zichao Wang¹¹,
Jakob Valdez¹¹,
Debshila Basu Mallick¹² &
…
Richard G. Baraniuk^11,12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13355))

Included in the following conference series:

International Conference on Artificial Intelligence in Education

5067 Accesses
12 Citations
1 Altmetric

Abstract

We investigate the utility of large pretrained language models (PLMs) for automatic educational assessment question generation. While PLMs have shown increasing promise in a wide range of natural language applications, including question generation, they can generate unreliable and undesirable content. For high-stakes applications such as educational assessments, it is not only critical to ensure that the generated content is of high quality but also relates to the specific content being assessed. In this paper, we investigate the impact of various PLM prompting strategies on the quality of generated questions. We design a series of generation scenarios to evaluate various generation strategies and evaluate generated questions via automatic metrics and manual examination. With empirical evaluation, we identify the prompting strategy that is most likely to lead to high-quality generated questions. Finally, we demonstrate the promising educational utility of generated questions using our concluded best generation strategy by presenting generated questions together with human-authored questions to a subject matter expert, who despite their expertise, could not effectively distinguish between generated and human-authored questions.

Z. Wang and J. Valdez—Contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/openstax/research-question-generation-gpt3.

References

Adesope, O.O., et al.: Rethinking the use of tests: a meta-analysis of practice testing. Rev. Educ. Res. 87(3), 659–701 (2017)
Article Google Scholar
Bloom, B.S., Engelhart, M.D., Furst, E., Hill, W.H., Krathwohl, D.R.: Handbook I: Cognitive Domain. David McKay, New York (1956)
Google Scholar
Brown, T., et al.: Language models are few-shot learners. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020)
Google Scholar
Connor-Greene, P.A.: Assessing and promoting student learning: blurring the line between teaching and testing. Teach. Psychol. 27(2), 84–88 (2000)
Article Google Scholar
Du, X., Shao, J., Cardie, C.: Learning to ask: neural question generation for reading comprehension. In: Proceedings of the ACL, pp. 1342–1352 (July 2017)
Google Scholar
Duan, N., Tang, D., Chen, P., Zhou, M.: Question generation for question answering. In: Proceedings of the Conference on EMNLP, pp. 866–874 (September 2017)
Google Scholar
Huang, Y.T., Chen, M.C., Sun, Y.S.: Bringing personalized learning into computer-aided question generation (2018)
Google Scholar
Karpicke, J.D.: Retrieval-based learning: active retrieval promotes meaningful learning. Curr. Dir. Psychol. Sci. 21(3), 157–163 (2012)
Article Google Scholar
Karpicke, J.D., Blunt, J.R.: Retrieval practice produces more learning than elaborative studying with concept mapping. Science 331(6018), 772–775 (2011)
Article Google Scholar
Karpicke, J.D., Roediger, H.L., III.: The critical importance of retrieval for learning. Science 319(5865), 966–968 (2008)
Article Google Scholar
Keskar, N.S., McCann, B., Varshney, L.R., Xiong, C., Socher, R.: CTRL: a conditional transformer language model for controllable generation (2019)
Google Scholar
Koedinger, K.R., Kim, J., Jia, J.Z., McLaughlin, E.A., Bier, N.L.: Learning is not a spectator sport: Doing is better than watching for learning from a MOOC. In: Proceedings of the Conference on Learning at Scale, pp. 111–120 (2015)
Google Scholar
Kovacs, G.: Effects of in-video quizzes on MOOC lecture viewing. In: Proceedings of the Conference on Learning at Scale, pp. 31–40 (2016)
Google Scholar
Krathwohl, D.R.: A revision of bloom’s taxonomy: a overview. Theor. Pract. 41(4), 212–218 (2002)
Article Google Scholar
Li, J., Galley, M., Brockett, C., Gao, J., Dolan, B.: A diversity-promoting objective function for neural conversation models. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 110–119 (Jun 2016)
Google Scholar
Li, X.L., Liang, P.: Prefix-tuning: optimizing continuous prompts for generation. In: Proceedings of the ACL. pp. 4582–4597 (August 2021)
Google Scholar
Li, Y., Duan, N., Zhou, B., Chu, X., Ouyang, W., Wang, X.: Visual question generation as dual task of visual question answering. arXiv e-prints (2017)
Google Scholar
Liu, P., et al.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing (2021)
Google Scholar
Liu, X., Ji, K., Fu, Y., Du, Z., Yang, Z., Tang, J.: P-tuning v2: prompt tuning can be comparable to fine-tuning universally across scales and tasks (2021)
Google Scholar
Lu, O.H., Huang, A.Y., Tsai, D.C., Yang, S.J.: Expert-authored and machine-generated short-answer questions for assessing students learning performance. Educ. Technol. Soc. 24(3), 159–173 (2021)
Google Scholar
Martin, L., Mills, C., D’Mello, S.K., Risko, E.F.: Re-watching lectures as a study strategy and its effect on mind wandering. Exp. Psychol. 65(5), 297–305 (2018)
Article Google Scholar
Morris, J.: Python language tool (2021). https://github.com/jxmorris12/language_tool_python
Perspective: Using machine learning to reduce toxicity online (2021). https://www.perspectiveapi.com/
Rajpurkar, P., et al.: SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the Conference on EMNLP, pp. 2383–2392 (November 2016)
Google Scholar
Serban, I.V., et al.: Generating factoid questions with recurrent neural networks: the 30M factoid question-answer corpus. In: Proceedings of the ACL, pp. 588–598 (August 2016)
Google Scholar
Wang, Z., Lan, A.S., Nie, W., Waters, A.E., Grimaldi, P.J., Baraniuk, R.G.: QG-Net: a data-driven question generation model for educational content. In: Proceedings of the Conference on Learning at Scale (2018)
Google Scholar
Wiklund-Hörnqvist, C., Jonsson, B., Nyberg, L.: Strengthening concept learning by repeated testing. Scand. J. Psychol. 55(1), 10–16 (2014)
Article Google Scholar
Willis, A., et al.: Key phrase extraction for generating educational question-answer pairs. In: Proceedings of the Conference on Learning at Scale (2019)
Google Scholar

Download references

Acknowledgements

This work is supported by NSF grants 1842378, 1917713, 2118706, ONR grant N0014-20-1-2534, AFOSR grant FA9550-18-1-0478, and a Vannevar Bush Faculty Fellowship, ONR grant N00014-18-1-2047. We thank Prof. Sandra Adams (Excelsior College), Prof. Tyler Rust (California State University), Prof. Julie Dinh (Baruch College, CUNY) for contributing their subject matter and instructional expertise. Thanks to the anonymous reviewers for thoughtful feedback on the manuscript.

Author information

Authors and Affiliations

Rice University, Houston, USA
Zichao Wang, Jakob Valdez & Richard G. Baraniuk
OpenStax, Houston, USA
Debshila Basu Mallick & Richard G. Baraniuk

Authors

Zichao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jakob Valdez
View author publications
You can also search for this author in PubMed Google Scholar
Debshila Basu Mallick
View author publications
You can also search for this author in PubMed Google Scholar
Richard G. Baraniuk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zichao Wang .

Editor information

Editors and Affiliations

Ateneo De Manila University, Quezon, Philippines
Maria Mercedes Rodrigo
Department of Computer Science, North Carolina State University, Raleigh, NC, USA
Noburu Matsuda
Durham University, Durham, UK
Alexandra I. Cristea
University of Leeds, Leeds, UK
Vania Dimitrova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Z., Valdez, J., Basu Mallick, D., Baraniuk, R.G. (2022). Towards Human-Like Educational Question Generation with Large Language Models. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2022. Lecture Notes in Computer Science, vol 13355. Springer, Cham. https://doi.org/10.1007/978-3-031-11644-5_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-11644-5_13
Published: 27 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11643-8
Online ISBN: 978-3-031-11644-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Human-Like Educational Question Generation with Large Language Models