Abstract
Controllable text generation (CTG) by large language models has a huge potential to transform education for teachers and students alike. Specifically, high quality and diverse question generation can dramatically reduce the load on teachers and improve the quality of their educational content. Recent work in this domain has made progress with generation, but fails to show that real teachers judge the generated questions as sufficiently useful for the classroom setting; or if instead the questions have errors and/or pedagogically unhelpful content. We conduct a human evaluation with teachers to assess the quality and usefulness of outputs from combining CTG and question taxonomies (Bloom’s and a difficulty taxonomy). The results demonstrate that the questions generated are high quality and sufficiently useful, showing their promise for widespread use in the classroom setting.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
[10] show that subject matter experts can’t distinguish between machine and human written questions, but state that a future direction is to assess CTG with teachers.
- 2.
The passages, few-shot examples, prompt format, taxonomic level definitions, annotator demographics and raw results are available: https://tinyurl.com/y2hy8m4p.
- 3.
Despite not being a teacher’s opinion, this is evaluated because we want to know the model’s success here without relying on automatic assessment.
References
Baidoo-Anu, D., Owusu Ansah, L.: Education in the Era of Generative Artificial Intelligence (AI): Understanding the Potential Benefits of ChatGPT in Promoting Teaching and Learning (2023). Available at SSRN 4337484
Landis, J.R., Koch, G.G. The measurement of observer agreement for categorical data. Biometrics, 159–174 (1977)
Krathwohl, D.R.: A revision of Bloom’s taxonomy: An overview. Theory Practice 41(4), 212–218 (2002)
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 55(9), 1–35 (2023)
Mulla, N., Gharpure, P.: Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications. Progress Artif. Intell., 1–32 (2023)
Ouyang, L., et al.: Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155 (2022)
Pérez, E.V., Santos, L.M.R., Pérez, M.J.V., de Castro Fernández, J.P., Martín, R.G.: Automatic classification of question difficulty level: Teachers’ estimation vs. students’ perception. In: 2012 Frontiers in Education Conference Proceedings, pp. 1–5. IEEE (2012)
Terwiesch, C.: Would Chat GPT Get a Wharton MBA? A Prediction Based on Its Performance in the Operations Management Course. Mack Institute for Innovation Management at the Wharton School, University of Pennsylvania (2023)
Wang, X., Fan, S., Houghton, J., Wang, L.: Towards Process-Oriented, Modular, and Versatile Question Generation that Meets Educational Needs. arXiv preprint arXiv:2205.00355 (2022)
Wang, Z., Valdez, J., Basu Mallick, D., Baraniuk, R.G.: Towards human-like educational question generation with large language models. In: Artificial Intelligence in Education: 23rd International Conference, AIED 2022, Durham, UK, 2022, Proceedings, Part I, pp. 153–166. Springer International Publishing, Cham (2022)
Zhang, H., Song, H., Li, S., Zhou, M., Song, D.: A survey of controllable text generation using transformer-based pre-trained language models. arXiv preprint arXiv:2201.05337 (2022)
Acknowledgements
We’d like to thank Mitacs for their grant for this project, and CIFAR for their continued support. We are grateful to both the annotators for their time and the anonymous reviewers for their valuable feedback.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Elkins, S., Kochmar, E., Serban, I., Cheung, J.C.K. (2023). How Useful Are Educational Questions Generated by Large Language Models?. In: Wang, N., Rebolledo-Mendez, G., Dimitrova, V., Matsuda, N., Santos, O.C. (eds) Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky. AIED 2023. Communications in Computer and Information Science, vol 1831. Springer, Cham. https://doi.org/10.1007/978-3-031-36336-8_83
Download citation
DOI: https://doi.org/10.1007/978-3-031-36336-8_83
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36335-1
Online ISBN: 978-3-031-36336-8
eBook Packages: Computer ScienceComputer Science (R0)
