Abstract
Video content analysis is of importance for researchers in technology-enhanced learning. A common starting point typically involves transcribing video into textual transcripts that enable the application of a coding scheme to group the text into key themes. However, manual coding is demanding and requires time and effort of human annotators. Therefore, this study explores the possibility of using Generative Pre-trained Transformer 3 (GPT-3) models for automating the text data coding compared to baseline classical machine learning approaches using a dataset manually coded for the orchestration actions of six teachers in classroom collaborative learning sessions. The findings of our study showed that a fine-tuned GPT-3 (curie) model outperformed classical approaches (F1 score of 0.87) and reached a 0.77 Cohen’s kappa, which indicated a moderate agreement between manual and machine coding. The study also brings out the limitations of our text transcripts and highlights the importance of multimodal observations that capture the context of orchestration actions.
I. Amarasinghe—Is now with the National Education Lab AI, Radboud University, the Netherlands.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Creswell, J.W., Creswell, J.D.: Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 4th edn. Sage Publications, Thousand Oaks (2017)
Johnson, R.B., Onwuegbuzie, A.J.: Mixed methods research: a research paradigm whose time has come. Educ. Res. 33(7), 14–26 (2004)
O’Connor, C., Joffe, H.: Intercoder reliability in qualitative research: debates and practical guidelines. Int. J. Qual. Methods 19, 1609406919899220 (2020)
Dowell, N., Kovanović, V.: Modeling educational discourse with natural language processing. In: Handbook of Learning Analytics, 2nd edn. (2022)
Amarasinghe, I., Hernández-Leo, D., Ulrich Hoppe, H.: Deconstructing orchestration load: comparing teacher support through mirroring and guiding. Int. J. Comput. Support. Collab. Learn. 16(3), 307–338 (2021)
OpenAI. GPT3. https://platform.openai.com/docs/models/overview. Accessed 12 Apr 2023
Rose, C., et al.: Analyzing collaborative learning processes automatically: exploiting the advances of computational linguistics in computer-supported collaborative learning. Int. J. Comput. Support. Collab. Learn. 3, 237–271 (2008)
Erkens, G., Janssen, J.: Automatic coding of dialogue acts in collaboration protocols. Int. J. Comput. Support. Collab. Learn. 3, 447–470 (2008)
Kovanović, V., et al.: Towards automated content analysis of discussion transcripts: a cognitive presence case. In: Proceedings of the Sixth International Conference on Learning Analytics & Knowledge, pp. 15–24 (2016)
Cai, Z., Siebert-Evenstone, A., Eagan, B., Shaffer, D.W., Hu, X., Graesser, A.C.: nCoder+: a semantic tool for improving recall of nCoder coding. In: Advances in Quantitative Ethnography: First International Conference, ICQE 2019, pp. 41–54 (2019)
Flor, M., Andrews-Todd, J.: Towards automatic annotation of collaborative problem-solving skills in technology-enhanced environments. J. Comput. Assist. Learn. 38(5), 1434–1447 (2022)
Li, Z., Xie, H., Wang, M., Wu, B., Hu, Y.: Automatic coding of collective creativity dialogues in collaborative problem solving based on deep learning models. In: 15th International Conference on Blended Learning: Engaging Students in the New Normal Era, pp. 123–134 (2022)
Nazaretsky, T., Mikeska, J.N., Beigman Klebanov, B.: Empowering teacher learning with AI: automated evaluation of teacher attention to student ideas during argumentation-focused discussion. In: Proceedings of the 13th International Conference on Learning Analytics & Knowledge, pp. 122–132 (2023)
Jensen, E., Pugh, S.L., D’Mello, S.K.: A deep transfer learning approach to modeling teacher discourse in the classroom. In: Proceedings of the 11th International Conference on Learning Analytics & Knowledge, pp. 302–312 (2021)
Suresh, A., et al.: Using transformers to provide teachers with personalized feedback on their classroom discourse: the TalkMoves application. arXiv preprint arXiv:2105.07949 (2021)
Brown, T., et al.: Language models are few-shot learners. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 1877–1901 (2020)
Microsoft Azure. OpenAI documentation. https://learn.microsoft.com/en-us/azure/cognitive-services/openai/. Accessed 12 Apr 2023
Saif, H., Fernandez, M., He, Y., Alani, H.: On stopwords, filtering and data sparsity for sentiment analysis of twitter. In: 9th International Conference on Language Resources and Evaluation (LREC 2014), pp. 810–817 (2014)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
McHugh, M.L.: Interrater reliability: the kappa statistic. Biochemia Medica 22(3), 276–282 (2012)
GPT4. https://openai.com/research/gpt-4. Accessed 12 Apr 2023
Bard Experiment. https://bard.google.com. Accessed 12 Apr 2023
Learn prompting. https://learnprompting.org/docs/intro/. Accessed 12 Apr 2023
Acknowledgments
This work has been partially funded by AEI/10.13039/501100011033 (PID2020-112584RB-C33) and (PLAWB00322) and the Department of Research and Universities of the Government of Catalonia (SGR 00930). DHL (Serra Hunter) also acknowledges the support by ICREA under the ICREA Academia programme. IA acknowledges the support by National Education Lab AI, Radboud University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Amarasinghe, I., Marques, F., Ortiz-Beltrán , A., Hernández-Leo, D. (2023). Generative Pre-trained Transformers for Coding Text Data? An Analysis with Classroom Orchestration Data. In: Viberg, O., Jivet, I., Muñoz-Merino, P., Perifanou, M., Papathoma, T. (eds) Responsive and Sustainable Educational Futures. EC-TEL 2023. Lecture Notes in Computer Science, vol 14200. Springer, Cham. https://doi.org/10.1007/978-3-031-42682-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-42682-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42681-0
Online ISBN: 978-3-031-42682-7
eBook Packages: Computer ScienceComputer Science (R0)