Generative Pre-trained Transformers for Coding Text Data? An Analysis with Classroom Orchestration Data

Amarasinghe, Ishari; Marques, Francielle; Ortiz-Beltrán
, Ariel; Hernández-Leo, Davinia

doi:10.1007/978-3-031-42682-7_3

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14200))

Included in the following conference series:

European Conference on Technology Enhanced Learning

1760 Accesses
2 Citations
5 Altmetric

Abstract

Video content analysis is of importance for researchers in technology-enhanced learning. A common starting point typically involves transcribing video into textual transcripts that enable the application of a coding scheme to group the text into key themes. However, manual coding is demanding and requires time and effort of human annotators. Therefore, this study explores the possibility of using Generative Pre-trained Transformer 3 (GPT-3) models for automating the text data coding compared to baseline classical machine learning approaches using a dataset manually coded for the orchestration actions of six teachers in classroom collaborative learning sessions. The findings of our study showed that a fine-tuned GPT-3 (curie) model outperformed classical approaches (F1 score of 0.87) and reached a 0.77 Cohen’s kappa, which indicated a moderate agreement between manual and machine coding. The study also brings out the limitations of our text transcripts and highlights the importance of multimodal observations that capture the context of orchestration actions.

I. Amarasinghe—Is now with the National Education Lab AI, Radboud University, the Netherlands.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Creswell, J.W., Creswell, J.D.: Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 4th edn. Sage Publications, Thousand Oaks (2017)
Google Scholar
Johnson, R.B., Onwuegbuzie, A.J.: Mixed methods research: a research paradigm whose time has come. Educ. Res. 33(7), 14–26 (2004)
Article Google Scholar
O’Connor, C., Joffe, H.: Intercoder reliability in qualitative research: debates and practical guidelines. Int. J. Qual. Methods 19, 1609406919899220 (2020)
Article Google Scholar
Dowell, N., Kovanović, V.: Modeling educational discourse with natural language processing. In: Handbook of Learning Analytics, 2nd edn. (2022)
Google Scholar
Amarasinghe, I., Hernández-Leo, D., Ulrich Hoppe, H.: Deconstructing orchestration load: comparing teacher support through mirroring and guiding. Int. J. Comput. Support. Collab. Learn. 16(3), 307–338 (2021)
Article Google Scholar
OpenAI. GPT3. https://platform.openai.com/docs/models/overview. Accessed 12 Apr 2023
Rose, C., et al.: Analyzing collaborative learning processes automatically: exploiting the advances of computational linguistics in computer-supported collaborative learning. Int. J. Comput. Support. Collab. Learn. 3, 237–271 (2008)
Article Google Scholar
Erkens, G., Janssen, J.: Automatic coding of dialogue acts in collaboration protocols. Int. J. Comput. Support. Collab. Learn. 3, 447–470 (2008)
Article Google Scholar
Kovanović, V., et al.: Towards automated content analysis of discussion transcripts: a cognitive presence case. In: Proceedings of the Sixth International Conference on Learning Analytics & Knowledge, pp. 15–24 (2016)
Google Scholar
Cai, Z., Siebert-Evenstone, A., Eagan, B., Shaffer, D.W., Hu, X., Graesser, A.C.: nCoder+: a semantic tool for improving recall of nCoder coding. In: Advances in Quantitative Ethnography: First International Conference, ICQE 2019, pp. 41–54 (2019)
Google Scholar
Flor, M., Andrews-Todd, J.: Towards automatic annotation of collaborative problem-solving skills in technology-enhanced environments. J. Comput. Assist. Learn. 38(5), 1434–1447 (2022)
Article Google Scholar
Li, Z., Xie, H., Wang, M., Wu, B., Hu, Y.: Automatic coding of collective creativity dialogues in collaborative problem solving based on deep learning models. In: 15th International Conference on Blended Learning: Engaging Students in the New Normal Era, pp. 123–134 (2022)
Google Scholar
Nazaretsky, T., Mikeska, J.N., Beigman Klebanov, B.: Empowering teacher learning with AI: automated evaluation of teacher attention to student ideas during argumentation-focused discussion. In: Proceedings of the 13th International Conference on Learning Analytics & Knowledge, pp. 122–132 (2023)
Google Scholar
Jensen, E., Pugh, S.L., D’Mello, S.K.: A deep transfer learning approach to modeling teacher discourse in the classroom. In: Proceedings of the 11th International Conference on Learning Analytics & Knowledge, pp. 302–312 (2021)
Google Scholar
Suresh, A., et al.: Using transformers to provide teachers with personalized feedback on their classroom discourse: the TalkMoves application. arXiv preprint arXiv:2105.07949 (2021)
Brown, T., et al.: Language models are few-shot learners. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 1877–1901 (2020)
Google Scholar
Microsoft Azure. OpenAI documentation. https://learn.microsoft.com/en-us/azure/cognitive-services/openai/. Accessed 12 Apr 2023
Saif, H., Fernandez, M., He, Y., Alani, H.: On stopwords, filtering and data sparsity for sentiment analysis of twitter. In: 9th International Conference on Language Resources and Evaluation (LREC 2014), pp. 810–817 (2014)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
McHugh, M.L.: Interrater reliability: the kappa statistic. Biochemia Medica 22(3), 276–282 (2012)
Article MathSciNet Google Scholar
GPT4. https://openai.com/research/gpt-4. Accessed 12 Apr 2023
Bard Experiment. https://bard.google.com. Accessed 12 Apr 2023
Learn prompting. https://learnprompting.org/docs/intro/. Accessed 12 Apr 2023

Download references

Acknowledgments

This work has been partially funded by AEI/10.13039/501100011033 (PID2020-112584RB-C33) and (PLAWB00322) and the Department of Research and Universities of the Government of Catalonia (SGR 00930). DHL (Serra Hunter) also acknowledges the support by ICREA under the ICREA Academia programme. IA acknowledges the support by National Education Lab AI, Radboud University.

Author information

Authors and Affiliations

ICT Department, Universitat Pompeu Fabra, Barcelona, Spain
Ishari Amarasinghe, Francielle Marques, Ariel Ortiz-Beltrán & Davinia Hernández-Leo

Authors

Ishari Amarasinghe
View author publications
You can also search for this author in PubMed Google Scholar
Francielle Marques
View author publications
You can also search for this author in PubMed Google Scholar
Ariel Ortiz-Beltrán
View author publications
You can also search for this author in PubMed Google Scholar
Davinia Hernández-Leo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ishari Amarasinghe .

Editor information

Editors and Affiliations

KTH Royal Institute of Technology, Stockholm, Sweden
Olga Viberg
Goethe University Frankfurt, Frankfurt am Main, Germany
Ioana Jivet
Universidad Carlos III de Madrid, Madrid, Spain
Pedro J. Muñoz-Merino
University of Macedonia, Thessaloniki, Greece
Maria Perifanou
CODE University of Applied Sciences, Berlin, Germany
Tina Papathoma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Amarasinghe, I., Marques, F., Ortiz-Beltrán , A., Hernández-Leo, D. (2023). Generative Pre-trained Transformers for Coding Text Data? An Analysis with Classroom Orchestration Data. In: Viberg, O., Jivet, I., Muñoz-Merino, P., Perifanou, M., Papathoma, T. (eds) Responsive and Sustainable Educational Futures. EC-TEL 2023. Lecture Notes in Computer Science, vol 14200. Springer, Cham. https://doi.org/10.1007/978-3-031-42682-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-42682-7_3
Published: 28 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42681-0
Online ISBN: 978-3-031-42682-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics