Skip to main content

Generative Pre-trained Transformers for Coding Text Data? An Analysis with Classroom Orchestration Data

  • Conference paper
  • First Online:
Responsive and Sustainable Educational Futures (EC-TEL 2023)

Abstract

Video content analysis is of importance for researchers in technology-enhanced learning. A common starting point typically involves transcribing video into textual transcripts that enable the application of a coding scheme to group the text into key themes. However, manual coding is demanding and requires time and effort of human annotators. Therefore, this study explores the possibility of using Generative Pre-trained Transformer 3 (GPT-3) models for automating the text data coding compared to baseline classical machine learning approaches using a dataset manually coded for the orchestration actions of six teachers in classroom collaborative learning sessions. The findings of our study showed that a fine-tuned GPT-3 (curie) model outperformed classical approaches (F1 score of 0.87) and reached a 0.77 Cohen’s kappa, which indicated a moderate agreement between manual and machine coding. The study also brings out the limitations of our text transcripts and highlights the importance of multimodal observations that capture the context of orchestration actions.

I. Amarasinghe—Is now with the National Education Lab AI, Radboud University, the Netherlands.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Creswell, J.W., Creswell, J.D.: Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 4th edn. Sage Publications, Thousand Oaks (2017)

    Google Scholar 

  2. Johnson, R.B., Onwuegbuzie, A.J.: Mixed methods research: a research paradigm whose time has come. Educ. Res. 33(7), 14–26 (2004)

    Article  Google Scholar 

  3. O’Connor, C., Joffe, H.: Intercoder reliability in qualitative research: debates and practical guidelines. Int. J. Qual. Methods 19, 1609406919899220 (2020)

    Article  Google Scholar 

  4. Dowell, N., Kovanović, V.: Modeling educational discourse with natural language processing. In: Handbook of Learning Analytics, 2nd edn. (2022)

    Google Scholar 

  5. Amarasinghe, I., Hernández-Leo, D., Ulrich Hoppe, H.: Deconstructing orchestration load: comparing teacher support through mirroring and guiding. Int. J. Comput. Support. Collab. Learn. 16(3), 307–338 (2021)

    Article  Google Scholar 

  6. OpenAI. GPT3. https://platform.openai.com/docs/models/overview. Accessed 12 Apr 2023

  7. Rose, C., et al.: Analyzing collaborative learning processes automatically: exploiting the advances of computational linguistics in computer-supported collaborative learning. Int. J. Comput. Support. Collab. Learn. 3, 237–271 (2008)

    Article  Google Scholar 

  8. Erkens, G., Janssen, J.: Automatic coding of dialogue acts in collaboration protocols. Int. J. Comput. Support. Collab. Learn. 3, 447–470 (2008)

    Article  Google Scholar 

  9. Kovanović, V., et al.: Towards automated content analysis of discussion transcripts: a cognitive presence case. In: Proceedings of the Sixth International Conference on Learning Analytics & Knowledge, pp. 15–24 (2016)

    Google Scholar 

  10. Cai, Z., Siebert-Evenstone, A., Eagan, B., Shaffer, D.W., Hu, X., Graesser, A.C.: nCoder+: a semantic tool for improving recall of nCoder coding. In: Advances in Quantitative Ethnography: First International Conference, ICQE 2019, pp. 41–54 (2019)

    Google Scholar 

  11. Flor, M., Andrews-Todd, J.: Towards automatic annotation of collaborative problem-solving skills in technology-enhanced environments. J. Comput. Assist. Learn. 38(5), 1434–1447 (2022)

    Article  Google Scholar 

  12. Li, Z., Xie, H., Wang, M., Wu, B., Hu, Y.: Automatic coding of collective creativity dialogues in collaborative problem solving based on deep learning models. In: 15th International Conference on Blended Learning: Engaging Students in the New Normal Era, pp. 123–134 (2022)

    Google Scholar 

  13. Nazaretsky, T., Mikeska, J.N., Beigman Klebanov, B.: Empowering teacher learning with AI: automated evaluation of teacher attention to student ideas during argumentation-focused discussion. In: Proceedings of the 13th International Conference on Learning Analytics & Knowledge, pp. 122–132 (2023)

    Google Scholar 

  14. Jensen, E., Pugh, S.L., D’Mello, S.K.: A deep transfer learning approach to modeling teacher discourse in the classroom. In: Proceedings of the 11th International Conference on Learning Analytics & Knowledge, pp. 302–312 (2021)

    Google Scholar 

  15. Suresh, A., et al.: Using transformers to provide teachers with personalized feedback on their classroom discourse: the TalkMoves application. arXiv preprint arXiv:2105.07949 (2021)

  16. Brown, T., et al.: Language models are few-shot learners. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), pp. 1877–1901 (2020)

    Google Scholar 

  17. Microsoft Azure. OpenAI documentation. https://learn.microsoft.com/en-us/azure/cognitive-services/openai/. Accessed 12 Apr 2023

  18. Saif, H., Fernandez, M., He, Y., Alani, H.: On stopwords, filtering and data sparsity for sentiment analysis of twitter. In: 9th International Conference on Language Resources and Evaluation (LREC 2014), pp. 810–817 (2014)

    Google Scholar 

  19. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  20. McHugh, M.L.: Interrater reliability: the kappa statistic. Biochemia Medica 22(3), 276–282 (2012)

    Article  MathSciNet  Google Scholar 

  21. GPT4. https://openai.com/research/gpt-4. Accessed 12 Apr 2023

  22. Bard Experiment. https://bard.google.com. Accessed 12 Apr 2023

  23. Learn prompting. https://learnprompting.org/docs/intro/. Accessed 12 Apr 2023

Download references

Acknowledgments

This work has been partially funded by AEI/10.13039/501100011033 (PID2020-112584RB-C33) and (PLAWB00322) and the Department of Research and Universities of the Government of Catalonia (SGR 00930). DHL (Serra Hunter) also acknowledges the support by ICREA under the ICREA Academia programme. IA acknowledges the support by National Education Lab AI, Radboud University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ishari Amarasinghe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Amarasinghe, I., Marques, F., Ortiz-Beltrán , A., Hernández-Leo, D. (2023). Generative Pre-trained Transformers for Coding Text Data? An Analysis with Classroom Orchestration Data. In: Viberg, O., Jivet, I., Muñoz-Merino, P., Perifanou, M., Papathoma, T. (eds) Responsive and Sustainable Educational Futures. EC-TEL 2023. Lecture Notes in Computer Science, vol 14200. Springer, Cham. https://doi.org/10.1007/978-3-031-42682-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42682-7_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42681-0

  • Online ISBN: 978-3-031-42682-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics