Use of Training, Validation, and Test Sets for Developing Automated Classifiers in Quantitative Ethnography

  • Seung B. LeeEmail author
  • Xiaofan Gui
  • Megan Manquen
  • Eric R. Hamilton
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1112)


Using automated classifiers to code discourse data enables researchers to carry out analyses on large datasets. This paper presents a detailed example of applying training, validation and test sets frequently utilized in machine learning to develop automated classifiers for use in quantitative ethnography research. The method was applied to two dispositional constructs. Within one cycle of the process, reliable and valid automated classifiers were developed for Social Disposition. However, the automated coding scheme for Inclusive Disposition was rejected during the validation stage due to issues of overfitting. Nonetheless, the results demonstrate the beneficial potential of using preclassified datasets in enhancing the efficiency and effectiveness of the automation process.


Qualitative coding Automated classifiers Quantitative ethnography 



The authors gratefully acknowledge funding support from the US National Science Foundation for the work this paper reports. Views appearing in this paper do not reflect those of the funding agency.


  1. 1.
    Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)zbMATHGoogle Scholar
  2. 2.
    Dönmez, P., Rosé, C., Stegmann, K., Weinberger, A., Fischer, F.: Supporting CSCL with automatic corpus analysis technology. In: Proceedings of the 2005 Conference on Computer Support for Collaborative Learning (CSCL), pp. 125–134. International Society of the Learning Sciences (2005)Google Scholar
  3. 3.
    Eagan, B.R., Hamilton, E.: Epistemic network analysis of an international digital makerspace in Africa, Europe, and the US. Paper presented at the annual meeting of the American Education Research Association (AERA), New York (2018)Google Scholar
  4. 4.
    Eagan, B.R., Rogers, B., Pozen, R., Marquart, C., Shaffer, D.W.: rhoR: rho for inter rater reliability (version (2019)Google Scholar
  5. 5.
    Eagan, B.R., Rogers, B., Serlin, R., Ruis, A.R., Arastoopour Irgens, G., Shaffer, D.W.: Can we rely on IRR? Testing the assumptions of inter-rater reliability. In: Proceedings of the 12th International Conference on Computer Supported Collaborative Learning, Philadelphia (2017)Google Scholar
  6. 6.
    Espino, D.P., Lee, S.B., Eagan, B.R., Hamilton, E.R.: An initial look at the developing culture of online global meet-ups in establishing a collaborative, STEM media-making community. In: Proceedings of the 13th International Conference on Computer-Supported Collaborative Learning (CSCL), pp. 608–611. International Society of the Learning Sciences (2019)Google Scholar
  7. 7.
    Frederiksen, J.R., Sipusic, M., Sherin, M., Wolfe, E.W.: Video portfolio assessment: creating a framework for viewing the functions of teaching. Educ. Assess. 5(4), 225–297 (1998)CrossRefGoogle Scholar
  8. 8.
    Haykin, S.S.: Neural Networks and Learning Machines, 3rd edn. Prentice Hall, New York (2009)Google Scholar
  9. 9.
    Herrenkohl, L.R., Cornelius, L.: Investigating elementary students’ scientific and historical argumentation. J. Learn. Sci. 22(3), 413–461 (2013)CrossRefGoogle Scholar
  10. 10.
    Katz, L.G., McClellan, D.E.: Research into practice series, vol. 8. Fostering children’s social competence: the teacher’s role. National Association for the Education of Young Children, Washington, D.C. (1997)Google Scholar
  11. 11.
    Lee, S.B., Espino, D.P., Hamilton, E.R.: Exploratory research application of epistemic network analysis for examining international virtual collaborative STEM learning. Paper presented at the annual meeting of the American Educational Research Association (AERA), Toronto (2019)Google Scholar
  12. 12.
    Lever, J., Krzywinski, M., Altman, N.: Points of significance: model selection and overfitting. Nat. Methods 13(9), 703–704 (2016)CrossRefGoogle Scholar
  13. 13.
    Marquart, C., Swiecki, Z., Eagan, B.R., Shaffer, D.W.: ncodeR: techniques for automated classifiers (version 0.1.2) (2018)Google Scholar
  14. 14.
    Marquart, C., Hinojosa, C., Swiecki, Z., Eagan, B., Shaffer, D.W.: Epistemic network analysis (version 1.5.2) (2018)Google Scholar
  15. 15.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Shaffer, D.W.: Quantitative Ethnography. Cathcart Press, Madison (2017)Google Scholar
  17. 17.
    Shaffer, D.W.: Big data for thick description of deep learning. In: Millis, K., Long, D., Magliano, J., Wiemer, K. (eds.) Deep Comprehension, pp. 265–277. Routledge, New York (2018)Google Scholar
  18. 18.
    Shaffer, D.W., Ruis, A.R.: Epistemic network analysis: a worked example of theory-based learning analytics. In: Lang, C., Siemens, G., Wise, A.F., Gasevic, D. (eds.) Handbook of Learning Analytics, pp. 175–187. Society for Learning Analytics Research (2017)Google Scholar
  19. 19.
    Swiecki, Z., Ruis, A.R., Farrell, C., Shaffer, D.W.: Assessing individual contributions to collaborative problem solving: a network analysis approach. Comput. Hum. Behav. (2019, in press)Google Scholar
  20. 20.
    Wise, A.F., Shaffer, D.W.: Why theory matters more than ever in the age of big data. J. Learn. Anal. 2(2), 5–13 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Seung B. Lee
    • 1
    Email author
  • Xiaofan Gui
    • 1
  • Megan Manquen
    • 1
  • Eric R. Hamilton
    • 1
  1. 1.Pepperdine UniversityMalibuUSA

Personalised recommendations