Abstract
Automated measurement of student engagement equips educators with valuable insights, aiding them in achieving educational program objectives and customizing their approach to suit individual students. Engagement measurement requires a detailed analysis of the behavioral and affective states of students over precise timescales. A range of current techniques have engineered sequential and spatiotemporal models, including recurrent neural networks, temporal convolutional networks, three-dimensional convolutional neural networks, and transformers to measure engagement from video data. These models are trained to incorporate the sequential/temporal order of behavioral and affective states into the video analysis, outputting their level of engagement. Drawing upon the definition of engagement in educational psychology, this paper questions the necessity of incorporating the order of behavioral and affective states into engagement measurement. Non-sequential bag-of-words-based models are developed to analyze behavioral and affective features extracted from videos and output engagement levels. The non-sequential models only analyze the occurrence of behavioral and affective states not the order in which they occur. Experimental results indicate that the proposed non-sequential approach is superior to state-of-the-art sequential engagement measurement approaches. On the IIITB Online SE dataset, the proposed approach significantly improved engagement level classification accuracy by 22%, and 26%, respectively, compared to the recurrent neural network, and the temporal convolutional network. It also improved minority class recall and achieved a classification accuracy as high as 0.6658 On the DAiSEE dataset. In another experiment, models displayed consistent performance while trained on the shuffled versions of the datasets compared with those trained on the original, unshuffled datasets. In the shuffled versions, behavioral and affective states within video samples were randomly permuted. These observations reinforce the notion that the order in which affective and behavioral states occur does not impact engagement measurement.
Similar content being viewed by others
Data availability
The datasets analyzed during the current study are publicly available in the following repositories: https://people.iith.ac.in/vineethnb/resources/daisee/index.html.
References
Mukhtar, K., Javed, K., Arooj, M., Sethi, A.: Advantages, limitations and recommendations for online learning during covid-19 pandemic era. Pak. J. Med. Sci. 36(COVID19–S4), 27 (2020)
Dung, D.T.H.: The advantages and disadvantages of virtual learning. IOSR J. Res. Method Educ. 10(3), 45–48 (2020)
Sümer, Ö., Goldberg, P., D’Mello, S., Gerjets, P., Trautwein, U., Kasneci, E.: Multimodal engagement analysis from facial videos in the classroom. IEEE Trans. Affect. Comput. 14(2), 1012–1027 (2021)
Gray, J.A., DiLoreto, M.: The effects of student engagement, student satisfaction, and perceived learning in online learning environments. Int. J. Educ. Leadership Prep. 11(1), 1 (2016)
Sinatra, G.M., Heddy, B.C., Lombardi, D.: The Challenges of Defining and Measuring Student Engagement in Science. Taylor & Francis, Abingdon (2015)
Woolf, B., Burleson, W., Arroyo, I., Dragon, T., Cooper, D., Picard, R.: Affect-aware tutors: recognising and responding to student affect. Int. J. Learn. Technol. 4(3/4), 129–164 (2009)
D’Mello, S., Graesser, A.: Dynamics of affective states during complex learning. Learn. Instr. 22(2), 145–157 (2012)
Fredricks, J., McColskey, W., Meli, J., Mordica, J., Montrosse, B., Mooney, K.: Measuring Student Engagement in Upper Elementary Through High School: A Description of 21 Instruments. Issues & answers. Rel 2011-no. 098. Regional Educational Laboratory Southeast (2011)
Nkomo, L.M., Daniel, B.K., Butson, R.J.: Synthesis of student engagement with digital technologies: a systematic review of the literature. Int. J. Educ. Technol. High. Educ. 18, 1–26 (2021)
D’Mello, S., Dieterle, E., Duckworth, A.: Advanced, analytic, automated (AAA) measurement of engagement during learning. Educ. Psychol. 52(2), 104–123 (2017)
Bosch, N.: Detecting student engagement: human versus machine. In: Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization, pp. 317–320 (2016)
Dewan, M., Murshed, M., Lin, F.: Engagement detection in online learning: a review. Smart Learn. Environ. 6(1), 1–20 (2019)
Karimah, S.N., Hasegawa, S.: Automatic engagement estimation in smart education/learning settings: a systematic review of engagement definitions, datasets, and methods. Smart Learn. Environ. 9(1), 1–48 (2022)
Belle, A., Hargraves, R.H., Najarian, K.: An automated optimal engagement and attention detection system using electrocardiogram. Comput. Math. Methods Med. 2012, 528781 (2012)
Pugh, C.M., Hashimoto, D.A., Korndorffer, J.R., Jr.: The what? how? and who? of video based assessment. Am. J. Surg. 221(1), 13–18 (2021)
Khan, S.S., Abedi, A., Colella, T.: Inconsistencies in measuring student engagement in virtual learning–a critical review. arXiv preprint arXiv:2208.04548 (2022)
Sherstinsky, A.: Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D 404, 132306 (2020)
Chen, X., Niu, L., Veeraraghavan, A., Sabharwal, A.: Faceengage: robust estimation of gameplay engagement from user-contributed (youtube) videos. IEEE Trans. Affect. Comput. 13, 651–665 (2019)
Wu, J., Yang, B., Wang, Y., Hattori, G.: Advanced multi-instance learning method with multi-features engineering and conservative optimization for engagement intensity prediction. In: Proceedings of the 2020 International Conference on Multimodal Interaction, pp. 777–783 (2020)
Ma, X., Xu, M., Dong, Y., Sun, Z.: Automatic student engagement in online learning environment based on neural turing machine. Int. J. Inf. Educ. Technol. 11(3), 107–111 (2021)
Copur, O., Nakıp, M., Scardapane, S., Slowack, J.: Engagement detection with multi-task training in e-learning environments. In: International Conference on Image Analysis and Processing, pp. 411–422. Springer (2022)
Abedi, A., Khan, S.S.: Affect-driven ordinal engagement measurement from video. Multimed. Tools Appl. 11, 1–20 (2023)
Abedi, A., Khan, S.: Detecting disengagement in virtual learning as an anomaly using temporal convolutional network autoencoder. Signal Image Video Process. 7, 3535–3543 (2023)
Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018)
Thomas, C., Nair, N., Jayagopi, D.B.: Predicting engagement intensity in the wild using temporal convolutional network. In: Proceedings of the 20th ACM International Conference on Multimodal Interaction, pp. 604–610 (2018)
Thomas, C., Sarma, K.P., Gajula, S.S., Jayagopi, D.B.: Automatic prediction of presentation style and student engagement from videos. Comput. Educ. Artif. Intell. 3, 100079 (2022)
Gupta, A., D’Cunha, A., Awasthi, K., Balasubramanian, V.: Daisee: Towards user engagement recognition in the wild. arXiv preprint arXiv:1609.01885 (2016)
Zhang, H., Xiao, X., Huang, T., Liu, S., Xia, Y., Li, J.: An novel end-to-end network for automatic student engagement recognition. In: 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC), pp. 342–345 (2019). IEEE
Abedi, A., Khan, S.S.: Improving state-of-the-art in detecting student engagement with resnet and tcn hybrid network. In: 2021 18th Conference on Robots and Vision (CRV), pp. 151–157 (2021). IEEE
Mehta, N.K., Prasad, S.S., Saurav, S., Saini, R., Singh, S.: Three-dimensional densenet self-attention neural network for automatic detection of student’s engagement. Appl. Intell. 52, 13803–13823 (2022)
Ai, X., Sheng, V.S., Li, C.: Class-attention video transformer for engagement intensity prediction. arXiv preprint arXiv:2208.07216 (2022)
Galke, L., Scherp, A.: Bag-of-words vs. graph vs. sequence in text classification: Questioning the necessity of text-graphs and the surprising strength of a wide mlp. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers), pp. 4038–4051 (2022)
Wang, J., Liu, P., She, M.F., Nahavandi, S., Kouzani, A.: Bag-of-words representation for biomedical time series classification. Biomed. Signal Process. Control 8(6), 634–644 (2013)
Liao, J., Liang, Y., Pan, J.: Deep facial spatiotemporal network for engagement prediction in online learning. Appl. Intell. 51(10), 6609–6621 (2021)
Selim, T., Elkabani, I., Abdou, M.A.: Students engagement level detection in online e-learning using hybrid efficientnetb7 together with tcn, lstm, and bi-lstm. IEEE Access 10, 99573–99583 (2022)
Hu, Y., Jiang, Z., Zhu, K.: An optimized cnn model for engagement recognition in an e-learning environment. Appl. Sci. 12(16), 8007 (2022)
Mohamad Nezami, O., Dras, M., Hamey, L., Richards, D., Wan, S., Paris, C.: Automatic recognition of student engagement using deep learning and facial expression. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 273–289 (2019). Springer
Whitehill, J., Serpell, Z., Lin, Y.-C., Foster, A., Movellan, J.R.: The faces of engagement: automatic recognition of student engagement from facial expressions. IEEE Trans. Affect. Comput. 5(1), 86–98 (2014)
Booth, B.M., Ali, A.M., Narayanan, S.S., Bennett, I., Farag, A.A.: Toward active and unobtrusive engagement assessment of distance learners. In: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 470–476 (2017). IEEE
Kaur, A., Mustafa, A., Mehta, L., Dhall, A.: Prediction and localization of student engagement in the wild. In: 2018 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–8 (2018). IEEE
Fedotov, D., Perepelkina, O., Kazimirova, E., Konstantinova, M., Minker, W.: Multimodal approach to engagement and disengagement detection with highly imbalanced in-the-wild data. In: Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data, pp. 1–9 (2018)
Cao, Z., Simon, T., Wei, S.-E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)
Niu, X., Han, H., Zeng, J., Sun, X., Shan, S., Huang, Y., Yang, S., Chen, X.: Automatic engagement prediction with gap feature. In: Proceedings of the 20th ACM International Conference on Multimodal Interaction, pp. 599–603 (2018)
Huang, T., Mei, Y., Zhang, H., Liu, S., Yang, H.: Fine-grained engagement recognition in online learning environment. In: 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC), pp. 338–341 (2019). IEEE
D’Mello, S.K.: On the influence of an iterative affect annotation approach on inter-observer and self-observer reliability. IEEE Trans. Affect. Comput. 7(2), 136–149 (2015)
Zaletelj, J., Košir, A.: Predicting students’ attention in the classroom from kinect facial and body features. EURASIP J. Image Video Process. 2017(1), 1–12 (2017)
Ma, J., Jiang, X., Xu, S., Qin, X.: Hierarchical temporal multi-instance learning for video-based student learning engagement assessment. In: IJCAI, pp. 2782–2789 (2021)
Karumbaiah, S., Baker, R.B., Ocumpaugh, J., Andres, A.: A re-analysis and synthesis of data on affect dynamics in learning. IEEE Trans. Affect. Comput. 14(2), 1696–1710 (2021)
D’Mello, S., Graesser, A., et al.: Monitoring affective trajectories during complex learning. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 29 (2007)
d Baker, R.S., Rodrigo, M., Mercedes, T., Xolocotzin, U.E.: The dynamics of affective transitions in simulation problem-solving environments. In: International Conference on Affective Computing and Intelligent Interaction, pp. 666–677 (2007). Springer
Lebanon, G., Mao, Y., Dillon, J.: The locally weighted bag of words framework for document representation. J. Mach. Learn. Res. 8(10), 2405–2441 (2007)
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial–temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)
Bettadapura, V., Schindler, G., Plötz, T., Essa, I.: Augmenting bag-of-words: Data-driven discovery of temporal and structural information for activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2619–2626 (2013)
Govender, D., Tapamo, J.-R.: Spatio-temporal scale coded bag-of-words. Sensors 20(21), 6380 (2020)
Kook, L., Herzog, L., Hothorn, T., Dürr, O., Sick, B.: Deep and interpretable regression models for ordinal outcomes. Pattern Recogn. 122, 108263 (2022)
Ranti, C., Jones, W., Klin, A., Shultz, S.: Blink rate patterns provide a reliable measure of individual engagement with scene content. Sci. Rep. 10(1), 1–10 (2020)
Baltrusaitis, T., Zadeh, A., Lim, Y.C., Morency, L.-P.: Openface 2.0: facial behavior analysis toolkit. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 59–66 (2018). IEEE
Aslan, S., Mete, S.E., Okur, E., Oktay, E., Alyuz, N., Genc, U.E., Stanhill, D., Esme, A.A.: Human expert labeling process (help): towards a reliable higher-order user state labeling process and tool to assess student engagement. Educ. Technol. 57(1), 53–59 (2017). http://www.jstor.org/stable/44430540
Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., Zhang, F., Chang, C.-L., Yong, M.G., Lee, J., et al.: Mediapipe: a framework for building perception pipelines. arXiv preprint arXiv:1906.08172 (2019)
Toisoul, A., Kossaifi, J., Bulat, A., Tzimiropoulos, G., Pantic, M.: Estimation of continuous valence and arousal levels from faces in naturalistic conditions. Nat. Mach. Intell. 3(1), 42–50 (2021)
Mollahosseini, A., Hasani, B., Mahoor, M.H.: Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in neural information processing systems, vol. 32, pp. 1–12. Curran Associates Inc. (2019)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Khan, S.S., Mishra, P.K., Javed, N., Ye, B., Newman, K., Mihailidis, A., Iaboni, A.: Unsupervised deep learning to detect agitation from videos in people with dementia. IEEE Access 10, 10349–10358 (2022)
Lachenbruch, P.A.: Mcnemar Test. Statistics reference online, John Wiley & Sons Ltd, Wiley StatsRef (2014). https://doi.org/10.1002/9781118445112.stat04876
Deng, D., Chen, Z., Zhou, Y., Shi, B.: Mimamo net: integrating micro- and macro-motion for video emotion recognition. Proc. AAAI Conf. Artif. Intell. 34(03), 2621–2628 (2020). https://doi.org/10.1609/aaai.v34i03.5646
Rosner, B., Glynn, R.J., Lee, M.L.: The Wilcoxon signed rank test for paired comparisons of clustered data. Biometrics, Oxford University Press, 62(1), 185–192 (2006)
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Contributions
AA performed the experiments related to the proposed method and wrote the paper and revised the paper for major and minor revisions. CT collected the data and performed the experiments for the previous works. DBJ and SSK supervised and provided the dataset and computation infrastructure.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose. The authors declare no competing interests.
Additional information
Communicated by J. Gao.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Abedi, A., Thomas, C., Jayagopi, D.B. et al. Bag of states: a non-sequential approach to video-based engagement measurement. Multimedia Systems 30, 47 (2024). https://doi.org/10.1007/s00530-023-01244-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00530-023-01244-1