Bag of states: a non-sequential approach to video-based engagement measurement

Abedi, Ali; Thomas, Chinchu; Jayagopi, Dinesh Babu; Khan, Shehroz S.

doi:10.1007/s00530-023-01244-1

Bag of states: a non-sequential approach to video-based engagement measurement

Regular Paper
Published: 28 January 2024

Volume 30, article number 47, (2024)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Ali Abedi¹,
Chinchu Thomas²,
Dinesh Babu Jayagopi² &
…
Shehroz S. Khan¹

109 Accesses
Explore all metrics

Abstract

Automated measurement of student engagement equips educators with valuable insights, aiding them in achieving educational program objectives and customizing their approach to suit individual students. Engagement measurement requires a detailed analysis of the behavioral and affective states of students over precise timescales. A range of current techniques have engineered sequential and spatiotemporal models, including recurrent neural networks, temporal convolutional networks, three-dimensional convolutional neural networks, and transformers to measure engagement from video data. These models are trained to incorporate the sequential/temporal order of behavioral and affective states into the video analysis, outputting their level of engagement. Drawing upon the definition of engagement in educational psychology, this paper questions the necessity of incorporating the order of behavioral and affective states into engagement measurement. Non-sequential bag-of-words-based models are developed to analyze behavioral and affective features extracted from videos and output engagement levels. The non-sequential models only analyze the occurrence of behavioral and affective states not the order in which they occur. Experimental results indicate that the proposed non-sequential approach is superior to state-of-the-art sequential engagement measurement approaches. On the IIITB Online SE dataset, the proposed approach significantly improved engagement level classification accuracy by 22%, and 26%, respectively, compared to the recurrent neural network, and the temporal convolutional network. It also improved minority class recall and achieved a classification accuracy as high as 0.6658 On the DAiSEE dataset. In another experiment, models displayed consistent performance while trained on the shuffled versions of the datasets compared with those trained on the original, unshuffled datasets. In the shuffled versions, behavioral and affective states within video samples were randomly permuted. These observations reinforce the notion that the order in which affective and behavioral states occur does not impact engagement measurement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The impact of artificial intelligence on learner–instructor interaction in online learning

Article Open access 26 October 2021

Is it harmful or helpful? Examining the causes and consequences of generative AI usage among university students

Article Open access 16 February 2024

Empowering learners with ChatGPT: insights from a systematic literature exploration

Article Open access 16 April 2024

Data availability

The datasets analyzed during the current study are publicly available in the following repositories: https://people.iith.ac.in/vineethnb/resources/daisee/index.html.

References

Mukhtar, K., Javed, K., Arooj, M., Sethi, A.: Advantages, limitations and recommendations for online learning during covid-19 pandemic era. Pak. J. Med. Sci. 36(COVID19–S4), 27 (2020)
Google Scholar
Dung, D.T.H.: The advantages and disadvantages of virtual learning. IOSR J. Res. Method Educ. 10(3), 45–48 (2020)
Google Scholar
Sümer, Ö., Goldberg, P., D’Mello, S., Gerjets, P., Trautwein, U., Kasneci, E.: Multimodal engagement analysis from facial videos in the classroom. IEEE Trans. Affect. Comput. 14(2), 1012–1027 (2021)
Google Scholar
Gray, J.A., DiLoreto, M.: The effects of student engagement, student satisfaction, and perceived learning in online learning environments. Int. J. Educ. Leadership Prep. 11(1), 1 (2016)
Google Scholar
Sinatra, G.M., Heddy, B.C., Lombardi, D.: The Challenges of Defining and Measuring Student Engagement in Science. Taylor & Francis, Abingdon (2015)
Google Scholar
Woolf, B., Burleson, W., Arroyo, I., Dragon, T., Cooper, D., Picard, R.: Affect-aware tutors: recognising and responding to student affect. Int. J. Learn. Technol. 4(3/4), 129–164 (2009)
Google Scholar
D’Mello, S., Graesser, A.: Dynamics of affective states during complex learning. Learn. Instr. 22(2), 145–157 (2012)
Google Scholar
Fredricks, J., McColskey, W., Meli, J., Mordica, J., Montrosse, B., Mooney, K.: Measuring Student Engagement in Upper Elementary Through High School: A Description of 21 Instruments. Issues & answers. Rel 2011-no. 098. Regional Educational Laboratory Southeast (2011)
Nkomo, L.M., Daniel, B.K., Butson, R.J.: Synthesis of student engagement with digital technologies: a systematic review of the literature. Int. J. Educ. Technol. High. Educ. 18, 1–26 (2021)
Google Scholar
D’Mello, S., Dieterle, E., Duckworth, A.: Advanced, analytic, automated (AAA) measurement of engagement during learning. Educ. Psychol. 52(2), 104–123 (2017)
PubMed PubMed Central Google Scholar
Bosch, N.: Detecting student engagement: human versus machine. In: Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization, pp. 317–320 (2016)
Dewan, M., Murshed, M., Lin, F.: Engagement detection in online learning: a review. Smart Learn. Environ. 6(1), 1–20 (2019)
Google Scholar
Karimah, S.N., Hasegawa, S.: Automatic engagement estimation in smart education/learning settings: a systematic review of engagement definitions, datasets, and methods. Smart Learn. Environ. 9(1), 1–48 (2022)
Google Scholar
Belle, A., Hargraves, R.H., Najarian, K.: An automated optimal engagement and attention detection system using electrocardiogram. Comput. Math. Methods Med. 2012, 528781 (2012)
MathSciNet PubMed PubMed Central Google Scholar
Pugh, C.M., Hashimoto, D.A., Korndorffer, J.R., Jr.: The what? how? and who? of video based assessment. Am. J. Surg. 221(1), 13–18 (2021)
PubMed Google Scholar
Khan, S.S., Abedi, A., Colella, T.: Inconsistencies in measuring student engagement in virtual learning–a critical review. arXiv preprint arXiv:2208.04548 (2022)
Sherstinsky, A.: Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D 404, 132306 (2020)
MathSciNet Google Scholar
Chen, X., Niu, L., Veeraraghavan, A., Sabharwal, A.: Faceengage: robust estimation of gameplay engagement from user-contributed (youtube) videos. IEEE Trans. Affect. Comput. 13, 651–665 (2019)
Google Scholar
Wu, J., Yang, B., Wang, Y., Hattori, G.: Advanced multi-instance learning method with multi-features engineering and conservative optimization for engagement intensity prediction. In: Proceedings of the 2020 International Conference on Multimodal Interaction, pp. 777–783 (2020)
Ma, X., Xu, M., Dong, Y., Sun, Z.: Automatic student engagement in online learning environment based on neural turing machine. Int. J. Inf. Educ. Technol. 11(3), 107–111 (2021)
Google Scholar
Copur, O., Nakıp, M., Scardapane, S., Slowack, J.: Engagement detection with multi-task training in e-learning environments. In: International Conference on Image Analysis and Processing, pp. 411–422. Springer (2022)
Abedi, A., Khan, S.S.: Affect-driven ordinal engagement measurement from video. Multimed. Tools Appl. 11, 1–20 (2023)
Google Scholar
Abedi, A., Khan, S.: Detecting disengagement in virtual learning as an anomaly using temporal convolutional network autoencoder. Signal Image Video Process. 7, 3535–3543 (2023)
Google Scholar
Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018)
Thomas, C., Nair, N., Jayagopi, D.B.: Predicting engagement intensity in the wild using temporal convolutional network. In: Proceedings of the 20th ACM International Conference on Multimodal Interaction, pp. 604–610 (2018)
Thomas, C., Sarma, K.P., Gajula, S.S., Jayagopi, D.B.: Automatic prediction of presentation style and student engagement from videos. Comput. Educ. Artif. Intell. 3, 100079 (2022)
Google Scholar
Gupta, A., D’Cunha, A., Awasthi, K., Balasubramanian, V.: Daisee: Towards user engagement recognition in the wild. arXiv preprint arXiv:1609.01885 (2016)
Zhang, H., Xiao, X., Huang, T., Liu, S., Xia, Y., Li, J.: An novel end-to-end network for automatic student engagement recognition. In: 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC), pp. 342–345 (2019). IEEE
Abedi, A., Khan, S.S.: Improving state-of-the-art in detecting student engagement with resnet and tcn hybrid network. In: 2021 18th Conference on Robots and Vision (CRV), pp. 151–157 (2021). IEEE
Mehta, N.K., Prasad, S.S., Saurav, S., Saini, R., Singh, S.: Three-dimensional densenet self-attention neural network for automatic detection of student’s engagement. Appl. Intell. 52, 13803–13823 (2022)
Google Scholar
Ai, X., Sheng, V.S., Li, C.: Class-attention video transformer for engagement intensity prediction. arXiv preprint arXiv:2208.07216 (2022)
Galke, L., Scherp, A.: Bag-of-words vs. graph vs. sequence in text classification: Questioning the necessity of text-graphs and the surprising strength of a wide mlp. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers), pp. 4038–4051 (2022)
Wang, J., Liu, P., She, M.F., Nahavandi, S., Kouzani, A.: Bag-of-words representation for biomedical time series classification. Biomed. Signal Process. Control 8(6), 634–644 (2013)
Google Scholar
Liao, J., Liang, Y., Pan, J.: Deep facial spatiotemporal network for engagement prediction in online learning. Appl. Intell. 51(10), 6609–6621 (2021)
Google Scholar
Selim, T., Elkabani, I., Abdou, M.A.: Students engagement level detection in online e-learning using hybrid efficientnetb7 together with tcn, lstm, and bi-lstm. IEEE Access 10, 99573–99583 (2022)
Google Scholar
Hu, Y., Jiang, Z., Zhu, K.: An optimized cnn model for engagement recognition in an e-learning environment. Appl. Sci. 12(16), 8007 (2022)
CAS Google Scholar
Mohamad Nezami, O., Dras, M., Hamey, L., Richards, D., Wan, S., Paris, C.: Automatic recognition of student engagement using deep learning and facial expression. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 273–289 (2019). Springer
Whitehill, J., Serpell, Z., Lin, Y.-C., Foster, A., Movellan, J.R.: The faces of engagement: automatic recognition of student engagement from facial expressions. IEEE Trans. Affect. Comput. 5(1), 86–98 (2014)
Google Scholar
Booth, B.M., Ali, A.M., Narayanan, S.S., Bennett, I., Farag, A.A.: Toward active and unobtrusive engagement assessment of distance learners. In: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 470–476 (2017). IEEE
Kaur, A., Mustafa, A., Mehta, L., Dhall, A.: Prediction and localization of student engagement in the wild. In: 2018 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–8 (2018). IEEE
Fedotov, D., Perepelkina, O., Kazimirova, E., Konstantinova, M., Minker, W.: Multimodal approach to engagement and disengagement detection with highly imbalanced in-the-wild data. In: Proceedings of the Workshop on Modeling Cognitive Processes from Multimodal Data, pp. 1–9 (2018)
Cao, Z., Simon, T., Wei, S.-E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)
Niu, X., Han, H., Zeng, J., Sun, X., Shan, S., Huang, Y., Yang, S., Chen, X.: Automatic engagement prediction with gap feature. In: Proceedings of the 20th ACM International Conference on Multimodal Interaction, pp. 599–603 (2018)
Huang, T., Mei, Y., Zhang, H., Liu, S., Yang, H.: Fine-grained engagement recognition in online learning environment. In: 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC), pp. 338–341 (2019). IEEE
D’Mello, S.K.: On the influence of an iterative affect annotation approach on inter-observer and self-observer reliability. IEEE Trans. Affect. Comput. 7(2), 136–149 (2015)
Google Scholar
Zaletelj, J., Košir, A.: Predicting students’ attention in the classroom from kinect facial and body features. EURASIP J. Image Video Process. 2017(1), 1–12 (2017)
Google Scholar
Ma, J., Jiang, X., Xu, S., Qin, X.: Hierarchical temporal multi-instance learning for video-based student learning engagement assessment. In: IJCAI, pp. 2782–2789 (2021)
Karumbaiah, S., Baker, R.B., Ocumpaugh, J., Andres, A.: A re-analysis and synthesis of data on affect dynamics in learning. IEEE Trans. Affect. Comput. 14(2), 1696–1710 (2021)
Google Scholar
D’Mello, S., Graesser, A., et al.: Monitoring affective trajectories during complex learning. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 29 (2007)
d Baker, R.S., Rodrigo, M., Mercedes, T., Xolocotzin, U.E.: The dynamics of affective transitions in simulation problem-solving environments. In: International Conference on Affective Computing and Intelligent Interaction, pp. 666–677 (2007). Springer
Lebanon, G., Mao, Y., Dillon, J.: The locally weighted bag of words framework for document representation. J. Mach. Learn. Res. 8(10), 2405–2441 (2007)
MathSciNet Google Scholar
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial–temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)
Google Scholar
Bettadapura, V., Schindler, G., Plötz, T., Essa, I.: Augmenting bag-of-words: Data-driven discovery of temporal and structural information for activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2619–2626 (2013)
Govender, D., Tapamo, J.-R.: Spatio-temporal scale coded bag-of-words. Sensors 20(21), 6380 (2020)
ADS PubMed PubMed Central Google Scholar
Kook, L., Herzog, L., Hothorn, T., Dürr, O., Sick, B.: Deep and interpretable regression models for ordinal outcomes. Pattern Recogn. 122, 108263 (2022)
Google Scholar
Ranti, C., Jones, W., Klin, A., Shultz, S.: Blink rate patterns provide a reliable measure of individual engagement with scene content. Sci. Rep. 10(1), 1–10 (2020)
ADS Google Scholar
Baltrusaitis, T., Zadeh, A., Lim, Y.C., Morency, L.-P.: Openface 2.0: facial behavior analysis toolkit. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 59–66 (2018). IEEE
Aslan, S., Mete, S.E., Okur, E., Oktay, E., Alyuz, N., Genc, U.E., Stanhill, D., Esme, A.A.: Human expert labeling process (help): towards a reliable higher-order user state labeling process and tool to assess student engagement. Educ. Technol. 57(1), 53–59 (2017). http://www.jstor.org/stable/44430540
Google Scholar
Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., Zhang, F., Chang, C.-L., Yong, M.G., Lee, J., et al.: Mediapipe: a framework for building perception pipelines. arXiv preprint arXiv:1906.08172 (2019)
Toisoul, A., Kossaifi, J., Bulat, A., Tzimiropoulos, G., Pantic, M.: Estimation of continuous valence and arousal levels from faces in naturalistic conditions. Nat. Mach. Intell. 3(1), 42–50 (2021)
Google Scholar
Mollahosseini, A., Hasani, B., Mahoor, M.H.: Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)
Google Scholar
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in neural information processing systems, vol. 32, pp. 1–12. Curran Associates Inc. (2019)
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet Google Scholar
Khan, S.S., Mishra, P.K., Javed, N., Ye, B., Newman, K., Mihailidis, A., Iaboni, A.: Unsupervised deep learning to detect agitation from videos in people with dementia. IEEE Access 10, 10349–10358 (2022)
Google Scholar
Lachenbruch, P.A.: Mcnemar Test. Statistics reference online, John Wiley & Sons Ltd, Wiley StatsRef (2014). https://doi.org/10.1002/9781118445112.stat04876
Google Scholar
Deng, D., Chen, Z., Zhou, Y., Shi, B.: Mimamo net: integrating micro- and macro-motion for video emotion recognition. Proc. AAAI Conf. Artif. Intell. 34(03), 2621–2628 (2020). https://doi.org/10.1609/aaai.v34i03.5646
Article Google Scholar
Rosner, B., Glynn, R.J., Lee, M.L.: The Wilcoxon signed rank test for paired comparisons of clustered data. Biometrics, Oxford University Press, 62(1), 185–192 (2006)
MathSciNet PubMed Google Scholar

Download references

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

KITE Research Institute, Toronto Rehabilitation Institute, University Health Network, Toronto, Canada
Ali Abedi & Shehroz S. Khan
Multimodal Perception Lab, International Institute of Information Technology, Bangalore, India
Chinchu Thomas & Dinesh Babu Jayagopi

Authors

Ali Abedi
View author publications
You can also search for this author in PubMed Google Scholar
Chinchu Thomas
View author publications
You can also search for this author in PubMed Google Scholar
Dinesh Babu Jayagopi
View author publications
You can also search for this author in PubMed Google Scholar
Shehroz S. Khan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AA performed the experiments related to the proposed method and wrote the paper and revised the paper for major and minor revisions. CT collected the data and performed the experiments for the previous works. DBJ and SSK supervised and provided the dataset and computation infrastructure.

Corresponding author

Correspondence to Ali Abedi.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose. The authors declare no competing interests.

Additional information

Communicated by J. Gao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Abedi, A., Thomas, C., Jayagopi, D.B. et al. Bag of states: a non-sequential approach to video-based engagement measurement. Multimedia Systems 30, 47 (2024). https://doi.org/10.1007/s00530-023-01244-1

Download citation

Received: 09 March 2023
Accepted: 09 December 2023
Published: 28 January 2024
DOI: https://doi.org/10.1007/s00530-023-01244-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bag of states: a non-sequential approach to video-based engagement measurement

Abstract

Access this article

Similar content being viewed by others

The impact of artificial intelligence on learner–instructor interaction in online learning

Is it harmful or helpful? Examining the causes and consequences of generative AI usage among university students

Empowering learners with ChatGPT: insights from a systematic literature exploration

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bag of states: a non-sequential approach to video-based engagement measurement

Abstract

Access this article

Similar content being viewed by others

The impact of artificial intelligence on learner–instructor interaction in online learning

Is it harmful or helpful? Examining the causes and consequences of generative AI usage among university students

Empowering learners with ChatGPT: insights from a systematic literature exploration

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation