Abstract
With the paradigm shift of learning processes from traditional classroom to distance learning systems, the recent success of artificial intelligent research and applications, including machine learning and deep neural networks, have increasingly been leveraged to learn how to include engagement state analysis into the distance learning process. Recent automatic engagement estimations employ several modalities, such as, video, audio, and biological signals or neuro-sensing information as source input to be analyzed. In this paper, we provide a literature review of engagement estimation, including dataset, algorithms as well as the discussion and evaluation. First of all, we present the engagement datasets, including publicly available and proprietary sources built for some specific concerns. We then describe the methodology of engagement measurements that are widely used in literature and state-of-the-art algorithms to automate the estimation. The advantages and limitations of the algorithms are briefly discussed and summarized in benchmark of the used modalities and datasets. Additionally, we extend this literature review to the insight for the practical use of automatic engagement estimation in a real education process that is crucial to distance learning improvement. Finally, we review the remaining challenges for robust engagement estimation as part of attempts in improving distance learning quality and performance by taking personalized engagement into account.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alarcão, S.M., Fonseca, M.J.: Emotions recognition using EEG signals: a survey. IEEE Trans. Affect. Comput. 10(3), 374–393 (2019). https://doi.org/10.1109/TAFFC.2017.2714671
Alexander, K.L., Entwisle, D.R., Horsey, C.S.: From first grade forward: early foundations of high school dropout. Sociol. Educ. 70(2), 87–107 (1997). http://www.jstor.org/stable/2673158
Alkabbany, I., Ali, A., Farag, A., Bennett, I., Ghanoum, M., Farag, A.: Measuring student engagement level using facial information. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 3337–3341 (2019). https://doi.org/10.1109/ICIP.2019.8803590
Aung, A.M., Whitehill, J.: Harnessing label uncertainty to improve modeling: an application to student engagement recognition. In: 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2018), pp. 166–170, May 2018. https://doi.org/10.1109/FG.2018.00033
Baltrusaitis, T., Zadeh, A., Lim, Y.C., Morency, L.: Openface 2.0: facial behavior analysis toolkit. In: 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2018), pp. 59–66 (2018). https://doi.org/10.1109/FG.2018.00019
Baltrušaitis, T., Mahmoud, M., Robinson, P.: Cross-dataset learning and person-specific normalisation for automatic action unit detection. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol. 06, pp. 1–6 (2015). https://doi.org/10.1109/FG.2015.7284869
Baltrušaitis, T., Robinson, P., Morency, L.: Openface: an open source facial behavior analysis toolkit. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–10, March 2016. https://doi.org/10.1109/WACV.2016.7477553
Booth, B.M., Ali, A.M., Narayanan, S.S., Bennett, I., Farag, A.A.: Toward active and unobtrusive engagement assessment of distance learners. In: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 470–476 (2017). https://doi.org/10.1109/ACII.2017.8273641
Bosch, N.: Detecting student engagement: human versus machine. In: UMAP 2016: Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization, pp. 317–320, July 2016. https://doi.org/10.1145/2930238.2930371
Bosch, N., et al.: Detecting student emotions in computer-enabled classrooms. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, pp. 4125–4129. AAAI Press (2016)
Bradski, G.: The opencv library. Dr. Dobb’s J. Softw. Tools (2000)
Bradski, G., Kaehler, A.: Learning OpenCV: Computer vision with the OpenCV library. O’Reilly (2008)
Cao, Z., Simon, T., Wei, S., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1302–1310 (2017). https://doi.org/10.1109/CVPR.2017.143
Chang, C., Zhang, C., Chen, L., Liu, Y.: An ensemble model using face and body tracking for engagement detection. In: Proceedings of the 20th ACM International Conference on Multimodal Interaction, ICMI 2018, pp. 616–622. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3242969.3264986
Chaouachi, M., Chalfoun, P., Jraidi, I., Frasson, C.: Affect and mental engagement: Towards adaptability for intelligent. In: Proceedings of the Twenty-Third International Florida Artificial Intelligence Research Society Conference (FLAIRS 2010), pp. 355–360, January 2010
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Int. Res. 16(1), 321–357 (2002)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734. Association for Computational Linguistics, Doha, October 2014. https://doi.org/10.3115/v1/D14-1179, https://www.aclweb.org/anthology/D14-1179
Christenson, S.L., Reschly, A.L., Wylie, C.: Handbook of Research on Student Engagement. Springer, New York (2012). https://doi.org/10.1007/978-1-4614-2018-7
Cocea, M., Weibelzahl, S.: Disengagement detection in online learning: validation studies and perspectives. IEEE Trans. Learn. Technol. 4(2), 114–124 (2011). https://doi.org/10.1109/TLT.2010.14
Dewan, M.A.A., Lin, F., Wen, D., Murshed, M., Uddin, Z.: A deep learning approach to detecting engagement of online learners. In: 2018 IEEE SmartWorld, Ubiquitous Intelligence Computing, Advanced Trusted Computing, Scalable Computing Communications, Cloud Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp. 1895–1902, October 2018. https://doi.org/10.1109/SmartWorld.2018.00318
Dewan, M.A.A., Murshed, M., Lin, F.: Engagement detection in online learning: a review. Smart Learn. Environ. 6(1) (2019). https://doi.org/10.1186/s40561-018-0080-z
Dhall, A.: Emotiw 2019: automatic emotion, engagement and cohesion prediction tasks. In: 2019 International Conference on Multimodal Interaction, ICMI 2019, pp. 546–550. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3340555.3355710
Dhall, A., Kaur, A., Goecke, R., Gedeon, T.: Emotiw 2018: audio-video, student engagement and group-level affect prediction. In: Proceedings of the 20th ACM International Conference on Multimodal Interaction, ICMI 2018, pp. 653–656. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3242969.3264993
Ekman, P., Friesen, W.V.: Facial Action Coding System. Consulting Psychologists Press, Palo Alto (1978)
Freeman, F.G., Mikulka, P.J., Prinzel, L.J., Scerbo, M.W.: Evaluation of an adaptive automation system using three EEG indices with a visual tracking task. Biol. Psychol. 50(1), 61–76 (1999). https://doi.org/10.1016/S0301-0511(99)00002-2, http://www.sciencedirect.com/science/article/pii/S0301051199000022
Grafsgaard, J.F., Wiggins, J.B., Boyer, K.E., Wiebe, E.N., Lester, J.C.: Automatically recognizing facial expression: Predicting engagement and frustration. In: Proceedings of the 6th International Conference on Educational Data Mining. Memphis, Tennessee (2013)
Gudi, A., Tasli, H.E., den Uyl, T.M., Maroulis, A.: Deep learning based facs action unit occurrence and intensity estimation. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, vol. 06, pp. 1–5, May 2015. https://doi.org/10.1109/FG.2015.7284873
Gupta, A., D’Cunha, A., Awasthi, K., Balasubramanian, V.: DAiSEE: Towards user engagement recognition in the wild. arXiv preprint arXiv:1609.01885 (2018)
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006). https://doi.org/10.1162/neco.2006.18.7.1527
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Holmes, G., Donkin, A., Witten, I.H.: WEKA: a machine learning workbench. In: Proceedings of ANZIIS ’94 - Australian New Zealand Intelligent Information Systems Conference, pp. 357–361 (1994)
Jabid, T., Kabir, M., Chae, O.: Robust facial expression recognition based on local directional pattern. ETRI J. 32 (2010). https://doi.org/10.4218/etrij.10.1510.0132
Jeni, L.A., Cohn, J.F., De La Torre, F.: Facing imbalanced data-recommendations for the use of performance metrics. In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 245–251 (2013). https://doi.org/10.1109/ACII.2013.47
Kamath, A., Biswas, A., Balasubramanian, V.: A crowdsourced approach to student engagement recognition in e-learning environments. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–9, March 2016. https://doi.org/10.1109/WACV.2016.7477618
Kaur, A., Ghosh, B., Singh, N.D., Dhall, A.: Domain adaptation based topic modeling techniques for engagement estimation in the wild. In: 2019 14th IEEE International Conference on Automatic Face Gesture Recognition (FG 2019), pp. 1–6, May 2019. https://doi.org/10.1109/FG.2019.8756511
Kaur, A., Mustafa, A., Mehta, L., Dhall, A.: Prediction and localization of student engagement in the wild. In: 2018 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–8 (2018). https://doi.org/10.1109/DICTA.2018.8615851
Kipp, M.: Spatiotemporal coding in anvil. In: Proceedings of the 6th International Conference on Language Resources and Evaluation. International Conference on Language Resources and Evaluation (LREC-2008), 6th, May 28–30, Marrakech, Morocco. ELRA (2008)
Kononenko, I.: Estimating attributes: analysis and extensions of relief. In: Bergadano, F., De Raedt, L. (eds.) Machine Learning: ECML-94, pp. 171–182. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-57868-4_57
Lee, S.P., Perez, M.R., Worsley, M.B., Burgess, B.D.: Utilizing natural language processing (NLP) to evaluate engagement in project-based learning. In: 2018 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE), pp. 1146–1149 (2018). https://doi.org/10.1109/TALE.2018.8615395
Li, S., Deng, W.: Deep facial expression recognition: a survey. IEEE Trans. Affective Comput. 1 (2020)
Littlewort, G., et al.: The computer expression recognition toolbox (CERT). Face Gesture 2011, 298–305 (2011). https://doi.org/10.1109/FG.2011.5771414
Lucey, P., Cohn, J.F., Prkachin, K.M., Solomon, P.E., Matthews, I.: Painful data: the UNBC-McMaster shoulder pain expression archive database. In: 2011 IEEE International Conference on Automatic Face Gesture Recognition (FG), pp. 57–64 (2011). https://doi.org/10.1109/FG.2011.5771462
Mavadati, S.M., Mahoor, M.H., Bartlett, K., Trinh, P., Cohn, J.F.: DISFA: a spontaneous facial action intensity database. IEEE Trans. Affect. Comput. 4(2), 151–160 (2013). https://doi.org/10.1109/T-AFFC.2013.4
McKeown, G., Valstar, M., Cowie, R., Pantic, M., Schroder, M.: The semaine database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans. Affect. Comput. 3(1), 5–17 (2012). https://doi.org/10.1109/T-AFFC.2011.20
Mohamad Nezami, O., Dras, M., Hamey, L., Richards, D., Wan, S., Paris, C.: Automatic recognition of student engagement using deep learning and facial expression. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds.) Machine Learning and Knowledge Discovery in Databases, vol. 11908, pp. 273–289. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-46133-1_17
Monkaresi, H., Bosch, N., Calvo, R.A., D’Mello, S.K.: Automated detection of engagement using video-based estimation of facial expressions and heart rate. IEEE Trans. Affect. Comput. 8(1), 15–28 (2017). https://doi.org/10.1109/TAFFC.2016.2515084
Mostafa, E., Ali, A.A., Shalaby, A., Farag, A.: A facial features detector integrating holistic facial information and part-based model. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 93–99 (2015). https://doi.org/10.1109/CVPRW.2015.7301324
Murshed, M., Dewan, M.A.A., Lin, F., Wen, D.: Engagement detection in e-learning environments using convolutional neural networks. In: 2019 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), pp. 80–86 (2019). https://doi.org/10.1109/DASC/PiCom/CBDCom/CyberSciTech.2019.00028
Nakano, Y.I., Ishii, R.: Estimating user’s engagement from eye-gaze behaviors in human-agent conversations. In: Proceedings of the 15th International Conference on Intelligent User Interfaces, IUI 2010, pp. 139–148. Association for Computing Machinery, New York (2010). https://doi.org/10.1145/1719970.1719990
Nebehay, G., Pflugfelder, R.: Clustering of static-adaptive correspondences for deformable object tracking. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2784–2791 (2015). https://doi.org/10.1109/CVPR.2015.7298895
Nezami, O.M., Richards, D., Hamey, L.: Semi-supervised detection of student engagement. In: PACIS 2017 Proceedings, p. 157 (2017)
Niu, X., et al.: Automatic engagement prediction with GAP feature. In: Proceedings of the 20th ACM International Conference on Multimodal Interaction, ICMI 2018, pp. 599–603. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3242969.3264982
Pennebaker, J., Booth, R.J., Boyd, R.L., Francis, M.E.: Linguistic inquiry and word count. http://liwc.wpengine.com
Pope, A.T., Bogart, E.H., Bartolome, D.S.: Biocybernetic system evaluates indices of operator engagement in automated task. Biol. Psychol. 40(1), 187–195 (1995). https://doi.org/10.1016/0301-0511(95)05116-3, http://www.sciencedirect.com/science/article/pii/0301051195051163, eEG in Basic and Applied Settings
Psaltis, A., Apostolakis, K.C., Dimitropoulos, K., Daras, P.: Multimodal student engagement recognition in prosocial games. IEEE Trans. Games 10(3), 292–303 (2018). https://doi.org/10.1109/TCIAIG.2017.2743341
Psaltis, A., et al.: Multimodal affective state recognition in serious games applications. In: 2016 IEEE International Conference on Imaging Systems and Techniques (IST), pp. 435–439 (2016). https://doi.org/10.1109/IST.2016.7738265
Ramya, R., Mala, K., Sindhuja, C.: Student engagement identification based on facial expression analysis using 3D video/image of students. TAGA J. 14, 2446–2454 (2018)
Reeve, J., Tseng, C.M.: Agency as fourth aspect of students’ engagement during learning activities. Contemp. Educ. Psychol. 36(4), 257–267 (2011)
Ross, D.A., Lim, J., Lin, R.S., Yang, M.H.: Incremental learning for robust visual tracking. Int. J. Comput. Vision 77, 125–141 (2008)
Sanghvi, J., Castellano, G., Leite, I., Pereira, A., McOwan, P.W., Paiva, A.: Automatic analysis of affective postures and body motion to detect engagement with a game companion. In: 2011 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 305–311, March 2011. https://doi.org/10.1145/1957656.1957781
Savran, A., et al.: Bosphorus database for 3D face analysis. In: Schouten, B., Juul, N.C., Drygajlo, A., Tistarelli, M. (eds.) Biometrics and Identity Management, pp. 47–56. Springer, Heidelberg (2008)
Singh, N.D., Dhall, A.: Clustering and learning from imbalanced data. arXiv:1811.00972v2 (2018)
Team, D.: About DAiSEE. https://iith.ac.in/~daisee-dataset/
Thomas, C., Jayagopi, D.B.: Predicting student engagement in classrooms using facial behavioral cues. In: Proceedings of the 1st ACM SIGCHI International Workshop on Multimodal Interaction for Education, MIE 2017, pp. 33–40. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3139513.3139514
Tofighi, G., Gu, H., Raahemifar, K.: Vision-based engagement detection in virtual reality. In: 2016 Digital Media Industry Academic Forum (DMIAF), pp. 202–206 (2016). https://doi.org/10.1109/DMIAF.2016.7574933
Valstar, M.F., Jiang, B., Mehu, M., Pantic, M., Scherer, K.: The first facial expression recognition and analysis challenge. In: 2011 IEEE International Conference on Automatic Face Gesture Recognition (FG), pp. 921–926 (2011). https://doi.org/10.1109/FG.2011.5771374
Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vision 57(2), 137–154 (2004). https://doi.org/10.1023/B:VISI.0000013087.49260.fb
Whitehill, J., Serpell, Z., Lin, Y., Foster, A., Movellan, J.R.: The faces of engagement: automatic recognition of student engagement from facial expressions. IEEE Trans. Affect. Comput. 5(1), 86–98 (2014). https://doi.org/10.1109/TAFFC.2014.2316163
Witten, I., Frank, E.: Morgan Kaufmann/Elsevier, New York (2000)
Witten, I., Frank, E.: Morgan Kaufmann/Elsevier, New York, USA (2005)
Wolf, L., Hassner, T., Maoz, I.: Face recognition in unconstrained videos with matched background similarity. CVPR 2011, 529–534 (2011). https://doi.org/10.1109/CVPR.2011.5995566
Zhang, C., Chang, C., Chen, L., Liu, Y.: Online privacy-safe engagement tracking system. In: Proceedings of the 20th ACM International Conference on Multimodal Interaction, ICMI 2018, pp. 553–554. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3242969.3266295
Zhang, X., et al.: BP4D-spontaneous: a high-resolution spontaneous 3D dynamic facial expression database. Image Vis. Comput. 32(10), 692–706 (2014). https://doi.org/10.1016/j.imavis.2014.06.002, http://www.sciencedirect.com/science/article/pii/S0262885614001012, best of Automatic Face and Gesture Recognition 2013
Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 915–928 (2007). https://doi.org/10.1109/TPAMI.2007.1110
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Karimah, S.N., Hasegawa, S. (2021). Automatic Engagement Recognition for Distance Learning Systems: A Literature Study of Engagement Datasets and Methods. In: Schmorrow, D.D., Fidopiastis, C.M. (eds) Augmented Cognition. HCII 2021. Lecture Notes in Computer Science(), vol 12776. Springer, Cham. https://doi.org/10.1007/978-3-030-78114-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-78114-9_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78113-2
Online ISBN: 978-3-030-78114-9
eBook Packages: Computer ScienceComputer Science (R0)