Bag of Deep Features for Instructor Activity Recognition in Lecture Room

  • Nudrat Nida
  • Muhammad Haroon YousafEmail author
  • Aun Irtaza
  • Sergio A. Velastin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11296)


This research aims to explore contextual visual information in the lecture room, to assist an instructor to articulate the effectiveness of the delivered lecture. The objective is to enable a self-evaluation mechanism for the instructor to improve lecture productivity by understanding their activities. Teacher’s effectiveness has a remarkable impact on uplifting students performance to make them succeed academically and professionally. Therefore, the process of lecture evaluation can significantly contribute to improve academic quality and governance. In this paper, we propose a vision-based framework to recognize the activities of the instructor for self-evaluation of the delivered lectures. The proposed approach uses motion templates of instructor activities and describes them through a Bag-of-Deep features (BoDF) representation. Deep spatio-temporal features extracted from motion templates are utilized to compile a visual vocabulary. The visual vocabulary for instructor activity recognition is quantized to optimize the learning model. A Support Vector Machine classifier is used to generate the model and predict the instructor activities. We evaluated the proposed scheme on a self-captured lecture room dataset, IAVID-1. Eight instructor activities: pointing towards the student, pointing towards board or screen, idle, interacting, sitting, walking, using a mobile phone and using a laptop, are recognized with an 85.41% accuracy. As a result, the proposed framework enables instructor activity recognition without human intervention.


Human activity recognition Instructor activity recognition Motion templates Academic quality assurance 



Sergio A Velastin has received funding from the Universidad Carlos III de Madrid, the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no 600371, el Ministerio de Economía, Industria y Competitividad (COFUND2014-51509) el Ministerio de Educación, cultura y Deporte (CEI-15-17) and Banco Santander.


  1. 1.
    Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Ijjina, E.P., Chalavadi, K.M.: Human action recognition using genetic algorithms and convolutional neural networks. Pattern Recognit. 59, 199–212 (2016)CrossRefGoogle Scholar
  3. 3.
    Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)CrossRefGoogle Scholar
  4. 4.
    Kim, H.-J., Lee, J.S., Yang, H.-S.: Human action recognition using a modified convolutional neural network. In: Liu, D., Fei, S., Hou, Z., Zhang, H., Sun, C. (eds.) ISNN 2007. LNCS, vol. 4492, pp. 715–723. Springer, Heidelberg (2007). Scholar
  5. 5.
    Knol, M.H., Dolan, C.V., Mellenbergh, G.J., van der Maas, H.L.: Measuring the quality of university lectures: development and validation of the instructional skills questionnaire (ISQ). PloS One 11(2), e0149163 (2016)CrossRefGoogle Scholar
  6. 6.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  7. 7.
    Li, W., Wen, L., Chang, M.C., Lim, S.N., Lyu, S.: Adaptive RNN tree for large-scale human action recognition. In: ICCV, pp. 1453–1461 (2017)Google Scholar
  8. 8.
    Murtaza, F., Yousaf, M.H., Velastin, S.A.: Multi-view human action recognition using 2D motion templates based on MHIS and their hog description. IET Comput. Vis. 10(7), 758–767 (2016)CrossRefGoogle Scholar
  9. 9.
    Murtaza, F., Yousaf, M.H., Velastin, S.A.: PMHI: proposals from motion history images for temporal segmentation of long uncut videos. IEEE Signal Process. Lett. 25(2), 179–183 (2018)CrossRefGoogle Scholar
  10. 10.
    Nazir, S., Yousaf, M.H., Nebel, J.C., Velastin, S.A.: A bag of expression framework for improved human action recognition. Pattern Recognit. Lett. 103, 39–45 (2018)CrossRefGoogle Scholar
  11. 11.
    Nazir, S., Yousaf, M.H., Velastin, S.A.: Evaluating a bag-of-visual features approach using spatio-temporal features for action recognition. Computers & Electrical Engineering (2018)Google Scholar
  12. 12.
    Ning, F., Delhomme, D., LeCun, Y., Piano, F., Bottou, L., Barbano, P.E.: Toward automatic phenotyping of developing embryos from videos. IEEE Trans. Image Process. 14(9), 1360–1371 (2005)CrossRefGoogle Scholar
  13. 13.
    O’Hara, S., Draper, B.A.: Introduction to the bag of features paradigm for image classification and retrieval. arXiv preprint arXiv:1101.3354 (2011)
  14. 14.
    Orrite, C., Rodriguez, M., Herrero, E., Rogez, G., Velastin, S.A.: Automatic segmentation and recognition of human actions in monocular sequences. In: 2014 22nd International Conference on Pattern Recognition (ICPR), pp. 4218–4223. IEEE (2014)Google Scholar
  15. 15.
    Raza, A., Yousaf, M.H., Sial, H.A., Raja, G.: HMM-based scheme for smart instructor activity recognition in a lecture room environment. SmartCR 5(6), 578–590 (2015)CrossRefGoogle Scholar
  16. 16.
    Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)Google Scholar
  17. 17.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  18. 18.
    Wang, Y., Mori, G.: Human action recognition by semilatent topic models. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1762–1774 (2009)CrossRefGoogle Scholar
  19. 19.
    Yousaf, M.H., Azhar, K., Sial, H.A.: A novel vision based approach for instructor’s performance and behavior analysis. In: 2015 International Conference on Communications, Signal Processing, and Their Applications (ICCSPA), pp. 1–6. IEEE (2015)Google Scholar
  20. 20.
    Yousaf, M.H., Habib, H.A., Azhar, K.: Fuzzy classification of instructor morphological features for autonomous lecture recording system. Inf. J. 16(8), 6367 (2013)Google Scholar
  21. 21.
    Zhu, F., Shao, L., Xie, J., Fang, Y.: From handcrafted to learned representations for human action recognition: a survey. Image Vis. Comput. 55, 42–52 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer EngineeringUniversity of Engineering and TechnologyTaxilaPakistan
  2. 2.Department of Computer ScienceUniversity of Engineering and TechnologyTaxilaPakistan
  3. 3.Department of Computer Science, Applied Artificial Intelligence Research GroupUniversity Carlos III de MadridMadridSpain
  4. 4.Cortexica Vision Systems Ltd.LondonUK
  5. 5.School of Electronic Engineering and Computer ScienceQueen Mary University of LondonLondonUK

Personalised recommendations