Emotional Video Scene Retrieval Using Multilayer Convolutional Network

  • Hiroki NomiyaEmail author
  • Shota Sakaue
  • Mitsuaki Maeda
  • Teruhisa Hochin
Part of the Studies in Computational Intelligence book series (SCI, volume 695)


In order to retrieve impressive scene from a video database, a scene retrieval method based on facial expression recognition (FER) is proposed. The proposed method will be useful to retrieve interesting scenes from lifelog videos. When an impressive event occurs, a certain facial expression will be observed in a person in the video. It is, therefore, important for the impressive scene retrieval to precisely recognize the facial expression of the person. In this paper, we try to construct accurate FER models by introducing a learning framework on the basis of multilayer convolutional network using a number of facial features defined as the positional relations between some facial feature points. The effectiveness of the proposed method is evaluated through an experiment to retrieve emotional scenes from a lifelog video database.


Facial Expression Facial Feature Facial Expression Recognition Video Database Bias Vector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This research is supported by Japan Society for the Promotion of Science, Grant-in-Aid for Young Scientists (B), 15K15993.


  1. 1.
    T. Datchakorn, T. Yamasaki, and K. Aizawa, “Practical Experience Recording and Indexing of Life Log Video,” Proc. of the 2nd ACM Workshop on Continuous Archival and Retrieval of Personal Experiences, pp. 61–66, 2005.Google Scholar
  2. 2.
    D. Datcu and L. Rothkrantz, “Facial Expression Recognition in Still Pictures and Videos Using Active Appearance Models: A Comparison Approach,” Proc. of the 2007 International Conference on Computer Systems and Technologies, pp. 1–6, 2007.Google Scholar
  3. 3.
    G. Fanelli, A. Yao, P.-L. Noel, J. Gall, and L. V. Gool, “Hough Forest-based Facial Expression Recognition from Video Sequences,” Proc. of the 11th European Conference on Trends and Topics in Computer Vision, pp. 195–206, 2010.Google Scholar
  4. 4.
    H. Nomiya and T. Hochin, “Emotional Scene Retrieval from Lifelog Videos Using Evolutionary Feature Creation,” Studies in Computational Intelligence, Vol. 612, pp. 61–75, 2015.Google Scholar
  5. 5.
    Y. LeCun and Y. Bengio, “Convolutional Networks for Images, Speech, and Time Series,” The Handbook of Brain Theory and Neural Networks, MIT Press, pp. 255–258, 1998.Google Scholar
  6. 6.
    Y. LeCun, Y. Bengio, and G. Hinton, “Deep Learning,” Nature Vol. 521, pp. 436–444, 2015.Google Scholar
  7. 7.
    T. Kanade, J. F. Cohn, and Y. Tian, “Comprehensive Database for Facial Expression Analysis,” Proc. of the 4th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 46–53, 2000.Google Scholar
  8. 8.
    Luxand Inc., Luxand FaceSDK 4.0, [September 11, 2016] (current version is 6.1).
  9. 9.
    H. Nomiya, S. Sakaue, and T. Hochin, “Recognition and Intensity Estimation of Facial Expression Using Ensemble Classifiers,” Proc. of 15th International Conference on Computer and Information Science, pp. 825–830, 2016.Google Scholar
  10. 10.
    N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” Journal of Machine Learning Research, Vol. 15, No. 1, pp. 1929–1958, 2014.Google Scholar
  11. 11.
    D. Kingma, J. Ba, “Adam: A Method for Stochastic Optimization,” Proc. of the 3rd International Conference for Learning Representations, arXiv:1412.6980, 2015.

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Hiroki Nomiya
    • 1
    Email author
  • Shota Sakaue
    • 1
  • Mitsuaki Maeda
    • 1
  • Teruhisa Hochin
    • 1
  1. 1.Department of Information ScienceKyoto Institute of TechnologyKyotoJapan

Personalised recommendations