Skip to main content

Advertisement

Log in

Towards robust automatic affective classification of images using facial expressions for practical applications

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Affect is an important feature of multimedia content and conveys valuable information for multimedia indexing and retrieval. Most existing studies for affective content analysis are limited to low-level features or mid-level representations, and are generally criticized for their incapacity to address the gap between low-level features and high-level human affective perception. The facial expressions of subjects in images carry important semantic information that can substantially influence human affective perception, but have been seldom investigated for affective classification of facial images towards practical applications. This paper presents an automatic image emotion detector (IED) for affective classification of practical (or non-laboratory) data using facial expressions, where a lot of “real-world” challenges are present, including pose, illumination, and size variations etc. The proposed method is novel, with its framework designed specifically to overcome these challenges using multi-view versions of face and fiducial point detectors, and a combination of point-based texture and geometry. Performance comparisons of several key parameters of relevant algorithms are conducted to explore the optimum parameters for high accuracy and fast computation speed. A comprehensive set of experiments with existing and new datasets, shows that the method is effective despite pose variations, fast, and appropriate for large-scale data, and as accurate as the method with state-of-the-art performance on laboratory-based data. The proposed method was also applied to affective classification of images from the British Broadcast Corporation (BBC) in a task typical for a practical application providing some valuable insights.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. http://www.youtube.com/yt/press/statistics.html.

  2. The dataset is not freely available, academic institutions interested in working with it should contact jana.eggink@bbc.co.uk, license agreements might be available for collaborative work between the BBC and individual universities.

References

  1. Acar E, Hopfgartner F, Albayrak S (2014) Understanding Affective Content of Music Videos through Learned Representations. In: Gurrin C, Hopfgartner F, Hurst W, Johansen H, Lee H, O’Connor N (eds) MultiMedia Modeling, vol 8325. Lecture Notes in Computer Science. Springer International Publishing, pp 303-314. doi:10.1007/978-3-319-04114-8_26

  2. An L, Yang S, Bhanu B (2015) Efficient smile detection by extreme learning machine. Neurocomputing 149, Part A (0):354-363. doi:http://dx.doi.org/10.1016/j.neucom.2014.04.072

  3. Anisetti M, Bellandi V (2009) Emotional state inference using face related features. In: Damiani E, Jeong J, Howlett R, Jain L (eds) New directions in intelligent interactive multimedia systems and services - 2, vol 226. studies in computational intelligence. Springer, Berlin, pp 401–411. doi:10.1007/978-3-642-02937-0_37

    Google Scholar 

  4. Anisetti M, Bellandi V, Damiani E, Arnone L, Rat B (2008) A3FD: accurate 3D face detection. In: Damiani E, Yétongnon K, Schelkens P, Dipanda A, Legrand L, Chbeir R (eds) Signal processing for image enhancement and multimedia processing vol 31, multimedia systems and applications series. Springer, US, pp 155–165. doi:10.1007/978-0-387-72500-0_14

    Chapter  Google Scholar 

  5. Anisetti M, Bellandi V, Damiani E, Beverina F 3D Expressive Face Model-based Tracking Algorithm. In: Signal Processing, Pattern Recognition, and Applications, Innsbruck, 2006. pp 111-116

  6. Ashraf AB, Lucey S, Cohn JF, Chen T, Ambadar Z, Prkachin KM, Solomon PE (2009) The painful face - pain expression recognition using active appearance models. Image Vis Comput 27(12):1788–1796

    Article  Google Scholar 

  7. Bianchi-Berthouze N (2003) K-DIME: an affective image filtering system. Multimed IEEE 10(3):103–106

    Article  Google Scholar 

  8. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  9. Caifeng S (2012) Smile detection by boosting pixel differences. Imag Process IEEE Trans 21(1):431–436. doi:10.1109/tip.2011.2161587

    Article  MathSciNet  Google Scholar 

  10. Canini L, Benini S, Leonardi R (2013) Affective recommendation of movies based on selected connotative features. Circ Syst Video Technol IEEE Trans 23(4):636–647. doi:10.1109/TCSVT.2012.2211935

    Article  Google Scholar 

  11. Caridakis G, Karpouzis K, Wallace M, Kessous L, Amir N (2010) Multimodal user’s affective state analysis in naturalistic interaction. J Multimod User Interf 3(1):49–66. doi:10.1007/s12193-009-0030-8

    Article  Google Scholar 

  12. Chang H, Haizhou A, Yuan L, Shihong L (2007) High-performance rotation invariant multiview face detection. Patt Anal Mach Intell IEEE Trans 29(4):671–686

    Article  Google Scholar 

  13. Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/. Accessed 19 Feb 2015

  14. Chew SW, Lucey P, Lucey S, Saragih J, Cohn JF, Matthews I, Sridharan S (2012) In the pursuit of effective affective computing: the relationship between features and registration. Syst Man Cybernet B Cybernet IEEE Trans 42(4):1006–1016. doi:10.1109/TSMCB.2012.2194485

    Article  Google Scholar 

  15. Cootes TF, Taylor CJ, Cooper DH, Graham J (1995) Active shape models-their training and application. Comput Vis Image Underst 61(1):38–59

    Article  Google Scholar 

  16. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  17. Danisman T, Bilasco IM, Martinet J, Djeraba C (2013) Intelligent pixels of interest selection with application to facial expression recognition using multilayer perceptron. Signal Process 93(6):1547–1556. doi:10.1016/j.sigpro.2012.08.007

    Article  Google Scholar 

  18. Dhall A, Goecke R, Lucey S, Gedeon T Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In: Computer Vision Workshops (ICCV Workshops), 2011 I.E. International Conference on, 6-13 Nov. 2011. pp 2106-2112

  19. Ekman P (1994) Strong evidence for universals in facial expressions - a reply to Russells mistaken critique. Psychol Bull 115(2):268–287

    Article  Google Scholar 

  20. Ekman P, Friesen W (1978) The facial action coding system: a technique for the measurement of facial movement. Consulting Psychologists Press, Palo Alto, pp 274–280

    Google Scholar 

  21. Fei-Fei L, Perona P A Bayesian hierarchical model for learning natural scene categories. In: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, 20-25 June 2005 2005. pp 524-531 vol. 522. doi:10.1109/CVPR.2005.16

  22. Feng X, Lai Y, Mao X, Peng J, Jiang X, Hadid A (2013) Extracting local binary patterns from image key points: application to automatic facial expression recognition. In: Kämäräinen J-K, Koskela M (eds) Image analysis, vol 7944. lecture notes in computer science. Springer, Berlin, pp 339–348. doi:10.1007/978-3-642-38886-6_33

    Google Scholar 

  23. Han D, Li W, Li Z (2008) Semantic image classification using statistical local spatial relations model. Multimed Tools Appl 39(2):169–188. doi:10.1007/s11042-008-0203-6

    Article  Google Scholar 

  24. Hanchuan P, Fuhui L, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. Patt Anal Mach Intell IEEE Trans 27(8):1226–1238

    Article  Google Scholar 

  25. Hanjalic A (2006) Extracting moods from pictures and sounds: towards truly personalized TV. Signal Process Mag IEEE 23(2):90–100

    Article  Google Scholar 

  26. Hanjalic A, Li-Qun X (2005) Affective video content representation and modeling. Multimed IEEE Trans 7(1):143–154

    Article  Google Scholar 

  27. Hao T, Huang TS (2008) 3D facial expression recognition based on automatically selected features. In: computer vision and pattern recognition workshops, 2008. CVPRW ’08. IEEE Computer Society Conference on pp 1-8

  28. Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196. doi:10.1023/A:1007617005950

    Article  MATH  Google Scholar 

  29. Ionescu B, Schluter J, Mironica I, Schedl M A naive mid-level concept-based fusion approach to violence detection in Hollywood movies. In: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval, Dallas, Texas, USA, 2013. ACM, 2461502, pp 215-222. doi:10.1145/2461466.2461502

  30. Jana M, Allan H Affective image classification using features inspired by psychology and art theory. In: Proceedings of the international conference on Multimedia, Firenze, Italy, 2010. ACM, pp 83-92. doi:10.1145/1873951.1873965

  31. Joonwhoan L, EunJong P (2011) Fuzzy similarity-based emotional classification of color images. Multimedia IEEE Trans 13(5):1031–1039

    Article  Google Scholar 

  32. Kotsia I, Zafeiriou S, Pitas I (2008) Texture and shape information fusion for facial expression and facial action unit recognition. Pattern Recogn 41(3):833–851

    Article  MATH  Google Scholar 

  33. Lajevardi S, Hussain Z (2011) Automatic facial expression recognition: feature extraction and selection. Signal Imag Video Process:1-11. doi:10.1007/s11760-010-0177-5

  34. Li S, Zhu L, Zhang Z, Blake A, Zhang H, Shum H (2002) Statistical learning of multi-view face detection. In: computer vision — ECCV 2002. pp 117-121

  35. Liu N, Dellandréa E, Tellez B, Chen L (2011) Associating textual features with visual ones to improve affective image classification. In: International Conference on affective computing and intelligent interaction (ACII2011), vol 6974. Lecture notes in computer science. Springer Berlin / Heidelberg, pp 195-204. doi:10.1007/978-3-642-24600-5_23

  36. Liu M, Li S, Shan S, Chen X (2013) Enhancing expression recognition in the wild with unlabeled reference data. In: Lee K, Matsushita Y, Rehg J, Hu Z (eds) Computer vision – ACCV 2012, vol 7725. lecture notes in computer science. Springer, Berlin, pp 577–588. doi:10.1007/978-3-642-37444-9_45

    Google Scholar 

  37. Maja P, Nicu S, Jeffrey FC, Thomas H (2005) Affective multimodal human-computer interaction. Paper presented at the Proceedings of the 13th annual ACM international conference on Multimedia, Hilton, Singapore

  38. Mehrabian A (1968) Communication without words. Psychol Today 2(9):52–55

    Google Scholar 

  39. Michela D, Pamela Z, Giulia B, Liliana A Emotion based classification of natural images. In: Proceedings of the 2011 international workshop on Detecting and Exploiting Cultural diversity on the social web, Glasgow, Scotland, UK, 2011. ACM, pp 17-22. doi:10.1145/2064448.2064470

  40. Milborrow S, Nicolls F (2008) Locating facial features with an extended active shape model. In: Forsyth D, Torr P, Zisserman A (eds) Computer vision – ECCV 2008, vol 5305. lecture notes in computer science. Springer, Berlin, pp 504–513. doi:10.1007/978-3-540-88693-8_37

    Google Scholar 

  41. Mingli S, Dacheng T, Zicheng L, Xuelong L, Mengchu Z (2010) Image ratio features for facial expression recognition application. Syst Man Cybernet B Cybernet IEEE Trans 40(3):779–788

    Article  Google Scholar 

  42. Mita T, Kaneko T, Hori O Joint Haar-like features for face detection. In: Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, 2005. pp 1619-1626 Vol. 1612

  43. Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Patt Anal Mach Intell IEEE Trans 24(7):971–987

    Article  MATH  Google Scholar 

  44. Pandzic IS, Forchheimer R (2002) MPEG-4 facial animation: the standard, implementation and applications. Wiley

  45. Panning A, Al-Hamadi A, Niese R, Michaelis B (2008) Facial expression recognition based on Haar-like feature detection. Patt Recog Imag Anal 18(3):447–452

    Article  Google Scholar 

  46. Peng W, Kohler C, Barrett F, Gur R, Verma R (2007) Quantifying facial expression abnormality in schizophrenia by combining 2D and 3D features. In: Computer vision and pattern recognition, 2007. CVPR ’07. IEEE Conference on. pp 1-8

  47. Rudovic O, Pantic M, Patras I (2013) Coupled Gaussian processes for pose-invariant facial expression recognition. Patt Anal Mach Intell IEEE Trans 35(6):1357–1369. doi:10.1109/tpami.2012.233

    Article  Google Scholar 

  48. Russell JA (1980) A circumplex model of affect. J Pers Soc Psychol 39(6):1161–1178

    Article  Google Scholar 

  49. Shan C, Gritti T (2008) Learning discriminative lbp-histogram bins for facial expression recognition. In: Proc. British Machine Vision Conference

  50. Shan H, Shangfei W, Yanpeng L (2011) Spontaneous facial expression recognition based on feature point tracking. In: Image and graphics (ICIG), Sixth International Conference on, 12-15 Aug. 2011. pp 760-765

  51. Shangfei W, Zhilei L, Siliang L, Yanpeng L, Guobing W, Peng P, Fei C, Xufa W (2010) A natural visible and infrared facial expression database for expression recognition and emotion inference. Multimed IEEE Trans 12(7):682–691

    Article  Google Scholar 

  52. Sung J, Kim D (2008) Pose-robust facial expression recognition using view-based 2D + 3D AAM. Syst Man Cybernet A Syst Humans IEEE Trans 38(4):852–866

    Article  Google Scholar 

  53. Tariq U, Kai-Hsiang L, Zhen L, Xi Z, Zhaowen W, Vuong L, Huang TS, Xutao L, Han TX Emotion recognition from an ensemble of features. In: automatic face & gesture recognition and workshops (FG 2011), 2011 I.E. International Conference on, 21-25 March 2011 2011. pp 872-877. doi:10.1109/FG.2011.5771365

  54. Tsalakanidou F, Malassiotis S (2010) Real-time 2D + 3D facial action and expression recognition. Pattern Recogn 43(5):1763–1775

    Article  Google Scholar 

  55. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154

    Article  Google Scholar 

  56. Whitehill J, Littlewort G, Fasel I, Bartlett M, Movellan J (2009) Toward practical smile detection. Patt Anal Mach Intell IEEE Trans 31(11):2106–2111

    Article  Google Scholar 

  57. Wu Y, Ji Q (2014) Discriminative deep face shape model for facial point detection. Int J Comput Vision:1-17. doi:10.1007/s11263-014-0775-8

  58. Xiangxin Z, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: Computer Vision and Pattern Recognition (CVPR), 2012 I.E. Conference on, 16-21 June 2012 pp 2879-2886. doi:10.1109/CVPR.2012.6248014

  59. Xie X, Lam K-M (2009) Facial expression recognition based on shape and texture. Pattern Recogn 42(5):1003–1011

    Article  Google Scholar 

  60. Xu M, Wang J, He X, Jin J, Luo S, Lu H (2012) A three-level framework for affective content analysis and its case studies. Multimedia Tools and Applications:1-23. doi:10.1007/s11042-012-1046-8

  61. Yongmian Z, Qiang J (2005) Active and dynamic information fusion for facial expression understanding from image sequences. Patt Anal Mach Intell IEEE Trans 27(5):699–714

    Article  Google Scholar 

  62. Zeng Z, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. Pattern Anal Machine Intell IEEE Trans 31(1):39–58

    Article  Google Scholar 

  63. Zhang L, Tjondronegoro D, Chandran V (2011) Evaluation of texture and geometry for dimensional facial expression recognition. In: digital image computing techniques and applications (DICTA), 2011 International Conference on, 6-8 Dec. 2011 pp 620-626

  64. Zhang L, Tjondronegoro D, Chandran V (2012) Discovering the best feature extraction and selection algorithms for spontaneous facial expression recognition. In: 2012 I.E. International Conference on Multimedia & Expo (ICME 2012), pp 1027-1032

  65. Zhang L, Tjondronegoro D, Chandran V (2014) Facial expression recognition experiments with data from television broadcasts and the World Wide Web. Image Vis Comput 32(2):107–119. doi:10.1016/j.imavis.2013.12.008

    Article  Google Scholar 

  66. Zhang L, Tjondronegoro D, Chandran V (2014) Representation of facial expression categories in continuous arousal–valence space: feature and correlation. Image Vis Comput 32(12):1067–1079. doi:10.1016/j.imavis.2014.09.005

    Article  Google Scholar 

  67. Zhang C, Zhang Z (2010) A survey of recent advances in face detection. technical report, microsoft research

  68. Zhaoyu W, Shangfei W Spontaneous facial expression recognition by using feature-level fusion of visible and thermal infrared images. In: Machine Learning for Signal Processing (MLSP), 2011 I.E. International Workshop on. pp 1-6

  69. Zhengyou Z, Lyons M, Schuster M, Akamatsu S Comparison between geometry-based and Gabor-wavelets-based facial expression recognition using multi-layer perceptron. In: Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on, 1998. pp 454-459

  70. Zisheng L, Jun-ichi I, Kaneko M Facial-component-based bag of words and PHOG descriptor for facial expression recognition. In: Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Conference on, 11-14 Oct. 2009 2009. pp 1353-1358

Download references

Acknowledgments

This work is funded by the British Broadcast Corporation, Australian Smart Services CRC, and the National Natural Science Foundation of China (Grant No. 61402362, 61402363).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ligang Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Tjondronegoro, D., Chandran, V. et al. Towards robust automatic affective classification of images using facial expressions for practical applications. Multimed Tools Appl 75, 4669–4695 (2016). https://doi.org/10.1007/s11042-015-2497-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-2497-5

Keywords

Navigation