Abstract
Affect is an important feature of multimedia content and conveys valuable information for multimedia indexing and retrieval. Most existing studies for affective content analysis are limited to low-level features or mid-level representations, and are generally criticized for their incapacity to address the gap between low-level features and high-level human affective perception. The facial expressions of subjects in images carry important semantic information that can substantially influence human affective perception, but have been seldom investigated for affective classification of facial images towards practical applications. This paper presents an automatic image emotion detector (IED) for affective classification of practical (or non-laboratory) data using facial expressions, where a lot of “real-world” challenges are present, including pose, illumination, and size variations etc. The proposed method is novel, with its framework designed specifically to overcome these challenges using multi-view versions of face and fiducial point detectors, and a combination of point-based texture and geometry. Performance comparisons of several key parameters of relevant algorithms are conducted to explore the optimum parameters for high accuracy and fast computation speed. A comprehensive set of experiments with existing and new datasets, shows that the method is effective despite pose variations, fast, and appropriate for large-scale data, and as accurate as the method with state-of-the-art performance on laboratory-based data. The proposed method was also applied to affective classification of images from the British Broadcast Corporation (BBC) in a task typical for a practical application providing some valuable insights.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The dataset is not freely available, academic institutions interested in working with it should contact jana.eggink@bbc.co.uk, license agreements might be available for collaborative work between the BBC and individual universities.
References
Acar E, Hopfgartner F, Albayrak S (2014) Understanding Affective Content of Music Videos through Learned Representations. In: Gurrin C, Hopfgartner F, Hurst W, Johansen H, Lee H, O’Connor N (eds) MultiMedia Modeling, vol 8325. Lecture Notes in Computer Science. Springer International Publishing, pp 303-314. doi:10.1007/978-3-319-04114-8_26
An L, Yang S, Bhanu B (2015) Efficient smile detection by extreme learning machine. Neurocomputing 149, Part A (0):354-363. doi:http://dx.doi.org/10.1016/j.neucom.2014.04.072
Anisetti M, Bellandi V (2009) Emotional state inference using face related features. In: Damiani E, Jeong J, Howlett R, Jain L (eds) New directions in intelligent interactive multimedia systems and services - 2, vol 226. studies in computational intelligence. Springer, Berlin, pp 401–411. doi:10.1007/978-3-642-02937-0_37
Anisetti M, Bellandi V, Damiani E, Arnone L, Rat B (2008) A3FD: accurate 3D face detection. In: Damiani E, Yétongnon K, Schelkens P, Dipanda A, Legrand L, Chbeir R (eds) Signal processing for image enhancement and multimedia processing vol 31, multimedia systems and applications series. Springer, US, pp 155–165. doi:10.1007/978-0-387-72500-0_14
Anisetti M, Bellandi V, Damiani E, Beverina F 3D Expressive Face Model-based Tracking Algorithm. In: Signal Processing, Pattern Recognition, and Applications, Innsbruck, 2006. pp 111-116
Ashraf AB, Lucey S, Cohn JF, Chen T, Ambadar Z, Prkachin KM, Solomon PE (2009) The painful face - pain expression recognition using active appearance models. Image Vis Comput 27(12):1788–1796
Bianchi-Berthouze N (2003) K-DIME: an affective image filtering system. Multimed IEEE 10(3):103–106
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Caifeng S (2012) Smile detection by boosting pixel differences. Imag Process IEEE Trans 21(1):431–436. doi:10.1109/tip.2011.2161587
Canini L, Benini S, Leonardi R (2013) Affective recommendation of movies based on selected connotative features. Circ Syst Video Technol IEEE Trans 23(4):636–647. doi:10.1109/TCSVT.2012.2211935
Caridakis G, Karpouzis K, Wallace M, Kessous L, Amir N (2010) Multimodal user’s affective state analysis in naturalistic interaction. J Multimod User Interf 3(1):49–66. doi:10.1007/s12193-009-0030-8
Chang H, Haizhou A, Yuan L, Shihong L (2007) High-performance rotation invariant multiview face detection. Patt Anal Mach Intell IEEE Trans 29(4):671–686
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/. Accessed 19 Feb 2015
Chew SW, Lucey P, Lucey S, Saragih J, Cohn JF, Matthews I, Sridharan S (2012) In the pursuit of effective affective computing: the relationship between features and registration. Syst Man Cybernet B Cybernet IEEE Trans 42(4):1006–1016. doi:10.1109/TSMCB.2012.2194485
Cootes TF, Taylor CJ, Cooper DH, Graham J (1995) Active shape models-their training and application. Comput Vis Image Underst 61(1):38–59
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Danisman T, Bilasco IM, Martinet J, Djeraba C (2013) Intelligent pixels of interest selection with application to facial expression recognition using multilayer perceptron. Signal Process 93(6):1547–1556. doi:10.1016/j.sigpro.2012.08.007
Dhall A, Goecke R, Lucey S, Gedeon T Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In: Computer Vision Workshops (ICCV Workshops), 2011 I.E. International Conference on, 6-13 Nov. 2011. pp 2106-2112
Ekman P (1994) Strong evidence for universals in facial expressions - a reply to Russells mistaken critique. Psychol Bull 115(2):268–287
Ekman P, Friesen W (1978) The facial action coding system: a technique for the measurement of facial movement. Consulting Psychologists Press, Palo Alto, pp 274–280
Fei-Fei L, Perona P A Bayesian hierarchical model for learning natural scene categories. In: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, 20-25 June 2005 2005. pp 524-531 vol. 522. doi:10.1109/CVPR.2005.16
Feng X, Lai Y, Mao X, Peng J, Jiang X, Hadid A (2013) Extracting local binary patterns from image key points: application to automatic facial expression recognition. In: Kämäräinen J-K, Koskela M (eds) Image analysis, vol 7944. lecture notes in computer science. Springer, Berlin, pp 339–348. doi:10.1007/978-3-642-38886-6_33
Han D, Li W, Li Z (2008) Semantic image classification using statistical local spatial relations model. Multimed Tools Appl 39(2):169–188. doi:10.1007/s11042-008-0203-6
Hanchuan P, Fuhui L, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. Patt Anal Mach Intell IEEE Trans 27(8):1226–1238
Hanjalic A (2006) Extracting moods from pictures and sounds: towards truly personalized TV. Signal Process Mag IEEE 23(2):90–100
Hanjalic A, Li-Qun X (2005) Affective video content representation and modeling. Multimed IEEE Trans 7(1):143–154
Hao T, Huang TS (2008) 3D facial expression recognition based on automatically selected features. In: computer vision and pattern recognition workshops, 2008. CVPRW ’08. IEEE Computer Society Conference on pp 1-8
Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196. doi:10.1023/A:1007617005950
Ionescu B, Schluter J, Mironica I, Schedl M A naive mid-level concept-based fusion approach to violence detection in Hollywood movies. In: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval, Dallas, Texas, USA, 2013. ACM, 2461502, pp 215-222. doi:10.1145/2461466.2461502
Jana M, Allan H Affective image classification using features inspired by psychology and art theory. In: Proceedings of the international conference on Multimedia, Firenze, Italy, 2010. ACM, pp 83-92. doi:10.1145/1873951.1873965
Joonwhoan L, EunJong P (2011) Fuzzy similarity-based emotional classification of color images. Multimedia IEEE Trans 13(5):1031–1039
Kotsia I, Zafeiriou S, Pitas I (2008) Texture and shape information fusion for facial expression and facial action unit recognition. Pattern Recogn 41(3):833–851
Lajevardi S, Hussain Z (2011) Automatic facial expression recognition: feature extraction and selection. Signal Imag Video Process:1-11. doi:10.1007/s11760-010-0177-5
Li S, Zhu L, Zhang Z, Blake A, Zhang H, Shum H (2002) Statistical learning of multi-view face detection. In: computer vision — ECCV 2002. pp 117-121
Liu N, Dellandréa E, Tellez B, Chen L (2011) Associating textual features with visual ones to improve affective image classification. In: International Conference on affective computing and intelligent interaction (ACII2011), vol 6974. Lecture notes in computer science. Springer Berlin / Heidelberg, pp 195-204. doi:10.1007/978-3-642-24600-5_23
Liu M, Li S, Shan S, Chen X (2013) Enhancing expression recognition in the wild with unlabeled reference data. In: Lee K, Matsushita Y, Rehg J, Hu Z (eds) Computer vision – ACCV 2012, vol 7725. lecture notes in computer science. Springer, Berlin, pp 577–588. doi:10.1007/978-3-642-37444-9_45
Maja P, Nicu S, Jeffrey FC, Thomas H (2005) Affective multimodal human-computer interaction. Paper presented at the Proceedings of the 13th annual ACM international conference on Multimedia, Hilton, Singapore
Mehrabian A (1968) Communication without words. Psychol Today 2(9):52–55
Michela D, Pamela Z, Giulia B, Liliana A Emotion based classification of natural images. In: Proceedings of the 2011 international workshop on Detecting and Exploiting Cultural diversity on the social web, Glasgow, Scotland, UK, 2011. ACM, pp 17-22. doi:10.1145/2064448.2064470
Milborrow S, Nicolls F (2008) Locating facial features with an extended active shape model. In: Forsyth D, Torr P, Zisserman A (eds) Computer vision – ECCV 2008, vol 5305. lecture notes in computer science. Springer, Berlin, pp 504–513. doi:10.1007/978-3-540-88693-8_37
Mingli S, Dacheng T, Zicheng L, Xuelong L, Mengchu Z (2010) Image ratio features for facial expression recognition application. Syst Man Cybernet B Cybernet IEEE Trans 40(3):779–788
Mita T, Kaneko T, Hori O Joint Haar-like features for face detection. In: Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, 2005. pp 1619-1626 Vol. 1612
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Patt Anal Mach Intell IEEE Trans 24(7):971–987
Pandzic IS, Forchheimer R (2002) MPEG-4 facial animation: the standard, implementation and applications. Wiley
Panning A, Al-Hamadi A, Niese R, Michaelis B (2008) Facial expression recognition based on Haar-like feature detection. Patt Recog Imag Anal 18(3):447–452
Peng W, Kohler C, Barrett F, Gur R, Verma R (2007) Quantifying facial expression abnormality in schizophrenia by combining 2D and 3D features. In: Computer vision and pattern recognition, 2007. CVPR ’07. IEEE Conference on. pp 1-8
Rudovic O, Pantic M, Patras I (2013) Coupled Gaussian processes for pose-invariant facial expression recognition. Patt Anal Mach Intell IEEE Trans 35(6):1357–1369. doi:10.1109/tpami.2012.233
Russell JA (1980) A circumplex model of affect. J Pers Soc Psychol 39(6):1161–1178
Shan C, Gritti T (2008) Learning discriminative lbp-histogram bins for facial expression recognition. In: Proc. British Machine Vision Conference
Shan H, Shangfei W, Yanpeng L (2011) Spontaneous facial expression recognition based on feature point tracking. In: Image and graphics (ICIG), Sixth International Conference on, 12-15 Aug. 2011. pp 760-765
Shangfei W, Zhilei L, Siliang L, Yanpeng L, Guobing W, Peng P, Fei C, Xufa W (2010) A natural visible and infrared facial expression database for expression recognition and emotion inference. Multimed IEEE Trans 12(7):682–691
Sung J, Kim D (2008) Pose-robust facial expression recognition using view-based 2D + 3D AAM. Syst Man Cybernet A Syst Humans IEEE Trans 38(4):852–866
Tariq U, Kai-Hsiang L, Zhen L, Xi Z, Zhaowen W, Vuong L, Huang TS, Xutao L, Han TX Emotion recognition from an ensemble of features. In: automatic face & gesture recognition and workshops (FG 2011), 2011 I.E. International Conference on, 21-25 March 2011 2011. pp 872-877. doi:10.1109/FG.2011.5771365
Tsalakanidou F, Malassiotis S (2010) Real-time 2D + 3D facial action and expression recognition. Pattern Recogn 43(5):1763–1775
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Whitehill J, Littlewort G, Fasel I, Bartlett M, Movellan J (2009) Toward practical smile detection. Patt Anal Mach Intell IEEE Trans 31(11):2106–2111
Wu Y, Ji Q (2014) Discriminative deep face shape model for facial point detection. Int J Comput Vision:1-17. doi:10.1007/s11263-014-0775-8
Xiangxin Z, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: Computer Vision and Pattern Recognition (CVPR), 2012 I.E. Conference on, 16-21 June 2012 pp 2879-2886. doi:10.1109/CVPR.2012.6248014
Xie X, Lam K-M (2009) Facial expression recognition based on shape and texture. Pattern Recogn 42(5):1003–1011
Xu M, Wang J, He X, Jin J, Luo S, Lu H (2012) A three-level framework for affective content analysis and its case studies. Multimedia Tools and Applications:1-23. doi:10.1007/s11042-012-1046-8
Yongmian Z, Qiang J (2005) Active and dynamic information fusion for facial expression understanding from image sequences. Patt Anal Mach Intell IEEE Trans 27(5):699–714
Zeng Z, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. Pattern Anal Machine Intell IEEE Trans 31(1):39–58
Zhang L, Tjondronegoro D, Chandran V (2011) Evaluation of texture and geometry for dimensional facial expression recognition. In: digital image computing techniques and applications (DICTA), 2011 International Conference on, 6-8 Dec. 2011 pp 620-626
Zhang L, Tjondronegoro D, Chandran V (2012) Discovering the best feature extraction and selection algorithms for spontaneous facial expression recognition. In: 2012 I.E. International Conference on Multimedia & Expo (ICME 2012), pp 1027-1032
Zhang L, Tjondronegoro D, Chandran V (2014) Facial expression recognition experiments with data from television broadcasts and the World Wide Web. Image Vis Comput 32(2):107–119. doi:10.1016/j.imavis.2013.12.008
Zhang L, Tjondronegoro D, Chandran V (2014) Representation of facial expression categories in continuous arousal–valence space: feature and correlation. Image Vis Comput 32(12):1067–1079. doi:10.1016/j.imavis.2014.09.005
Zhang C, Zhang Z (2010) A survey of recent advances in face detection. technical report, microsoft research
Zhaoyu W, Shangfei W Spontaneous facial expression recognition by using feature-level fusion of visible and thermal infrared images. In: Machine Learning for Signal Processing (MLSP), 2011 I.E. International Workshop on. pp 1-6
Zhengyou Z, Lyons M, Schuster M, Akamatsu S Comparison between geometry-based and Gabor-wavelets-based facial expression recognition using multi-layer perceptron. In: Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on, 1998. pp 454-459
Zisheng L, Jun-ichi I, Kaneko M Facial-component-based bag of words and PHOG descriptor for facial expression recognition. In: Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Conference on, 11-14 Oct. 2009 2009. pp 1353-1358
Acknowledgments
This work is funded by the British Broadcast Corporation, Australian Smart Services CRC, and the National Natural Science Foundation of China (Grant No. 61402362, 61402363).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, L., Tjondronegoro, D., Chandran, V. et al. Towards robust automatic affective classification of images using facial expressions for practical applications. Multimed Tools Appl 75, 4669–4695 (2016). https://doi.org/10.1007/s11042-015-2497-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-2497-5