Abstract
Access to audio-visual databases, which contain enough variety and are richly annotated is essential to assess the performance of algorithms in affective computing applications, which require emotion recognition from face and/or speech data. Most databases available today have been recorded under tightly controlled environments, are mostly acted and do not contain speech data. We first present a semi-automatic method that can extract audio-visual facial video clips from movies and TV programs in any language. The method is based on automatic detection and tracking of faces in a movie until the face is occluded or a scene cut occurs. We also created a video-based database, named as BAUM-2, which consists of annotated audio-visual facial clips in several languages. The collected clips simulate real-world conditions by containing various head poses, illumination conditions, accessories, temporary occlusions and subjects with a wide range of ages. The proposed semi-automatic affective clip extraction method can easily be used to extend the database to contain clips in other languages. We also created an image based facial expression database from the peak frames of the video clips, which is named as BAUM-2i. Baseline image and video-based facial expression recognition results using state-of-the art features and classifiers indicate that facial expression recognition under tough and close-to-natural conditions is quite challenging.
Similar content being viewed by others
References
FG 2011 facial expression recognition and analysis challenge (FERA 2011), Available [online]. http://sspnet.eu/fera2011/
Machine vision group, MATLAB codes for local phase quantization. http://www.cse.oulu.fi/CMV/Downloads/LPQMatlab. Last Accessed: 01/07/2013
Phog implementation. http://www.robots.ox.ac.uk/%7Evgg/research/caltech/phog.html. Last Accessed: 01/07/2013
Ashraf AB, Lucey S, Cohn JF, Chen T, Ambadar Z, Prkachin KM, Solomon PE (2009) The painful face—pain expression recognition using active appearance models. Image Vis Comput 27(12):1788–1796
Banziger T, Scherer KR (2010) Blueprint for affective computing: a sourcebook, In: Introducing the Geneva multimodal emotion portrayal (GEMEP) corpus. Oxford University Press, pp 271–294
Bassili J (1979) Emotion recognition: the role of facial movement and the relative importance of upper and lower areas of the face. J Pers Soc Psychol 37:2049–2058
Bay H, Ess A, Tuytelaars T, Gool LV (2008) Surf: speeded up robust features. Comput Vision Image Underst (CVIU) 110(3):346–359
Bosch A, Zisserman A, Munoz X (2007) Representing shape with a spatial pyramidal kernel. In: Proceedings of ACM international conference on image and video retrieval, CIVR 2007, pp 401–408
Bozkurt E, Erzin E, Erdem CE, Erdem AT (2011) Formant position based weighted spectral features for emotion recognition. Speech Commun 53(9–10):1186–1197
Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS (2008) Iemocap: interactive emotional dyadic motion capture database. J Lang Resour Eval 42(4):335–359
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27
Cootes T, Taylor C (1992) Active shape models. In: British machine vision conference (BMVC’92), pp 266–275
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of international conference on computer vision and pattern recognition (CVPR), pp 886–893
Dhall A, Goecke R, Gedeon T (2011) Emotion recognition using PHOG and LPQ features. In: Proceedings of the workshop on facial expression recognition and analysis challenge FERA2011, IEEE automatic face and gesture recognition conference FG2011. Santa Barbara
Dhall A, Goecke R, Lucey S, Gedeon T (2011) Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In: IEEE international workshop on bechmarking facial image analysis technologies BeFIT, ICCV
Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE Multimed 19(3):34–41
Douglas-Cowie E, Cowie R, Schoder M (2000) A new emotion database: considerations, sources and scope. In: Proceedings of ISCA ITRW on speech and emotion, pp 39–44
Ekman P, Friesen WV (1976) Pictures of facial effect, Consulting Psychologists Press, Palo Alto
Erdem CE, Ulukaya S, Karaali A, Erdem AT (2011) Combining haar feature and skin color based classifiers for face detection. In: IEEE 36th international conference on acoustics, speech and signal processing (ICASSP 2011). Prague
Fanelli G, Gall J, Romsdorfer H, Weise T, Gool LV (2010) A 3-d audio-visual corpus of affective communication. IEEE Trans Multimed 12(6):591–598
Fasel B, Luettin J (2003) Automatic facial expression analysis: a survey. Pattern Recogn 36:259–275
Grimm M, Kroschel K, Narayanan S (2008) The Vera am Mittag German audio-visual emotional speech database. In: Proceedings of international conference multimedia and expo (ICME)
Gross R, Matthews I, Cohn JF, Kanade T, Baker S (2010) Multi-PIE. Image Vis Comput 28(5):807–813
Gunes H, Pantic M (2010) Automatic, dimensional and continuous emotion recognition. Int J Synth Emot 1(1):68–99
Hupont I, Baldassarri S, Cerezo E (2013) Facial emotional classification: from a discrete perspective to a continous emotional space. Pattern Anal Appl 16(1):41–54
Kanade T, Cohn JF, Tian Y (2000) Comprehensive database for facial expression analysis. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (FG’00). Grenoble, France, pp 46–53
Li Z, Imai JI, Kaneko M (2009) Facial component based bag of words and PHOG descriptor for facial expression recognition. In: Proceedings of IEEE international conference on systems, man and cybernetics
Littlewort G, Bartlett MS, Fasel I, Susskind J, Movellan J (2006) Dynamics of facial expression extracted automatically from video. Image Vis Comput 24(6):615–625
Littlewort G, Whitehill J, Wu T, Fasel I, Frank M, Movellan J, Bartlett M (2011) The computer expression recognition toolbox (CERT). In: IEEE conference on automatic face and gesture recognition (FG 2011)
Littlewort GC, Bartlett MS, Lee K (2009) Automatic coding of facial expresssions displayed during posed and genuine pain. Image Vis Comput 27(12):1797–1803
Liu C, Yuen J, Torralba A (2011) SIFT flow: dense correspondence across scenes and its applications. IEEE Trans Pattern Anal Mach Intell 33(5):978–994
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: Proceedings of IEEE workshop on CVPR for human communicative behavior analysis. San Francisco
Lyons MJ, Akamatsu S, Kamachi M, Gyoba J (1998) Coding facial expressions with gabor wavelets. In: Proceedings of 3rd IEEE international conference on automatic face and gesture recognition, pp 200–205
Martinez A, Du S (2012) A model of the perception of facial expressions of emotion by humans: research overview and perspectives. J Mach Learn Res 13:1589–1608
Mckeown G, Valstar MF, Cowie R, Pantic M, Schroeder M (2012) The semaine database: annotated multimodal records of emotionally coloured conversations between a person and a limited agent. IEEE Trans Affect Comput 3(1):5–17
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Ojansivu V, Heikkil J (2008) Blur insensitive texture classification using local phase quantization. Lect Notes Comput Sci 5099:236–243
O’Toole AJ, Harms J, Snow SL, Hurst DR, Pappas MR, Ayyad JH, Abdi H (2005) A video database of moving faces and people. IEEE Trans Pattern Anal Mach Intell 27(5):812–816
Pantic M (2009) Machine analysis of facial behaviour: naturalistic and dynamic behaviour. Philos Trans R Soc B-Biol Sci 364(1535):3505–3513
Pantic M, Rothkrantz L (2000) Automatic analysis of facial expressions: the state of the art. IEEE Trans Pattern Anal Mach Intell 22(12):1424–1445
Pantic M, Valstar MF, Rademaker R, Maat L (2005) Web-based database for facial expression analysis. In: Proceedings of IEEE international conference on multimedia and expo (ICME’05). Amsterdam http://www.mmifacedb.com/
Rusell JA (1980) A circumplex model of affect. J Personal Social Psychol 39:1161–1178
Ryan A, Cohn J, Lucey S, Saragih J, Lucey P, la Torre FD, Rossi A (2009) Automated facial expression recognition system. In: Proceedings of the international Carnahan conference on security technology, pp 172–177
Saragih JM, Lucey S, Cohn JF (2011) Deformable model fitting by regularized landmark mean-shift. Int J Comput Vis (IJCV) 91:200–215
Savran A, Alyuz N, Dibeklioğlu H, Celiktutan O, Gökberk B, Sankur B, Akarun L (2008) Bosphorus database for 3D face analysis. In: First COST 2101 workshop on biometrics and identity management (BIOID 2008)
Savran A, Sankur B, Bilge MT (2012) Comparative evaluation of 3d versus 2d modality for automatic detection of facial action units. Pattern Recogn 45(2):767–782
Shan C, Gong S, McOwan PW (2009) Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis Comput 27:803–816
Turan C, Kansin C, Erdem CE (2013) Bahcesehir University multimodal affective database (BAUM-2). http://baum2.bahcesehir.edu.tr/
Ulukaya S, Erdem CE (2012) Estimation of the neutral face shape using gaussian mixture models. In: IEEE international conference on acoustics, speech and signal processing (ICASSP 2012). Kyoto, 1385–1388
Valstar MF, Jiang B, Mehu M, Pantic M, Scherer KR (2011) The first facial expression recognition and analysis challenge. In: IEEE international conference face and gesture recognition (FG’2011)
Viera AJ, Garrett JM (2005) Understanding interobserver agreement: the kappa statistic. Fam Med37(5)
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Wallhoff F (2006) Facial expressions and emotion database [online]. Available: http://www.mmk.ei.tum.de/%7Ewaf/fgnet/feedtum.html
Watkins MW, Pacheco M (2000) Interobserver agreement in behavioral research. J Behav Educ 10(4):205–212
Whissell C Emotion: theory, research and experience. The measurement of emotions, vol. 4, chap. The dictionary of affect in language. Academic, New York
Wischik LAvi utils. http://www.wischik.com/lu/programmer/avi_utils.html. Last Accessed: 01/07/2013
Yang S, Bhanu B (2012) Understanding discrete facial expressions in video using an emotion avatar image. IEEE Trans Syst Man Cybern - Part B: Cybern 42(4):980–992
Zeng ZH, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58
Zhang X, Yin L, Cohn J, Canavan S, Reale M, Horowitz A, Liu P (2013) A highresolution spontaneous 3D dynamic facial expression database. In: International conference on automatic face and gesture recognition (FG’13). Shanghai
Zhang Z, Schuller B (2012) Active learning by sparse instance tracking and classifier confidence in acoustic emotion recognition. In: ISCA (ed) Proceedings of INTERSPEECH. Portland
Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928
Acknowledgments
Portions of the research in this paper use the MMI-FacialExpression Database collected by M. Pantic and her group (www.mmifacedb.com).
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the Turkish Scientific and Technical Research Council (TUBITAK) under project 110E056.
Rights and permissions
About this article
Cite this article
Eroglu Erdem, C., Turan, C. & Aydin, Z. BAUM-2: a multilingual audio-visual affective face database. Multimed Tools Appl 74, 7429–7459 (2015). https://doi.org/10.1007/s11042-014-1986-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-1986-2