BAUM-2: a multilingual audio-visual affective face database

Eroglu Erdem, Cigdem; Turan, Cigdem; Aydin, Zafer

doi:10.1007/s11042-014-1986-2

BAUM-2: a multilingual audio-visual affective face database

Published: 09 May 2014

Volume 74, pages 7429–7459, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Cigdem Eroglu Erdem¹,
Cigdem Turan¹^nAff3 &
Zafer Aydin¹^nAff2

935 Accesses
26 Citations
Explore all metrics

Abstract

Access to audio-visual databases, which contain enough variety and are richly annotated is essential to assess the performance of algorithms in affective computing applications, which require emotion recognition from face and/or speech data. Most databases available today have been recorded under tightly controlled environments, are mostly acted and do not contain speech data. We first present a semi-automatic method that can extract audio-visual facial video clips from movies and TV programs in any language. The method is based on automatic detection and tracking of faces in a movie until the face is occluded or a scene cut occurs. We also created a video-based database, named as BAUM-2, which consists of annotated audio-visual facial clips in several languages. The collected clips simulate real-world conditions by containing various head poses, illumination conditions, accessories, temporary occlusions and subjects with a wide range of ages. The proposed semi-automatic affective clip extraction method can easily be used to extend the database to contain clips in other languages. We also created an image based facial expression database from the peak frames of the video clips, which is named as BAUM-2i. Baseline image and video-based facial expression recognition results using state-of-the art features and classifiers indicate that facial expression recognition under tough and close-to-natural conditions is quite challenging.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Sabancı University Dynamic Face Database (SUDFace): Development and validation of an audiovisual stimulus set of recited and free speeches with neutral facial expressions

Article 26 August 2022

Yağmur Damla Şentürk, Ebru Ecem Tavacioglu, … Nihan Alp

RAMAS: Russian Multimodal Corpus of Dyadic Interaction for Affective Computing

Comparing the Performance of Facial Emotion Recognition Systems on Real-Life Videos: Gender, Ethnicity and Age

References

FG 2011 facial expression recognition and analysis challenge (FERA 2011), Available [online]. http://sspnet.eu/fera2011/
Machine vision group, MATLAB codes for local phase quantization. http://www.cse.oulu.fi/CMV/Downloads/LPQMatlab. Last Accessed: 01/07/2013
Phog implementation. http://www.robots.ox.ac.uk/%7Evgg/research/caltech/phog.html. Last Accessed: 01/07/2013
Ashraf AB, Lucey S, Cohn JF, Chen T, Ambadar Z, Prkachin KM, Solomon PE (2009) The painful face—pain expression recognition using active appearance models. Image Vis Comput 27(12):1788–1796
Article Google Scholar
Banziger T, Scherer KR (2010) Blueprint for affective computing: a sourcebook, In: Introducing the Geneva multimodal emotion portrayal (GEMEP) corpus. Oxford University Press, pp 271–294
Bassili J (1979) Emotion recognition: the role of facial movement and the relative importance of upper and lower areas of the face. J Pers Soc Psychol 37:2049–2058
Article Google Scholar
Bay H, Ess A, Tuytelaars T, Gool LV (2008) Surf: speeded up robust features. Comput Vision Image Underst (CVIU) 110(3):346–359
Article Google Scholar
Bosch A, Zisserman A, Munoz X (2007) Representing shape with a spatial pyramidal kernel. In: Proceedings of ACM international conference on image and video retrieval, CIVR 2007, pp 401–408
Bozkurt E, Erzin E, Erdem CE, Erdem AT (2011) Formant position based weighted spectral features for emotion recognition. Speech Commun 53(9–10):1186–1197
Article Google Scholar
Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS (2008) Iemocap: interactive emotional dyadic motion capture database. J Lang Resour Eval 42(4):335–359
Article Google Scholar
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27
Article MATH Google Scholar
Cootes T, Taylor C (1992) Active shape models. In: British machine vision conference (BMVC’92), pp 266–275
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of international conference on computer vision and pattern recognition (CVPR), pp 886–893
Dhall A, Goecke R, Gedeon T (2011) Emotion recognition using PHOG and LPQ features. In: Proceedings of the workshop on facial expression recognition and analysis challenge FERA2011, IEEE automatic face and gesture recognition conference FG2011. Santa Barbara
Dhall A, Goecke R, Lucey S, Gedeon T (2011) Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In: IEEE international workshop on bechmarking facial image analysis technologies BeFIT, ICCV
Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE Multimed 19(3):34–41
Article MATH Google Scholar
Douglas-Cowie E, Cowie R, Schoder M (2000) A new emotion database: considerations, sources and scope. In: Proceedings of ISCA ITRW on speech and emotion, pp 39–44
Ekman P, Friesen WV (1976) Pictures of facial effect, Consulting Psychologists Press, Palo Alto
Erdem CE, Ulukaya S, Karaali A, Erdem AT (2011) Combining haar feature and skin color based classifiers for face detection. In: IEEE 36th international conference on acoustics, speech and signal processing (ICASSP 2011). Prague
Fanelli G, Gall J, Romsdorfer H, Weise T, Gool LV (2010) A 3-d audio-visual corpus of affective communication. IEEE Trans Multimed 12(6):591–598
Article MATH Google Scholar
Fasel B, Luettin J (2003) Automatic facial expression analysis: a survey. Pattern Recogn 36:259–275
Article Google Scholar
Grimm M, Kroschel K, Narayanan S (2008) The Vera am Mittag German audio-visual emotional speech database. In: Proceedings of international conference multimedia and expo (ICME)
Gross R, Matthews I, Cohn JF, Kanade T, Baker S (2010) Multi-PIE. Image Vis Comput 28(5):807–813
Article Google Scholar
Gunes H, Pantic M (2010) Automatic, dimensional and continuous emotion recognition. Int J Synth Emot 1(1):68–99
Article Google Scholar
Hupont I, Baldassarri S, Cerezo E (2013) Facial emotional classification: from a discrete perspective to a continous emotional space. Pattern Anal Appl 16(1):41–54
Article MathSciNet MATH Google Scholar
Kanade T, Cohn JF, Tian Y (2000) Comprehensive database for facial expression analysis. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (FG’00). Grenoble, France, pp 46–53
Li Z, Imai JI, Kaneko M (2009) Facial component based bag of words and PHOG descriptor for facial expression recognition. In: Proceedings of IEEE international conference on systems, man and cybernetics
Littlewort G, Bartlett MS, Fasel I, Susskind J, Movellan J (2006) Dynamics of facial expression extracted automatically from video. Image Vis Comput 24(6):615–625
Article Google Scholar
Littlewort G, Whitehill J, Wu T, Fasel I, Frank M, Movellan J, Bartlett M (2011) The computer expression recognition toolbox (CERT). In: IEEE conference on automatic face and gesture recognition (FG 2011)
Littlewort GC, Bartlett MS, Lee K (2009) Automatic coding of facial expresssions displayed during posed and genuine pain. Image Vis Comput 27(12):1797–1803
Article Google Scholar
Liu C, Yuen J, Torralba A (2011) SIFT flow: dense correspondence across scenes and its applications. IEEE Trans Pattern Anal Mach Intell 33(5):978–994
Article MATH Google Scholar
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: Proceedings of IEEE workshop on CVPR for human communicative behavior analysis. San Francisco
Lyons MJ, Akamatsu S, Kamachi M, Gyoba J (1998) Coding facial expressions with gabor wavelets. In: Proceedings of 3rd IEEE international conference on automatic face and gesture recognition, pp 200–205
Martinez A, Du S (2012) A model of the perception of facial expressions of emotion by humans: research overview and perspectives. J Mach Learn Res 13:1589–1608
MathSciNet Google Scholar
Mckeown G, Valstar MF, Cowie R, Pantic M, Schroeder M (2012) The semaine database: annotated multimodal records of emotionally coloured conversations between a person and a limited agent. IEEE Trans Affect Comput 3(1):5–17
Article Google Scholar
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Article Google Scholar
Ojansivu V, Heikkil J (2008) Blur insensitive texture classification using local phase quantization. Lect Notes Comput Sci 5099:236–243
Article Google Scholar
O’Toole AJ, Harms J, Snow SL, Hurst DR, Pappas MR, Ayyad JH, Abdi H (2005) A video database of moving faces and people. IEEE Trans Pattern Anal Mach Intell 27(5):812–816
Article Google Scholar
Pantic M (2009) Machine analysis of facial behaviour: naturalistic and dynamic behaviour. Philos Trans R Soc B-Biol Sci 364(1535):3505–3513
Article Google Scholar
Pantic M, Rothkrantz L (2000) Automatic analysis of facial expressions: the state of the art. IEEE Trans Pattern Anal Mach Intell 22(12):1424–1445
Article Google Scholar
Pantic M, Valstar MF, Rademaker R, Maat L (2005) Web-based database for facial expression analysis. In: Proceedings of IEEE international conference on multimedia and expo (ICME’05). Amsterdam http://www.mmifacedb.com/
Rusell JA (1980) A circumplex model of affect. J Personal Social Psychol 39:1161–1178
Article Google Scholar
Ryan A, Cohn J, Lucey S, Saragih J, Lucey P, la Torre FD, Rossi A (2009) Automated facial expression recognition system. In: Proceedings of the international Carnahan conference on security technology, pp 172–177
Saragih JM, Lucey S, Cohn JF (2011) Deformable model fitting by regularized landmark mean-shift. Int J Comput Vis (IJCV) 91:200–215
Article MathSciNet MATH Google Scholar
Savran A, Alyuz N, Dibeklioğlu H, Celiktutan O, Gökberk B, Sankur B, Akarun L (2008) Bosphorus database for 3D face analysis. In: First COST 2101 workshop on biometrics and identity management (BIOID 2008)
Savran A, Sankur B, Bilge MT (2012) Comparative evaluation of 3d versus 2d modality for automatic detection of facial action units. Pattern Recogn 45(2):767–782
Article Google Scholar
Shan C, Gong S, McOwan PW (2009) Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis Comput 27:803–816
Article Google Scholar
Turan C, Kansin C, Erdem CE (2013) Bahcesehir University multimodal affective database (BAUM-2). http://baum2.bahcesehir.edu.tr/
Ulukaya S, Erdem CE (2012) Estimation of the neutral face shape using gaussian mixture models. In: IEEE international conference on acoustics, speech and signal processing (ICASSP 2012). Kyoto, 1385–1388
Valstar MF, Jiang B, Mehu M, Pantic M, Scherer KR (2011) The first facial expression recognition and analysis challenge. In: IEEE international conference face and gesture recognition (FG’2011)
Viera AJ, Garrett JM (2005) Understanding interobserver agreement: the kappa statistic. Fam Med37(5)
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Article Google Scholar
Wallhoff F (2006) Facial expressions and emotion database [online]. Available: http://www.mmk.ei.tum.de/%7Ewaf/fgnet/feedtum.html
Watkins MW, Pacheco M (2000) Interobserver agreement in behavioral research. J Behav Educ 10(4):205–212
Article Google Scholar
Whissell C Emotion: theory, research and experience. The measurement of emotions, vol. 4, chap. The dictionary of affect in language. Academic, New York
Wischik LAvi utils. http://www.wischik.com/lu/programmer/avi_utils.html. Last Accessed: 01/07/2013
Yang S, Bhanu B (2012) Understanding discrete facial expressions in video using an emotion avatar image. IEEE Trans Syst Man Cybern - Part B: Cybern 42(4):980–992
Article Google Scholar
Zeng ZH, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58
Article Google Scholar
Zhang X, Yin L, Cohn J, Canavan S, Reale M, Horowitz A, Liu P (2013) A highresolution spontaneous 3D dynamic facial expression database. In: International conference on automatic face and gesture recognition (FG’13). Shanghai
Zhang Z, Schuller B (2012) Active learning by sparse instance tracking and classifier confidence in acoustic emotion recognition. In: ISCA (ed) Proceedings of INTERSPEECH. Portland
Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928
Article Google Scholar

Download references

Acknowledgments

Portions of the research in this paper use the MMI-FacialExpression Database collected by M. Pantic and her group (www.mmifacedb.com).

Author information

Zafer Aydin
Present address: Department of Computer Engineering, Abdullah Gul University, Kocasinan, Kayseri, Turkey
Cigdem Turan
Present address: Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong

Authors and Affiliations

Department of Electrical and Electronics Engineering, Bahcesehir University, 34349, Besiktas, Istanbul, Turkey
Cigdem Eroglu Erdem, Cigdem Turan & Zafer Aydin

Authors

Cigdem Eroglu Erdem
View author publications
You can also search for this author in PubMed Google Scholar
Cigdem Turan
View author publications
You can also search for this author in PubMed Google Scholar
Zafer Aydin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cigdem Eroglu Erdem.

Additional information

This work was supported by the Turkish Scientific and Technical Research Council (TUBITAK) under project 110E056.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eroglu Erdem, C., Turan, C. & Aydin, Z. BAUM-2: a multilingual audio-visual affective face database. Multimed Tools Appl 74, 7429–7459 (2015). https://doi.org/10.1007/s11042-014-1986-2

Download citation

Published: 09 May 2014
Issue Date: September 2015
DOI: https://doi.org/10.1007/s11042-014-1986-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BAUM-2: a multilingual audio-visual affective face database

Abstract

Access this article

Similar content being viewed by others

The Sabancı University Dynamic Face Database (SUDFace): Development and validation of an audiovisual stimulus set of recited and free speeches with neutral facial expressions

RAMAS: Russian Multimodal Corpus of Dyadic Interaction for Affective Computing

Comparing the Performance of Facial Emotion Recognition Systems on Real-Life Videos: Gender, Ethnicity and Age

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

BAUM-2: a multilingual audio-visual affective face database

Abstract

Access this article

Similar content being viewed by others

The Sabancı University Dynamic Face Database (SUDFace): Development and validation of an audiovisual stimulus set of recited and free speeches with neutral facial expressions

RAMAS: Russian Multimodal Corpus of Dyadic Interaction for Affective Computing

Comparing the Performance of Facial Emotion Recognition Systems on Real-Life Videos: Gender, Ethnicity and Age

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation