Skip to main content

BAUM-2: a multilingual audio-visual affective face database


Access to audio-visual databases, which contain enough variety and are richly annotated is essential to assess the performance of algorithms in affective computing applications, which require emotion recognition from face and/or speech data. Most databases available today have been recorded under tightly controlled environments, are mostly acted and do not contain speech data. We first present a semi-automatic method that can extract audio-visual facial video clips from movies and TV programs in any language. The method is based on automatic detection and tracking of faces in a movie until the face is occluded or a scene cut occurs. We also created a video-based database, named as BAUM-2, which consists of annotated audio-visual facial clips in several languages. The collected clips simulate real-world conditions by containing various head poses, illumination conditions, accessories, temporary occlusions and subjects with a wide range of ages. The proposed semi-automatic affective clip extraction method can easily be used to extend the database to contain clips in other languages. We also created an image based facial expression database from the peak frames of the video clips, which is named as BAUM-2i. Baseline image and video-based facial expression recognition results using state-of-the art features and classifiers indicate that facial expression recognition under tough and close-to-natural conditions is quite challenging.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11


  1. 1.

    FG 2011 facial expression recognition and analysis challenge (FERA 2011), Available [online].

  2. 2.

    Machine vision group, MATLAB codes for local phase quantization. Last Accessed: 01/07/2013

  3. 3.

    Phog implementation. Last Accessed: 01/07/2013

  4. 4.

    Ashraf AB, Lucey S, Cohn JF, Chen T, Ambadar Z, Prkachin KM, Solomon PE (2009) The painful face—pain expression recognition using active appearance models. Image Vis Comput 27(12):1788–1796

    Article  Google Scholar 

  5. 5.

    Banziger T, Scherer KR (2010) Blueprint for affective computing: a sourcebook, In: Introducing the Geneva multimodal emotion portrayal (GEMEP) corpus. Oxford University Press, pp 271–294

  6. 6.

    Bassili J (1979) Emotion recognition: the role of facial movement and the relative importance of upper and lower areas of the face. J Pers Soc Psychol 37:2049–2058

    Article  Google Scholar 

  7. 7.

    Bay H, Ess A, Tuytelaars T, Gool LV (2008) Surf: speeded up robust features. Comput Vision Image Underst (CVIU) 110(3):346–359

    Article  Google Scholar 

  8. 8.

    Bosch A, Zisserman A, Munoz X (2007) Representing shape with a spatial pyramidal kernel. In: Proceedings of ACM international conference on image and video retrieval, CIVR 2007, pp 401–408

  9. 9.

    Bozkurt E, Erzin E, Erdem CE, Erdem AT (2011) Formant position based weighted spectral features for emotion recognition. Speech Commun 53(9–10):1186–1197

    Article  Google Scholar 

  10. 10.

    Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS (2008) Iemocap: interactive emotional dyadic motion capture database. J Lang Resour Eval 42(4):335–359

    Article  Google Scholar 

  11. 11.

    Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27

    Article  MATH  Google Scholar 

  12. 12.

    Cootes T, Taylor C (1992) Active shape models. In: British machine vision conference (BMVC’92), pp 266–275

  13. 13.

    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of international conference on computer vision and pattern recognition (CVPR), pp 886–893

  14. 14.

    Dhall A, Goecke R, Gedeon T (2011) Emotion recognition using PHOG and LPQ features. In: Proceedings of the workshop on facial expression recognition and analysis challenge FERA2011, IEEE automatic face and gesture recognition conference FG2011. Santa Barbara

  15. 15.

    Dhall A, Goecke R, Lucey S, Gedeon T (2011) Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In: IEEE international workshop on bechmarking facial image analysis technologies BeFIT, ICCV

  16. 16.

    Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE Multimed 19(3):34–41

    Article  MATH  Google Scholar 

  17. 17.

    Douglas-Cowie E, Cowie R, Schoder M (2000) A new emotion database: considerations, sources and scope. In: Proceedings of ISCA ITRW on speech and emotion, pp 39–44

  18. 18.

    Ekman P, Friesen WV (1976) Pictures of facial effect, Consulting Psychologists Press, Palo Alto

  19. 19.

    Erdem CE, Ulukaya S, Karaali A, Erdem AT (2011) Combining haar feature and skin color based classifiers for face detection. In: IEEE 36th international conference on acoustics, speech and signal processing (ICASSP 2011). Prague

  20. 20.

    Fanelli G, Gall J, Romsdorfer H, Weise T, Gool LV (2010) A 3-d audio-visual corpus of affective communication. IEEE Trans Multimed 12(6):591–598

    Article  MATH  Google Scholar 

  21. 21.

    Fasel B, Luettin J (2003) Automatic facial expression analysis: a survey. Pattern Recogn 36:259–275

    Article  Google Scholar 

  22. 22.

    Grimm M, Kroschel K, Narayanan S (2008) The Vera am Mittag German audio-visual emotional speech database. In: Proceedings of international conference multimedia and expo (ICME)

  23. 23.

    Gross R, Matthews I, Cohn JF, Kanade T, Baker S (2010) Multi-PIE. Image Vis Comput 28(5):807–813

    Article  Google Scholar 

  24. 24.

    Gunes H, Pantic M (2010) Automatic, dimensional and continuous emotion recognition. Int J Synth Emot 1(1):68–99

    Article  Google Scholar 

  25. 25.

    Hupont I, Baldassarri S, Cerezo E (2013) Facial emotional classification: from a discrete perspective to a continous emotional space. Pattern Anal Appl 16(1):41–54

    MathSciNet  Article  MATH  Google Scholar 

  26. 26.

    Kanade T, Cohn JF, Tian Y (2000) Comprehensive database for facial expression analysis. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (FG’00). Grenoble, France, pp 46–53

  27. 27.

    Li Z, Imai JI, Kaneko M (2009) Facial component based bag of words and PHOG descriptor for facial expression recognition. In: Proceedings of IEEE international conference on systems, man and cybernetics

  28. 28.

    Littlewort G, Bartlett MS, Fasel I, Susskind J, Movellan J (2006) Dynamics of facial expression extracted automatically from video. Image Vis Comput 24(6):615–625

    Article  Google Scholar 

  29. 29.

    Littlewort G, Whitehill J, Wu T, Fasel I, Frank M, Movellan J, Bartlett M (2011) The computer expression recognition toolbox (CERT). In: IEEE conference on automatic face and gesture recognition (FG 2011)

  30. 30.

    Littlewort GC, Bartlett MS, Lee K (2009) Automatic coding of facial expresssions displayed during posed and genuine pain. Image Vis Comput 27(12):1797–1803

    Article  Google Scholar 

  31. 31.

    Liu C, Yuen J, Torralba A (2011) SIFT flow: dense correspondence across scenes and its applications. IEEE Trans Pattern Anal Mach Intell 33(5):978–994

    Article  MATH  Google Scholar 

  32. 32.

    Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: Proceedings of IEEE workshop on CVPR for human communicative behavior analysis. San Francisco

  33. 33.

    Lyons MJ, Akamatsu S, Kamachi M, Gyoba J (1998) Coding facial expressions with gabor wavelets. In: Proceedings of 3rd IEEE international conference on automatic face and gesture recognition, pp 200–205

  34. 34.

    Martinez A, Du S (2012) A model of the perception of facial expressions of emotion by humans: research overview and perspectives. J Mach Learn Res 13:1589–1608

    MathSciNet  Google Scholar 

  35. 35.

    Mckeown G, Valstar MF, Cowie R, Pantic M, Schroeder M (2012) The semaine database: annotated multimodal records of emotionally coloured conversations between a person and a limited agent. IEEE Trans Affect Comput 3(1):5–17

    Article  Google Scholar 

  36. 36.

    Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987

    Article  Google Scholar 

  37. 37.

    Ojansivu V, Heikkil J (2008) Blur insensitive texture classification using local phase quantization. Lect Notes Comput Sci 5099:236–243

    Article  Google Scholar 

  38. 38.

    O’Toole AJ, Harms J, Snow SL, Hurst DR, Pappas MR, Ayyad JH, Abdi H (2005) A video database of moving faces and people. IEEE Trans Pattern Anal Mach Intell 27(5):812–816

    Article  Google Scholar 

  39. 39.

    Pantic M (2009) Machine analysis of facial behaviour: naturalistic and dynamic behaviour. Philos Trans R Soc B-Biol Sci 364(1535):3505–3513

    Article  Google Scholar 

  40. 40.

    Pantic M, Rothkrantz L (2000) Automatic analysis of facial expressions: the state of the art. IEEE Trans Pattern Anal Mach Intell 22(12):1424–1445

    Article  Google Scholar 

  41. 41.

    Pantic M, Valstar MF, Rademaker R, Maat L (2005) Web-based database for facial expression analysis. In: Proceedings of IEEE international conference on multimedia and expo (ICME’05). Amsterdam

  42. 42.

    Rusell JA (1980) A circumplex model of affect. J Personal Social Psychol 39:1161–1178

    Article  Google Scholar 

  43. 43.

    Ryan A, Cohn J, Lucey S, Saragih J, Lucey P, la Torre FD, Rossi A (2009) Automated facial expression recognition system. In: Proceedings of the international Carnahan conference on security technology, pp 172–177

  44. 44.

    Saragih JM, Lucey S, Cohn JF (2011) Deformable model fitting by regularized landmark mean-shift. Int J Comput Vis (IJCV) 91:200–215

    MathSciNet  Article  MATH  Google Scholar 

  45. 45.

    Savran A, Alyuz N, Dibeklioğlu H, Celiktutan O, Gökberk B, Sankur B, Akarun L (2008) Bosphorus database for 3D face analysis. In: First COST 2101 workshop on biometrics and identity management (BIOID 2008)

  46. 46.

    Savran A, Sankur B, Bilge MT (2012) Comparative evaluation of 3d versus 2d modality for automatic detection of facial action units. Pattern Recogn 45(2):767–782

    Article  Google Scholar 

  47. 47.

    Shan C, Gong S, McOwan PW (2009) Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis Comput 27:803–816

    Article  Google Scholar 

  48. 48.

    Turan C, Kansin C, Erdem CE (2013) Bahcesehir University multimodal affective database (BAUM-2).

  49. 49.

    Ulukaya S, Erdem CE (2012) Estimation of the neutral face shape using gaussian mixture models. In: IEEE international conference on acoustics, speech and signal processing (ICASSP 2012). Kyoto, 1385–1388

  50. 50.

    Valstar MF, Jiang B, Mehu M, Pantic M, Scherer KR (2011) The first facial expression recognition and analysis challenge. In: IEEE international conference face and gesture recognition (FG’2011)

  51. 51.

    Viera AJ, Garrett JM (2005) Understanding interobserver agreement: the kappa statistic. Fam Med37(5)

  52. 52.

    Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154

    Article  Google Scholar 

  53. 53.

    Wallhoff F (2006) Facial expressions and emotion database [online]. Available:

  54. 54.

    Watkins MW, Pacheco M (2000) Interobserver agreement in behavioral research. J Behav Educ 10(4):205–212

    Article  Google Scholar 

  55. 55.

    Whissell C Emotion: theory, research and experience. The measurement of emotions, vol. 4, chap. The dictionary of affect in language. Academic, New York

  56. 56.

    Wischik LAvi utils. Last Accessed: 01/07/2013

  57. 57.

    Yang S, Bhanu B (2012) Understanding discrete facial expressions in video using an emotion avatar image. IEEE Trans Syst Man Cybern - Part B: Cybern 42(4):980–992

    Article  Google Scholar 

  58. 58.

    Zeng ZH, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58

    Article  Google Scholar 

  59. 59.

    Zhang X, Yin L, Cohn J, Canavan S, Reale M, Horowitz A, Liu P (2013) A highresolution spontaneous 3D dynamic facial expression database. In: International conference on automatic face and gesture recognition (FG’13). Shanghai

  60. 60.

    Zhang Z, Schuller B (2012) Active learning by sparse instance tracking and classifier confidence in acoustic emotion recognition. In: ISCA (ed) Proceedings of INTERSPEECH. Portland

  61. 61.

    Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928

    Article  Google Scholar 

Download references


Portions of the research in this paper use the MMI-FacialExpression Database collected by M. Pantic and her group (

Author information



Corresponding author

Correspondence to Cigdem Eroglu Erdem.

Additional information

This work was supported by the Turkish Scientific and Technical Research Council (TUBITAK) under project 110E056.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Eroglu Erdem, C., Turan, C. & Aydin, Z. BAUM-2: a multilingual audio-visual affective face database. Multimed Tools Appl 74, 7429–7459 (2015).

Download citation


  • Facial expression recognition
  • Affective database
  • Audio-visual affective database