Multimedia Tools and Applications

, Volume 74, Issue 18, pp 7429–7459 | Cite as

BAUM-2: a multilingual audio-visual affective face database

  • Cigdem Eroglu ErdemEmail author
  • Cigdem Turan
  • Zafer Aydin


Access to audio-visual databases, which contain enough variety and are richly annotated is essential to assess the performance of algorithms in affective computing applications, which require emotion recognition from face and/or speech data. Most databases available today have been recorded under tightly controlled environments, are mostly acted and do not contain speech data. We first present a semi-automatic method that can extract audio-visual facial video clips from movies and TV programs in any language. The method is based on automatic detection and tracking of faces in a movie until the face is occluded or a scene cut occurs. We also created a video-based database, named as BAUM-2, which consists of annotated audio-visual facial clips in several languages. The collected clips simulate real-world conditions by containing various head poses, illumination conditions, accessories, temporary occlusions and subjects with a wide range of ages. The proposed semi-automatic affective clip extraction method can easily be used to extend the database to contain clips in other languages. We also created an image based facial expression database from the peak frames of the video clips, which is named as BAUM-2i. Baseline image and video-based facial expression recognition results using state-of-the art features and classifiers indicate that facial expression recognition under tough and close-to-natural conditions is quite challenging.


Facial expression recognition Affective database Audio-visual affective database 



Portions of the research in this paper use the MMI-FacialExpression Database collected by M. Pantic and her group (


  1. 1.
    FG 2011 facial expression recognition and analysis challenge (FERA 2011), Available [online].
  2. 2.
    Machine vision group, MATLAB codes for local phase quantization. Last Accessed: 01/07/2013
  3. 3.
    Phog implementation. Last Accessed: 01/07/2013
  4. 4.
    Ashraf AB, Lucey S, Cohn JF, Chen T, Ambadar Z, Prkachin KM, Solomon PE (2009) The painful face—pain expression recognition using active appearance models. Image Vis Comput 27(12):1788–1796CrossRefGoogle Scholar
  5. 5.
    Banziger T, Scherer KR (2010) Blueprint for affective computing: a sourcebook, In: Introducing the Geneva multimodal emotion portrayal (GEMEP) corpus. Oxford University Press, pp 271–294Google Scholar
  6. 6.
    Bassili J (1979) Emotion recognition: the role of facial movement and the relative importance of upper and lower areas of the face. J Pers Soc Psychol 37:2049–2058CrossRefGoogle Scholar
  7. 7.
    Bay H, Ess A, Tuytelaars T, Gool LV (2008) Surf: speeded up robust features. Comput Vision Image Underst (CVIU) 110(3):346–359CrossRefGoogle Scholar
  8. 8.
    Bosch A, Zisserman A, Munoz X (2007) Representing shape with a spatial pyramidal kernel. In: Proceedings of ACM international conference on image and video retrieval, CIVR 2007, pp 401–408Google Scholar
  9. 9.
    Bozkurt E, Erzin E, Erdem CE, Erdem AT (2011) Formant position based weighted spectral features for emotion recognition. Speech Commun 53(9–10):1186–1197CrossRefGoogle Scholar
  10. 10.
    Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS (2008) Iemocap: interactive emotional dyadic motion capture database. J Lang Resour Eval 42(4):335–359CrossRefGoogle Scholar
  11. 11.
    Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27CrossRefzbMATHGoogle Scholar
  12. 12.
    Cootes T, Taylor C (1992) Active shape models. In: British machine vision conference (BMVC’92), pp 266–275Google Scholar
  13. 13.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of international conference on computer vision and pattern recognition (CVPR), pp 886–893Google Scholar
  14. 14.
    Dhall A, Goecke R, Gedeon T (2011) Emotion recognition using PHOG and LPQ features. In: Proceedings of the workshop on facial expression recognition and analysis challenge FERA2011, IEEE automatic face and gesture recognition conference FG2011. Santa BarbaraGoogle Scholar
  15. 15.
    Dhall A, Goecke R, Lucey S, Gedeon T (2011) Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In: IEEE international workshop on bechmarking facial image analysis technologies BeFIT, ICCVGoogle Scholar
  16. 16.
    Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE Multimed 19(3):34–41CrossRefzbMATHGoogle Scholar
  17. 17.
    Douglas-Cowie E, Cowie R, Schoder M (2000) A new emotion database: considerations, sources and scope. In: Proceedings of ISCA ITRW on speech and emotion, pp 39–44Google Scholar
  18. 18.
    Ekman P, Friesen WV (1976) Pictures of facial effect, Consulting Psychologists Press, Palo AltoGoogle Scholar
  19. 19.
    Erdem CE, Ulukaya S, Karaali A, Erdem AT (2011) Combining haar feature and skin color based classifiers for face detection. In: IEEE 36th international conference on acoustics, speech and signal processing (ICASSP 2011). PragueGoogle Scholar
  20. 20.
    Fanelli G, Gall J, Romsdorfer H, Weise T, Gool LV (2010) A 3-d audio-visual corpus of affective communication. IEEE Trans Multimed 12(6):591–598CrossRefzbMATHGoogle Scholar
  21. 21.
    Fasel B, Luettin J (2003) Automatic facial expression analysis: a survey. Pattern Recogn 36:259–275CrossRefGoogle Scholar
  22. 22.
    Grimm M, Kroschel K, Narayanan S (2008) The Vera am Mittag German audio-visual emotional speech database. In: Proceedings of international conference multimedia and expo (ICME)Google Scholar
  23. 23.
    Gross R, Matthews I, Cohn JF, Kanade T, Baker S (2010) Multi-PIE. Image Vis Comput 28(5):807–813CrossRefGoogle Scholar
  24. 24.
    Gunes H, Pantic M (2010) Automatic, dimensional and continuous emotion recognition. Int J Synth Emot 1(1):68–99CrossRefGoogle Scholar
  25. 25.
    Hupont I, Baldassarri S, Cerezo E (2013) Facial emotional classification: from a discrete perspective to a continous emotional space. Pattern Anal Appl 16(1):41–54MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Kanade T, Cohn JF, Tian Y (2000) Comprehensive database for facial expression analysis. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (FG’00). Grenoble, France, pp 46–53Google Scholar
  27. 27.
    Li Z, Imai JI, Kaneko M (2009) Facial component based bag of words and PHOG descriptor for facial expression recognition. In: Proceedings of IEEE international conference on systems, man and cyberneticsGoogle Scholar
  28. 28.
    Littlewort G, Bartlett MS, Fasel I, Susskind J, Movellan J (2006) Dynamics of facial expression extracted automatically from video. Image Vis Comput 24(6):615–625CrossRefGoogle Scholar
  29. 29.
    Littlewort G, Whitehill J, Wu T, Fasel I, Frank M, Movellan J, Bartlett M (2011) The computer expression recognition toolbox (CERT). In: IEEE conference on automatic face and gesture recognition (FG 2011)Google Scholar
  30. 30.
    Littlewort GC, Bartlett MS, Lee K (2009) Automatic coding of facial expresssions displayed during posed and genuine pain. Image Vis Comput 27(12):1797–1803CrossRefGoogle Scholar
  31. 31.
    Liu C, Yuen J, Torralba A (2011) SIFT flow: dense correspondence across scenes and its applications. IEEE Trans Pattern Anal Mach Intell 33(5):978–994CrossRefzbMATHGoogle Scholar
  32. 32.
    Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: Proceedings of IEEE workshop on CVPR for human communicative behavior analysis. San FranciscoGoogle Scholar
  33. 33.
    Lyons MJ, Akamatsu S, Kamachi M, Gyoba J (1998) Coding facial expressions with gabor wavelets. In: Proceedings of 3rd IEEE international conference on automatic face and gesture recognition, pp 200–205Google Scholar
  34. 34.
    Martinez A, Du S (2012) A model of the perception of facial expressions of emotion by humans: research overview and perspectives. J Mach Learn Res 13:1589–1608MathSciNetGoogle Scholar
  35. 35.
    Mckeown G, Valstar MF, Cowie R, Pantic M, Schroeder M (2012) The semaine database: annotated multimodal records of emotionally coloured conversations between a person and a limited agent. IEEE Trans Affect Comput 3(1):5–17CrossRefGoogle Scholar
  36. 36.
    Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987CrossRefGoogle Scholar
  37. 37.
    Ojansivu V, Heikkil J (2008) Blur insensitive texture classification using local phase quantization. Lect Notes Comput Sci 5099:236–243CrossRefGoogle Scholar
  38. 38.
    O’Toole AJ, Harms J, Snow SL, Hurst DR, Pappas MR, Ayyad JH, Abdi H (2005) A video database of moving faces and people. IEEE Trans Pattern Anal Mach Intell 27(5):812–816CrossRefGoogle Scholar
  39. 39.
    Pantic M (2009) Machine analysis of facial behaviour: naturalistic and dynamic behaviour. Philos Trans R Soc B-Biol Sci 364(1535):3505–3513CrossRefGoogle Scholar
  40. 40.
    Pantic M, Rothkrantz L (2000) Automatic analysis of facial expressions: the state of the art. IEEE Trans Pattern Anal Mach Intell 22(12):1424–1445CrossRefGoogle Scholar
  41. 41.
    Pantic M, Valstar MF, Rademaker R, Maat L (2005) Web-based database for facial expression analysis. In: Proceedings of IEEE international conference on multimedia and expo (ICME’05). Amsterdam
  42. 42.
    Rusell JA (1980) A circumplex model of affect. J Personal Social Psychol 39:1161–1178CrossRefGoogle Scholar
  43. 43.
    Ryan A, Cohn J, Lucey S, Saragih J, Lucey P, la Torre FD, Rossi A (2009) Automated facial expression recognition system. In: Proceedings of the international Carnahan conference on security technology, pp 172–177Google Scholar
  44. 44.
    Saragih JM, Lucey S, Cohn JF (2011) Deformable model fitting by regularized landmark mean-shift. Int J Comput Vis (IJCV) 91:200–215MathSciNetCrossRefzbMATHGoogle Scholar
  45. 45.
    Savran A, Alyuz N, Dibeklioğlu H, Celiktutan O, Gökberk B, Sankur B, Akarun L (2008) Bosphorus database for 3D face analysis. In: First COST 2101 workshop on biometrics and identity management (BIOID 2008)Google Scholar
  46. 46.
    Savran A, Sankur B, Bilge MT (2012) Comparative evaluation of 3d versus 2d modality for automatic detection of facial action units. Pattern Recogn 45(2):767–782CrossRefGoogle Scholar
  47. 47.
    Shan C, Gong S, McOwan PW (2009) Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis Comput 27:803–816CrossRefGoogle Scholar
  48. 48.
    Turan C, Kansin C, Erdem CE (2013) Bahcesehir University multimodal affective database (BAUM-2).
  49. 49.
    Ulukaya S, Erdem CE (2012) Estimation of the neutral face shape using gaussian mixture models. In: IEEE international conference on acoustics, speech and signal processing (ICASSP 2012). Kyoto, 1385–1388Google Scholar
  50. 50.
    Valstar MF, Jiang B, Mehu M, Pantic M, Scherer KR (2011) The first facial expression recognition and analysis challenge. In: IEEE international conference face and gesture recognition (FG’2011)Google Scholar
  51. 51.
    Viera AJ, Garrett JM (2005) Understanding interobserver agreement: the kappa statistic. Fam Med37(5)Google Scholar
  52. 52.
    Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154CrossRefGoogle Scholar
  53. 53.
    Wallhoff F (2006) Facial expressions and emotion database [online]. Available:
  54. 54.
    Watkins MW, Pacheco M (2000) Interobserver agreement in behavioral research. J Behav Educ 10(4):205–212CrossRefGoogle Scholar
  55. 55.
    Whissell C Emotion: theory, research and experience. The measurement of emotions, vol. 4, chap. The dictionary of affect in language. Academic, New YorkGoogle Scholar
  56. 56.
    Wischik LAvi utils. Last Accessed: 01/07/2013
  57. 57.
    Yang S, Bhanu B (2012) Understanding discrete facial expressions in video using an emotion avatar image. IEEE Trans Syst Man Cybern - Part B: Cybern 42(4):980–992CrossRefGoogle Scholar
  58. 58.
    Zeng ZH, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58CrossRefGoogle Scholar
  59. 59.
    Zhang X, Yin L, Cohn J, Canavan S, Reale M, Horowitz A, Liu P (2013) A highresolution spontaneous 3D dynamic facial expression database. In: International conference on automatic face and gesture recognition (FG’13). ShanghaiGoogle Scholar
  60. 60.
    Zhang Z, Schuller B (2012) Active learning by sparse instance tracking and classifier confidence in acoustic emotion recognition. In: ISCA (ed) Proceedings of INTERSPEECH. PortlandGoogle Scholar
  61. 61.
    Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Cigdem Eroglu Erdem
    • 1
    Email author
  • Cigdem Turan
    • 1
  • Zafer Aydin
    • 1
  1. 1.Department of Electrical and Electronics EngineeringBahcesehir UniversityIstanbulTurkey

Personalised recommendations