Multimodal Database of Emotional Speech, Video and Gestures

  • Tomasz Sapiński
  • Dorota KamińskaEmail author
  • Adam Pelikant
  • Cagri Ozcinar
  • Egils Avots
  • Gholamreza Anbarjafari
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11188)


People express emotions through different modalities. Integration of verbal and non-verbal communication channels creates a system in which the message is easier to understand. Expanding the focus to several expression forms can facilitate research on emotion recognition as well as human-machine interaction. In this article, the authors present a Polish emotional database composed of three modalities: facial expressions, body movement and gestures, and speech. The corpora contains recordings registered in studio conditions, acted out by 16 professional actors (8 male and 8 female). The data is labeled with six basic emotions categories, according to Ekman’s emotion categories. To check the quality of performance, all recordings are evaluated by experts and volunteers. The database is available to academic community and might be useful in the study on audio-visual emotion recognition.


Multimodal database Emotions Speech Video Gestures 



The authors would like to thank Michał Wasażnik (psychologist), who participated in experimental protocol creation. This work is supported Estonian Research Council Grant (PUT638), the Scientific and Technological Research Council of Turkey (TÜBİTAK) (Proje 1001 - 116E097), Estonian-Polish Joint Research Project, the Estonian Centre of Excellence in IT (EXCITE) funded by the European Regional Development Fund. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan XP GPU used for this research.


  1. 1.
    Baltrušaitis, T., et al.: Real-time inference of mental states from facial expressions and upper body gestures. In: 2011 IEEE International Conference on Automatic Face & Gesture Recognition and Workshops (FG 2011), pp. 909–914. IEEE (2011)Google Scholar
  2. 2.
    Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: Ninth European Conference on Speech Communication and Technology (2005)Google Scholar
  3. 3.
    Camras, L.A., Oster, H., Campos, J.J., Miyake, K., Bradshaw, D.: Japanese and american infants’ responses to arm restraint. Dev. Psychol. 28(4), 578 (1992)CrossRefGoogle Scholar
  4. 4.
    Daneshmand, M., et al.: 3D scanning: a comprehensive survey. arXiv preprint arXiv:1801.08863 (2018)
  5. 5.
    Douglas-Cowie, E., Cowie, R., Schröder, M.: A new emotion database: considerations, sources and scope. In: ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion (2000)Google Scholar
  6. 6.
    Efron, D.: Gesture and environment (1941)Google Scholar
  7. 7.
    Ekman, P.: Universal and cultural differences in facial expression of emotion. Nebr. Sym. Motiv. 19, 207–283 (1971)Google Scholar
  8. 8.
    Gavrilescu, M.: Recognizing emotions from videos by studying facial expressions, body postures and hand gestures. In: 2015 23rd Telecommunications Forum Telfor (TELFOR), pp. 720–723. IEEE (2015)Google Scholar
  9. 9.
    Gelder, B.D.: Why bodies? Twelve reasons for including bodily expressions in affective neuroscience. Philos. Trans. R. Soc. B: Biol. Sci. 364(364), 3475–3484 (2009)CrossRefGoogle Scholar
  10. 10.
    Goswami, G., Vatsa, M., Singh, R.: RGB-D face recognition with texture and attribute features. IEEE Trans. Inf. Forensics Secur. 9(10), 1629–1640 (2014)CrossRefGoogle Scholar
  11. 11.
    Greco, A., Valenza, G., Citi, L., Scilingo, E.P.: Arousal and valence recognition of affective sounds based on electrodermal activity. IEEE Sens. J. 17(3), 716–725 (2017)CrossRefGoogle Scholar
  12. 12.
    Gupta, R., Khomami Abadi, M., Cárdenes Cabré, J.A., Morreale, F., Falk, T.H., Sebe, N.: A quality adaptive multimodal affect recognition system for user-centric multimedia indexing. In: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pp. 317–320. ACM (2016)Google Scholar
  13. 13.
    Haamer, R.E., et al.: Changes in facial expression as biometric: a database and benchmarks of identification. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 621–628. IEEE (2018)Google Scholar
  14. 14.
    Haque, M.A., et al.: Deep multimodal pain recognition: a database and comparison of spatio-temporal visual modalities. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 250–257. IEEE (2018)Google Scholar
  15. 15.
    Hg, R., Jasek, P., Rofidal, C., Nasrollahi, K., Moeslund, T.B., Tranchet, G.: An RGB-D database using microsoft’s kinect for windows for face detection. In: 2012 Eighth International Conference on Signal Image Technology and Internet Based Systems (SITIS), pp. 42–46. IEEE (2012)Google Scholar
  16. 16.
    Jenke, R., Peer, A., Buss, M.: Feature extraction and selection for emotion recognition from EEG. IEEE Trans. Affect. Comput. 5(3), 327–339 (2014)CrossRefGoogle Scholar
  17. 17.
    Jerritta, S., Murugappan, M., Wan, K., Yaacob, S.: Emotion recognition from facial EMG signals using higher order statistics and principal component analysis. J. Chin. Inst. Eng. 37(3), 385–394 (2014)CrossRefGoogle Scholar
  18. 18.
    Kamińska, D., Sapiński, T., Anbarjafari, G.: Efficiency of chosen speech descriptors in relation to emotion recognition. EURASIP J. Audio Speech Music Process. 2017(1), 3 (2017)CrossRefGoogle Scholar
  19. 19.
    Kendon, A.: The study of gesture: some remarks on its history. In: Deely, J.N., Lenhart, M.D. (eds.) Semiotics 1981, pp. 153–164. Springer, Heidelberg (1983). Scholar
  20. 20.
    Kiforenko, L., Kraft, D.: Emotion recognition through body language using RGB-D sensor. Vision Theory and Applications Computer Vision Theory and Applications, pp. 398–405. SCITEPRESS Digital Library (2016) In: 11th International Conference on Computer Vision Theory and Applications Computer Vision Theory and Applications, pp. 398–405. SCITEPRESS Digital Library (2016)Google Scholar
  21. 21.
    Lopes, A.T., de Aguiar, E., De Souza, A.F., Oliveira-Santos, T.: Facial expression recognition with convolutional neural networks: coping with few data and the training sample order. Pattern Recognit. 61, 610–628 (2017)CrossRefGoogle Scholar
  22. 22.
    Lüsi, I., Escarela, S., Anbarjafari, G.: SASE: RGB-depth database for human head pose estimation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 325–336. Springer, Cham (2016). Scholar
  23. 23.
    Min, R., Kose, N., Dugelay, J.L.: KinectFaceDB: a kinect database for face recognition. IEEE Trans. Syst. Man Cybern. Syst. 44(11), 1534–1548 (2014)CrossRefGoogle Scholar
  24. 24.
    Noroozi, F., Corneanu, C.A., Kamińska, D., Sapiński, T., Escalera, S., Anbarjafari, G.: Survey on emotional body gesture recognition. arXiv preprint arXiv:1801.07481 (2018)
  25. 25.
    Noroozi, F., Sapiński, T., Kamińska, D., Anbarjafari, G.: Vocal-based emotion recognition using random forests and decision tree. Int. J. Speech Technol. 20(2), 239–246 (2017)CrossRefGoogle Scholar
  26. 26.
    Pease, B., Pease, A.: The Definitive Book of Body Language. Bantam, New York City (2004)zbMATHGoogle Scholar
  27. 27.
    Pławiak, P., Sośnicki, T., Niedźwiecki, M., Tabor, Z., Rzecki, K.: Hand body language gesture recognition based on signals from specialized glove and machine learning algorithms. IEEE Trans. Ind. Inform. 12(3), 1104–1113 (2016)CrossRefGoogle Scholar
  28. 28.
    Plutchik, R.: The nature of emotions human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. Am. Sci. 89(4), 344–350 (2001)CrossRefGoogle Scholar
  29. 29.
    Psaltis, A., et al.: Multimodal affective state recognition in serious games applications. In: 2016 IEEE International Conference on Imaging Systems and Techniques (IST), pp. 435–439. IEEE (2016)Google Scholar
  30. 30.
    Ranganathan, H., Chakraborty, S., Panchanathan, S.: Multimodal emotion recognition using deep learning architectures. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–9. IEEE (2016)Google Scholar
  31. 31.
    Russell, J., Mehrabian, A.: Evidence for a three-factor theory of emotions. J. Res. Pers. 11, 273–294 (1977)CrossRefGoogle Scholar
  32. 32.
    Savran, A., Alyüz, N., Dibeklioğlu, H., Çeliktutan, O., Gökberk, B., Sankur, B., Akarun, L.: Bosphorus database for 3D face analysis. In: Schouten, B., Juul, N.C., Drygajlo, A., Tistarelli, M. (eds.) BioID 2008. LNCS, vol. 5372, pp. 47–56. Springer, Heidelberg (2008). Scholar
  33. 33.
    Wan, J., et al.: Results and analysis of ChaLearn lap multi-modal isolated and continuous gesture recognition, and real versus fake expressed emotions challenges. In: ChaLearn LAP, Action, Gesture, and Emotion Recognition Workshop and Competitions: Large Scale Multimodal Gesture Recognition and Real versus Fake expressed emotions, ICCV, vol. 4 (2017)Google Scholar
  34. 34.
    Yin, L., Chen, X., Sun, Y., Worm, T., Reale, M.: A high-resolution 3D dynamic facial expression database. In: 8th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2008, pp. 1–6. IEEE (2008)Google Scholar
  35. 35.
    Yin, L., Wei, X., Sun, Y., Wang, J., Rosato, M.J.: A 3D facial expression database for facial behavior research. In: 7th International Conference on Automatic face and gesture recognition, FGR 2006, pp. 211–216. IEEE (2006)Google Scholar
  36. 36.
    Zhang, K., Huang, Y., Du, Y., Wang, L.: Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Trans. Image Process. 26(9), 4193–4203 (2017)MathSciNetCrossRefGoogle Scholar
  37. 37.
    Zhang, X., et al.: BP4D-spontaneous: a high-resolution spontaneous 3D dynamic facial expression database. Image Vis. Comput. 32(10), 692–706 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Tomasz Sapiński
    • 1
  • Dorota Kamińska
    • 1
    Email author
  • Adam Pelikant
    • 1
  • Cagri Ozcinar
    • 2
  • Egils Avots
    • 3
  • Gholamreza Anbarjafari
    • 3
  1. 1.Inst of Mechatronics and Info SysLodz University of TechnologyŁódźPoland
  2. 2.Computer Science and StatisticsTrinity College DublinDublin 2Ireland
  3. 3.iCV Research Lab, Institute of TechnologyUniversity of TartuTartuEstonia

Personalised recommendations