CCPR 2016: Pattern Recognition pp 667-678 | Cite as

MEC 2016: The Multimodal Emotion Recognition Challenge of CCPR 2016

  • Ya Li
  • Jianhua Tao
  • Björn Schuller
  • Shiguang Shan
  • Dongmei Jiang
  • Jia Jia
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 663)

Abstract

Emotion recognition is a significant research filed of pattern recognition and artificial intelligence. The Multimodal Emotion Recognition Challenge (MEC) is a part of the 2016 Chinese Conference on Pattern Recognition (CCPR). The goal of this competition is to compare multimedia processing and machine learning methods for multimodal emotion recognition. The challenge also aims to provide a common benchmark data set, to bring together the audio and video emotion recognition communities, and to promote the research in multimodal emotion recognition. The data used in this challenge is the Chinese Natural Audio-Visual Emotion Database (CHEAVD), which is selected from Chinese movies and TV programs. The discrete emotion labels are annotated by four experienced assistants. Three sub-challenges are defined: audio, video and multimodal emotion recognition. This paper introduces the baseline audio, visual features, and the recognition results by Random Forests.

Keywords

Audio-visual corpus Features Multimodal fusion Challenge Emotion Affective computing 

Notes

Acknowledgement

This work is supported by the National High-Tech Research and Development Program of China (863 Program) (No. 2015AA016305), the National Natural Science Foundation of China (NSFC) (No. 61305003, No. 61425017), the Strategic Priority Research Program of the CAS (Grant XDB02080006), and partly supported by the Major Program for the National Social Science Fund of China (13 & ZD189).

We thank the data providers for their kind permission to make their data for non-commercial, scientific use. Due to space limitations, providers’ information is available in http://www.speakit.cn/. The corpus can be freely achieved at ChineseLDC, http://www.chineseldc.org.

References

  1. 1.
    McKeown, G., Valstar, M., Cowie, R., Pantic, M., Schröder, M.: The semaine database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans. Affect. Comput. 3, 5–17 (2012)CrossRefGoogle Scholar
  2. 2.
    Dhall, A., Goecke, R., Lucey, S., Gedeon, T.: Collecting large, richly annotated facial-expression databases from movies. IEEE Multimedia 19, 34–41 (2012)CrossRefGoogle Scholar
  3. 3.
    Dhall, A., Goecke, R., Lucey, S., Gedeon, T.: Static facial expression analysis in tough conditions: data, evaluation protocol and benchmark. In: IEEE International Conference on Computer Vision Workshops, ICCV 2011 Workshops, Barcelona, Spain, pp. 2106–2112, November 2011Google Scholar
  4. 4.
    Schuller, B., Valstar, M., Eyben, F., McKeown, G., Cowie, R., Pantic, M.: AVEC 2011-the first international audio/visual emotion challenge. In: Affective Computing and Intelligent Interaction, pp. 415–424 (2011)Google Scholar
  5. 5.
    Schuller, B., Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In: Interspeech, pp. 312–315 (2009)Google Scholar
  6. 6.
    Valstar, M.F., Jiang, B., Mehu, M., Pantic, M., Scherer, K.: The first facial expression recognition and analysis challenge. In: 2011 IEEE International Conference on Automatic Face and Gesture Recognition and Workshops (FG 2011), pp. 921–926 (2011)Google Scholar
  7. 7.
    Dhall, A., Ramana Murthy, O., Goecke, R., Joshi, J., Gedeon, T.: Video and image based emotion recognition challenges in the wild: Emotiw 2015. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 423–426 (2015)Google Scholar
  8. 8.
    Ringeval, F., Schuller, B., Valstar, M., Jaiswal, S., Marchi, E., Lalanne, D., et al.: AV+EC 2015: the first affect recognition challenge bridging across audio, video, and physiological data. In: Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, pp. 3–8 (2015)Google Scholar
  9. 9.
    Ververidis, D., Kotropoulos, C.: Emotional speech recognition: resources, features, and methods. Speech Commun. 48, 1162–1181 (2006)CrossRefGoogle Scholar
  10. 10.
    Wu, C.-H., Lin, J.-C., Wei, W.-L.: Survey on audiovisual emotion recognition: databases, features, and data fusion strategies. APSIPA Trans. Signal Inf. Process. 3, 12 (2014)CrossRefGoogle Scholar
  11. 11.
    Douglas-Cowie, E., Campbell, N., Cowie, R., Roach, P.: Emotional speech: towards a new generation of databases. Speech Commun. 40, 33–60 (2003)CrossRefMATHGoogle Scholar
  12. 12.
    Grimm, M., Kroschel, K., Narayanan, S.: The Vera am Mittag German audio-visual emotional speech database. In: International Conference on Multimedia Computing and Systems/International Conference on Multimedia and Expo, pp. 865–868 (2008)Google Scholar
  13. 13.
    Devillers, L., Cowie, R., Martin, J.C., Douglas-Cowie, E., Abrilian, S., Mcrorie, M.: Real life emotions in French and English TV video clips: an integrated annotation protocol combining continuous and discrete approaches. In: International Conference on Language Resources and Evaluation, pp. 1105–1110 (2006)Google Scholar
  14. 14.
    Clavel, C., Vasilescu, I., Devillers, L., Richard, G., Ehrette, T., Sedogbo, C.: The SAFE corpus: illustrating extreme emotions in dynamic situations. In: First International Workshop on Emotion: Corpora for Research on Emotion and Affect, pp. 76–79 (2006)Google Scholar
  15. 15.
    Bao, W., Li, Y., Gu, M., Yang, M., Li, H., Chao, L., et al.: Building a Chinese natural emotional audio-visual database. In: 2014 International Conference on Signal Processing, pp. 583–587 (2014)Google Scholar
  16. 16.
    Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., et al.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18, 32–80 (2001)CrossRefGoogle Scholar
  17. 17.
    Gobl, C., Chasaide, A.N.: The role of voice quality in communicating emotion, mood and attitude. Speech Commun. 40, 189–212 (2003)CrossRefMATHGoogle Scholar
  18. 18.
    Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 835–838 (2013)Google Scholar
  19. 19.
    Xiong, X., Torre, F.D.L.: Supervised descent method and its applications to face alignment. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 532–539 (2013)Google Scholar
  20. 20.
    Viola, P., Jones, M.J.: Robust real-time object detection. Int. J. Comput. Vision 57, 87 (2001)Google Scholar
  21. 21.
    Zhao, G., Pietikinen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29, 915–928 (2007)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2016

Authors and Affiliations

  • Ya Li
    • 1
  • Jianhua Tao
    • 1
    • 2
  • Björn Schuller
    • 3
    • 4
    • 5
  • Shiguang Shan
    • 6
  • Dongmei Jiang
    • 7
  • Jia Jia
    • 8
  1. 1.Institute of AutomationChinese Academy of SciencesBeijingPeople’s Republic of China
  2. 2.University of Chinese Academy of SciencesBeijingPeople’s Republic of China
  3. 3.Chair of Complex and Intelligent SystemsUniversity of PassauPassauGermany
  4. 4.Department of ComputingImperial College LondonLondonUK
  5. 5.Harbin Institute of TechnologyHarbinPeople’s Republic of China
  6. 6.Institute of Computing TechnologyChinese Academy of SciencesBeijingPeople’s Republic of China
  7. 7.Northwestern Polytechnical UniversityXi’anPeople’s Republic of China
  8. 8.Tsinghua UniversityBeijingPeople’s Republic of China

Personalised recommendations