CCPR 2016: Pattern Recognition pp 667-678 | Cite as
MEC 2016: The Multimodal Emotion Recognition Challenge of CCPR 2016
Abstract
Emotion recognition is a significant research filed of pattern recognition and artificial intelligence. The Multimodal Emotion Recognition Challenge (MEC) is a part of the 2016 Chinese Conference on Pattern Recognition (CCPR). The goal of this competition is to compare multimedia processing and machine learning methods for multimodal emotion recognition. The challenge also aims to provide a common benchmark data set, to bring together the audio and video emotion recognition communities, and to promote the research in multimodal emotion recognition. The data used in this challenge is the Chinese Natural Audio-Visual Emotion Database (CHEAVD), which is selected from Chinese movies and TV programs. The discrete emotion labels are annotated by four experienced assistants. Three sub-challenges are defined: audio, video and multimodal emotion recognition. This paper introduces the baseline audio, visual features, and the recognition results by Random Forests.
Keywords
Audio-visual corpus Features Multimodal fusion Challenge Emotion Affective computingNotes
Acknowledgement
This work is supported by the National High-Tech Research and Development Program of China (863 Program) (No. 2015AA016305), the National Natural Science Foundation of China (NSFC) (No. 61305003, No. 61425017), the Strategic Priority Research Program of the CAS (Grant XDB02080006), and partly supported by the Major Program for the National Social Science Fund of China (13 & ZD189).
We thank the data providers for their kind permission to make their data for non-commercial, scientific use. Due to space limitations, providers’ information is available in http://www.speakit.cn/. The corpus can be freely achieved at ChineseLDC, http://www.chineseldc.org.
References
- 1.McKeown, G., Valstar, M., Cowie, R., Pantic, M., Schröder, M.: The semaine database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans. Affect. Comput. 3, 5–17 (2012)CrossRefGoogle Scholar
- 2.Dhall, A., Goecke, R., Lucey, S., Gedeon, T.: Collecting large, richly annotated facial-expression databases from movies. IEEE Multimedia 19, 34–41 (2012)CrossRefGoogle Scholar
- 3.Dhall, A., Goecke, R., Lucey, S., Gedeon, T.: Static facial expression analysis in tough conditions: data, evaluation protocol and benchmark. In: IEEE International Conference on Computer Vision Workshops, ICCV 2011 Workshops, Barcelona, Spain, pp. 2106–2112, November 2011Google Scholar
- 4.Schuller, B., Valstar, M., Eyben, F., McKeown, G., Cowie, R., Pantic, M.: AVEC 2011-the first international audio/visual emotion challenge. In: Affective Computing and Intelligent Interaction, pp. 415–424 (2011)Google Scholar
- 5.Schuller, B., Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In: Interspeech, pp. 312–315 (2009)Google Scholar
- 6.Valstar, M.F., Jiang, B., Mehu, M., Pantic, M., Scherer, K.: The first facial expression recognition and analysis challenge. In: 2011 IEEE International Conference on Automatic Face and Gesture Recognition and Workshops (FG 2011), pp. 921–926 (2011)Google Scholar
- 7.Dhall, A., Ramana Murthy, O., Goecke, R., Joshi, J., Gedeon, T.: Video and image based emotion recognition challenges in the wild: Emotiw 2015. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 423–426 (2015)Google Scholar
- 8.Ringeval, F., Schuller, B., Valstar, M., Jaiswal, S., Marchi, E., Lalanne, D., et al.: AV+EC 2015: the first affect recognition challenge bridging across audio, video, and physiological data. In: Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, pp. 3–8 (2015)Google Scholar
- 9.Ververidis, D., Kotropoulos, C.: Emotional speech recognition: resources, features, and methods. Speech Commun. 48, 1162–1181 (2006)CrossRefGoogle Scholar
- 10.Wu, C.-H., Lin, J.-C., Wei, W.-L.: Survey on audiovisual emotion recognition: databases, features, and data fusion strategies. APSIPA Trans. Signal Inf. Process. 3, 12 (2014)CrossRefGoogle Scholar
- 11.Douglas-Cowie, E., Campbell, N., Cowie, R., Roach, P.: Emotional speech: towards a new generation of databases. Speech Commun. 40, 33–60 (2003)CrossRefMATHGoogle Scholar
- 12.Grimm, M., Kroschel, K., Narayanan, S.: The Vera am Mittag German audio-visual emotional speech database. In: International Conference on Multimedia Computing and Systems/International Conference on Multimedia and Expo, pp. 865–868 (2008)Google Scholar
- 13.Devillers, L., Cowie, R., Martin, J.C., Douglas-Cowie, E., Abrilian, S., Mcrorie, M.: Real life emotions in French and English TV video clips: an integrated annotation protocol combining continuous and discrete approaches. In: International Conference on Language Resources and Evaluation, pp. 1105–1110 (2006)Google Scholar
- 14.Clavel, C., Vasilescu, I., Devillers, L., Richard, G., Ehrette, T., Sedogbo, C.: The SAFE corpus: illustrating extreme emotions in dynamic situations. In: First International Workshop on Emotion: Corpora for Research on Emotion and Affect, pp. 76–79 (2006)Google Scholar
- 15.Bao, W., Li, Y., Gu, M., Yang, M., Li, H., Chao, L., et al.: Building a Chinese natural emotional audio-visual database. In: 2014 International Conference on Signal Processing, pp. 583–587 (2014)Google Scholar
- 16.Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., et al.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18, 32–80 (2001)CrossRefGoogle Scholar
- 17.Gobl, C., Chasaide, A.N.: The role of voice quality in communicating emotion, mood and attitude. Speech Commun. 40, 189–212 (2003)CrossRefMATHGoogle Scholar
- 18.Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 835–838 (2013)Google Scholar
- 19.Xiong, X., Torre, F.D.L.: Supervised descent method and its applications to face alignment. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 532–539 (2013)Google Scholar
- 20.Viola, P., Jones, M.J.: Robust real-time object detection. Int. J. Comput. Vision 57, 87 (2001)Google Scholar
- 21.Zhao, G., Pietikinen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29, 915–928 (2007)CrossRefGoogle Scholar