Abstract
The importance of context-aware computing in understanding social signals gave a rise to a new emerging domain, called social signal processing (SSP). SSP depends heavily on the existence of comprehensive multimodal databases containing the descriptors of social context and behaviors, such as situational environment, roles and gender of human participants. In the recent paper SSP community has emphasized how current research lacks of the adequate data, for the greatest part because acquisition and annotation of large multimodal datasets are time- and resource-consuming for the researchers. This paper aims to collect the existing work in this scope and to deliver the key aspects and clear directions for managing the multimodal behavior data. It reviews some of the existing databases, gives their important characteristics and draws the most important tools and methods conducted in capturing and managing social behavior signals. Summarizing the relevant findings it also addresses the existing issues and proposes fundamental topics that need to be investigated in the future research.
Similar content being viewed by others
References
Afzal S, Robinson P (2009) Natural affect data—collection & annotation in a learning context. Design, 1–7. Affective Computing and Intelligent Interaction and Workshops
Alameda-Pineda X, Sanchez-Riera J, Wienke J, Franc V, Cech J, Kulkarni K, Deleforge A, Horaud R (2011) The Ravel dataset. ICMI workshop on multimodal corpora for machine learining: taking stock and roadmapping the future
Albrecht K (2005) Social intelligence: the new science of success. Wiley, Hoboken, NJ
Allwood J (2008) Multimodal corpora. In: Lüdeling A, Kytö M (eds) Corpus linguistics an international handbook. Mouton de Gruyter, Berlin, pp 207–225
Allwood J, Cerrato L, Jokinen K, Navarretta C, Paggio P (2008) The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. Lang Resour Eval 41(3-4): 273–287
Aran O, Hung H, Gatica-Perez D (2010) A multimodal corpus for studying dominance in small group conversations. In: Proceedings of the LREC workshop on multimodal corpora malta
Argyle M (1988) Bodily communication. Methuen, London
Bakeman R, Gothman JM (1997) Observing interaction: an introduction to sequential analysis. Cambridge University Press, UK
Bakeman R, Gottman J (1986) Observing interaction: an introduction to sequential analysis. Cambridge University Press, Nova York, NY
Banziger, Scherer KR (2010) Introducing the geneva multimodal emotion portrayal (gemep) corpus. In: Blueprint for affective computing: a sourcebook, series in affective science, chapter 6.1. pp 271–294
Banziger T, Pirker H, Scherer K (2006) GEMEP–GEneva multimodal emotion portrayals: a corpus for the study of multimodal emotional expressions. In: Workshop on Corpora for Research on Emotion and Affect
Battersby SA, Healey PGT (2010) Using head movement to detect listener responses during multi-party dialogue. In: Kipp M, Martin J-C, Paggio P, Heylen D (eds) Proceedings of LREC workshop on multimodal corpora advances in capturing coding and analysing multimodality. pp 11–15
Blache P, Ferré G, Rauzy S (2007) An XML coding scheme for multimodal corpus annotation. Image Rochester NY, pp 1–17
Blache P, Bertrand R, Ferré G (2008) Creating and exploiting multimodal annotated corpora. In: Calzolari N, Choukri K, Maegaard B, Mariani J, Odjik J, Piperidis S, Tapias D (eds) Language resource and evaluation conference LREC08. pp 38–53
Black M, Katsamanis A, Lee C, Lammert AC, Baucom BR, Christensen A, Georgiou PG, Narayanan SS (2010) Automatic classification of married couples’ behavior using audio features. In Proc of Interspeech, pp 2030–2033
Black MP, Bone D, Williams ME, Gorrindo P, Levitt P, Narayanan SS (2011) The USC CARE corpus: child-psychologist interactions of children with Autism spectrum disorders signal analysis and interpretation laboratory (SAIL). Viterbi School of Engineering, Corpus, pp 1497–1500
Boersma P, Weenink D (2009) Praat: doing phonetics by computer (Version 5.1.05) [Computer program]. Retrieved May 1, 2009, from http://www.praat.org/
Boholm M, Allwood J (2010) Repeated head movements, their function and relation to speech. In: Kipp M, Martin J-C, Paggio P, Heylen D (eds) Proceedings of LREC workshop on multimodal corpora advances in capturing coding and analysing multimodality. pp 6–10
Bousmalis K, Mehu M, Pantic M (2009) Spotting agreement and disagreement: a survey of nonverbal audiovisual cues and tools. 2009 3rd international conference on affective computing and intelligent interaction and workshops, II. pp 1–9
Bousmalis K, Morency L, Pantic M (2011) Modeling hidden dynamics of multimodal cues for spontaneous agreement and disagreement recognition. In: Proceedings of IEEE international conference on automatic face and gesture recognition
Brugman H, Russel A (2004) Annotating multi-media/multi-modal resources with ELAN. In: Lino M, Xavier M, Ferreira F, Costa R, Silva R (eds) Text. ELRA, NJ, pp 2065–2068
Burger S, MacLaren V, Yu H (2002) The ISL meeting corpus: the impact of meeting type on speech style. In: Proceedings of the international conference on spoken language processing (ICSLP), Denver
Butko T, Nadeu C, Moreno A (2011) A multilingual corpus for rich audio-visual scene description in a meeting-room environment. ICMI workshop on multimodal corpora for machine learining: taking stock and roadmapping the future
Busso C, Bulut M, Lee C-C, Kazemzadeh A, Mower E, Kim S, Chang JN et al (2008) IEMOCAP: interactive emotional dyadic motion capture database. Lang Resour Eval 42(4): 335–359
Busso C, Deng Z, Yildirim S, Bulut M, Lee CM, Kazemzadeh A, Lee S, et al (2004) Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Sharma R, Darrell T, Harper M, Lazzari G, Turk M (eds) Proceedings of the 6th international conference on Multimodal interfaces ICMI 04. p 205
Campbell N, Sadanobu T, Imura M, Iwahashi N, Noriko S, Douxchamps D, (2006) “A multimedia database of meetings and informal interactions for tracking participant involvement and discourse flow. In: Proceedings of conference on language and resources evaluation, pp 391–394
Caridakis G, Wagner J, Raouzaiou A, Curto Z, Andre E, Karpouzis K (2010) “A multimodal corpus for gesture expressivity analysis” multimodal corpora: advances in capturing, coding and analyzing multimodality, LREC, Malta, May 17–23
Carletta J, Ashby S, Bourban S, Flynn M, Guillemot M, Hain T, Kadlec J, Karaiskos V, Kraaij W, Kronenthal M et al (2005) The AMI meeting corpus: a pre-announcement. Lect Notes Comput Sci 3869: 28–39
Castellano G, Kessous L, Caridakis G (2007) Multimodal emotion recognition from expressive faces, body gestures and speech. In: Proceedings of the doctoral consortium of the 2nd international conference on affective computing and intelligent interaction, 13–14 September, Lisbon, pp 375–388
Castellano G, Leite I, Martinho C, Mcowan PW (2010) Inter-ACT : An affective and contextually rich multimodal video corpus for studying interaction with robots. Corpus 1031–1034
Cavicchio F (2009) Multimodal corpora annotation: validation methods to assess coding scheme reliability. In: Kipp M, Martin J-C, Paggio P, Heylen D (eds) Multimodal corpora from models of natural interaction to systems and applications, vol 5509. pp 109–121
Charfuelan M, Schröder M (2011) Investigating the prosody and voice quality of social signals in scenario meetings. In Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part I (ACII’11), Sidney D’Mello, Arthur Graesser, Björn Schuller, and Jean-Claude Martin (Eds.), Vol. Part I. Springer-Verlag, Berlin, Heidelberg, 46-56.
Charfuelan M, Schroeder M (2011) Investigating the prosody and voice quality of social signals in scenario meetings. ACII 2011: 46–56
Chen L, Rose R, Qiao Y, Kimbara I, Parrill F, Welji H, Han T, Han T (2006) VACE multimodal meeting corpus. Mach Learn Multimodal Interact 3869(C): 40–51
Cowie R, Douglas-Cowie E, Cox C (2005) Beyond emotion archetypes: databases for emotion modelling using neural networks. Neural Netw 18(4): 371–388
Cristani M, Raghavendra R, Del Bue A, Murino V (2012) Human behavior analysis in video surveillance: a social signal processing perspective. Neurocomputing (to appear)
Douglas-Cowie E, Campbell N, Cowie R, Roach P (2003) Emotional speech: towards a new generation of databases. Speech Commun 40(1–2): 33–60
Douglas-Cowie E, Devillers L, Martin JC, Cowie R, Savvidou S, Abrilian S, Cox C (2005) Multimodal databases of everyday emotion: Facing up to complexity. In: Ninth European Conference on Speech Communication and Technology
Douglas-Cowie E, Cowie R, Sneddon I, Cox C, Lowry O, Mcrorie M, Martin J, Devillers L, Abrilian S, Batliner A, et al (2007) The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In: Proceedings of the international conference on affective computing and intelligent interaction. pp 488–500
Douglas-Cowie E, Cowie R, Cox C, Amir N, Heylen D (2008) The sensitive artificial listener: an induction technique for generating emotionally coloured conversation. In Proceedings of LREC workshop on corpora for research on emotion and affect, pp 1–4
Edlund J, Beskow J (2010) Capturing massively multimodal dialogues: affordable synchronization and visualization. In: Kipp M, Martin J-C, Paggio P, Heylen D (eds) Proceedings of multimodal corpora: advances in capturing, coding and analyzing multimodality (MMC 2010). Valetta, Malta, pp 160–161
Ekman P (1999) Emotional and conversational nonverbal signals. In: Messing LS, Campbell R (eds) Gesture, speech and sign. Oxford University Press, New York, NY
Ekman P, Friesen WV (1978) Facial action coding system: a technique for the measurement of facial movement. Consulting Psychologists Press, Palo Alto, CA
Faden R, Beauchamp T, King N (1986) A history and theory of informed consent. Oxford University Press, USA
Fanelli G, Gall J, Romsdorfer H, Weise T, Van Gool L (2010a) A 3-D audio-visual corpus of affective communication. IEEE Trans Multimedia 12(6): 591–598
Fanelli G, Gall J, Romsdorfer H, Weise T, Van Gool L (2010b) 3D vision technology for capturing multimodal corpora: chances and challenges. LREC WS on Multimodal Corpora, Malta
Farma T, Cortivonis I (2000) ‘Un Questionario sul “Locus of Control”: Suo Utilizzo nel Contesto Italiano’ (A questionnaire on the “Locus of Control”: its use in the Italian context). Ricerca in Psicoterapia, vol 2
Fernández R, Hernández LA, López E, Alcázar J, Portillo G, Toledano DT (2001) Design of a multimodal database for research on automatic detection of severe apnoea cases. In: Calzolari N, Choukri K, Maegaard B, Mariani J, Odjik J, Piperidis S, Tapias D (eds) The Laryngoscope pp 1785–1790
Fisher D, Williams M, Andriacchi T (2003) The therapeutic potential for changing patterns of locomotion: an application to the acl deficient knee. In: ASME Bioengineering Conference
Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76: 378–382
Fleury A, Vacher M, Portet F, Chahuara P, Noury N (2010) A multimodal corpus recorded in a health smart home. In: Proceedings of the workshop on multimodal corpora advances in capturing coding and analyzing multimodality in conjunction with LREC 2010. pp 99–105
Frey J, Neßelrath R, Schulz CH, Alexandersson J (2010) SensHome: towards a corpus for everyday activities in smart homes. In: Proceedings of the workshop on multimodal corpora advances in capturing coding and analyzing multimodality
Garg S, Martinovski B, Robinson S, Stephan J, Tetreault J, Traum DR, Rey M (2004) Evaluation of transcription and annotation tools for a multi-modal, multi-party dialogue corpus. Forth international conference on language resources and evaluation (LREC 2004), pp 2163–2166
Garofolo JS, Laprun CD, Michel M, Stanford VM, Tabassi E (2004) The NIST meeting room pilot corpus. In: Proceedings of the 4th intl conf on language resources and evaluation
Gatica-Perez D (2009) Automatic nonverbal analysis of social interaction in small groups: a review. Image Vis Comput 27(12): 1775–1787
Grimm M, Kroschel K, Narayanan S (2008) The vera am mittag German audio-visual emotional speech database. In: proceedings of the IEEE international conference on multimedia and expo (ICME), Hannover, Germany, pp 865–868
Hall ET (1966) The hidden dimension. Anchor Books, New York, NY
Hayes AF, Krippendorff K (2007) Answering the call for a standard reliability measure for coding data. Commun Methods Meas 1(1): 77–89
Herrera D, Novick D, Jan D, Traum D (2010) The UTEP-ICT cross-cultural multiparty multimodal dialog corpus. In: Proceedings of the multimodal corpora workshop: advances in capturing, coding and analyzing multimodality (MMC 2010), May 2010
Hung H, Chittaranjan G (2010) The idiap wolf corpus: exploring group behaviour in a competitive role-playing game. ACM Multimedia, Florence, Italy
Janin A, Baron D, Edwards J, Ellis D, Gelbart D, Morgan N, Peskin B, Pfau T, Shriberg E, Stolcke A, Wooters C (2003) The ICSI meeting corpus. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing I. pp 364–367
Jiang B, Valstar M, Pantic M (2011) Action Unit detection using sparse appearance descriptors in space-time video volumes. In: Proceedings of IEEE international conference on automatic face and gesture recognition
Jokinen K, Minna V (2009) Stand-up gestures—annotation for communication management. In: Proceedings of the NODALIDA 2009 workshop multimodal communication: from human behaviour to computational models. Odense, Denmark, May 2009. pp 15–20
Jokinen K, Navarretta C, Paggio P (2008) Distinguishing the communicative functions of gestures. In: Proceedings of the 5th joint workshop on machine learning and multimodal interaction, 8–10 September 2008, Utrecht, The Netherlands
Kanade T, Cohn J, Tian YL (2000) Comprehensive database for facial expression analysis. In: Proceedings of the IEEE FG 00
Kim T, Pentland AS (2009) Understanding effects of feedback on group collaboration. Artif Intell 9: 61–70
Kipp M (2010) Multimedia annotation, querying and analysis in ANVIL. In: Maybury M (ed) Multimedia information extraction, chapter 19. MIT Press
Kipp M, Neff M, Albrecht I (2008) An annotation scheme for conversational gestures: how to economically capture timing and form. Lang Resour Eval 41(3–4): 325–339
Koelstra S, Mühl C, Soleymani M, Yazdani A, Lee J-S, Ebrahimi T, Pun T, Nijholt A, Patras I (2012) DEAP: a database for emotion analysis using physiological signals. IEEE Trans Affective Computing, Special Issue on Naturalistic Affect Resources for System Building and Evaluation (to appear)
Koutsombogera M, Touribaba L, Papageorgiou H (2008) Multimodality in conversation analysis: a case of Greek TV interviews. In: Proceedings of the sixth international conference on language resources and evaluation (LREC 2008) workshop on multimodal coorpora from models of natural interaction to systems and applications, Marrakesh, May 2008. pp 12–15
Knight D, Tennent P, Adolphs S, Carter R (2010) Developing heterogeneous corpora using the Digital Replay System (DRS). In: Proceedings of the 2010 language resources evaluation conference (Workshop on Multimodal Corpora), May 17–23, Malta
Le Chenadec G, Maffiolo V, Chateau N, Colletta J (2006) Creation of a corpus of multimodal spontaneous expressions of emotions in human-machine interaction. In: Proceedings of 5th international conference on language resources and evaluation, 24–26 May, Genova, Italy
Lefter I, Rothkrantz LJM, Burghouts G, Yang Z, Wiggers P (2011) Addressing multimodality in overt aggression detection. In: Habernal I, Matoušek V (eds) Proceedings of the 14th international conference on text, speech and dialogue (TSD’11). Springer, Berlin, Heidelberg, pp 25–32
Leroy J, Mancas M, Gosselin B (2011) Personal space augmented reality tool. In: Proceedings of the 32nd WIC symposium on information theory in the Benelux, Bruxelles
Levine J, Moreland R (1998) Small groups. In: Gilbert D, Lindzey G (eds) The handbook of social psychology, vol 2. Oxford University Press, New York, NY, pp 415–469
Lichtenauer J, Valstar MF, Shen J, Pantic M (2009) Cost-effective solution to synchronized audio-visual capture using multiple sensors. In: Proceedings of IEEE int’l conf. advanced video and signal based surveillance (AVSS’09). Genoa, Italy. pp 324–329
Lombard M (2005) Practical resources for assessing and reporting intercoder reliability in content analysis research projects. Analysis 28(2002): 1–18
Mana N, Lepri B, Chippendale P, Cappelletti A, Pianesi F, Svaizer P, Zancanaro M (2007) Multimodal corpus of multi-party meetings for automatic social behavior analysis and personality traits detection. In: Proceedings of the 2007 workshop on Tagging mining and retrieval of human related activity information TMR 07. ACM Press, 9–14
McCowan I, Bengio S, Gatica-Perez D, Lathoud G, Monay F, Moore D, Wellner P, Bourlard H (2003) Modeling human interaction in meetings. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), Hong Kong
McCowan I, Gatica-Perez D, Bengio S, Lathoud G, Barnard M, Zhang D (2005) Automatic analysis of multimodal group actions in meetings. IEEE Trans Pattern Anal Mach Intell 27(3): 305–317
McKeown G, Valstar MF, Cowie R, Pantic M (2010) The SEMAINE corpus of emotionally coloured character interactions. In: Proceedings of the IEEE international conference on multimedia and expo, ICME 2010, 19–23 July 2010, Singapore
McNeill D (1992) Hand and mind: what gestures reveal about thought. Library. University of Chicago Press, Chicago, IL, p 423
Méhu M (2011) Smiling and laughter in naturally occurring dyadic interactions: relationship to conversation, body contacts, and displacement activities. Hum Ethol Bull 26(1): 10–28
Metallinou A, Lee C-C, Busso C, Carnicke S, Narayanan S, Tx D (2010) The USC CreativeIT database: a multimodal database of theatrical improvisation. In: Proceedings of the multimodal corpora workshop: advances in capturing, coding and analyzing, multimodality (MMC 2010), May 2010, pp 64–68
Mower E, Lee C-C, Gibson J, Chaspari T, Williams M, Narayanan S (2011) Analyzing the nature of ECA interactions in children with Autism. In: proceedings of interspeech, Florence, Italy, pp 2989–2992
Multimodal corpora workshop portal: http://www.multimodal-corpora.org/mmc10.html
Neff M, Kipp M, Albrecht I, Seidel H-P (2008) Statistical reconstruction and animation of specific speakers’ gesturing styles. ACM Trans Graph 27(1): 5:1–5:24
Nick C, Sadanobu T, Imura M, Iwahashi N, Suzuki N, Douxchamps D (2006) A multimedia database of meetings and informal interactions for tracking participant involvement and discourse flow. In: Proceedings of the fifth international language resources and evaluation LREC06. pp 391–394
Ntalampiras S, Arsic D, Stormer A, Ganchev T, Potamitis I, Fakotakis N (2009) Prometheus database: a multimodal corpus for research on modeling and interpreting human behavior. IEEE 16th international conference on digital signal processing 2009. pp 1–8
Oertel C, Cummins R, Campbell N, Edlund J, Wagner P (2010) D64 : A corpus of richly recorded conversational interaction. In: Proceedings of the workshop on multimodal corpora advances in capturing coding and analyzing multimodality in conjunction with LREC
Oh S, Hoogs A, Perera A, Cuntoor N, Chen C-C, Lee JT, Mukherjee S, et al (2008) A large-scale benchmark dataset for event recognition in surveillance video. IEEE Conf Comput Vis Pattern Recognit CVPR 1(2):6
Otsuka K, Takemae Y, Yamato J (2005) A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances. In: Proceedings of the 7th international conference on multimodal interfaces ICMI 05. pp 191–198
Pantic M, Valstar MF, Rademaker R, Maat L (2005) Web-based database for facial expression analysis. In: Proceedings of IEEE int’l conf. multimedia and expo (ICME’05). Amsterdam, The Netherlands, pp 317–321
Pentland A (2007) Social signal processing [exploratory DSP]. IEEE Signal Process Mag 24: 108–111
Perugini M, Di Blas L (2002) Analyzing personality related adjectives from an eticemic perspective: the big five marker scales (BFMS) and the Italian AB5C taxonomy. In: De Raad B, Perugini M (eds) Big five assessment. Hogrefe und Huber Publishers, Göttingen, pp 281–304
Poggi I, Vincze L (2006) The persuasive import of gesture and gaze. Programme of the Workshop on Multimodal Corpora. p 46
Rehm M, Nakano Y, Huang H-H, Lipi AA, Yamaoka Y, Grüneberg F (2008) Creating a standardized corpus of multimodal interactions for enculturating conversational interfaces. In: Proceedings of the IUI-workshop on enculturating interfaces (ECI), Gran Canaria
Richter M, Quasthoff U, Hallsteinsdóttir E, Biemann C (2006) Exploiting the leipzig corpora collection. In: Proceedings of the ISLTC
Rieser H (2010) On factoring out a gesture typology from the bielefeld speech-and-gesture-alignment corpus (SAGA). In: Kopp S, Wachsmuth I (eds) Gesture in embodied communication and humancomputer interaction, vol 4930. pp 47–60)
Rietveld T, van Hout R (1993) Statistical techniques for the study of language and language behaviour. Mouton de Gruyter, Berlin
Rohlfing K, Loehr D, Duncan S, Brown A, Franklin A, Kimbara I, Milde JT, et al (2006) Comparison of multimodal annotation tools—workshop report. Gesprachsforschung–OnlineZeitschrift zur verbalen Interaktion 7: 7–799123
Rozgi V, Xiao B, Katsamanis A, Baucom B, Georgiou PG, Narayanan S (2010) A new multichannel multi modal dyadic interaction database. In: Proceedings of interspeech pp 1982–1985
Russell JA, Mehrabian A (1977) Evidence for a three-factor theory of emotions. J Res Pers 11(3): 273–294
Sanchez-Cortes D, Aran O, Gatica-Perez D (2011) An audio visual corpus for emergent leader analysis. ICMI workshop on multimodal corpora for machine learining: taking stock and roadmapping the future
Soleymani M, Lichtenauer J, Pun T, Pantic M (2011) A multi-modal affective database for affect recognition and implicit tagging. IEEE Trans Affective Computing Special Issue on Naturalistic Affect Resources for System Building and Evaluation, (99)
Stein B, Meredith MA (1993) The merging of senses. MIT Press, Cambridge, MA
Stolzman WT (2006) Toward a social signaling framework: activity and emphasis in speech. Master’s thesis, MIT Media Laboratory
Sun X, Lichtenauer J, Valstar MF, Nijholt A, Pantic M (2011) A multimodal database for mimicry analysis. In: Proceedings of the 4th bi-annual international conference of the HUMAINE association on affective computing and intelligent interaction (ACII2011). Memphis, Tennessee, USA
The humaine portal. http://emotion-research.net/
Thirde D, Li L, Ferryman J (2005) An overview of the PETS 2006 Dataset. PETS’ 2005. pp 317–324
Valstar MF, Jiang B, Mehu M, Pantic M, Scherer K (2011) The first facial expression recognition and analysis In: proceedings of IEEE international conference on automatic face and gesture recognition, workshop on facial expression recognition and analysis change, pp 921–926
Velten E (1968) A laboratory task for induction of mood states. Behav Res Ther 6: 473–482
Vinciarelli A (2009) Capturing order in social interactions. IEEE Signal Process Mag 26(5): 133–152
Vinciarelli A, Pantic M, Bourlard H (2009a) Social signal processing: survey of an emerging domain. Image Vis Comput 27(12):1743–1759. ISSN 0262-8856
Vinciarelli A, Dielmann A, Favre S, Salamin H (2009b) Canal9: a database of political debates for analysis of social interactions. In: Proceedings of the international conference on affective computing and intelligent interaction, vol 2. pp 96–99
Vinciarelli A, Salamin H, Pantic M (2009) Social signal processing: understanding social interactions through nonverbal behavior analysis. IEEE Comput Soc Conf Comput Vis Pattern Recognit Workshops 231287(231287): 42–49
Vinciarelli A, Pantic M, Heylen D, Pelachaud C, Poggi I, D’Errico F, Schroeder M (2012) Bridging the gap between social animal and unsocial machine: a survey of social signal processing. IEEE Trans Affect Comput (to appear)
Vogt T, André E, Bee N (2008) EmoVoice—a framework for online recognition of emotions from voice. In: Proceedings of workshop on perception and interactive technologies for speech based systems. pp 188–199
Wagner J, Andr’e E, Jung F (2009) Smart sensor integration: a framework for multimodal emotion recognition in real-time. In: Affective computing and intelligent interaction (ACII 2009)
Wiebe J, Wilson T, Cardie C (2005) Annotating expressions of opinions and emotions in language. Lang Resour Eval 39(2/3): 164–210
Wilson T (2008) Annotating subjective content in meetings. In: Proceedings of LREC
Zajdel W, Krijnders JD, Andringa TC, Gavrila DM (2007) CASSANDRA: audiovideo sensor fusion for aggression detection. In: Proceedings of the IEEE conference on advanced video and signal based surveillance AVSS. pp 200–205
Zeng Z, Pantic M, Roisman G, Huang T (2009) A survey of affect recognition methods: audio, visual and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1): 39–58
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ěerekoviæ, A. An insight into multimodal databases for social signal processing: acquisition, efforts, and directions. Artif Intell Rev 42, 663–692 (2014). https://doi.org/10.1007/s10462-012-9334-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-012-9334-2