Skip to main content
Log in

An insight into multimodal databases for social signal processing: acquisition, efforts, and directions

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

An Erratum to this article was published on 19 October 2014

Abstract

The importance of context-aware computing in understanding social signals gave a rise to a new emerging domain, called social signal processing (SSP). SSP depends heavily on the existence of comprehensive multimodal databases containing the descriptors of social context and behaviors, such as situational environment, roles and gender of human participants. In the recent paper SSP community has emphasized how current research lacks of the adequate data, for the greatest part because acquisition and annotation of large multimodal datasets are time- and resource-consuming for the researchers. This paper aims to collect the existing work in this scope and to deliver the key aspects and clear directions for managing the multimodal behavior data. It reviews some of the existing databases, gives their important characteristics and draws the most important tools and methods conducted in capturing and managing social behavior signals. Summarizing the relevant findings it also addresses the existing issues and proposes fundamental topics that need to be investigated in the future research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Afzal S, Robinson P (2009) Natural affect data—collection & annotation in a learning context. Design, 1–7. Affective Computing and Intelligent Interaction and Workshops

  • Alameda-Pineda X, Sanchez-Riera J, Wienke J, Franc V, Cech J, Kulkarni K, Deleforge A, Horaud R (2011) The Ravel dataset. ICMI workshop on multimodal corpora for machine learining: taking stock and roadmapping the future

  • Albrecht K (2005) Social intelligence: the new science of success. Wiley, Hoboken, NJ

    Google Scholar 

  • Allwood J (2008) Multimodal corpora. In: Lüdeling A, Kytö M (eds) Corpus linguistics an international handbook. Mouton de Gruyter, Berlin, pp 207–225

    Google Scholar 

  • Allwood J, Cerrato L, Jokinen K, Navarretta C, Paggio P (2008) The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. Lang Resour Eval 41(3-4): 273–287

    Article  Google Scholar 

  • Aran O, Hung H, Gatica-Perez D (2010) A multimodal corpus for studying dominance in small group conversations. In: Proceedings of the LREC workshop on multimodal corpora malta

  • Argyle M (1988) Bodily communication. Methuen, London

    Google Scholar 

  • Bakeman R, Gothman JM (1997) Observing interaction: an introduction to sequential analysis. Cambridge University Press, UK

    Book  Google Scholar 

  • Bakeman R, Gottman J (1986) Observing interaction: an introduction to sequential analysis. Cambridge University Press, Nova York, NY

    Google Scholar 

  • Banziger, Scherer KR (2010) Introducing the geneva multimodal emotion portrayal (gemep) corpus. In: Blueprint for affective computing: a sourcebook, series in affective science, chapter 6.1. pp 271–294

  • Banziger T, Pirker H, Scherer K (2006) GEMEP–GEneva multimodal emotion portrayals: a corpus for the study of multimodal emotional expressions. In: Workshop on Corpora for Research on Emotion and Affect

  • Battersby SA, Healey PGT (2010) Using head movement to detect listener responses during multi-party dialogue. In: Kipp M, Martin J-C, Paggio P, Heylen D (eds) Proceedings of LREC workshop on multimodal corpora advances in capturing coding and analysing multimodality. pp 11–15

  • Blache P, Ferré G, Rauzy S (2007) An XML coding scheme for multimodal corpus annotation. Image Rochester NY, pp 1–17

  • Blache P, Bertrand R, Ferré G (2008) Creating and exploiting multimodal annotated corpora. In: Calzolari N, Choukri K, Maegaard B, Mariani J, Odjik J, Piperidis S, Tapias D (eds) Language resource and evaluation conference LREC08. pp 38–53

  • Black M, Katsamanis A, Lee C, Lammert AC, Baucom BR, Christensen A, Georgiou PG, Narayanan SS (2010) Automatic classification of married couples’ behavior using audio features. In Proc of Interspeech, pp 2030–2033

  • Black MP, Bone D, Williams ME, Gorrindo P, Levitt P, Narayanan SS (2011) The USC CARE corpus: child-psychologist interactions of children with Autism spectrum disorders signal analysis and interpretation laboratory (SAIL). Viterbi School of Engineering, Corpus, pp 1497–1500

    Google Scholar 

  • Boersma P, Weenink D (2009) Praat: doing phonetics by computer (Version 5.1.05) [Computer program]. Retrieved May 1, 2009, from http://www.praat.org/

  • Boholm M, Allwood J (2010) Repeated head movements, their function and relation to speech. In: Kipp M, Martin J-C, Paggio P, Heylen D (eds) Proceedings of LREC workshop on multimodal corpora advances in capturing coding and analysing multimodality. pp 6–10

  • Bousmalis K, Mehu M, Pantic M (2009) Spotting agreement and disagreement: a survey of nonverbal audiovisual cues and tools. 2009 3rd international conference on affective computing and intelligent interaction and workshops, II. pp 1–9

  • Bousmalis K, Morency L, Pantic M (2011) Modeling hidden dynamics of multimodal cues for spontaneous agreement and disagreement recognition. In: Proceedings of IEEE international conference on automatic face and gesture recognition

  • Brugman H, Russel A (2004) Annotating multi-media/multi-modal resources with ELAN. In: Lino M, Xavier M, Ferreira F, Costa R, Silva R (eds) Text. ELRA, NJ, pp 2065–2068

    Google Scholar 

  • Burger S, MacLaren V, Yu H (2002) The ISL meeting corpus: the impact of meeting type on speech style. In: Proceedings of the international conference on spoken language processing (ICSLP), Denver

  • Butko T, Nadeu C, Moreno A (2011) A multilingual corpus for rich audio-visual scene description in a meeting-room environment. ICMI workshop on multimodal corpora for machine learining: taking stock and roadmapping the future

  • Busso C, Bulut M, Lee C-C, Kazemzadeh A, Mower E, Kim S, Chang JN et al (2008) IEMOCAP: interactive emotional dyadic motion capture database. Lang Resour Eval 42(4): 335–359

    Article  Google Scholar 

  • Busso C, Deng Z, Yildirim S, Bulut M, Lee CM, Kazemzadeh A, Lee S, et al (2004) Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Sharma R, Darrell T, Harper M, Lazzari G, Turk M (eds) Proceedings of the 6th international conference on Multimodal interfaces ICMI 04. p 205

  • Campbell N, Sadanobu T, Imura M, Iwahashi N, Noriko S, Douxchamps D, (2006) “A multimedia database of meetings and informal interactions for tracking participant involvement and discourse flow. In: Proceedings of conference on language and resources evaluation, pp 391–394

  • Caridakis G, Wagner J, Raouzaiou A, Curto Z, Andre E, Karpouzis K (2010) “A multimodal corpus for gesture expressivity analysis” multimodal corpora: advances in capturing, coding and analyzing multimodality, LREC, Malta, May 17–23

  • Carletta J, Ashby S, Bourban S, Flynn M, Guillemot M, Hain T, Kadlec J, Karaiskos V, Kraaij W, Kronenthal M et al (2005) The AMI meeting corpus: a pre-announcement. Lect Notes Comput Sci 3869: 28–39

    Article  Google Scholar 

  • Castellano G, Kessous L, Caridakis G (2007) Multimodal emotion recognition from expressive faces, body gestures and speech. In: Proceedings of the doctoral consortium of the 2nd international conference on affective computing and intelligent interaction, 13–14 September, Lisbon, pp 375–388

  • Castellano G, Leite I, Martinho C, Mcowan PW (2010) Inter-ACT : An affective and contextually rich multimodal video corpus for studying interaction with robots. Corpus 1031–1034

  • Cavicchio F (2009) Multimodal corpora annotation: validation methods to assess coding scheme reliability. In: Kipp M, Martin J-C, Paggio P, Heylen D (eds) Multimodal corpora from models of natural interaction to systems and applications, vol 5509. pp 109–121

  • Charfuelan M, Schröder M (2011) Investigating the prosody and voice quality of social signals in scenario meetings. In Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part I (ACII’11), Sidney D’Mello, Arthur Graesser, Björn Schuller, and Jean-Claude Martin (Eds.), Vol. Part I. Springer-Verlag, Berlin, Heidelberg, 46-56.

  • Charfuelan M, Schroeder M (2011) Investigating the prosody and voice quality of social signals in scenario meetings. ACII 2011: 46–56

    Google Scholar 

  • Chen L, Rose R, Qiao Y, Kimbara I, Parrill F, Welji H, Han T, Han T (2006) VACE multimodal meeting corpus. Mach Learn Multimodal Interact 3869(C): 40–51

    Article  Google Scholar 

  • Cowie R, Douglas-Cowie E, Cox C (2005) Beyond emotion archetypes: databases for emotion modelling using neural networks. Neural Netw 18(4): 371–388

    Article  Google Scholar 

  • Cristani M, Raghavendra R, Del Bue A, Murino V (2012) Human behavior analysis in video surveillance: a social signal processing perspective. Neurocomputing (to appear)

  • Douglas-Cowie E, Campbell N, Cowie R, Roach P (2003) Emotional speech: towards a new generation of databases. Speech Commun 40(1–2): 33–60

    Article  MATH  Google Scholar 

  • Douglas-Cowie E, Devillers L, Martin JC, Cowie R, Savvidou S, Abrilian S, Cox C (2005) Multimodal databases of everyday emotion: Facing up to complexity. In: Ninth European Conference on Speech Communication and Technology

  • Douglas-Cowie E, Cowie R, Sneddon I, Cox C, Lowry O, Mcrorie M, Martin J, Devillers L, Abrilian S, Batliner A, et al (2007) The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In: Proceedings of the international conference on affective computing and intelligent interaction. pp 488–500

  • Douglas-Cowie E, Cowie R, Cox C, Amir N, Heylen D (2008) The sensitive artificial listener: an induction technique for generating emotionally coloured conversation. In Proceedings of LREC workshop on corpora for research on emotion and affect, pp 1–4

  • Edlund J, Beskow J (2010) Capturing massively multimodal dialogues: affordable synchronization and visualization. In: Kipp M, Martin J-C, Paggio P, Heylen D (eds) Proceedings of multimodal corpora: advances in capturing, coding and analyzing multimodality (MMC 2010). Valetta, Malta, pp 160–161

  • Ekman P (1999) Emotional and conversational nonverbal signals. In: Messing LS, Campbell R (eds) Gesture, speech and sign. Oxford University Press, New York, NY

    Google Scholar 

  • Ekman P, Friesen WV (1978) Facial action coding system: a technique for the measurement of facial movement. Consulting Psychologists Press, Palo Alto, CA

    Google Scholar 

  • Faden R, Beauchamp T, King N (1986) A history and theory of informed consent. Oxford University Press, USA

    Google Scholar 

  • Fanelli G, Gall J, Romsdorfer H, Weise T, Van Gool L (2010a) A 3-D audio-visual corpus of affective communication. IEEE Trans Multimedia 12(6): 591–598

    Article  Google Scholar 

  • Fanelli G, Gall J, Romsdorfer H, Weise T, Van Gool L (2010b) 3D vision technology for capturing multimodal corpora: chances and challenges. LREC WS on Multimodal Corpora, Malta

    Google Scholar 

  • Farma T, Cortivonis I (2000) ‘Un Questionario sul “Locus of Control”: Suo Utilizzo nel Contesto Italiano’ (A questionnaire on the “Locus of Control”: its use in the Italian context). Ricerca in Psicoterapia, vol 2

  • Fernández R, Hernández LA, López E, Alcázar J, Portillo G, Toledano DT (2001) Design of a multimodal database for research on automatic detection of severe apnoea cases. In: Calzolari N, Choukri K, Maegaard B, Mariani J, Odjik J, Piperidis S, Tapias D (eds) The Laryngoscope pp 1785–1790

  • Fisher D, Williams M, Andriacchi T (2003) The therapeutic potential for changing patterns of locomotion: an application to the acl deficient knee. In: ASME Bioengineering Conference

  • Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76: 378–382

    Article  Google Scholar 

  • Fleury A, Vacher M, Portet F, Chahuara P, Noury N (2010) A multimodal corpus recorded in a health smart home. In: Proceedings of the workshop on multimodal corpora advances in capturing coding and analyzing multimodality in conjunction with LREC 2010. pp 99–105

  • Frey J, Neßelrath R, Schulz CH, Alexandersson J (2010) SensHome: towards a corpus for everyday activities in smart homes. In: Proceedings of the workshop on multimodal corpora advances in capturing coding and analyzing multimodality

  • Garg S, Martinovski B, Robinson S, Stephan J, Tetreault J, Traum DR, Rey M (2004) Evaluation of transcription and annotation tools for a multi-modal, multi-party dialogue corpus. Forth international conference on language resources and evaluation (LREC 2004), pp 2163–2166

  • Garofolo JS, Laprun CD, Michel M, Stanford VM, Tabassi E (2004) The NIST meeting room pilot corpus. In: Proceedings of the 4th intl conf on language resources and evaluation

  • Gatica-Perez D (2009) Automatic nonverbal analysis of social interaction in small groups: a review. Image Vis Comput 27(12): 1775–1787

    Article  Google Scholar 

  • Grimm M, Kroschel K, Narayanan S (2008) The vera am mittag German audio-visual emotional speech database. In: proceedings of the IEEE international conference on multimedia and expo (ICME), Hannover, Germany, pp 865–868

  • Hall ET (1966) The hidden dimension. Anchor Books, New York, NY

    Google Scholar 

  • Hayes AF, Krippendorff K (2007) Answering the call for a standard reliability measure for coding data. Commun Methods Meas 1(1): 77–89

    Article  Google Scholar 

  • Herrera D, Novick D, Jan D, Traum D (2010) The UTEP-ICT cross-cultural multiparty multimodal dialog corpus. In: Proceedings of the multimodal corpora workshop: advances in capturing, coding and analyzing multimodality (MMC 2010), May 2010

  • Hung H, Chittaranjan G (2010) The idiap wolf corpus: exploring group behaviour in a competitive role-playing game. ACM Multimedia, Florence, Italy

    Google Scholar 

  • Janin A, Baron D, Edwards J, Ellis D, Gelbart D, Morgan N, Peskin B, Pfau T, Shriberg E, Stolcke A, Wooters C (2003) The ICSI meeting corpus. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing I. pp 364–367

  • Jiang B, Valstar M, Pantic M (2011) Action Unit detection using sparse appearance descriptors in space-time video volumes. In: Proceedings of IEEE international conference on automatic face and gesture recognition

  • Jokinen K, Minna V (2009) Stand-up gestures—annotation for communication management. In: Proceedings of the NODALIDA 2009 workshop multimodal communication: from human behaviour to computational models. Odense, Denmark, May 2009. pp 15–20

  • Jokinen K, Navarretta C, Paggio P (2008) Distinguishing the communicative functions of gestures. In: Proceedings of the 5th joint workshop on machine learning and multimodal interaction, 8–10 September 2008, Utrecht, The Netherlands

  • Kanade T, Cohn J, Tian YL (2000) Comprehensive database for facial expression analysis. In: Proceedings of the IEEE FG 00

  • Kim T, Pentland AS (2009) Understanding effects of feedback on group collaboration. Artif Intell 9: 61–70

    Google Scholar 

  • Kipp M (2010) Multimedia annotation, querying and analysis in ANVIL. In: Maybury M (ed) Multimedia information extraction, chapter 19. MIT Press

  • Kipp M, Neff M, Albrecht I (2008) An annotation scheme for conversational gestures: how to economically capture timing and form. Lang Resour Eval 41(3–4): 325–339

    Google Scholar 

  • Koelstra S, Mühl C, Soleymani M, Yazdani A, Lee J-S, Ebrahimi T, Pun T, Nijholt A, Patras I (2012) DEAP: a database for emotion analysis using physiological signals. IEEE Trans Affective Computing, Special Issue on Naturalistic Affect Resources for System Building and Evaluation (to appear)

  • Koutsombogera M, Touribaba L, Papageorgiou H (2008) Multimodality in conversation analysis: a case of Greek TV interviews. In: Proceedings of the sixth international conference on language resources and evaluation (LREC 2008) workshop on multimodal coorpora from models of natural interaction to systems and applications, Marrakesh, May 2008. pp 12–15

  • Knight D, Tennent P, Adolphs S, Carter R (2010) Developing heterogeneous corpora using the Digital Replay System (DRS). In: Proceedings of the 2010 language resources evaluation conference (Workshop on Multimodal Corpora), May 17–23, Malta

  • Le Chenadec G, Maffiolo V, Chateau N, Colletta J (2006) Creation of a corpus of multimodal spontaneous expressions of emotions in human-machine interaction. In: Proceedings of 5th international conference on language resources and evaluation, 24–26 May, Genova, Italy

  • Lefter I, Rothkrantz LJM, Burghouts G, Yang Z, Wiggers P (2011) Addressing multimodality in overt aggression detection. In: Habernal I, Matoušek V (eds) Proceedings of the 14th international conference on text, speech and dialogue (TSD’11). Springer, Berlin, Heidelberg, pp 25–32

  • Leroy J, Mancas M, Gosselin B (2011) Personal space augmented reality tool. In: Proceedings of the 32nd WIC symposium on information theory in the Benelux, Bruxelles

  • Levine J, Moreland R (1998) Small groups. In: Gilbert D, Lindzey G (eds) The handbook of social psychology, vol 2. Oxford University Press, New York, NY, pp 415–469

    Google Scholar 

  • Lichtenauer J, Valstar MF, Shen J, Pantic M (2009) Cost-effective solution to synchronized audio-visual capture using multiple sensors. In: Proceedings of IEEE int’l conf. advanced video and signal based surveillance (AVSS’09). Genoa, Italy. pp 324–329

  • Lombard M (2005) Practical resources for assessing and reporting intercoder reliability in content analysis research projects. Analysis 28(2002): 1–18

    Google Scholar 

  • Mana N, Lepri B, Chippendale P, Cappelletti A, Pianesi F, Svaizer P, Zancanaro M (2007) Multimodal corpus of multi-party meetings for automatic social behavior analysis and personality traits detection. In: Proceedings of the 2007 workshop on Tagging mining and retrieval of human related activity information TMR 07. ACM Press, 9–14

  • McCowan I, Bengio S, Gatica-Perez D, Lathoud G, Monay F, Moore D, Wellner P, Bourlard H (2003) Modeling human interaction in meetings. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), Hong Kong

  • McCowan I, Gatica-Perez D, Bengio S, Lathoud G, Barnard M, Zhang D (2005) Automatic analysis of multimodal group actions in meetings. IEEE Trans Pattern Anal Mach Intell 27(3): 305–317

    Article  Google Scholar 

  • McKeown G, Valstar MF, Cowie R, Pantic M (2010) The SEMAINE corpus of emotionally coloured character interactions. In: Proceedings of the IEEE international conference on multimedia and expo, ICME 2010, 19–23 July 2010, Singapore

  • McNeill D (1992) Hand and mind: what gestures reveal about thought. Library. University of Chicago Press, Chicago, IL, p 423

    Google Scholar 

  • Méhu M (2011) Smiling and laughter in naturally occurring dyadic interactions: relationship to conversation, body contacts, and displacement activities. Hum Ethol Bull 26(1): 10–28

    Google Scholar 

  • Metallinou A, Lee C-C, Busso C, Carnicke S, Narayanan S, Tx D (2010) The USC CreativeIT database: a multimodal database of theatrical improvisation. In: Proceedings of the multimodal corpora workshop: advances in capturing, coding and analyzing, multimodality (MMC 2010), May 2010, pp 64–68

  • Mower E, Lee C-C, Gibson J, Chaspari T, Williams M, Narayanan S (2011) Analyzing the nature of ECA interactions in children with Autism. In: proceedings of interspeech, Florence, Italy, pp 2989–2992

  • Multimodal corpora workshop portal: http://www.multimodal-corpora.org/mmc10.html

  • Neff M, Kipp M, Albrecht I, Seidel H-P (2008) Statistical reconstruction and animation of specific speakers’ gesturing styles. ACM Trans Graph 27(1): 5:1–5:24

    Article  Google Scholar 

  • Nick C, Sadanobu T, Imura M, Iwahashi N, Suzuki N, Douxchamps D (2006) A multimedia database of meetings and informal interactions for tracking participant involvement and discourse flow. In: Proceedings of the fifth international language resources and evaluation LREC06. pp 391–394

  • Ntalampiras S, Arsic D, Stormer A, Ganchev T, Potamitis I, Fakotakis N (2009) Prometheus database: a multimodal corpus for research on modeling and interpreting human behavior. IEEE 16th international conference on digital signal processing 2009. pp 1–8

  • Oertel C, Cummins R, Campbell N, Edlund J, Wagner P (2010) D64 : A corpus of richly recorded conversational interaction. In: Proceedings of the workshop on multimodal corpora advances in capturing coding and analyzing multimodality in conjunction with LREC

  • Oh S, Hoogs A, Perera A, Cuntoor N, Chen C-C, Lee JT, Mukherjee S, et al (2008) A large-scale benchmark dataset for event recognition in surveillance video. IEEE Conf Comput Vis Pattern Recognit CVPR 1(2):6

    Google Scholar 

  • Otsuka K, Takemae Y, Yamato J (2005) A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances. In: Proceedings of the 7th international conference on multimodal interfaces ICMI 05. pp 191–198

  • Pantic M, Valstar MF, Rademaker R, Maat L (2005) Web-based database for facial expression analysis. In: Proceedings of IEEE int’l conf. multimedia and expo (ICME’05). Amsterdam, The Netherlands, pp 317–321

  • Pentland A (2007) Social signal processing [exploratory DSP]. IEEE Signal Process Mag 24: 108–111

    Article  Google Scholar 

  • Perugini M, Di Blas L (2002) Analyzing personality related adjectives from an eticemic perspective: the big five marker scales (BFMS) and the Italian AB5C taxonomy. In: De Raad B, Perugini M (eds) Big five assessment. Hogrefe und Huber Publishers, Göttingen, pp 281–304

    Google Scholar 

  • Poggi I, Vincze L (2006) The persuasive import of gesture and gaze. Programme of the Workshop on Multimodal Corpora. p 46

  • Rehm M, Nakano Y, Huang H-H, Lipi AA, Yamaoka Y, Grüneberg F (2008) Creating a standardized corpus of multimodal interactions for enculturating conversational interfaces. In: Proceedings of the IUI-workshop on enculturating interfaces (ECI), Gran Canaria

  • Richter M, Quasthoff U, Hallsteinsdóttir E, Biemann C (2006) Exploiting the leipzig corpora collection. In: Proceedings of the ISLTC

  • Rieser H (2010) On factoring out a gesture typology from the bielefeld speech-and-gesture-alignment corpus (SAGA). In: Kopp S, Wachsmuth I (eds) Gesture in embodied communication and humancomputer interaction, vol 4930. pp 47–60)

  • Rietveld T, van Hout R (1993) Statistical techniques for the study of language and language behaviour. Mouton de Gruyter, Berlin

    Book  Google Scholar 

  • Rohlfing K, Loehr D, Duncan S, Brown A, Franklin A, Kimbara I, Milde JT, et al (2006) Comparison of multimodal annotation tools—workshop report. Gesprachsforschung–OnlineZeitschrift zur verbalen Interaktion 7: 7–799123

    Google Scholar 

  • Rozgi V, Xiao B, Katsamanis A, Baucom B, Georgiou PG, Narayanan S (2010) A new multichannel multi modal dyadic interaction database. In: Proceedings of interspeech pp 1982–1985

  • Russell JA, Mehrabian A (1977) Evidence for a three-factor theory of emotions. J Res Pers 11(3): 273–294

    Article  Google Scholar 

  • Sanchez-Cortes D, Aran O, Gatica-Perez D (2011) An audio visual corpus for emergent leader analysis. ICMI workshop on multimodal corpora for machine learining: taking stock and roadmapping the future

  • Soleymani M, Lichtenauer J, Pun T, Pantic M (2011) A multi-modal affective database for affect recognition and implicit tagging. IEEE Trans Affective Computing Special Issue on Naturalistic Affect Resources for System Building and Evaluation, (99)

  • Stein B, Meredith MA (1993) The merging of senses. MIT Press, Cambridge, MA

    Google Scholar 

  • Stolzman WT (2006) Toward a social signaling framework: activity and emphasis in speech. Master’s thesis, MIT Media Laboratory

  • Sun X, Lichtenauer J, Valstar MF, Nijholt A, Pantic M (2011) A multimodal database for mimicry analysis. In: Proceedings of the 4th bi-annual international conference of the HUMAINE association on affective computing and intelligent interaction (ACII2011). Memphis, Tennessee, USA

  • The humaine portal. http://emotion-research.net/

  • Thirde D, Li L, Ferryman J (2005) An overview of the PETS 2006 Dataset. PETS’ 2005. pp 317–324

  • Valstar MF, Jiang B, Mehu M, Pantic M, Scherer K (2011) The first facial expression recognition and analysis In: proceedings of IEEE international conference on automatic face and gesture recognition, workshop on facial expression recognition and analysis change, pp 921–926

  • Velten E (1968) A laboratory task for induction of mood states. Behav Res Ther 6: 473–482

    Article  Google Scholar 

  • Vinciarelli A (2009) Capturing order in social interactions. IEEE Signal Process Mag 26(5): 133–152

    Article  Google Scholar 

  • Vinciarelli A, Pantic M, Bourlard H (2009a) Social signal processing: survey of an emerging domain. Image Vis Comput 27(12):1743–1759. ISSN 0262-8856

    Google Scholar 

  • Vinciarelli A, Dielmann A, Favre S, Salamin H (2009b) Canal9: a database of political debates for analysis of social interactions. In: Proceedings of the international conference on affective computing and intelligent interaction, vol 2. pp 96–99

  • Vinciarelli A, Salamin H, Pantic M (2009) Social signal processing: understanding social interactions through nonverbal behavior analysis. IEEE Comput Soc Conf Comput Vis Pattern Recognit Workshops 231287(231287): 42–49

    Google Scholar 

  • Vinciarelli A, Pantic M, Heylen D, Pelachaud C, Poggi I, D’Errico F, Schroeder M (2012) Bridging the gap between social animal and unsocial machine: a survey of social signal processing. IEEE Trans Affect Comput (to appear)

  • Vogt T, André E, Bee N (2008) EmoVoice—a framework for online recognition of emotions from voice. In: Proceedings of workshop on perception and interactive technologies for speech based systems. pp 188–199

  • Wagner J, Andr’e E, Jung F (2009) Smart sensor integration: a framework for multimodal emotion recognition in real-time. In: Affective computing and intelligent interaction (ACII 2009)

  • Wiebe J, Wilson T, Cardie C (2005) Annotating expressions of opinions and emotions in language. Lang Resour Eval 39(2/3): 164–210

    Google Scholar 

  • Wilson T (2008) Annotating subjective content in meetings. In: Proceedings of LREC

  • Zajdel W, Krijnders JD, Andringa TC, Gavrila DM (2007) CASSANDRA: audiovideo sensor fusion for aggression detection. In: Proceedings of the IEEE conference on advanced video and signal based surveillance AVSS. pp 200–205

  • Zeng Z, Pantic M, Roisman G, Huang T (2009) A survey of affect recognition methods: audio, visual and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1): 39–58

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Ěerekoviæ.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ěerekoviæ, A. An insight into multimodal databases for social signal processing: acquisition, efforts, and directions. Artif Intell Rev 42, 663–692 (2014). https://doi.org/10.1007/s10462-012-9334-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-012-9334-2

Keywords

Navigation