Skip to main content
Log in

Multimodal Integration of Dynamic Audio–Visual Cues in the Communication of Agreement and Disagreement

  • Original Paper
  • Published:
Journal of Nonverbal Behavior Aims and scope Submit manuscript

Abstract

Recent research has stressed the importance of using multimodal and dynamic features to investigate the role of nonverbal behavior in social perception. This paper examines the influence of low-level visual and auditory cues on the communication of agreement, disagreement, and the ability to convince others. In addition, we investigate whether the judgment of these attitudes depends on ratings of socio-emotional dimensions such as dominance, arousal, and valence. The material we used consisted of audio–video excerpts that represent statements of agreement and disagreement, as well as neutral utterances taken from political discussions. Each excerpt was rated on a number of dimensions: agreement, disagreement, dominance, valence, arousal, and convincing power in three rating conditions: audio-only, video-only, and audio–video. We extracted low-level dynamic visual features using optical flow. Auditory features consisted of pitch measurements, vocal intensity, and articulation rate. Results show that judges were able to distinguish statements of disagreement from agreement and neutral utterances on the basis of nonverbal cues alone, in particular, when both auditory and visual information were present. Visual features were more influential when presented along with auditory features. Perceivers mainly used changes in pitch and the maximum speed of vertical movements to infer agreement and disagreement, and targets appeared more convincing when they showed consistent and rapid movements on the vertical plane. The effect of nonverbal features on ratings of agreement and disagreement was completely mediated by ratings of dominance, valence, and arousal, indicating that the impact of low-level audio–visual features on the perception of agreement and disagreement depends on the perception of fundamental socio-emotional dimensions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Note that time information can be included in a categorical coding paradigm for example by using time based sampling methods (Altmann 1974; Martin and Bateson 1993), in which behavior is recorded periodically at regular intervals (as in instantaneous sampling) or during short time windows (as in one–zero sampling).

  2. One rater had to be removed from the video-only condition due to technical problems during the presentation of stimuli.

  3. Huynh-Feldt correction was used because the assumption of sphericity was not met.

  4. Raw judgements were made on single scales for agreement/disagreement, dominance/submission, and positive/negative valence.

  5. The significance of the indirect effects is tested using the “product-of-coefficients” approach and the “distribution of the product” approach (MacKinnon et al. 2004). In the latter approach, an empirical approximation of the distribution of the indirect effect is built using bootstrapping and is used to create confidence intervals for the indirect effect (also described in Preacher and Hayes 2004). This method is recommended over the Sobel test (Sobel 1982) or the causal steps approach (Baron and Kenny 1986) because it has higher statistical power while keeping the rate of Type I error within reasonable limits (MacKinnon et al. 2004).

  6. Perceived convincing power refers to the impression by perceivers that the statement of the target is convincing, not to the impression that the target, as a person, has a lot of convincing power. In this sense, it relates to the performance of the speaker rather than to his personality or to his social role.

References

  • Altmann, J. (1974). Observational study of behavior: Sampling methods. Behaviour, 44, 227–267.

    Google Scholar 

  • Archer, D., & Akert, R. M. (1977). Words and everything else: Verbal and nonverbal cues in social interpretation. Journal of Personality and Social Psychology, 35(6), 443–449.

    Google Scholar 

  • Argyle, M. (1988). Bodily communication. London: Routledge.

    Google Scholar 

  • Argyle, M., & Dean, J. (1965). Eye-contact, distance and affiliation. Sociometry, 28(3), 289–304.

    PubMed  Google Scholar 

  • Ay, N., Flack, J. C., & Krakauer, D. C. (2007). Robustness and complexity co-constructed in multimodal signalling networks. Philosophical Transactions of the Royal Society B, 362, 441–447.

    Google Scholar 

  • Bachorowski, J.-A., & Owren, M. J. (1995). Vocal expression of emotion: Acoustic properties of speech are associated with emotional intensity and context. Psychological Science, 6(4), 219–224.

    Google Scholar 

  • Bachorowski, J.-A., & Owren, M. J. (2001). Not all laughs are alike: Voiced but not unvoiced laughter readily elicits positive affect. Psychological Science, 12(3), 252–257.

    PubMed  Google Scholar 

  • Baker, S., Scharstein, D., Lewis, J. P., Roth, S., Black, M., & Szeliski, R. (2011). A database and evaluation methodology for optical flow. International Journal of Computer Vision, 92(1), 1–31.

    Google Scholar 

  • Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70(3), 614–636.

    PubMed  Google Scholar 

  • Bänziger, T., Mortillaro, M., & Scherer, K. R. (2012). Introducing the Geneva multimodal expression corpus for experimental research on emotion perception. Emotion, 12(5), 1161–1179.

    PubMed  Google Scholar 

  • Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182.

    PubMed  Google Scholar 

  • Barrett, H. C., Todd, P. M., Miller, G. F., & Blythe, P. W. (2005). Accurate judgments of intention from motion cues alone: A cross-cultural study. Evolution and Human Behavior, 26(4), 313–331.

    Google Scholar 

  • Blake, R., & Shiffrar, M. (2007). Perception of human motion. Annual Review of Psychology, 58, 47–73.

    PubMed  Google Scholar 

  • Blakemore, S.-J., & Decety, J. (2001). From the perception of action to the understanding of intention. Nature Reviews Neuroscience, 2(8), 561–567.

    PubMed  Google Scholar 

  • Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International, 5(9/10), 341–345.

    Google Scholar 

  • Borkenau, P., Mauer, N., Riemann, R., Spinath, F. M., & Angleitner, A. (2004). Thin slices of behavior as cues of personality and intelligence. Journal of Personality and Social Psychology, 86(4), 599–614.

    PubMed  Google Scholar 

  • Bousmalis, K., Mehu, M., & Pantic, M. (2013). Towards the automatic detection of spontaneous agreement and disagreement based on nonverbal behavior: A survey of related cues, databases, and tools. Image and Vision Computing, 31, 203–221.

    Google Scholar 

  • Brown, W. M., Cronk, L., Grochow, K., Jacobson, A., Liu, C. K., Popovic, Z., et al. (2005). Dance reveals symmetry especially in young men. Nature, 438(7071), 1148–1150.

    PubMed  Google Scholar 

  • Brown, W. M., Palameta, B., & Moore, C. (2003). Are there nonverbal cues to commitment? An exploratory study using the zero-acquaintance video presentation paradigm. Evolutionary Psychology, 1, 42–69.

    Google Scholar 

  • Brunswik, E. (1956). Perception and the representative design of psychological experiments. Berkeley: University of California Press.

    Google Scholar 

  • Bull, P. E. (1987). Posture and gesture. Oxford: Pergamon Press.

    Google Scholar 

  • Burns, K. L., & Beier, E. G. (1973). Significance of vocal and visual channels in the decoding of emotional meaning. Journal of Communication, 23(1), 118–130.

    PubMed  Google Scholar 

  • Camras, L. A. (1980). Children’s understanding of facial expressions used during conflict encounters. Child Development, 51(3), 879–885.

    Google Scholar 

  • Castellano, G., Mortillaro, M., Camurri, A., Volpe, G., & Scherer, K. (2008). Automated analysis of body movement in emotionally expressive piano performances. Music Perception, 26(2), 103–119.

    Google Scholar 

  • Cohn, J. F., Schmidt, K., Gross, R., & Ekman, P. (2002). Individual differences in facial expression: Stability over time, relation to self-reported Emotion, and ability to inform person identification. In Proceedings of the 4th IEEE International Conference on Multimodal Interfaces, (pp. 491–496). IEEE Computer Society.

  • Collignon, O., Girard, S., Gosselin, F., Roy, S., Saint-Amour, D., Lassonde, M., et al. (2008). Audio–visual integration of emotion expression. Brain Research, 1242, 126–135.

    PubMed  Google Scholar 

  • Dael, N., Mortillaro, M., & Scherer, K. R. (2012). The body action and posture coding system (BAP): Development and reliability. Journal of Nonverbal Behavior, 36(2), 97–121.

    Google Scholar 

  • Darwin, C. (1872). The expression of the emotions in man and animals. London: John Murray.

    Google Scholar 

  • de Jong, N., & Wempe, T. (2009). Praat script to detect syllable nuclei and measure speech rate automatically. Behavior Research Methods, 41(2), 385–390.

    PubMed  Google Scholar 

  • deGelder, B., & Vroomen, J. (2000). The perception of emotions by ear and by eye. Cognition and Emotion, 14(3), 289–311.

    Google Scholar 

  • Edinger, J. A., & Patterson, M. L. (1983). Nonverbal involvement and social control. Psychological Bulletin, 93(1), 30–56.

    Google Scholar 

  • Eibl-Eibesfeldt, I. (1989). Human ethology. New York: Aldine De Gruyter.

    Google Scholar 

  • Ekman, P. (1983). Emotions revealed: Recognizing faces and feelings to improve communication and emotional life. New York: Henry Holt and Company.

    Google Scholar 

  • Ekman, P. (1985). Telling lies: Clues to deceit in the market place, marriage, and politics. New York: Norton.

    Google Scholar 

  • Ekman, P., Friesen, W. V., O’Sullivan, M., & Scherer, K. (1980). Relative importance of face, body, and speech in judgments of personality and affect. Journal of Personality and Social Psychology, 38(2), 270–277.

    Google Scholar 

  • Ekman, P., & Oster, H. (1979). Facial expressions of emotion. Annual Review of Psychology, 30, 527–554.

    Google Scholar 

  • Fiske, S. T., Cuddy, A. J. C., & Glick, P. (2007). Universal dimensions of social cognition: Warmth and competence. Trends in Cognitive Sciences, 11(2), 77–83.

    PubMed  Google Scholar 

  • Fontaine, J. R. J., Scherer, K. R., Roesch, E. B., & Ellsworth, P. C. (2007). The world of emotions is not two-dimensional. Psychological Science, 18(12), 1050–1057.

    PubMed  Google Scholar 

  • Frodi, A. M., Lamb, M. E., Leavitt, L. A., & Donovan, W. L. (1978). Fathers’ and mothers’ responses to infant smiles and cries. Infant Behavior and Development, 1, 187–198.

    Google Scholar 

  • Funder, D. C., & Sneed, C. D. (1993). Behavioral manifestations of personality: An ecological approach to judgmental accuracy. Journal of Personality and Social Psychology, 64(3), 479–490.

    PubMed  Google Scholar 

  • Gale, A., Kingsley, E., Brookes, S., & Smith, D. (1978). Cortical arousal and social intimacy in the human female under different conditions of eye contact. Behavioral Processes, 3, 271–275.

    Google Scholar 

  • Germesin, S., & Wilson, T. (2009). Agreement detection in multiparty conversation. In Proceedings of the 2009 international conference on multimodal interfaces (pp. 7–14).

  • Gesn, P. R., & Ickes, W. (1999). The development of meaning contexts for empathic accuracy: Channel and sequence effects. Journal of Personality and Social Psychology, 77(4), 746–761.

    Google Scholar 

  • Ghazanfar, A. A., & Logothetis, N. K. (2003). Facial expressions linked to monkey calls. Nature, 423, 937–938.

    PubMed  Google Scholar 

  • Gibson, J. J. (1950). The perception of the visual world. Boston: Houghton Mifflin.

    Google Scholar 

  • Gifford, R. (1981). Sociability: Traits, settings, and interactions. Journal of Personality and Social Psychology, 41(2), 340–347.

    Google Scholar 

  • Gifford, R. (1994). A lens-mapping framework for understanding the encoding and decoding of interpersonal dispositions in nonverbal behavior. Journal of Personality and Social Psychology, 66(2), 398–412.

    Google Scholar 

  • Goldenthal, P., Johnston, R. E., & Kraut, R. E. (1981). Smiling, appeasement, and the silent bared-teeth display. Ethology and Sociobiology, 2, 127–133.

    Google Scholar 

  • Grafe, T. U., & Wanger, T. C. (2007). Multimodal signaling in male and female foot-flagging frogs (StauroisGuttatusRanida): An alerting function of callings. Ethology, 113(8), 772–781.

    Google Scholar 

  • Grammer, K. (1989). Human courtship behavior: Biological basis and cognitive processing. In A. E. Rasa, C. Vogel, & E. Voland (Eds.), The sociobiology of sexual and reproductive strategies (pp. 147–169). London: Chapman & Hall.

    Google Scholar 

  • Grammer, K. (1993). Signale der liebe. Die biologischen gesetze der partnerschaft. Hamburg: Hoffmann und Campe.

    Google Scholar 

  • Grammer, K., Honda, M., Juette, A., & Schmitt, A. (1999). Fuzziness of nonverbal courtship communication unblurred by motion energy detection. Journal of Personality and Social Psychology, 77(3), 509–524.

    Google Scholar 

  • Gross, M. M., Crane, E. A., & Fredrickson, B. L. (2010). Methodology for assessing bodily expression of emotion. Journal of Nonverbal Behavior, 34, 223–248.

    Google Scholar 

  • Hadar, U., Steiner, T. J., & Rose, F. C. (1985). Head movement during listening turns in conversation. Journal of Nonverbal Behavior, 9(4), 214–228.

    Google Scholar 

  • Hall, E. T. (1968). Proxemics. Current Anthropology, 9(2/3), 83–108.

    Google Scholar 

  • Hall, J. A., Coats, E. J., & LeBeau, L. S. (2005). Nonverbal behavior and the vertical dimension of social relations: A meta-analysis. Psychological Bulletin, 131(6), 898–924.

    PubMed  Google Scholar 

  • Hall, J. A., & Schmid Mast, M. (2007). Sources of accuracy in the empathic accuracy paradigm. Emotion, 7(2), 438–446.

    PubMed  Google Scholar 

  • Hess, U., Blairy, S., & Kleck, R. E. (2000). The influence of facial emotion displays, gender, and ethnicity on judgments of dominance and affiliation. Journal of Nonverbal Behavior, 24(4), 265–283.

    Google Scholar 

  • Hillard, D., Ostendorf, M., & Shriberg, E. (2003). Detection of agreement vs. disagreement in meetings: Training with unlabeled data. In Proceedings of the 2003 conference of the North-American Association for Computational Linguistics on Human Language Technology: Companion volume of the proceedings of HLT-NAACL 2003-short papers-volume 2 (pp. 34–36).

  • Horn, B. K. P., & Schunck, B. G. (1981). Determining optical flow. Artificial Intelligence, 17, 185–203.

    Google Scholar 

  • Huber, P. J. (1964). Robust estimation of a location parameter. Annals of Statistics, 53, 73–101.

    Google Scholar 

  • Johnstone, R. A. (1996). Multiple displays in animal communication: “Backup signals” and “multiple messages”. Philosophical Transactions of the Royal Society of London. Series B, Biological sciences, 351(1337), 329–338.

    Google Scholar 

  • Keller, E., & Tschacher, W. (2007). Prosodic and gestural expression of interactional agreement. In A. Esposito (Ed.), Lecture notes in computer sciences (Vol. 4775, pp. 85–98). Berlin/Heidelberg: Springer.

    Google Scholar 

  • Knutson, B. (1996). Facial expressions of emotion influence interpersonal trait inferences. Journal of Nonverbal Behavior, 20(3), 165–182.

    Google Scholar 

  • Koppensteiner, M., & Grammer, K. (2010). Motion patterns in political speech and their influence on personality ratings. Journal of Research in Personality, 44(3), 374–379.

    Google Scholar 

  • Kreifelts, B., Ethofer, T., Grodd, W., Erb, M., & Wildgruber, D. (2007). Audiovisual integration of emotional signals in voice and face: an event-related fMRI study. Neuroimage, 37(4), 1445–1456.

    PubMed  Google Scholar 

  • Lehner, P. N. (1996). Handbook of ethological methods (Vol. 2). Cambridge, UK: Cambridge University Press.

    Google Scholar 

  • MacKinnon, D. P., Lockwood, C. M., & Williams, J. (2004). Confidence limits for the indirect effect: Distribution of the product and resampling methods. Multivariate Behavioral Research, 39(1), 99–128.

    PubMed  PubMed Central  Google Scholar 

  • Martin, P., & Bateson, P. (1993). Measuring behavior: An introductory guide (Vol. 2). Cambridge: Cambridge University Press.

    Google Scholar 

  • Mazur, A. (2005). Biosociology of dominance and deference. Lanham, Maryland: Rowman & Littlefield.

    Google Scholar 

  • Mazur, A., & Booth, A. (1998). Testosterone and dominance in men. Behavioral and Brain Sciences, 21(3), 353–363.

    PubMed  Google Scholar 

  • McArthur, L. Z., & Baron, R. M. (1983). Toward an ecological theory of social perception. Psychological Review, 90(3), 215–238.

    Google Scholar 

  • Mehrabian, A. (1969). Significance of posture and position in the communication of attitude and status relationships. Psychological Bulletin, 71(5), 359–372.

    PubMed  Google Scholar 

  • Mehrabian, A. (1971). Silent messages. Belmont, CA: Wadsworth.

    Google Scholar 

  • Mehrabian, A. (1996). Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament. Current Psychology, 14(4), 261–292.

    Google Scholar 

  • Mehrabian, A., & Ferris, S. R. (1967). Inference of attitudes from nonverbal communication in two channels. Journal of Consulting Psychology, 31(3), 248–252.

    PubMed  Google Scholar 

  • Mehu, M., & Dunbar, R. I. M. (2008). Naturalistic observations of smiling and laughter in human group interactions. Behaviour, 145, 1747–1780.

    Google Scholar 

  • Mehu, M., & Scherer, K. R. (2012). A psycho-ethological approach to social signal processing. Cognitive Processing, 13(Suppl 2), 397–414.

    PubMed  Google Scholar 

  • Montepare, J. M., & Dobish, H. (2003). The contribution of emotion perceptions and their overgeneralizations to trait impressions. Journal of Nonverbal Behavior, 27(4), 237–254.

    Google Scholar 

  • Montepare, J. M., & Zebrowitz-McArthur, L. (1988). Impressions of people created by age-related qualities of their gaits. Journal of Personality and Social Psychology, 55(4), 547–556.

    PubMed  Google Scholar 

  • Moore, M. M. (1985). Nonverbal courtship patterns in women: Context and consequences. Ethology and Sociobiology, 6(4), 237–247.

    Google Scholar 

  • Owren, M. J., Rendall, D., & Ryan, M. J. (2010). Redefining animal signaling: Influence versus information in communication. Biology and Philosophy, 25(5), 755–780.

    Google Scholar 

  • Parker, G. A. (1974). Assessment strategy and the evolution of fighting behavior. Journal of Theoretical Biology, 47, 223–243.

    PubMed  Google Scholar 

  • Parr, L. A. (2004). Perceptual biases for multimodal cues in chimpanzee (Pan troglodytes) affect recognition. Animal cognition, 7(3), 171–178.

    PubMed  Google Scholar 

  • Partan, S. R., & Marler, P. (2005). Issues in the classification of multimodal communication signals. The American Naturalist, 166(2), 231–245.

    PubMed  Google Scholar 

  • Patterson, M. L. (1982). A sequential functional model of nonverbal exchange. Psychological Review, 89(3), 231–249.

    Google Scholar 

  • Patterson, M. L., Churchill, M. E., Burger, G. K., & Powell, J. L. (1992). Verbal and nonverbal modality effects on impressions of political candidates: Analysis from the 1984 presidential debates. Communication Monographs, 59(3), 231–242.

    Google Scholar 

  • Poggi, I., D’Errico, F., & Vincze, L. (2011). Agreement and its multimodal communication in debates: A qualitative analysis. Cognitive Computation, 3(3), 466–479.

    Google Scholar 

  • Pourtois, G., de Gelder, B., Vroomen, J., Rossion, B., & Crommelinck, M. (2000). The time-course of intermodal binding between seeing and hearing affective information. NeuroReport, 11(6), 1329–1333.

    PubMed  Google Scholar 

  • Preacher, K. J., & Hayes, A. F. (2004). SPSS and SAS procedures for estimating indirect effects in simple mediation models. Behavior Research Methods, Instruments, and Computers, 36(4), 717–731.

    PubMed  Google Scholar 

  • Puts, D. A., Gaulin, S. J. C., & Verdolini, K. (2006). Dominance and the evolution of sexual dimorphism in human voice pitch. Evolution and Human Behavior, 27(4), 283–296.

    Google Scholar 

  • Roberts, J. A., Taylor, P. W., & Uetz, G. W. (2007). Consequences of complex signaling: Predator detection of multimodal cues. Behavioral Ecology, 18(1), 236–240.

    Google Scholar 

  • Roth, W.-M., & Tobin, K. (2010). Solidarity and conflict: Aligned and misaligned prosody as a transactional resource in intra-and intercultural communication involving power differences. Cultural Studies of Science Education, 5(4), 807–847.

    Google Scholar 

  • Rowe, C. (1999). Receiver psychology and the evolution of multicomponent signals. Animal Behaviour, 58, 921–931.

    PubMed  Google Scholar 

  • Rowe, C. (2002). Sound improves visual discrimination learning in avian predators. Proceedings of the Royal Society of London. Series B: Biological Sciences, 269(1498), 1353–1357.

    PubMed  PubMed Central  Google Scholar 

  • Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178.

    Google Scholar 

  • Sayette, M. A., Cohn, J. F., Wertz, J. M., Perrott, M. A., & Parrott, D. J. (2001). A psychometric evaluation of the facial action coding system for assessing spontaneous expression. Journal of Nonverbal Behavior, 25, 167–185.

    Google Scholar 

  • Scherer, K. R. (1978). Personality inference from voice quality: The loud voice of extroversion. European Journal of Social Psychology, 8(4), 467–487.

    Google Scholar 

  • Scherer, K. R. (1992). What does facial expression express? In K. T. Strongman (Ed.), International review of studies of emotion (pp. 139–165). Chichester, U.K.: Wiley.

    Google Scholar 

  • Scherer, K. R. (2009). The dynamic architecture of emotion: Evidence for the component process model. Cognition and Emotion, 23(7), 1307–1351.

    Google Scholar 

  • Scherer, K. R., & Ellgring, H. (2007). Multimodal expression of emotion: affect programs or componential appraisal patterns? Emotion, 7(1), 158–171.

    PubMed  Google Scholar 

  • Scherer, K. R., Scherer, U., Hall, J. A., & Rosenthal, R. (1977). Differential attribution of personality based on multi-channel presentation of verbal and nonverbal cues. Psychological Research, 39, 221–247.

    Google Scholar 

  • Schubert, J. N. (1986). Human vocalizations in agonistic political encounters. Social Science Information, 25(2), 475–492.

    Google Scholar 

  • Simpson, J. A., Gangestad, S. W., & Biek, M. (1993). Personality and nonverbal social behavior: An ethological perspective of relationship initiation. Journal of Experimental Social Psychology, 29(5), 434–461.

    Google Scholar 

  • Sobel, M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equations models. In S. Leinhart (Ed.), Sociological methodology 1982 (pp. 159–186). San Francisco: Jossey-Bass.

    Google Scholar 

  • Tiedens, L. Z., & Fragale, A. R. (2003). Power moves: Complementarity in dominant and submissive nonverbal behavior. Journal of Personality and Social Psychology, 84(3), 558–568.

    PubMed  Google Scholar 

  • Todorov, A., Said, C. P., Engell, A. D., & Oosterhof, N. N. (2008). Understanding evaluation of faces on social dimensions. Trends in cognitive sciences, 12(12), 455–460.

    PubMed  Google Scholar 

  • vanKampen, H. S., & Bolhuis, J. J. (1993). Interaction between auditory and visual learning during filial imprinting. Animal Behaviour, 45(3), 623–625.

    Google Scholar 

  • Wallbott, H. G. (1998). Bodily expression of emotion. European Journal of Social Psychology, 28(6), 879–896.

    Google Scholar 

  • Weisfeld, G. E., & Beresford, J. M. (1982). Erectness of posture as an indicator of dominance or success in humans. Motivation and Emotion, 6(2), 113–131.

    Google Scholar 

  • Werlberger, M., Pock, T., & Bischof, H. (2010). Motion estimation with non-local total variation regularization. Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2464–2471).

  • Wiggins, J. S. (1979). A psychological taxonomy of trait-descriptive terms: The interpersonal domain. Journal of Personality and Social Psychology, 37(3), 395–412.

    Google Scholar 

  • Xu, Y. (2005). ProsodyPro.praat. Retrieved from http://www.phon.ucl.ac.uk/home/yi/ProsodyPro/.

  • Zaki, J., Bolger, N., & Ochsner, K. (2009). Unpacking the informational bases of empathic accuracy. Emotion, 9(4), 478–487.

    PubMed  Google Scholar 

Download references

Acknowledgments

This research was funded by the European Network of Excellence SSPNet (Grant agreement No. 231287). We would like to thank Konstantinos Bousmalis and Maja Pantic for their collaboration in the rating study. Thanks also go to the Swiss Centre for Affective Sciences that provided support in data analysis and writing of the manuscript.

Conflict of interest

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Mehu.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 114 kb)

Appendices

Appendix 1: Definitions of Socio-Emotional Dimensions

Participants were given some time prior to the study to become familiar with the definitions of the dimensions under study. They also had the definitions along with them in case they needed it during the rating session. Definitions:

  • Agreement: an attitude that reflects harmony or accordance in opinion or feeling.

  • Disagreement: an attitude that reflects a lack of consensus or approval about an opinion or feeling.

  • Dominance: a disposition or a tendency to exert influence and control over other individuals and life events in general.

  • Submissiveness: a disposition or a tendency to be influenced and controlled by other individuals or by events.

  • PositiveNegative: refers to the pleasant or unpleasant character of the attitude expressed in the excerpt.

  • Emotional arousal: refers to the degree of physical and physiological activation associated with the emotional state of the individual.

  • Convincing power: refers to the extent to which the statement can provoke a change of opinion, feeling, or attitude in an interlocutorFootnote 6.

Appendix 2: Method for the Estimation of Optical Flow

Optical flow estimation techniques rely on the fact that the local image appearance around a particular location does not change drastically between two consecutive frames. The local image appearance is generally described by the pixel intensity values around the point under consideration or by the image gradient around this point. Most optical flow estimation techniques find the optical flow by minimizing a function that comprises two terms: (1) a term that measures how well the second image is reconstructed by transforming the first image using the flow field and (2) a term that promotes the selection of “simple” optical flow fields. In particular, the functions generally have the form of a minimization over the flow field U (Horn and Schunck 1981):

$${ \hbox{min} }D\,(I_{0} , \, I_{1} , \, U) +\varvec{\lambda}R(U).$$

Herein, D (I 0, I 1, U) is an error function that measures the difference between image I 1 and image I 0 transformed by flow field U; for instance, a sum of squared errors may be employed. The function R (U) is a regularizer that promotes smoothness of the optical flow field U, i.e. that penalizes solutions in which neighbouring locations have drastically different flow estimates. The scalar λ is a free parameter that weighs the regularizer against the error function.

In this study, we employed an optical flow estimation method that uses a sum of absolute differences as error function (Werlberger et al. 2010):

$$D\,\left( {I_{0} , \, I_{1} , \, U} \right) = \iint \, |I_{0} \left( {x,y} \right) - I_{1} \left( {U_{x} \left( {x, \, y} \right), \, U_{y} \left( {x, \, y} \right)} \right)|dxdy,$$

where the integrals are summing the errors over the entire image. In the equation, the optical flow at location (x, y) is represented by the x-component U x of the optical flow vector and by the y-component U y of the optical flow vector. This decomposition of the optical flow vector is illustrated in the panel B of Fig. 2.

The regularizer is implemented by a function that measures the (weighted) total variation of the optical flow field:

$$R\left( U \right) = {\iint \iint }W\left( {x, \, y, \, x^{\prime }, \, y^{\prime }} \right)\left[ {|U_{x} \left( {x, \, y} \right) - U_{x} \left( {x^{\prime }, \, y^{\prime }} \right)\left| {_{\epsilon }} + \right|U_{y} \left( {x^{\prime }, \, y^{\prime }} \right) - U_{y} \left( {x^{\prime }, \, y^{\prime }} \right)|_{\epsilon } } \right]dx \, dy \, dx^{\prime } \, dy^{\prime },$$

where again, the integrals are over the entire image. In the above equation, \(\epsilon\) is a small constant and |q|\(\epsilon\) denotes the so-called Huber loss (Hube 1964):

$$\left| q \right|_{\smallint }\,= q^{2} /2\smallint {\text{if }}\left| q \right| \le \smallint ;\left| q \right| - \smallint /2{\text{ otherwise}} .$$

In the above equation, the weights W(x, y, x′, y′) are computed based on the spatial distance between points (x, y) and (x′, y′) and based on the pixel intensity differences between those points. Specifically, the weights are large for nearby points with a similar color, but small for points that are far away or that have a dissimilar color (much like an anisotropic filter). As a result, the regularizer promotes optical flow fields that are locally smooth, but smoothness is not promoted across edges in the image (as edges suggest a different object with a potentially different optical flow).

Hence, the advantage of the optical flow estimator we used is that it promotes smoothness, whilst allowing for discontinuities in the optical flow field near edges in the image. The optical flow estimator outlined above is presently one of the best-performing algorithms on a widely used benchmark data set for evaluation of optical flow algorithms (Werlberger et al. 2010; Baker et al. 2011).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mehu, M., van der Maaten, L. Multimodal Integration of Dynamic Audio–Visual Cues in the Communication of Agreement and Disagreement. J Nonverbal Behav 38, 569–597 (2014). https://doi.org/10.1007/s10919-014-0192-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10919-014-0192-2

Keywords

Navigation