Audiovisual Integration of Face–Voice Gender Studied Using “Morphed Videos”

  • Rebecca Watson
  • Ian Charest
  • Julien Rouger
  • Christoph Casper
  • Marianne Latinus
  • Pascal Belin


Both the face and the voice provide us with not only linguistic information but also a wealth of paralinguistic information, including gender cues. However, the way in which we integrate these two sources in our perception of gender has remained largely unexplored. In the following study, we used a bimodal perception paradigm in which varying degrees of incongruence were created between facial and vocal information within audiovisual stimuli. We found that in general participants were able to combine both sources of information, with gender of the face being influenced by that of the voice and vice versa. However, in conditions that directed attention to either modality, we observed that participants were unable to ignore the gender of the voice, even when instructed to. Overall, our results point to a larger role of the voice in gender perception, when more controlled visual stimuli are used.


Male Voice Gender Information Audiovisual Stimulus Male Face Female Face 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Brown, E., & Perrett, D. I. (1993). What gives a face its gender? Perception, 22(7), 829–840.PubMedCrossRefGoogle Scholar
  2. Bruce, V., Burton, A. M., Hanna, E., Healey, P., Mason, O., Coombes, A., Fright, R., & Linney, A. (1993). Sex discrimination: How do we tell the difference between male and female faces? Perception, 22(2), 131–152.PubMedCrossRefGoogle Scholar
  3. Campanella, S., & Belin, P. (2007). Integrating face and voice in person perception. Trends in Cognitive Sciences, 11(12), 535–543.PubMedCrossRefGoogle Scholar
  4. de Gelder, B., & Vroomen, J. (2000). The perception of emotions by ear and by eye. Cognition and Emotion, 14(3), 289–311.CrossRefGoogle Scholar
  5. Haxby, J. V., Hoffman, E. A., & Gobbini, M. I. (2000). The distributed human neural system for face perception. Trends in Cognitive Sciences, 4(6), 223–233.PubMedCrossRefGoogle Scholar
  6. Joassin, F., Maurage, P., & Campanella, S. (2011). The neural network sustaining the crossmodal processing of human gender from faces and voices: An fMRI study. Neuroimage, 54(2), 1654–1661.PubMedCrossRefGoogle Scholar
  7. Joassin, F., Pesenti, M., Maurage, P., Verreckt, E., Bruyer, R., & Campanella, S. (2011). Cross-modal interactions between human faces and voices involved in person recognition. Cortex, 47(3), 367–376.PubMedCrossRefGoogle Scholar
  8. Kamachi, M., Hill, H., Lander, K., & Vatikiotis-Bateson, E. (2003). “Putting the face to the voice”: Matching identity across modality. Current Biology, 13(19), 1709–1714.PubMedCrossRefGoogle Scholar
  9. Kawahara, H. (2003). Exemplar-based voice quality analysis and control using a high quality auditory morphing procedure based on straight. In: VoQual 03: Voice Quality: Functions, Analysis and Synthesis. Geneva (Switzerland): ISCA Tutorial and Research Workshop.PubMedCrossRefGoogle Scholar
  10. Kilts, C. D., Egan, G., Gideon, D. A., Ely, T. D., & Hoffman, J. M. (2003). Dissociable neural pathways are involved in the recognition of emotion in static and dynamic facial expressions. Neuroimage, 18, 156–168.PubMedCrossRefGoogle Scholar
  11. Linke, C. E. (1973). A study of pitch characteristics of female voices and their relationship to vocal effectiveness. Folia Phoniatrica, 25, 173–185.PubMedCrossRefGoogle Scholar
  12. McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 64(5588), 746–748.CrossRefGoogle Scholar
  13. Munhall, K. G., Gribble, P., Sacco, L., & Ward, M. (1996). Temporal constraints on the McGurk effect. Perception and Psychophysics, 58(3), 351–362.PubMedCrossRefGoogle Scholar
  14. Schweinberger, S. R., Robertson, D., & Kaufmann, J. M. (2007). Hearing facial identities. The Quarterly Journal of Experimental Psychology, 60(10), 1446–1456.PubMedCrossRefGoogle Scholar
  15. Tiddeman, B., & Perrett, D. (2001). Moving facial image transformations based on static 2D prototypes. Paper presented at the 9th International conference in Central Europe on Computer Graphics, Visualization and Computer Vision 2001 (WSCG 2001), Plzen, Czech Republic.PubMedCrossRefGoogle Scholar
  16. Vroomen, J., Driver, J., & de Gelder, B. (2001). Is cross-modal integration of emotional expressions independent of attentional resources? Cognitive, Affective and Behavioural Neurosciences, 1(4), 382–387.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Rebecca Watson
    • 1
  • Ian Charest
    • 2
  • Julien Rouger
    • 3
  • Christoph Casper
    • 4
  • Marianne Latinus
    • 1
  • Pascal Belin
    • 1
    • 5
  1. 1.Voice Neurocognition Laboratory, Institute of Neuroscience and Psychology, College of Medical, Veterinary and Life SciencesUniversity of GlasgowGlasgowUK
  2. 2.MRC Cognition and Brain Sciences UnitCambridgeUK
  3. 3.Brain Innovation BrainVoyagerMaastrichtThe Netherlands
  4. 4.Department of Business Administration and Human Resource ManagementUniversity of CologneCologneGermany
  5. 5.International Laboratories for Brain, Music and Sound (BRAMS)Université de Montréal & McGill UniversityQuebecCanada

Personalised recommendations