, Volume 66, Issue 4, pp 574-583

Spatial frequency requirements for audiovisual speech perception

Abstract

Spatial frequency band-pass and low-pass filtered images of a talker were used in an audiovisual speech-in-noise task. Three experiments tested subjects’ use of information contained in the different filter bands with center frequencies ranging from 2.7 to 44.1 cycles/face (c/face). Experiment 1 demonstrated that information from a broad range of spatial frequencies enhanced auditory intelligibility. The frequency bands differed in the degree of enhancement, with a peak being observed in a mid-range band (11-c/face center frequency). Experiment 2 showed that this pattern was not influenced by viewing distance and, thus, that the results are best interpreted in object spatial frequency, rather than in retinal coordinates. Experiment 3 showed that low-pass filtered images could produce a performance equivalent to that produced by unfiltered images. These experiments are consistent with the hypothesis that high spatial resolution information is not necessary for audiovisual speech perception and that a limited range of spatial frequency spectrum is sufficient.

This research was supported by grants from the Natural Sciences and Engineering Research Council of Canada, the NIH, the National Institute of Deafness and Other Communications Disorders (Grant DC-05774), and the Communication Dynamics Project, ATR Human Information Science Laboratories, Kyoto, Japan.
Note—This article was accepted by the previous editorial team, headed by Neil Macmillan.