Introduction

The use of face personal protective equipment (F-PPE) is one of the main global countermeasures aimed at reducing human-to-human transmission of SARS-CoV-2, involving both the general population and front‐line healthcare staff working with patients affected by coronavirus (COVID-19) [1, 2]. Used as a barrier against respiratory droplets, F-PPE conceals the mouth and may hinder the production and transmission of speech sounds, posing an obstacle to verbal communication. We assessed the effect of common types and combinations of F-PPE on speech intelligibility in quiet and in a simulated noisy environment.

Methods

Speech intelligibility was measured according to the Speech Intelligibility Index (SII) standard [3] principles and correlated to speech audiometry results of normal hearing subjects. For a given speech-in-noise condition, the SII is calculated from the levels of speech and environmental noise frequency spectra, and from the listener’s hearing threshold. The model output is highly correlated to the total speech information available to the listener [3]. The tests were carried out in a sound-treated room equipped with Ambisonics spatial audio facilities consisting of a hemispherical array of 24 broadband-flat frequency response loudspeakers (Ambisonic 130, Audio Design, Cavriago, RE, Italy). Ambisonics is a multichannel audio technology allowing for reproduction of audio signals containing complete spatial information, that we used in this case to emulate a diffuse noise centered on the listening position. Spectral analysis was performed with 1/3 octave band filters with a calibrated sound level meter (XL2 Sound Level Meter, NTi Audio, Schaan, Liechtenstein). The speech signal consisted of 5 phonetically balanced lists of 20 spondaic words extracted from the standard Italian speech audiometry material [4], merged in a single utterance. Words were separated by 230 ms, corresponding to a speed of 74 words per minute. The signal was delivered at 70 dB SPL, an intensity that is comfortably loud for a normal hearing listener. Average band importance functions from the ANSI standard were used to weight the speech signal [3]. Twelve experimental conditions were tested: (1) uncovered loudspeaker, (2) surgical mask (Byd, Los Angeles, CA), (3) face shield (Univet, Brescia, Italy), (4) surgical mask combined with face shield, (5) FFP2 mask (Droair, Dromex, Johannesburg, South Africa), (6) FFP2 mask combined with face shield, (7) FFP2 ventilated mask (Oxy Line, Pabianice, Poland), (8) FFP2 ventilated mask combined with surgical mask, (9) FFP2 ventilated mask combined with face shield, (10) FFP3 ventilated mask (Bls Group, Milano, Italy), (11) FFP3 ventilated mask combined with surgical mask, (12) FFP3 ventilated mask combined with face shield (Fig. 1). Room background noise and diffuse, multi-talker babble noise were recorded. The noise was increased in steps of 5 dB from mild environmental noise (25 dB SPL), to loud environmental noise (70 dB SPL, i.e. signal-to-noise ratio = 0). The hearing threshold parameters in the SII model were set to normal hearing values. To assess reproducibility and repeatability, three trials were conducted for each condition adjusting the F-PPE donning, and collected data were divided in three samples for each condition. Analysis of variance was then performed (one-way ANOVA) to test differences in trials and samples. Speech recognition of ten normal hearing adult volunteers was tested in the same conditions, according to clinical speech audiometry standard principles [4, 5]. Correlation between SII modeled and subjective data was tested with the Spearman’s Rho test.

Fig. 1
figure 1

a Face personal protective equipment (F-PPE) used for the test: ventilated FFP2 (upper left), ventilated FFP3 (lower left), face shield (middle), FFP2 (upper right), surgical mask (lower right), b testing set-up

Results

Spectral analysis of the signal in quiet shows no relevant differences for the F-PPE conditions in the range 160–1000 Hz, but a loss up to 9.7 dB (with the FFP2 ventilated mask combined with the face shield) in the range 1250–2000 Hz, up to 18.3 dB at 2500–4000 Hz and up to 16.8 dB at 5000–8000 Hz (both with the FFP3 ventilated mask combined with face shield) (Fig. 2). SII results gathered from the model (Fig. 3) display a little gap from the uncovered loudspeaker condition in quiet and in loud noise (ceiling effect), but they are prominent at moderate noise levels, from 6.4% at 50 dB SPL for the surgical mask condition to 25.0% at 40 dB SPL for the FFP3 ventilated mask combined with face shield condition. ANOVA confirmed the repeatability and reproducibility of measurements for each test condition. The speech recognition function obtained for normal hearing subjects demonstrates larger gaps between uncovered and masked conditions, as high as 23.3% at 70 dB SPL for the surgical mask, 49.0% at 60 dB SPL for the face shield, 38.0% at 65 dB SPL for the surgical mask combined with face shield, 23.3% at 70 dB SPL for the FFP2 mask, 60.5% at 65 dB SPL for the FFP2 mask combined with face shield, 44.0% at 60 dB SPL for the FFP2 ventilated mask, 51.3% at 65 dB SPL for the FFP2 ventilated mask combined with surgical mask, 64.0% at 60 dB SPL for the FFP2 ventilated mask combined with face shield, 40.7% at 60 dB SPL for the FFP3 ventilated mask, 44.0% at 60 dB SPL for the FFP3 ventilated mask combined with surgical mask, and 69.0% at 60 dB SPL for the FFP3 ventilated mask combined with face shield (Fig. 3). The Spearman’s correlation is high (> 0.85) between SII-modeled and subjective data.

Fig. 2
figure 2

Full spectral analysis of speech signal from the uncovered loudspeaker and main frequency band level differences for face masks and face shield, alone and combined, in quiet

Fig. 3
figure 3

Speech recognition of normal hearing subjects (solid line) and Speech Intelligibility Index (SII, dashed line) for face masks and face shield, alone and combined, at different levels of diffuse speech noise (dotted lines, reference values for the uncovered loudspeaker conditions)

Discussion

Given the gravity of the pandemic, wearing F-PPEs has become an action against SARS-CoV-2 on a global scale. Beyond healthcare staff protection needs, the evidence of benefit combined with the low risk of harm supports mask wearing by the general public [2]. However, the effects of F-PPE on speech production and transmission, and hence on intelligibility, need attention. A number of objective measures for predicting speech intelligibility in quiet and noisy conditions have been proposed [6]. SII was developed with simple signal degradations in mind, e.g. linear filtering and additive noise. The speech transmission index (STI) is another standardized model which evaluates the reduction in modulation depth of a specifically designed signal to include the effects of reverberant speech and room acoustics. Both SII and STI outputs range from 0 to 1, i.e. from bad (< 0.3) to excellent (> 0.7) speech intelligibility. SII has been validated for several speech perception tests, and average band importance functions are available for generic speech signals. Many developments of the standard SII and STI have been recently proposed to improve prediction of speech intelligibility [6]. The characteristics of the signal may have influenced the difference between the average SII model and the subjective psychometric function, being the latter steeply sloping at higher noise levels. Palmiero et al. [7] measured the effects of different types of F-PPE on speech intelligibility and speech-in-noise with STI. In 2016, the use of F-PPE was usually limited to the healthcare setting. They reported that signal attenuation caused by F-PPE affects mainly the high frequencies, with a loss depending on F-PPE type, in agreement with our results. This effect is the main responsible for speech intelligibility reduction due to the use of F-PPE. The thinness and thickness of the mask, and the acoustic properties of the materials themselves, are all contributing factors to signal degradation. STI deviation from no-mask condition was 3–4% for surgical masks, 13–17% for N95 masks (similar to FFP2) and 42–45% for elastomeric half-mask air-purifying respirators (EAPRs, a category of F-PPE not included in our study). Despite the different design and set-up, there is an overall agreement in speech intelligibility decline for surgical masks and N95/FFP2 masks, either in quiet or in noise, between the results of Palmiero et al. [7] and our measurements.

The negative impact of wearing personal protective equipment on communication during the pandemic is the topic of many anecdotal reports and of the paper of Hampton et al. [8]. Speech perception scores were found to be significantly impaired when a human speaker was wearing F-PPE and the background noise was set to loud levels, simulating an operating theatre scenario. Despite better depicting a real-world condition, the subjective measures based on sentences perception expose to a wider range of results, e.g. due to the listeners’ high interindividual variability of speech-in-noise performance, to the speaker’s Lombard effect (the involuntary tendency to raise voice levels to enhance speech intelligibility in noise), and to the wide signal-to-noise ratio fluctuations [8].

SII has not been validated for predicting speech intelligibility in case of modulated maskers or diffuse noise, although it has been used for studies considering fluctuating noise as well [6]. Despite this limitation, the SII is widely used and considered a reliable objective measure for speech intelligibility. Common surgical masks, which are used by the larger part of the population, are responsible for a loss of more than 20% of speech intelligibility when the signal level equals the background noise level. Front-line healthcare staff may experience significant communication issues because of F-PPE. Wearing advanced F-PPE (e.g. FFP3 masks combined with a face shield) and working in noisy environments (e.g. the intensive care unit) can cause a speech intelligibility reduction of almost 70% [7]. In the SII and STI models, which are characterized by a smooth psychometric function, the effect is evident at moderate levels of environmental noise. This condition can be more detrimental to people affected by hearing loss. It should also be noted that our results do not take into account the effects of wearing F-PPE on speech articulation—whereas this effect cannot be simulated by using a loudspeaker (nor a head-and-torso mannequin). Besides the reduction of the speech intelligibility, it should be also pointed out that (1) F-PPE precludes lip-reading, which provides crucial cues to speech understanding in challenging background-noise conditions, especially for people affected by hearing loss [9], and (2) COVID-19 social distancing measures may require an increased space between talkers [10], which leads to a detrimental decrease in the signal-to-noise ratio. Actually, as sound intensity falls exponentially with the square of the distance, every doubling of space between signal source and listener corresponds to a remarkable loss of 6 dB.

Conclusion

The use of F-PPE is one of the main global countermeasures aimed at reducing human-to-human transmission of SARS-CoV-2. F-PPE may cause significant speech intelligibility issues. Further research is needed to study the impact of F-PPE on verbal communication in the general population and in specific at-risk groups, e.g. healthcare workers, school-aged children, and people affected by voice and hearing disorders.