The COVID-19 pandemic has led to a dramatic increase in the use of face masks. Face masks can affect both the acoustic properties of the signal and the speech patterns and have undesirable effects on automatic speech recognition systems as well as on forensic speaker recognition and identification systems. This is because the masks introduce both intrinsic and extrinsic variability into the audio signals. Moreover, their filtering effect varies depending on the type of mask used. In this paper we explore the impact of the use of different masks on the performance of an automatic speaker recognition system based on Mel Frequency Cepstral Coefficients to characterise the voices and on Support Vector Machines to perform the classification task. The results show that masks slightly affect the classification results. The effects vary depending on the type of mask used, but not as expected, as the results with FPP2 masks are better than those with surgical masks. An increase in speech intensity has been found with the FPP2 mask, which is related to the increased vocal effort made to counteract the effects of hearing loss.
- Automatic speaker recognition
- Acoustic features
- Face mask
- Forensic acoustics
This is a preview of subscription content, access via your institution.
Tax calculation will be finalised at checkout
Purchases are for personal use onlyLearn about institutional subscriptions
In Spanish, only the first two formants, F1 and F2, have the characteristics that make the difference between one vowel sound and another. This is due to the relationship between the location of the formants in the spectrogram and the position of the organs involved in articulation .
According to Delgado-Romero , “a control sample is one that belongs to a known subject, while a recovered sample is anonymous, i.e. the identity of the person who carried it out is not known”.
The corpus repository and the ASR system are available at: https://tinyurl.com/8h8dteuu.
Atcherson, S.R., et al.: The effect of conventional and transparent surgical masks on speech understanding in individuals with and without hearing loss. J. Am. Acad. Audiol. 28, 58–67 (2017)
Audacity Team: Audacity (R): Free audio editor and recorder [computer application] (2022). www.audacityteam.org/
Boersma, P., Weenink, D.: Praat: doing phonetics by computer [computer program] (version 6.2.10) (2009). www.praat.org. Accessed 17 Mar 2022
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011). https://doi.org/10.1145/1961189.1961199
Coniam, D.: The impact of wearing a face mask in a high-stakes oral examination: an exploratory post-SARS study in Hong Kong. Lang. Assess. Q.: Int. J. 2, 235–261 (2005)
Corey, R.M., Jones, U., Singer, A.C.: Acoustic effects of medical, cloth, and transparent face masks on speech signals. J. Acoust. Soc. Am. 148, 2371–2375 (2020). https://doi.org/10.1121/10.0002279
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1023/A:1022627411411
Delgado-Romero, C.: La Identificación de Locutores en el Ámbito Forense (in Spanish). Ph.D. thesis, Departamento de Comunicación y Publicidad II. Facultad de Ciencias de la Información. Universidad Complutense de Madrid. España (2001)
Deller, J.R., Proakis, J.G., Hansen, J.H.L.: Discrete-Time Processing of Speech Signals. Institute of Electrical and Electronics Engineers, New York (2015)
ENFSI: Forensic speech and audio analysis working group terms of reference for forensic speaker analysis. European Network of Forensic Science Institutes, pp. 1–4 (2008)
Leu, F.Y., Lin, G.L.: An MFCC-based speaker identification system. In: IEEE 31st International Conference on Advanced Information Networking and Applications, AINA, pp. 1055–1062. Institute of Electrical and Electronics Engineers Inc. (2017). https://doi.org/10.1109/AINA.2017.130
Maher, R.C.: Principles of Forensic Audio Analysis. Springer, Heidelberg (2018). https://doi.org/10.1007/978-3-319-99453-6
McFee, B., et al.: Thassilo: librosa/librosa: 0.9.1 (2022). https://doi.org/10.5281/zenodo.6097378
McFee, B., et al.: Librosa: audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference, pp. 18–24 (2015). https://doi.org/10.25080/majora-7b98e3ed-003
Mendel, L.L., Gardino, J.A., Atcherson, S.R.: Speech understanding using surgical masks: a problem in health care? J. Am. Acad. Audiol. 19, 686–695 (2008)
Nguyen, D.D., et al.: Acoustic voice characteristics with and without wearing a facemask. Sci. Rep. 11, 1–11 (2021). https://doi.org/10.1038/s41598-021-85130-8
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Pörschmann, C., Lübeck, T., Arend, J.M.: Impact of face masks on voice radiation. J. Acoust. Soc. Am. 148, 3663–3670 (2020). https://doi.org/10.1121/10.0002853
Radonovich, L.J., Jr., Yanke, R., Cheng, J., Bender, B.: Diminished speech intelligibility associated with certain types of respirators worn by healthcare workers. J. Occup. Environ. Hyg. 7, 63–70 (2009)
Randazzo, M., Koenig, L.L., Priefer, R.: The effect of face masks on the intelligibility of unpredictable sentences. In: Proceedings of Meetings on Acoustics, vol. 42 (2020). https://doi.org/10.1121/2.0001374
Rao, K.S., Vuppala, A.K.: Speech Processing in Mobile Environments. SECE, Springer, heidelberg (2014). https://doi.org/10.1007/978-3-319-03116-3
Ratha, N.K., Connell, J.H., Bolle, R.M.: Enhancing security and privacy in biometrics-based authentication systems. IBM Syst. J. 40(3), 614–634 (2001)
Ribeiro, V., Dassie-Leite, A.P., Pereira, E.C., Santos, A.D.N., Martins, P., de Irineu, R.: Effect of wearing a face mask on vocal self-perception during a pandemic. J. Voice (2020)
Saeidi, R., Huhtakallio, I., Alku, P.: Analysis of face mask effect on speaker recognition. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 08, pp. 1800–1804 (2016). https://doi.org/10.21437/Interspeech.2016-518
Saeidi, R., Niemi, T., Karppelin, H., Pohjalainen, J., Kinnunen, T., Alku, P.: Speaker recognition for speech under face cover. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2015-January, pp. 1012–1016 (2015). https://doi.org/10.21437/interspeech.2015-275
Saleem, S., Subhan, F., Naseer, N., Bais, A., Imtiaz, A.: Forensic speaker recognition: a new method based on extracting accent and language information from short utterances. Forensic Sci. Int.: Digital Invest. 34, 300982 (2020)
Sánchez-López, D.: Análisis acústico y sonográfico de la vocal /a/ para su aplicación en el ámbito de las ciencias forenses (2016). https://tinyurl.com/h5ncwpv. (in Spanish)
Wainer, J., Fonseca, P.: How to tune the RBF SVM hyperparameters? An empirical evaluation of 18 search algorithms. Artif. Intell. Rev. 54, 4771–4797 (2021)
Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., Li, H.: Spoofing and countermeasures for speaker verification: a survey. Speech Commun. 66, 130–153 (2015)
This research was supported by the Research Grants Program of the Universidad de Alcalá. We acknowledge the valuable counsel and resources provided by G. A. Acha Ruiz, as well as to the Department of Forensic Acoustics of the “Comisaría General de Policía Científica” for the access to the LOCUPOL database sentences.
Editors and Affiliations
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Bogdanel, G., Belghazi-Mohamed, N., Gómez-Moreno, H., Lafuente-Arroyo, S. (2022). Study on the Effect of Face Masks on Forensic Speaker Recognition. In: Alcaraz, C., Chen, L., Li, S., Samarati, P. (eds) Information and Communications Security. ICICS 2022. Lecture Notes in Computer Science, vol 13407. Springer, Cham. https://doi.org/10.1007/978-3-031-15777-6_33
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15776-9
Online ISBN: 978-3-031-15777-6