Forensic Speaker Recognition pp 469-503 | Cite as
Helping the Forensic Research Institute of the French Gendarmerie to Identify a Suspect in the Presence of Voice Disguise or Voice Forgery
Abstract
In the field of forensic speaker recognition, the question of voice disguise presents a specific interest. Most criminals try to disguise their voice before making a malefic call or a terrorist threat. Their aim is to change the register of their voice quality in order to falsify their identity (voice disguise) or to mimic the voice of another person (voice forgery). This chapter proposes to analyse two different kinds of disguise: The first is the transformation of the voice by non-electronic and deliberate means; the second is the conversion of the voice by electronic and deliberate means. By considering both kinds of disguise (electronic and non-electronic) our analyses of voice transformation are based on an acoustic approach, which we use to measure specific changes in speech, and on an automatic approach to detect voice disguise. Four kinds of disguises which are considered the most common are studied: high pitched voice, low pitched voice, a hand over the mouth and pinched nostrils. A constraint of audibility and intelligibility has been imposed on the speakers who have recorded the database. The acoustic analysis of specific features reveals some differences according to the form of disguise, while in the automatic experiment we found the best way to detect a voice disguise is to use Support Vector Machines (SVM) technique. The level of performance is an AUC (area under curve) at 0.79. Voice conversion techniques are also proposed and applied in two forensic scenarios: first, the imitation of a politician from an Internet recording; and second, the application of voice disguise reversibility. Different kinds of tests are proposed to evaluate the relevance of the results, which are based on objective and subjective measurements. The best conversion is obtained from a GMM-ALISP voice conversion.
Keywords
Speech Rate Speaker Recognition Speaker Identification Target Speaker Voice ConversionReferences
- 1.Abe M, Nakamura S, Shikano K, Kuwabara H (1988) Voice conversion through vector quantization. In: Proceedings IEEE Int conf on acoustics, speech and signal processing, pp 655–658Google Scholar
- 2.Baverel C, Chollet G, Gournay P (2001) Amélioration d’un codeur de parole à très bas débit par indexation d’unités de taille variable. In: GRETSIGoogle Scholar
- 3.Bimbot F, Chollet G, Deleglise G, Montacie C (1988) Temporal decomposition and acoustic-phonetic decoding of speech. In Proceedings of the international conference of acoustics, speech, and signal processing, ICASSP, pp 445–448Google Scholar
- 4.Blomberg M, Elenius D, Zetterholm E (2004) Relating acoustic features of a professional impersonator with the score of a speaker verification system. In: Proceedings of FonetikGoogle Scholar
- 5.Boersma P, Weenink D (2008) Praat: doing phonetics by computer. http://www.praat.org/Google Scholar
- 6.Chollet G, Cernocky J, Constantinescu A, Deligne S, Bimbot F (1999) Towards ALISP: a proposal for automatic language independent speech processing. In: Computational models of speech pattern processing, NATO ASI Series, Series F: computer and system sciences, vol 169. Springer, pp 375–387Google Scholar
- 7.Clark J, Foulkes P (2007) Identification of voices in electronically disguised speech. Int J Speech Lang Law 14:2Google Scholar
- 8.de Figueiredo RM, Souza Britto H (1996) A report on the acoustic effects of one type of disguise. Forensic Linguist 3(1):168–175Google Scholar
- 9.de Krom G (1993) A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals. J Speech Hear Res 36:254–266Google Scholar
- 10.Doddington DR, Przybocki M, Martin AF, Reynolds DA (2000) The NIST speaker recognition evaluation—overview, methodolgy, systems, results, perspectives. Speech Commun 31:225–254CrossRefGoogle Scholar
- 11.Duxans H, Erro D, Pérez J, Diego F, Bonafonte A, Moreno A (2006) Voice conversion of non-aligned data using unit selection. In: TC-STAR workshop on speech to speech translationGoogle Scholar
- 12.En-najjary T (2005) Conversion de voix pour la synthèse de la parole. In Rapport de Thèse, Université de Rennes IGoogle Scholar
- 13.Eriksson A, Wretling P (1997) How flexible is the human voice? A case study of mimicry. In: Proceedings in European conference speech technology, RhodesGoogle Scholar
- 14.Fan X, Hansen JHL (2010) Acoustic analysis for speaker identification of whispered speech. In: Proceedings of ICASSPGoogle Scholar
- 15.Fant G (1960) Acoustic theory of speech production. Mouton & Co, The HagueGoogle Scholar
- 16.Farrus M, Wagner M, Anguita J, Hernando J (2008) Robustness of prosodic features to voice imitation. In: Proceedings of Interspeech, Brisbane, Australia, Sep 2008, pp 613–616Google Scholar
- 17.Fawcett T (2005) An introduction to ROC analysis. Pattern Recog Lett 27:861–874 (special issue on ROC analysis)CrossRefGoogle Scholar
- 18.Gray AH, Wong DY (1980) The burg algorithm for LPC speech analysis/synthesis. IEEE Trans Acoust Speech Signal Process 28:609–615CrossRefGoogle Scholar
- 19.Hatef M, Kitter J, Duin R(1996) Combining classifiers. In: Proceedings of ICPR, pp 897–901Google Scholar
- 20.Hirson A, Duckworth M (1995) Forensic implications of vocal creak as voicedisguise. Beitr Phonetik Linguist 64:67–76Google Scholar
- 21.Kain A, Maccon MW (1998) Spectral voice conversion for text to speech synthesis. In: Proceedings of the ICASSPGoogle Scholar
- 22.Kajarekar S, Bratt H, Shriberg E, de Leon R (2006) A study of intentional voice modifications for evading automatic speaker recognition. In: Proceedings OdysseyGoogle Scholar
- 23.Kittler J, Hatef M, Duin R, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Machine Intell 20;226–239CrossRefGoogle Scholar
- 24.Künzel HJ (1994) Current approach to forensic speaker recognition. In: Proceedings ESCA workshop on automatic speaker recognition, identification, and verification, pp 135–141Google Scholar
- 25.Künzel HJ (2000) Effects of voice disguise on speaking fundamental frequency. Forensic Linguist 7(2):149–179CrossRefGoogle Scholar
- 26.Künzel H, Gonzalez-Rodriguez J, Ortega-Garcia J (2004) Effect of voice disguise on the performance of a forensic automatic speaker recognition system. In: Proceedings of OdysseyGoogle Scholar
- 27.Lau YW, Wagner M, Tran D (2004) Vulnerability of speaker verification to voice mimicking. In: Proceedings of international symposium on intelligent multimedia, video and speech processingGoogle Scholar
- 28.Lindsey G, Hirson A (1999) Variable robustness of nonstandard /r/ in English: evidence from accent disguise. Forensic Linguist 6(2):278–288CrossRefGoogle Scholar
- 29.Maeda S (1990) Compensatory articulation during speech: evidence from the analysis and synthesis of vocal-tract shapes using an articulatory model. In: Hardcastle WJ, Marchal A (eds) Speech production and speech modelling. Kluwer, Amsterdam, pp 131–149CrossRefGoogle Scholar
- 30.Maeda S (1992) Modélisation articulatoire du conduit vocal. J Phys IV(2):191–198Google Scholar
- 31.Mann MD (2006) The ‘CSI Effect’: better jurors through television and science? Buff Pub Int Law J 211:215–218Google Scholar
- 32.Masthoff H (1996) A report on voice disguise experiment. Forensic Linguist 3:160–167Google Scholar
- 33.Meuwly D (2001) Reconnaissance de locuteur en sciences forensiques: l’apport d’une approche automatique. PhD thesisGoogle Scholar
- 34.Nakamura S, Shikano K (1989) Spectrogram normalization using fuzzy vector quantization. Int J Acoust Soc Jpn 45:107–109Google Scholar
- 35.Orchard TL, Yarmey AD (1995) The effects of whispers, voice sample duration, and voice distinctiveness on criminal speaker identification. J Appl Cognit Psychol 9(3):249–260CrossRefGoogle Scholar
- 36.Patil HA, Basu TK (2008) LP spectra vs. Mel spectra for identification of professional mimics in Indian languages. Int J Speech Tech IJST, Springer 11(1):1–16CrossRefGoogle Scholar
- 37.Patil HA, Dutta PK, Basu TK (2006) Effectiveness of LP based features for identification of professional mimics in Indian languages. In: International workshop on multimodal user authentication, MMUA06, Toulouse, France, May 11–12, 2006Google Scholar
- 38.Perrot P, Chollet G (2009) Les mondes virtuels: un nouvel espace ouvert à la criminalité. In: Proceedings WISG workshop interdisciplinare sur la Sécurité globaleGoogle Scholar
- 39.Perrot P, Aversano G, Blouet R, Charbit M, Chollet G (2005) Voice forgery using ALISP: indexation in a client memory. In: ICASSPGoogle Scholar
- 40.Perrot P, Razik J, Chollet G (2009) Vocal forgery in forensic sciences. In: Proceedings of E Forensics, AdelaïdeGoogle Scholar
- 41.Reich AR (1977) Speaker identification: effects of vocal disguise upon listener performance. J Acoust Soc Am 62(S1):S4CrossRefGoogle Scholar
- 42.Reich AR, Duke JE (1979) Effect of selective vocal disguise upon speaker identification by listening. J Acoust Soc Am 66:1023–1028CrossRefGoogle Scholar
- 43.Rodman RD (1988) Speaker recognition of disguised voices. In: Proceedings of the consortium on speech technology Conference on speaker recognition by man and machine: directions for forensic applications COST250, pp 9–22Google Scholar
- 44.Rodman RD, Powell MS (2000) Computer recognition of speakers who disguise their voice. In: Proceedings of the international conference on signal processing applications and technology, ICSPATGoogle Scholar
- 45.Schlichting FF, Sullivan KPH (1997) The imitated voice—a problem for voice lineups? Forensic Linguist 4(1):148–166Google Scholar
- 46.Schweitzer NJ, Michael JS (2007) The CSI effect: popular fiction about forensic science affects public expectations about real forensic science. Jurimetrics, SpringGoogle Scholar
- 47.Sjötröm M, Eriksson J, Zetterholm E, Sullivan KPH (2006) A switch of dialect as disguise. Lund University, centre for languages and literature, Department of Linguistics and phonetics working papersGoogle Scholar
- 48.Solewicz YA, Sofer MK (2004) A robust framework for forensic speaker erification. In: SPECOM 2004: 9th Conference Speech and ComputerGoogle Scholar
- 49.Sreenivasa Rao K, Yegnanarayana B (2006) Voice conversion by prosody and vocal tract modification. Proceedings of ninth international conference on information technology, Bhubaneswar, Orissa, pp 111–116Google Scholar
- 50.Stylianou Y (1996) Harmonics plus noise models for speech, combined with statistical methods for speech and speaker modifications. Phd thesis, Telecom ParisGoogle Scholar
- 51.Stylianou Y, Cappe O (1995) Statistical methods for voice quality transformation. In: EUROSPEECHGoogle Scholar
- 52.Sündermann D, Bonafonte A, Höge H, Ney H (2004) Voice conversion using exclusively unaligned training data. In: Proceedings Spanish society for natural language processing conferenceGoogle Scholar
- 53.Taseer SK (2005) Speaker identification for speakers with deliberately disguised voices using glottal pulse information. In: Proceedings of the 3rd international workshop on frontiers of information technologyGoogle Scholar
- 54.Toda T, Black A, Tokuda K (2005) Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter. In: ICASSP, pp 9–12Google Scholar
- 55.Toda T, Black A, Tokuda K (2007) Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE TASLP 15(8):2222Google Scholar
- 56.Torstensson N, Kirk P, Sullivan H, Erik JE (2004) Mimicked accents: do speakers have similar cognitive prototype? In: Proceedings of SST2004: the 10th Australian international conference on speech science and technologyGoogle Scholar
- 57.Valbret H, Moulines E, Tubach JP (1992) Voice transformation using TDPSOLA technique. In: ICASSPGoogle Scholar
- 58.Vapnik VN (1998) Statistical learning theory. Wiley, New YorkMATHGoogle Scholar
- 59.Wagner I, Köster O (1999) Perceptual recognition of familiar voices using falsetto as a type of voice disguise. In: Proceedings of the XIVth international congress of phonetic sciences, USA, pp 1381–1385Google Scholar
- 60.Yumoto E (1982) Harmonics to noise ratio as a degree of hoarseness. J Acoust Soc Am 71(6):1544–1549CrossRefGoogle Scholar
- 61.Zetterholm E (2003) Voice imitation. A phonetic study of perceptual illusions and acoustic success. PhD dissertationGoogle Scholar
- 62.Zetterholm E (2007) Detection of speaker characteristics using voice imitation. In: Müller C (ed), Speaker classification II. Lecture Notes in Computer Science, vol 4441. Springer, Berlin, pp 192–205CrossRefGoogle Scholar