Helping the Forensic Research Institute of the French Gendarmerie to Identify a Suspect in the Presence of Voice Disguise or Voice Forgery

Chapter

Abstract

In the field of forensic speaker recognition, the question of voice disguise presents a specific interest. Most criminals try to disguise their voice before making a malefic call or a terrorist threat. Their aim is to change the register of their voice quality in order to falsify their identity (voice disguise) or to mimic the voice of another person (voice forgery). This chapter proposes to analyse two different kinds of disguise: The first is the transformation of the voice by non-electronic and deliberate means; the second is the conversion of the voice by electronic and deliberate means. By considering both kinds of disguise (electronic and non-electronic) our analyses of voice transformation are based on an acoustic approach, which we use to measure specific changes in speech, and on an automatic approach to detect voice disguise. Four kinds of disguises which are considered the most common are studied: high pitched voice, low pitched voice, a hand over the mouth and pinched nostrils. A constraint of audibility and intelligibility has been imposed on the speakers who have recorded the database. The acoustic analysis of specific features reveals some differences according to the form of disguise, while in the automatic experiment we found the best way to detect a voice disguise is to use Support Vector Machines (SVM) technique. The level of performance is an AUC (area under curve) at 0.79. Voice conversion techniques are also proposed and applied in two forensic scenarios: first, the imitation of a politician from an Internet recording; and second, the application of voice disguise reversibility. Different kinds of tests are proposed to evaluate the relevance of the results, which are based on objective and subjective measurements. The best conversion is obtained from a GMM-ALISP voice conversion.

Keywords

Speech Rate Speaker Recognition Speaker Identification Target Speaker Voice Conversion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Abe M, Nakamura S, Shikano K, Kuwabara H (1988) Voice conversion through vector quantization. In: Proceedings IEEE Int conf on acoustics, speech and signal processing, pp 655–658Google Scholar
  2. 2.
    Baverel C, Chollet G, Gournay P (2001) Amélioration d’un codeur de parole à très bas débit par indexation d’unités de taille variable. In: GRETSIGoogle Scholar
  3. 3.
    Bimbot F, Chollet G, Deleglise G, Montacie C (1988) Temporal decomposition and acoustic-phonetic decoding of speech. In Proceedings of the international conference of acoustics, speech, and signal processing, ICASSP, pp 445–448Google Scholar
  4. 4.
    Blomberg M, Elenius D, Zetterholm E (2004) Relating acoustic features of a professional impersonator with the score of a speaker verification system. In: Proceedings of FonetikGoogle Scholar
  5. 5.
    Boersma P, Weenink D (2008) Praat: doing phonetics by computer. http://www.praat.org/Google Scholar
  6. 6.
    Chollet G, Cernocky J, Constantinescu A, Deligne S, Bimbot F (1999) Towards ALISP: a proposal for automatic language independent speech processing. In: Computational models of speech pattern processing, NATO ASI Series, Series F: computer and system sciences, vol 169. Springer, pp 375–387Google Scholar
  7. 7.
    Clark J, Foulkes P (2007) Identification of voices in electronically disguised speech. Int J Speech Lang Law 14:2Google Scholar
  8. 8.
    de Figueiredo RM, Souza Britto H (1996) A report on the acoustic effects of one type of disguise. Forensic Linguist 3(1):168–175Google Scholar
  9. 9.
    de Krom G (1993) A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals. J Speech Hear Res 36:254–266Google Scholar
  10. 10.
    Doddington DR, Przybocki M, Martin AF, Reynolds DA (2000) The NIST speaker recognition evaluation—overview, methodolgy, systems, results, perspectives. Speech Commun 31:225–254CrossRefGoogle Scholar
  11. 11.
    Duxans H, Erro D, Pérez J, Diego F, Bonafonte A, Moreno A (2006) Voice conversion of non-aligned data using unit selection. In: TC-STAR workshop on speech to speech translationGoogle Scholar
  12. 12.
    En-najjary T (2005) Conversion de voix pour la synthèse de la parole. In Rapport de Thèse, Université de Rennes IGoogle Scholar
  13. 13.
    Eriksson A, Wretling P (1997) How flexible is the human voice? A case study of mimicry. In: Proceedings in European conference speech technology, RhodesGoogle Scholar
  14. 14.
    Fan X, Hansen JHL (2010) Acoustic analysis for speaker identification of whispered speech. In: Proceedings of ICASSPGoogle Scholar
  15. 15.
    Fant G (1960) Acoustic theory of speech production. Mouton & Co, The HagueGoogle Scholar
  16. 16.
    Farrus M, Wagner M, Anguita J, Hernando J (2008) Robustness of prosodic features to voice imitation. In: Proceedings of Interspeech, Brisbane, Australia, Sep 2008, pp 613–616Google Scholar
  17. 17.
    Fawcett T (2005) An introduction to ROC analysis. Pattern Recog Lett 27:861–874 (special issue on ROC analysis)CrossRefGoogle Scholar
  18. 18.
    Gray AH, Wong DY (1980) The burg algorithm for LPC speech analysis/synthesis. IEEE Trans Acoust Speech Signal Process 28:609–615CrossRefGoogle Scholar
  19. 19.
    Hatef M, Kitter J, Duin R(1996) Combining classifiers. In: Proceedings of ICPR, pp 897–901Google Scholar
  20. 20.
    Hirson A, Duckworth M (1995) Forensic implications of vocal creak as voicedisguise. Beitr Phonetik Linguist 64:67–76Google Scholar
  21. 21.
    Kain A, Maccon MW (1998) Spectral voice conversion for text to speech synthesis. In: Proceedings of the ICASSPGoogle Scholar
  22. 22.
    Kajarekar S, Bratt H, Shriberg E, de Leon R (2006) A study of intentional voice modifications for evading automatic speaker recognition. In: Proceedings OdysseyGoogle Scholar
  23. 23.
    Kittler J, Hatef M, Duin R, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Machine Intell 20;226–239CrossRefGoogle Scholar
  24. 24.
    Künzel HJ (1994) Current approach to forensic speaker recognition. In: Proceedings ESCA workshop on automatic speaker recognition, identification, and verification, pp 135–141Google Scholar
  25. 25.
    Künzel HJ (2000) Effects of voice disguise on speaking fundamental frequency. Forensic Linguist 7(2):149–179CrossRefGoogle Scholar
  26. 26.
    Künzel H, Gonzalez-Rodriguez J, Ortega-Garcia J (2004) Effect of voice disguise on the performance of a forensic automatic speaker recognition system. In: Proceedings of OdysseyGoogle Scholar
  27. 27.
    Lau YW, Wagner M, Tran D (2004) Vulnerability of speaker verification to voice mimicking. In: Proceedings of international symposium on intelligent multimedia, video and speech processingGoogle Scholar
  28. 28.
    Lindsey G, Hirson A (1999) Variable robustness of nonstandard /r/ in English: evidence from accent disguise. Forensic Linguist 6(2):278–288CrossRefGoogle Scholar
  29. 29.
    Maeda S (1990) Compensatory articulation during speech: evidence from the analysis and synthesis of vocal-tract shapes using an articulatory model. In: Hardcastle WJ, Marchal A (eds) Speech production and speech modelling. Kluwer, Amsterdam, pp 131–149CrossRefGoogle Scholar
  30. 30.
    Maeda S (1992) Modélisation articulatoire du conduit vocal. J Phys IV(2):191–198Google Scholar
  31. 31.
    Mann MD (2006) The ‘CSI Effect’: better jurors through television and science? Buff Pub Int Law J 211:215–218Google Scholar
  32. 32.
    Masthoff H (1996) A report on voice disguise experiment. Forensic Linguist 3:160–167Google Scholar
  33. 33.
    Meuwly D (2001) Reconnaissance de locuteur en sciences forensiques: l’apport d’une approche automatique. PhD thesisGoogle Scholar
  34. 34.
    Nakamura S, Shikano K (1989) Spectrogram normalization using fuzzy vector quantization. Int J Acoust Soc Jpn 45:107–109Google Scholar
  35. 35.
    Orchard TL, Yarmey AD (1995) The effects of whispers, voice sample duration, and voice distinctiveness on criminal speaker identification. J Appl Cognit Psychol 9(3):249–260CrossRefGoogle Scholar
  36. 36.
    Patil HA, Basu TK (2008) LP spectra vs. Mel spectra for identification of professional mimics in Indian languages. Int J Speech Tech IJST, Springer 11(1):1–16CrossRefGoogle Scholar
  37. 37.
    Patil HA, Dutta PK, Basu TK (2006) Effectiveness of LP based features for identification of professional mimics in Indian languages. In: International workshop on multimodal user authentication, MMUA06, Toulouse, France, May 11–12, 2006Google Scholar
  38. 38.
    Perrot P, Chollet G (2009) Les mondes virtuels: un nouvel espace ouvert à la criminalité. In: Proceedings WISG workshop interdisciplinare sur la Sécurité globaleGoogle Scholar
  39. 39.
    Perrot P, Aversano G, Blouet R, Charbit M, Chollet G (2005) Voice forgery using ALISP: indexation in a client memory. In: ICASSPGoogle Scholar
  40. 40.
    Perrot P, Razik J, Chollet G (2009) Vocal forgery in forensic sciences. In: Proceedings of E Forensics, AdelaïdeGoogle Scholar
  41. 41.
    Reich AR (1977) Speaker identification: effects of vocal disguise upon listener performance. J Acoust Soc Am 62(S1):S4CrossRefGoogle Scholar
  42. 42.
    Reich AR, Duke JE (1979) Effect of selective vocal disguise upon speaker identification by listening. J Acoust Soc Am 66:1023–1028CrossRefGoogle Scholar
  43. 43.
    Rodman RD (1988) Speaker recognition of disguised voices. In: Proceedings of the consortium on speech technology Conference on speaker recognition by man and machine: directions for forensic applications COST250, pp 9–22Google Scholar
  44. 44.
    Rodman RD, Powell MS (2000) Computer recognition of speakers who disguise their voice. In: Proceedings of the international conference on signal processing applications and technology, ICSPATGoogle Scholar
  45. 45.
    Schlichting FF, Sullivan KPH (1997) The imitated voice—a problem for voice lineups? Forensic Linguist 4(1):148–166Google Scholar
  46. 46.
    Schweitzer NJ, Michael JS (2007) The CSI effect: popular fiction about forensic science affects public expectations about real forensic science. Jurimetrics, SpringGoogle Scholar
  47. 47.
    Sjötröm M, Eriksson J, Zetterholm E, Sullivan KPH (2006) A switch of dialect as disguise. Lund University, centre for languages and literature, Department of Linguistics and phonetics working papersGoogle Scholar
  48. 48.
    Solewicz YA, Sofer MK (2004) A robust framework for forensic speaker erification. In: SPECOM 2004: 9th Conference Speech and ComputerGoogle Scholar
  49. 49.
    Sreenivasa Rao K, Yegnanarayana B (2006) Voice conversion by prosody and vocal tract modification. Proceedings of ninth international conference on information technology, Bhubaneswar, Orissa, pp 111–116Google Scholar
  50. 50.
    Stylianou Y (1996) Harmonics plus noise models for speech, combined with statistical methods for speech and speaker modifications. Phd thesis, Telecom ParisGoogle Scholar
  51. 51.
    Stylianou Y, Cappe O (1995) Statistical methods for voice quality transformation. In: EUROSPEECHGoogle Scholar
  52. 52.
    Sündermann D, Bonafonte A, Höge H, Ney H (2004) Voice conversion using exclusively unaligned training data. In: Proceedings Spanish society for natural language processing conferenceGoogle Scholar
  53. 53.
    Taseer SK (2005) Speaker identification for speakers with deliberately disguised voices using glottal pulse information. In: Proceedings of the 3rd international workshop on frontiers of information technologyGoogle Scholar
  54. 54.
    Toda T, Black A, Tokuda K (2005) Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter. In: ICASSP, pp 9–12Google Scholar
  55. 55.
    Toda T, Black A, Tokuda K (2007) Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE TASLP 15(8):2222Google Scholar
  56. 56.
    Torstensson N, Kirk P, Sullivan H, Erik JE (2004) Mimicked accents: do speakers have similar cognitive prototype? In: Proceedings of SST2004: the 10th Australian international conference on speech science and technologyGoogle Scholar
  57. 57.
    Valbret H, Moulines E, Tubach JP (1992) Voice transformation using TDPSOLA technique. In: ICASSPGoogle Scholar
  58. 58.
    Vapnik VN (1998) Statistical learning theory. Wiley, New YorkMATHGoogle Scholar
  59. 59.
    Wagner I, Köster O (1999) Perceptual recognition of familiar voices using falsetto as a type of voice disguise. In: Proceedings of the XIVth international congress of phonetic sciences, USA, pp 1381–1385Google Scholar
  60. 60.
    Yumoto E (1982) Harmonics to noise ratio as a degree of hoarseness. J Acoust Soc Am 71(6):1544–1549CrossRefGoogle Scholar
  61. 61.
    Zetterholm E (2003) Voice imitation. A phonetic study of perceptual illusions and acoustic success. PhD dissertationGoogle Scholar
  62. 62.
    Zetterholm E (2007) Detection of speaker characteristics using voice imitation. In: Müller C (ed), Speaker classification II. Lecture Notes in Computer Science, vol 4441. Springer, Berlin, pp 192–205CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Gendarmerie Operational UnitGendarmerie NationaleSaint Jean d’AngelyFrance
  2. 2.CNRS-LTCITelecom ParisTechParisFrance

Personalised recommendations