Vowel Inherent Spectral Change in Forensic Voice Comparison

Part of the Modern Acoustics and Signal Processing book series (MASP)


The onset + offset model of vowel inherent spectral change has been found to be effective for vowel-phoneme identification, and not to be outperformed by more sophisticated parametric-curve models. This suggests that if only simple cues such as initial and final formant values are necessary for signaling phoneme identity, then speakers may have considerable freedom in the exact path taken between the initial and final formant values. If the constraints on formant trajectories are relatively lax with respect to vowel-phoneme identity, then with respect to speaker identity there may be considerable information contained in the details of formant trajectories. Differences in physiology and idiosyncrasies in the use of motor commands may mean that different individuals produce different formant trajectories between the beginning and end of the same vowel phoneme. If within-speaker variability is substantially smaller than between-speaker variability then formant trajectories may be effective features for forensic voice comparison. This chapter reviews a number of forensic-voice-comparison studies which have used different procedures to extract information from formant trajectories. It concludes that information extracted from formant trajectories can lead to a high degree of validity in forensic voice comparison (at least under controlled conditions), and that a whole trajectory approach based on parametric curves outperforms an onset + offset model.


Acoustic Feature Dynamic Time Warping Forensic Scientist Voice Recording Parametric Curf 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Log-likelihood-ratio cost


Discrete cosine transform


Deoxyribonucleic acid


Dynamic time warping


First formant


Second formant


Third formant


Linear predictive coding


Likelihood ratio


Mel-frequency cepstral coefficient


Multivariate kernel density


Vowel inherent spectral change



Thanks to Philip Rose, Peter F. Assmann, and Stephen A. Zahorian for comments on earlier versions of this chapter. The writing of this chapter was supported by the Australian Research Council, the Australian Federal Police, New South Wales Police, Queensland Police, the National Institute of Forensic Science, the Australasian Speech Science and Technology Association, and the Guardia Civil via Linkage Project LP100200142. Unless otherwise explicitly attributed, the opinions expressed herein are those of the author and do not necessarily represent the policies or opinions of any of the above mentioned organizations or individuals.


  1. Aitken, C.G.G., Lucy, D.: Evaluation of trace evidence in the form of multivariate data. Appl. Stat 54, 109–122 (2004). doi: 10.1111/j.1467-9876.2004.02031.x MathSciNetGoogle Scholar
  2. Aitken, C.G.G., Roberts, P., Jackson, G.: Fundamentals of probability and statistical evidence in criminal. In: Proceedings guidance for judges, lawyers, forensic scientists and expert witnesses, Royal Statistical Society, London (2010)Google Scholar
  3. Aitken, C.G.G., Taroni, F.: Statistics and the evaluation of evidence for forensic scientists. Wiley, Chichester (2004)zbMATHCrossRefGoogle Scholar
  4. Balding, D.J.: Weight of evidence for forensic DNA profiles. Wiley, Chichester (2005)zbMATHCrossRefGoogle Scholar
  5. Berger, C.E.H., Buckleton, J., Champod, C., Evett, I.W., Jackson, G.: Evidence evaluation: A response to the court of appeal judgment in R v T. Sci. Justice 51, 43–49 (2011). doi: 10.1016/j.scijus.2011.03.005 CrossRefGoogle Scholar
  6. Brümmer, N., Burget, L., Cernocký, J.H., Glembek, O., Grézl, F., Karafiát, M., van Leeuwen, D.A., Matejka, P., Schwarz, P., Strasheim, A.: Fusion of heterogenous speaker recognition systems in the STBU submission for the NIST SRE 2006. IEEE. Trans. Audio. Speech. Lang. Process 15, 2072–2084 (2007). doi: 10.1109/TASL.2007.902870 CrossRefGoogle Scholar
  7. Brümmer, N., du Preez, J.: Application independent evaluation of speaker detection. Comput. Speech. Lang 20, 230–275 (2006). doi: 10.1016/j.csl.2005.08.001 CrossRefGoogle Scholar
  8. Buckleton, J.: A framework for interpreting evidence. In: Buckleton, J.,Triggs, C.M.,Walsh S.J. (eds.) Forensic DNA Evidence Interpretation. Boca Raton, FL: CRC, pp. 27–63 (2005)Google Scholar
  9. Champod, C., Meuwly, D.: The inference of identity in forensic speaker recognition. Speech. Commun 31, 193–203 (2000). doi: 10.1016/S0167-6393(99)00078-3 CrossRefGoogle Scholar
  10. Enzinger, E.: Characterising formant tracks in Viennese diphthongs for forensic speaker comparison. In: Proceedings of the 39th Audio Engineering Society Conference—Audio Forensics: Practices and Challenges, Hillerød, Denmark. Audio Engineering Society, New York, pp. 47–52 (2010)Google Scholar
  11. Eriksson, E.J., Cepeda, L.F., Rodman, R.D., McAllister, D.F., Bitzer, D., Arroway, P.: Cross-language speaker identification using spectral moments. In: Branderud, P., Traunmüller, H. (eds.) Proceedings of FONETIK 2004: The XVIIth Swedish Phonetics Conference. Stokholm, Sweden: Department of Linguistics, Stockholm University, pp. 76–79 (2004a)Google Scholar
  12. Eriksson, E.J., Cepeda, L.F., Rodman, R.D., Sullivan, K.P.H., McAllister, D.F., Bitzer, D., Arroway, P.: Robustness of spectral moments: A study using voice imitations. In: Cassidy, S., Cox, F., Mannell, R., Palethorpe. (eds.) Proceedings of the 10th Australian International Conference on Speech Sciences & Technology. Australian Speech Science & Technology Association, Canberra, pp. 259–264 (2004b)Google Scholar
  13. Evett, I.W.: Towards a uniform framework for reporting opinions in forensic science case-work. Sci. Justice 38, 198–202 (1998). doi: 10.1016/S1355-0306(98)72105-7 CrossRefGoogle Scholar
  14. Evett, I.W.: Evaluation and professionalism. Sci. Justice 49, 159–160 (2009). doi: 10.1016/j.scijus.2009.07.001 CrossRefGoogle Scholar
  15. Evett, I.W., Buckleton, J.S.: Statistical analysis of STR data. In: Carraredo, A., Brinkmann, B., Bär, W. (eds.) Advances in Forensic Haemogenetics, vol. 6, pp. 79–86. Springer, Heidelberg (1996)Google Scholar
  16. Evett, I.W., and other signatories Expressing evaluative opinions: A position statement. science & justice. 51, 1–2 (2011). doi: 10.1016/j.scijus.2011.01.002
  17. Foreman, L.A., Champod, C., Evett, I.W., Lambert, J.A., Pope, S.: Interpreting DNA evidence: A review. Int. Stat. J 71, 473–495 (2003). doi: 10.1111/j.1751-5823.2003.tb00207.x zbMATHCrossRefGoogle Scholar
  18. Goldstein, U.G.: Speaker-identifying features based on formant tracks. J. Acoust. Soc. Am 59, 176–182 (1976). doi: 10.1121/1.380837 CrossRefGoogle Scholar
  19. González-Rodríguez, J., Drygajlo, A., Ramos-Castro, D., García-Gomar, M., Ortega-García, J.: Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition. Comput. Speech. Lang 20, 331–355 (2006). doi: 10.1016/j.csl.2005.08.005 CrossRefGoogle Scholar
  20. González-Rodríguez, J.,Ramos, D.: Forensic automatic speaker classification in the coming paradigm shift. In: Müller, C. (ed.) Speaker Classification I: Selected Projects. Springer-Verlag, Berlin, pp. 205–217 (2007)Google Scholar
  21. González-Rodríguez, J., Rose, P., Ramos, D., Toledano, D.T., Ortega-García, J.: Emulating DNA: Rigorous quantification of evidential weight in transparent and testable forensic speaker recognition. IEEE. Trans. Audio. Speech. Lang. Process 15, 2104–2115 (2007). doi: 10.1109/TASL.2007.902747 CrossRefGoogle Scholar
  22. Gottfried, M., Miller, J.D., Meyer, D.J.: Three approaches to the classification of american english diphthongs. J. Phonetics 21, 205–229 (1993)Google Scholar
  23. Greisbach, R., Esser, O., Weinstock, C.: Speaker identification by formant contours. In: Braun, A., Köster, J.-P. (eds.) Studies in Forensic Phonetics, pp. 49–55. Wissenschaftlicher, Trier, Germany (1995)Google Scholar
  24. Guillemin, B.J., Watson, C.: Impact of the GSM mobile phone network on the speech signal: Some preliminary findings. Int. J. Speech, Lang. Law 15, 193–218 (2008). doi: 10.1558/ijsll.v15i2.193 Google Scholar
  25. Harrington, J.: An acoustic analysis of happy-tensing in the Queen’s Christmas broadcasts. J. Phonetics 34, 439–457 (2006). doi: 10.1016/j.wocn.2005.08.001 CrossRefGoogle Scholar
  26. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009)zbMATHGoogle Scholar
  27. Hillenbrand, J.M., Clark, M.J., Nearey, T.M.: Effect of consonant environment on vowel formant patterns. J. Acoust. Soc. Am. 109, 748–763 (2001). doi: 10.1121/1.1337959 CrossRefGoogle Scholar
  28. Ingram, J.C.L., Prandolini, R., Ong, S.: Formant trajectories as indices of speaker identification. Forensic. Linguist. Int. J. Speech. Lang. Law 3, 129–145 (1996)Google Scholar
  29. Jessen, M.: Forensic phonetics language and linguistics. Compass 2, 671–711 (2008). doi: 10.1111/j.1749-818x.2008.00066.x Google Scholar
  30. Kasuya, H., Tan, X., Yang, C.-S.: Voice source and vocal tract characteristics associated with speaker individuality. In: Proceedings of the 3rd International Conference on Spoken-Language Processing, Yokohama, pp. 1459–1462 (1994)Google Scholar
  31. Kinoshita, Y., Osanai, T.: Within speaker variation in diphthongal dynamics: What can we compare?. In: Warren, P., Watson, C.I. (eds.) Proceedings of the 11th Australasian International Conference on Speech Science & Technology, Auckland, New Zealand. Australia: Australasian Speech Science & Technology Association, Canberra, pp. 112–117 (2006)Google Scholar
  32. Kuhn, T.S.: The Structure of Scientific Revolutions. University of Chicago Press, Chicago (1962)Google Scholar
  33. Lehiste, I., Peterson, G.E.: Transitions, glides, and diphthongs. J. Acoust. Soc. Am 33, 268–277 (1961). doi: 10.1121/1.1908681 CrossRefGoogle Scholar
  34. Lucy, D.: Introduction to Statistics for Forensic Scientists. Wiley, Chichester (2005)zbMATHGoogle Scholar
  35. McDougall, K.: Speaker-specific formant dynamics: an experiment on Australian English /aɪ/. Int. J. Speech. Lang. Law 11, 103–130 (2004)Google Scholar
  36. McDougall, K.: Dynamic features of speech and the characterization of speakers. Int. J. Speech. Lang. Law 13, 89–126 (2006)CrossRefGoogle Scholar
  37. McDougall, K., Nolan F.: Discrimination of speakers using the formant dynamics of /u/ in British English. In: Trouvain, J., Barry, W.J. (eds.) Proceedings of the 16th International Congress on Phonetic Sciences, Saarbrücken. Saarbrücken, Germany, pp. 1825–1828 (2007)Google Scholar
  38. Meuwly, D.: Reconnaissance de locuteurs en sciences forensiques: l’apport d’une approche automatique. Dissertation, University of Lausanne, Switzerland (2001)Google Scholar
  39. Morrison, G.S.: Forensic voice comparison using likelihood ratios based on polynomial curves fitted to the formant trajectories of Australian English /aɪ/. Int. J. Speech. Lang. Law 15, 247–264 (2008). doi: 10.1558/ijsll.v15i2.249 Google Scholar
  40. Morrison, G.S.: Comments on Coulthard & Johnson’s (2007) portrayal of the likelihood-ratio framework. Aust. J. Forensic. Sci 41, 155–161 (2009a). doi: 10.1080/00450610903147701 CrossRefGoogle Scholar
  41. Morrison, G.S.: Forensic voice comparison and the paradigm shift. Sci. Justice 49, 298–308 (2009b). doi: 10.1016/j.scijus.2009.09.002 CrossRefGoogle Scholar
  42. Morrison, G.S.: Likelihood-ratio forensic voice comparison using parametric representations of the formant trajectories of diphthongs. J. Acoust. Soc. Am. 125, 2387–2397 (2009c). doi: 10.1121/1.3081384 CrossRefGoogle Scholar
  43. Morrison, G.S.: Forensic voice comparison. In: Freckelton, I., Selby, H. (eds.) Expert Evidence (Ch. 99). Sydney, Australia: Thomson Reuters (2010)Google Scholar
  44. Morrison, G.S.: A comparison of procedures for the calculation of forensic likelihood ratios from acoustic-phonetic data: Multvariate kernel density (MVKD) versus gaussian mixture model—universal background model (GMM-UBM). Speech. Commun 53, 242–256 (2011a). doi: 10.1016/j.specom.2010.09.005 CrossRefGoogle Scholar
  45. Morrison, G.S.: Measuring the validity and reliability of forensic likelihood-ratio systems. Sci. Justice 51, 91–98 (2011b). doi: 10.1016/j.scijus.2011.03.002 CrossRefGoogle Scholar
  46. Morrison, G.S.: Static and dynamic approaches to understanding vowel perception. In: Morrison, G.S., Assmann, P.F. (eds.) Theories of vowel inherent spectral change (ch. 3). Springer Verlag, Heidelberg (2013a)Google Scholar
  47. Morrison, G.S.: The likelihood-ratio framework and forensic evidence in court: A response to R v T. Int. J. Evid. Proof 16, 1–29 (2012b). CrossRefGoogle Scholar
  48. Morrison, G.S.: Tutorial on logistic regression calibration and fusion: Converting a score to a likelihood ratio. Aus. J. Forensic Sci. online 31 Oct 2012 (2012c). doi: 10.1080/00450618.2012.733025
  49. Morrison, G.S., Kinoshita, Y.: Automatic-type calibration of traditionally derived likelihood ratios: Forensic analysis of Australian English /o/ formant trajectories. In: Proceedings of Interspeech Incorporating SST, International Speech Communication Association, pp. 1501–1504 (2008)Google Scholar
  50. Morrison, G.S., Ochoa, F., Thiruvaran, T.: Database selection for forensic voice comparison. In: Proceedings of Odyssey 2012: The Language and Speaker Recognition Workshop, Singapore International Speech Communication Association, pp. 62–77 (2012)Google Scholar
  51. Morrison, G.S., Thiruvaran, T., Epps, J.: Estimating the precision of the likelihood-ratio output of a forensic-voice-comparison system. In: Proceedings of Odyssey 2010: The Language and Speaker Recognition Workshop, Brno, Czech Republic. International Speech Communication Association (2010)Google Scholar
  52. Nearey, T.M., Assmann, P.F.: Modeling the role of vowel inherent spectral change in vowel identification. J. Acoust. Soc. Am 80, 1297–1308 (1986). doi: 10.1121/1.394433 CrossRefGoogle Scholar
  53. Nolan, F.: Speaker recognition and forensic phonetics. In: Hardcastle, W.J., Laver, J. (eds.) The Handbook of Phonetic Sciences, pp. 744–767. Blackwell, Oxford (1997)Google Scholar
  54. Pigeon, S., Druyts, P., Verlinde, P.: Applying logistic regression to the fusion of the NIST’99 1-speaker submissions. Digital Signal Process 10, 237–248 (2000). doi: 10.1006/dspr.1999.0358 CrossRefGoogle Scholar
  55. Ramos Castro, D.: Forensic evaluation of the evidence using automatic speaker recognition systems. Dissertation, Universidad Autónoma de Madrid, Madrid, Spain (2007)Google Scholar
  56. Robertson, B., Vignaux, G.A.: Interpreting Evidence. Wiley, Chichester (1995)Google Scholar
  57. Rodman, R., McAllister, D., Bitzer, D., Cepeda, L., Abbitt, P.: Forensic speaker identification based on spectral moments. Int. J. Speech. Lang. Law 9, 22–43 (2002)Google Scholar
  58. Rose, P.: Forensic Speaker Identification. Taylor & Francis, London (2002)CrossRefGoogle Scholar
  59. Rose, P.: The technical comparison of forensic voice samples. In: Freckelton, I., Selby, H. (eds.) Expert Evidence (Ch. 99). Sydney, Australia: Thomson Lawbook (2003)Google Scholar
  60. Rose, P.: Forensic speaker recognition at the beginning of the twenty-first century: An overview and a demonstration. Aust. J. Forensic. Sci 37, 49–72 (2005). doi: 10.1080/00450610509410616 CrossRefGoogle Scholar
  61. Rose, P.: Technical forensic speaker recognition: Evaluation, types and testing of evidence. Comput. Speech. Lang 20, 159–191 (2006). doi: 10.1016/j.csl.2005.07.003 CrossRefGoogle Scholar
  62. Rose, P., Kinoshita, Y., Alderman, T.: Realistic extrinsic forensic speaker discrimination with the diphthong /aɪ/. In: Warren, P., Watson, C.I. (eds.) Proceedings of the 11th Australasian International Conference on Speech Science & Technology, Auckland, New Zealand. Canberra, Australia: Australasian Speech Science & Technology Association, pp. 329–334 (2006)Google Scholar
  63. Rose, P., Morrison, G.S.: A response to the UK position statement on forensic speaker comparison. Int. J. Speech. Lang. Law 16, 139–163 (2009). doi: 10.1558/ijsll.v16i1.139 Google Scholar
  64. Saks, M.J., Koehler, J.J.: The coming paradigm shift in forensic identification science. Science 309, 892–895 (2005). doi: 10.1126/science.1111565 CrossRefGoogle Scholar
  65. Sambur, M.R.: Selection of acoustic features for speaker identification. IEEE. Trans. Acoust. Speech. Signal. Process 23, 176–182 (1975). doi: 10.1109/TASSP.1975.1162664 CrossRefGoogle Scholar
  66. Taitechawat, S., Foulkes, P.: Discrimination of speakers using tone and formant dynamics in Thai. In: Lee, W.-S., Zee, E. (eds.) Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong, China. Hong Kong: Organizers of ICPhS XVII at the Department of Chinese, Translation and Linguistics, City University of Hong Kong, pp. 1975–1981 (2011)Google Scholar
  67. van Leeuwen, D.A., Brümmer, N.: An introduction to application-independent evaluation of speaker recognition systems. In: Müller, C. (ed.) Speaker Classification I: Selected Projects, pp. 330–353. Springer-Verlag, Berlin (2007)CrossRefGoogle Scholar
  68. Watson, C., Harrington, J.: Acoustic evidence of dynamic formant trajectories in Australian English vowels. J. Acoust. Soc. Am. 106, 458–468 (1999). doi: 10.1121/1.427069 CrossRefGoogle Scholar
  69. Zahorian, S.A., Jagharghi, A.J.: Speaker normalization of static and dynamic vowel spectral features. J. Acoust. Soc. Am 90, 67–75 (1991). doi: 10.1121/1.402350 CrossRefGoogle Scholar
  70. Zahorian, S.A., Jagharghi, A.J.: Spectral-shape features versus formants as acoustic correlates for vowels. J. Acoust. Soc. Am 94, 1966–1982 (1993). doi: 10.1121/1.407520 CrossRefGoogle Scholar
  71. Zhang, C., Morrison, G.S., Thiruvaran, T.: Forensic voice comparison using Chinese/iau/. In: Lee, W.-S., Zee, E. (eds.) Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong, China. Hong Kong: Organizers of ICPhS XVII at the Department of Chinese, Translation and Linguistics, City University of Hong Kong, pp. 2280–2283 (2011)Google Scholar
  72. Zuo, D., Mok, P.P.K.: Formant dynamics of/ua/in the speech of Mandarin-Shanghainese bilingual identical twins. In: Lee, W.-S., Zee, E. (eds.) Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong, China. Hong Kong: Organizers of ICPhS XVII at the Department of Chinese, Translation and Linguistics, City University of Hong Kong, pp. 2332–2335 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Forensic Voice Comparison Laboratory, School of Electrical Engineering & TelecommunicationsUniversity of New South WalesSydneyAustralia

Personalised recommendations