Abstract
The onset + offset model of vowel inherent spectral change has been found to be effective for vowel-phoneme identification, and not to be outperformed by more sophisticated parametric-curve models. This suggests that if only simple cues such as initial and final formant values are necessary for signaling phoneme identity, then speakers may have considerable freedom in the exact path taken between the initial and final formant values. If the constraints on formant trajectories are relatively lax with respect to vowel-phoneme identity, then with respect to speaker identity there may be considerable information contained in the details of formant trajectories. Differences in physiology and idiosyncrasies in the use of motor commands may mean that different individuals produce different formant trajectories between the beginning and end of the same vowel phoneme. If within-speaker variability is substantially smaller than between-speaker variability then formant trajectories may be effective features for forensic voice comparison. This chapter reviews a number of forensic-voice-comparison studies which have used different procedures to extract information from formant trajectories. It concludes that information extracted from formant trajectories can lead to a high degree of validity in forensic voice comparison (at least under controlled conditions), and that a whole trajectory approach based on parametric curves outperforms an onset + offset model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- C llr :
-
Log-likelihood-ratio cost
- DCT:
-
Discrete cosine transform
- DNA:
-
Deoxyribonucleic acid
- DTW:
-
Dynamic time warping
- F1:
-
First formant
- F2:
-
Second formant
- F3:
-
Third formant
- LPC:
-
Linear predictive coding
- LR:
-
Likelihood ratio
- MFCC:
-
Mel-frequency cepstral coefficient
- MVKD:
-
Multivariate kernel density
- VISC:
-
Vowel inherent spectral change
References
Aitken, C.G.G., Lucy, D.: Evaluation of trace evidence in the form of multivariate data. Appl. Stat 54, 109–122 (2004). doi:10.1111/j.1467-9876.2004.02031.x
Aitken, C.G.G., Roberts, P., Jackson, G.: Fundamentals of probability and statistical evidence in criminal. In: Proceedings guidance for judges, lawyers, forensic scientists and expert witnesses, Royal Statistical Society, London (2010)
Aitken, C.G.G., Taroni, F.: Statistics and the evaluation of evidence for forensic scientists. Wiley, Chichester (2004)
Balding, D.J.: Weight of evidence for forensic DNA profiles. Wiley, Chichester (2005)
Berger, C.E.H., Buckleton, J., Champod, C., Evett, I.W., Jackson, G.: Evidence evaluation: A response to the court of appeal judgment in R v T. Sci. Justice 51, 43–49 (2011). doi:10.1016/j.scijus.2011.03.005
Brümmer, N., Burget, L., Cernocký, J.H., Glembek, O., Grézl, F., Karafiát, M., van Leeuwen, D.A., Matejka, P., Schwarz, P., Strasheim, A.: Fusion of heterogenous speaker recognition systems in the STBU submission for the NIST SRE 2006. IEEE. Trans. Audio. Speech. Lang. Process 15, 2072–2084 (2007). doi:10.1109/TASL.2007.902870
Brümmer, N., du Preez, J.: Application independent evaluation of speaker detection. Comput. Speech. Lang 20, 230–275 (2006). doi:10.1016/j.csl.2005.08.001
Buckleton, J.: A framework for interpreting evidence. In: Buckleton, J.,Triggs, C.M.,Walsh S.J. (eds.) Forensic DNA Evidence Interpretation. Boca Raton, FL: CRC, pp. 27–63 (2005)
Champod, C., Meuwly, D.: The inference of identity in forensic speaker recognition. Speech. Commun 31, 193–203 (2000). doi:10.1016/S0167-6393(99)00078-3
Enzinger, E.: Characterising formant tracks in Viennese diphthongs for forensic speaker comparison. In: Proceedings of the 39th Audio Engineering Society Conference—Audio Forensics: Practices and Challenges, Hillerød, Denmark. Audio Engineering Society, New York, pp. 47–52 (2010)
Eriksson, E.J., Cepeda, L.F., Rodman, R.D., McAllister, D.F., Bitzer, D., Arroway, P.: Cross-language speaker identification using spectral moments. In: Branderud, P., Traunmüller, H. (eds.) Proceedings of FONETIK 2004: The XVIIth Swedish Phonetics Conference. Stokholm, Sweden: Department of Linguistics, Stockholm University, pp. 76–79 (2004a)
Eriksson, E.J., Cepeda, L.F., Rodman, R.D., Sullivan, K.P.H., McAllister, D.F., Bitzer, D., Arroway, P.: Robustness of spectral moments: A study using voice imitations. In: Cassidy, S., Cox, F., Mannell, R., Palethorpe. (eds.) Proceedings of the 10th Australian International Conference on Speech Sciences & Technology. Australian Speech Science & Technology Association, Canberra, pp. 259–264 (2004b)
Evett, I.W.: Towards a uniform framework for reporting opinions in forensic science case-work. Sci. Justice 38, 198–202 (1998). doi:10.1016/S1355-0306(98)72105-7
Evett, I.W.: Evaluation and professionalism. Sci. Justice 49, 159–160 (2009). doi:10.1016/j.scijus.2009.07.001
Evett, I.W., Buckleton, J.S.: Statistical analysis of STR data. In: Carraredo, A., Brinkmann, B., Bär, W. (eds.) Advances in Forensic Haemogenetics, vol. 6, pp. 79–86. Springer, Heidelberg (1996)
Evett, I.W., and other signatories Expressing evaluative opinions: A position statement. science & justice. 51, 1–2 (2011). doi:10.1016/j.scijus.2011.01.002
Foreman, L.A., Champod, C., Evett, I.W., Lambert, J.A., Pope, S.: Interpreting DNA evidence: A review. Int. Stat. J 71, 473–495 (2003). doi:10.1111/j.1751-5823.2003.tb00207.x
Goldstein, U.G.: Speaker-identifying features based on formant tracks. J. Acoust. Soc. Am 59, 176–182 (1976). doi:10.1121/1.380837
González-RodrÃguez, J., Drygajlo, A., Ramos-Castro, D., GarcÃa-Gomar, M., Ortega-GarcÃa, J.: Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition. Comput. Speech. Lang 20, 331–355 (2006). doi:10.1016/j.csl.2005.08.005
González-RodrÃguez, J.,Ramos, D.: Forensic automatic speaker classification in the coming paradigm shift. In: Müller, C. (ed.) Speaker Classification I: Selected Projects. Springer-Verlag, Berlin, pp. 205–217 (2007)
González-RodrÃguez, J., Rose, P., Ramos, D., Toledano, D.T., Ortega-GarcÃa, J.: Emulating DNA: Rigorous quantification of evidential weight in transparent and testable forensic speaker recognition. IEEE. Trans. Audio. Speech. Lang. Process 15, 2104–2115 (2007). doi:10.1109/TASL.2007.902747
Gottfried, M., Miller, J.D., Meyer, D.J.: Three approaches to the classification of american english diphthongs. J. Phonetics 21, 205–229 (1993)
Greisbach, R., Esser, O., Weinstock, C.: Speaker identification by formant contours. In: Braun, A., Köster, J.-P. (eds.) Studies in Forensic Phonetics, pp. 49–55. Wissenschaftlicher, Trier, Germany (1995)
Guillemin, B.J., Watson, C.: Impact of the GSM mobile phone network on the speech signal: Some preliminary findings. Int. J. Speech, Lang. Law 15, 193–218 (2008). doi:10.1558/ijsll.v15i2.193
Harrington, J.: An acoustic analysis of happy-tensing in the Queen’s Christmas broadcasts. J. Phonetics 34, 439–457 (2006). doi:10.1016/j.wocn.2005.08.001
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009)
Hillenbrand, J.M., Clark, M.J., Nearey, T.M.: Effect of consonant environment on vowel formant patterns. J. Acoust. Soc. Am. 109, 748–763 (2001). doi:10.1121/1.1337959
Ingram, J.C.L., Prandolini, R., Ong, S.: Formant trajectories as indices of speaker identification. Forensic. Linguist. Int. J. Speech. Lang. Law 3, 129–145 (1996)
Jessen, M.: Forensic phonetics language and linguistics. Compass 2, 671–711 (2008). doi:10.1111/j.1749-818x.2008.00066.x
Kasuya, H., Tan, X., Yang, C.-S.: Voice source and vocal tract characteristics associated with speaker individuality. In: Proceedings of the 3rd International Conference on Spoken-Language Processing, Yokohama, pp. 1459–1462 (1994)
Kinoshita, Y., Osanai, T.: Within speaker variation in diphthongal dynamics: What can we compare?. In: Warren, P., Watson, C.I. (eds.) Proceedings of the 11th Australasian International Conference on Speech Science & Technology, Auckland, New Zealand. Australia: Australasian Speech Science & Technology Association, Canberra, pp. 112–117 (2006)
Kuhn, T.S.: The Structure of Scientific Revolutions. University of Chicago Press, Chicago (1962)
Lehiste, I., Peterson, G.E.: Transitions, glides, and diphthongs. J. Acoust. Soc. Am 33, 268–277 (1961). doi:10.1121/1.1908681
Lucy, D.: Introduction to Statistics for Forensic Scientists. Wiley, Chichester (2005)
McDougall, K.: Speaker-specific formant dynamics: an experiment on Australian English /aɪ/. Int. J. Speech. Lang. Law 11, 103–130 (2004)
McDougall, K.: Dynamic features of speech and the characterization of speakers. Int. J. Speech. Lang. Law 13, 89–126 (2006)
McDougall, K., Nolan F.: Discrimination of speakers using the formant dynamics of /u/ in British English. In: Trouvain, J., Barry, W.J. (eds.) Proceedings of the 16th International Congress on Phonetic Sciences, Saarbrücken. Saarbrücken, Germany, pp. 1825–1828 (2007)
Meuwly, D.: Reconnaissance de locuteurs en sciences forensiques: l’apport d’une approche automatique. Dissertation, University of Lausanne, Switzerland (2001)
Morrison, G.S.: Forensic voice comparison using likelihood ratios based on polynomial curves fitted to the formant trajectories of Australian English /aɪ/. Int. J. Speech. Lang. Law 15, 247–264 (2008). doi:10.1558/ijsll.v15i2.249
Morrison, G.S.: Comments on Coulthard & Johnson’s (2007) portrayal of the likelihood-ratio framework. Aust. J. Forensic. Sci 41, 155–161 (2009a). doi:10.1080/00450610903147701
Morrison, G.S.: Forensic voice comparison and the paradigm shift. Sci. Justice 49, 298–308 (2009b). doi:10.1016/j.scijus.2009.09.002
Morrison, G.S.: Likelihood-ratio forensic voice comparison using parametric representations of the formant trajectories of diphthongs. J. Acoust. Soc. Am. 125, 2387–2397 (2009c). doi:10.1121/1.3081384
Morrison, G.S.: Forensic voice comparison. In: Freckelton, I., Selby, H. (eds.) Expert Evidence (Ch. 99). Sydney, Australia: Thomson Reuters (2010)
Morrison, G.S.: A comparison of procedures for the calculation of forensic likelihood ratios from acoustic-phonetic data: Multvariate kernel density (MVKD) versus gaussian mixture model—universal background model (GMM-UBM). Speech. Commun 53, 242–256 (2011a). doi:10.1016/j.specom.2010.09.005
Morrison, G.S.: Measuring the validity and reliability of forensic likelihood-ratio systems. Sci. Justice 51, 91–98 (2011b). doi:10.1016/j.scijus.2011.03.002
Morrison, G.S.: Static and dynamic approaches to understanding vowel perception. In: Morrison, G.S., Assmann, P.F. (eds.) Theories of vowel inherent spectral change (ch. 3). Springer Verlag, Heidelberg (2013a)
Morrison, G.S.: The likelihood-ratio framework and forensic evidence in court: A response to R v T. Int. J. Evid. Proof 16, 1–29 (2012b). http://vathek.org/doi/abs/10.1350/ijep.2012.16.1.390
Morrison, G.S.: Tutorial on logistic regression calibration and fusion: Converting a score to a likelihood ratio. Aus. J. Forensic Sci. online 31 Oct 2012 (2012c). doi:10.1080/00450618.2012.733025
Morrison, G.S., Kinoshita, Y.: Automatic-type calibration of traditionally derived likelihood ratios: Forensic analysis of Australian English /o/ formant trajectories. In: Proceedings of Interspeech Incorporating SST, International Speech Communication Association, pp. 1501–1504 (2008)
Morrison, G.S., Ochoa, F., Thiruvaran, T.: Database selection for forensic voice comparison. In: Proceedings of Odyssey 2012: The Language and Speaker Recognition Workshop, Singapore International Speech Communication Association, pp. 62–77 (2012)
Morrison, G.S., Thiruvaran, T., Epps, J.: Estimating the precision of the likelihood-ratio output of a forensic-voice-comparison system. In: Proceedings of Odyssey 2010: The Language and Speaker Recognition Workshop, Brno, Czech Republic. International Speech Communication Association (2010)
Nearey, T.M., Assmann, P.F.: Modeling the role of vowel inherent spectral change in vowel identification. J. Acoust. Soc. Am 80, 1297–1308 (1986). doi:10.1121/1.394433
Nolan, F.: Speaker recognition and forensic phonetics. In: Hardcastle, W.J., Laver, J. (eds.) The Handbook of Phonetic Sciences, pp. 744–767. Blackwell, Oxford (1997)
Pigeon, S., Druyts, P., Verlinde, P.: Applying logistic regression to the fusion of the NIST’99 1-speaker submissions. Digital Signal Process 10, 237–248 (2000). doi:10.1006/dspr.1999.0358
Ramos Castro, D.: Forensic evaluation of the evidence using automatic speaker recognition systems. Dissertation, Universidad Autónoma de Madrid, Madrid, Spain (2007)
Robertson, B., Vignaux, G.A.: Interpreting Evidence. Wiley, Chichester (1995)
Rodman, R., McAllister, D., Bitzer, D., Cepeda, L., Abbitt, P.: Forensic speaker identification based on spectral moments. Int. J. Speech. Lang. Law 9, 22–43 (2002)
Rose, P.: Forensic Speaker Identification. Taylor & Francis, London (2002)
Rose, P.: The technical comparison of forensic voice samples. In: Freckelton, I., Selby, H. (eds.) Expert Evidence (Ch. 99). Sydney, Australia: Thomson Lawbook (2003)
Rose, P.: Forensic speaker recognition at the beginning of the twenty-first century: An overview and a demonstration. Aust. J. Forensic. Sci 37, 49–72 (2005). doi:10.1080/00450610509410616
Rose, P.: Technical forensic speaker recognition: Evaluation, types and testing of evidence. Comput. Speech. Lang 20, 159–191 (2006). doi:10.1016/j.csl.2005.07.003
Rose, P., Kinoshita, Y., Alderman, T.: Realistic extrinsic forensic speaker discrimination with the diphthong /aɪ/. In: Warren, P., Watson, C.I. (eds.) Proceedings of the 11th Australasian International Conference on Speech Science & Technology, Auckland, New Zealand. Canberra, Australia: Australasian Speech Science & Technology Association, pp. 329–334 (2006)
Rose, P., Morrison, G.S.: A response to the UK position statement on forensic speaker comparison. Int. J. Speech. Lang. Law 16, 139–163 (2009). doi:10.1558/ijsll.v16i1.139
Saks, M.J., Koehler, J.J.: The coming paradigm shift in forensic identification science. Science 309, 892–895 (2005). doi:10.1126/science.1111565
Sambur, M.R.: Selection of acoustic features for speaker identification. IEEE. Trans. Acoust. Speech. Signal. Process 23, 176–182 (1975). doi:10.1109/TASSP.1975.1162664
Taitechawat, S., Foulkes, P.: Discrimination of speakers using tone and formant dynamics in Thai. In: Lee, W.-S., Zee, E. (eds.) Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong, China. Hong Kong: Organizers of ICPhS XVII at the Department of Chinese, Translation and Linguistics, City University of Hong Kong, pp. 1975–1981 (2011)
van Leeuwen, D.A., Brümmer, N.: An introduction to application-independent evaluation of speaker recognition systems. In: Müller, C. (ed.) Speaker Classification I: Selected Projects, pp. 330–353. Springer-Verlag, Berlin (2007)
Watson, C., Harrington, J.: Acoustic evidence of dynamic formant trajectories in Australian English vowels. J. Acoust. Soc. Am. 106, 458–468 (1999). doi:10.1121/1.427069
Zahorian, S.A., Jagharghi, A.J.: Speaker normalization of static and dynamic vowel spectral features. J. Acoust. Soc. Am 90, 67–75 (1991). doi:10.1121/1.402350
Zahorian, S.A., Jagharghi, A.J.: Spectral-shape features versus formants as acoustic correlates for vowels. J. Acoust. Soc. Am 94, 1966–1982 (1993). doi:10.1121/1.407520
Zhang, C., Morrison, G.S., Thiruvaran, T.: Forensic voice comparison using Chinese/iau/. In: Lee, W.-S., Zee, E. (eds.) Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong, China. Hong Kong: Organizers of ICPhS XVII at the Department of Chinese, Translation and Linguistics, City University of Hong Kong, pp. 2280–2283 (2011)
Zuo, D., Mok, P.P.K.: Formant dynamics of/ua/in the speech of Mandarin-Shanghainese bilingual identical twins. In: Lee, W.-S., Zee, E. (eds.) Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong, China. Hong Kong: Organizers of ICPhS XVII at the Department of Chinese, Translation and Linguistics, City University of Hong Kong, pp. 2332–2335 (2011)
Acknowledgments
Thanks to Philip Rose, Peter F. Assmann, and Stephen A. Zahorian for comments on earlier versions of this chapter. The writing of this chapter was supported by the Australian Research Council, the Australian Federal Police, New South Wales Police, Queensland Police, the National Institute of Forensic Science, the Australasian Speech Science and Technology Association, and the Guardia Civil via Linkage Project LP100200142. Unless otherwise explicitly attributed, the opinions expressed herein are those of the author and do not necessarily represent the policies or opinions of any of the above mentioned organizations or individuals.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Interpretation of Tippett Plots
Appendix: Interpretation of Tippett Plots
A graphical method for presenting the results of running a likelihood-ratio forensic-comparison system on a set of test data is a Tippett plot. Tippett plots were introduced in Meuwly (2001) (inspired by the work of C. F. Tippett and by Evett and Buckleton 1996), and are now a standard method for presenting results in likelihood-ratio forensic-voice-comparison research. Tippett plots provide more detailed information about the results than is available from a summary measure such as C llr . This appendix is an extract from Morrison (2010 Sect. 99.930) and provides a guide to the interpretation of Tippett plots.
Figures 10, 11, 12 provide a series of Tippett plots drawn on the basis of hypothetical sets of output from forensic-comparison systems. The lines rising to the right represent the results from same-speaker comparisons in the test set, the cumulative proportion of log likelihood ratios less than or equal to the value indicated on the x axis. The lines rising to the left represent the results from different-speaker comparisons in the test set, the cumulative proportion of log likelihood ratios greater than or equal to the value indicated on the x axis. (Some authors draw both same-speaker and different-speaker lines as the cumulative proportion of log likelihood ratios greater than or equal to the value indicated on the x axis.) In these hypothetical results the same-speaker and different-speaker lines are symmetrical and cross at a log likelihood ratio of zero; this need not be the case for real test results.
An ideal forensic-comparison system should produce a large positive log likelihood ratio for a same-origin comparison, and a large negative log likelihood ratio for a different-origin comparison. Large-magnitude log likelihood ratios which support the consistent-with-fact hypothesis are better than small-magnitude log likelihood ratios which support the consistent-with-fact hypothesis. Log likelihood ratios which support the contrary-to-fact hypothesis are bad, and the larger their magnitude the worse they are. Therefore, in Tippet plots the further apart the same-speaker and different-speaker lines (the further to the right the same-speaker line and the further to the left the different-speaker line) the better the results. The results presented in the Tippett plot in Fig. 11 are therefore better than those presented in the Tippett plot in Fig. 10.
Note, however, that (consistent with the C llr metric) log-likelihood-ratio results which support contrary-to-fact hypotheses are of greater concern than whether the consistent-with-fact log-likelihood-ratio results are relatively small or large—a system which minimizes support for contrary-to-fact hypotheses is preferable even if this leads to a reduction in its strength of support for consistent-with-fact hypotheses. The results presented in the Tippett plot in Fig. 12 are therefore also better than those presented in the Tippett plot in Fig. 10.
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Morrison, G.S. (2013). Vowel Inherent Spectral Change in Forensic Voice Comparison. In: Morrison, G., Assmann, P. (eds) Vowel Inherent Spectral Change. Modern Acoustics and Signal Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14209-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-14209-3_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14208-6
Online ISBN: 978-3-642-14209-3
eBook Packages: EngineeringEngineering (R0)