Vowel Inherent Spectral Change in Forensic Voice Comparison

Morrison, Geoffrey Stewart

doi:10.1007/978-3-642-14209-3_11

Geoffrey Stewart Morrison³

Part of the book series: Modern Acoustics and Signal Processing ((MASP))

1438 Accesses
6 Citations

Abstract

The onset + offset model of vowel inherent spectral change has been found to be effective for vowel-phoneme identification, and not to be outperformed by more sophisticated parametric-curve models. This suggests that if only simple cues such as initial and final formant values are necessary for signaling phoneme identity, then speakers may have considerable freedom in the exact path taken between the initial and final formant values. If the constraints on formant trajectories are relatively lax with respect to vowel-phoneme identity, then with respect to speaker identity there may be considerable information contained in the details of formant trajectories. Differences in physiology and idiosyncrasies in the use of motor commands may mean that different individuals produce different formant trajectories between the beginning and end of the same vowel phoneme. If within-speaker variability is substantially smaller than between-speaker variability then formant trajectories may be effective features for forensic voice comparison. This chapter reviews a number of forensic-voice-comparison studies which have used different procedures to extract information from formant trajectories. It concludes that information extracted from formant trajectories can lead to a high degree of validity in forensic voice comparison (at least under controlled conditions), and that a whole trajectory approach based on parametric curves outperforms an onset + offset model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

C _llr :: Log-likelihood-ratio cost
DCT:: Discrete cosine transform
DNA:: Deoxyribonucleic acid
DTW:: Dynamic time warping
F1:: First formant
F2:: Second formant
F3:: Third formant
LPC:: Linear predictive coding
LR:: Likelihood ratio
MFCC:: Mel-frequency cepstral coefficient
MVKD:: Multivariate kernel density
VISC:: Vowel inherent spectral change

References

Aitken, C.G.G., Lucy, D.: Evaluation of trace evidence in the form of multivariate data. Appl. Stat 54, 109–122 (2004). doi:10.1111/j.1467-9876.2004.02031.x
MathSciNet Google Scholar
Aitken, C.G.G., Roberts, P., Jackson, G.: Fundamentals of probability and statistical evidence in criminal. In: Proceedings guidance for judges, lawyers, forensic scientists and expert witnesses, Royal Statistical Society, London (2010)
Google Scholar
Aitken, C.G.G., Taroni, F.: Statistics and the evaluation of evidence for forensic scientists. Wiley, Chichester (2004)
Book MATH Google Scholar
Balding, D.J.: Weight of evidence for forensic DNA profiles. Wiley, Chichester (2005)
Book MATH Google Scholar
Berger, C.E.H., Buckleton, J., Champod, C., Evett, I.W., Jackson, G.: Evidence evaluation: A response to the court of appeal judgment in R v T. Sci. Justice 51, 43–49 (2011). doi:10.1016/j.scijus.2011.03.005
Article Google Scholar
Brümmer, N., Burget, L., Cernocký, J.H., Glembek, O., Grézl, F., Karafiát, M., van Leeuwen, D.A., Matejka, P., Schwarz, P., Strasheim, A.: Fusion of heterogenous speaker recognition systems in the STBU submission for the NIST SRE 2006. IEEE. Trans. Audio. Speech. Lang. Process 15, 2072–2084 (2007). doi:10.1109/TASL.2007.902870
Article Google Scholar
Brümmer, N., du Preez, J.: Application independent evaluation of speaker detection. Comput. Speech. Lang 20, 230–275 (2006). doi:10.1016/j.csl.2005.08.001
Article Google Scholar
Buckleton, J.: A framework for interpreting evidence. In: Buckleton, J.,Triggs, C.M.,Walsh S.J. (eds.) Forensic DNA Evidence Interpretation. Boca Raton, FL: CRC, pp. 27–63 (2005)
Google Scholar
Champod, C., Meuwly, D.: The inference of identity in forensic speaker recognition. Speech. Commun 31, 193–203 (2000). doi:10.1016/S0167-6393(99)00078-3
Article Google Scholar
Enzinger, E.: Characterising formant tracks in Viennese diphthongs for forensic speaker comparison. In: Proceedings of the 39th Audio Engineering Society Conference—Audio Forensics: Practices and Challenges, Hillerød, Denmark. Audio Engineering Society, New York, pp. 47–52 (2010)
Google Scholar
Eriksson, E.J., Cepeda, L.F., Rodman, R.D., McAllister, D.F., Bitzer, D., Arroway, P.: Cross-language speaker identification using spectral moments. In: Branderud, P., Traunmüller, H. (eds.) Proceedings of FONETIK 2004: The XVIIth Swedish Phonetics Conference. Stokholm, Sweden: Department of Linguistics, Stockholm University, pp. 76–79 (2004a)
Google Scholar
Eriksson, E.J., Cepeda, L.F., Rodman, R.D., Sullivan, K.P.H., McAllister, D.F., Bitzer, D., Arroway, P.: Robustness of spectral moments: A study using voice imitations. In: Cassidy, S., Cox, F., Mannell, R., Palethorpe. (eds.) Proceedings of the 10th Australian International Conference on Speech Sciences & Technology. Australian Speech Science & Technology Association, Canberra, pp. 259–264 (2004b)
Google Scholar
Evett, I.W.: Towards a uniform framework for reporting opinions in forensic science case-work. Sci. Justice 38, 198–202 (1998). doi:10.1016/S1355-0306(98)72105-7
Article Google Scholar
Evett, I.W.: Evaluation and professionalism. Sci. Justice 49, 159–160 (2009). doi:10.1016/j.scijus.2009.07.001
Article Google Scholar
Evett, I.W., Buckleton, J.S.: Statistical analysis of STR data. In: Carraredo, A., Brinkmann, B., Bär, W. (eds.) Advances in Forensic Haemogenetics, vol. 6, pp. 79–86. Springer, Heidelberg (1996)
Google Scholar
Evett, I.W., and other signatories Expressing evaluative opinions: A position statement. science & justice. 51, 1–2 (2011). doi:10.1016/j.scijus.2011.01.002
Foreman, L.A., Champod, C., Evett, I.W., Lambert, J.A., Pope, S.: Interpreting DNA evidence: A review. Int. Stat. J 71, 473–495 (2003). doi:10.1111/j.1751-5823.2003.tb00207.x
Article MATH Google Scholar
Goldstein, U.G.: Speaker-identifying features based on formant tracks. J. Acoust. Soc. Am 59, 176–182 (1976). doi:10.1121/1.380837
Article Google Scholar
González-Rodríguez, J., Drygajlo, A., Ramos-Castro, D., García-Gomar, M., Ortega-García, J.: Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition. Comput. Speech. Lang 20, 331–355 (2006). doi:10.1016/j.csl.2005.08.005
Article Google Scholar
González-Rodríguez, J.,Ramos, D.: Forensic automatic speaker classification in the coming paradigm shift. In: Müller, C. (ed.) Speaker Classification I: Selected Projects. Springer-Verlag, Berlin, pp. 205–217 (2007)
Google Scholar
González-Rodríguez, J., Rose, P., Ramos, D., Toledano, D.T., Ortega-García, J.: Emulating DNA: Rigorous quantification of evidential weight in transparent and testable forensic speaker recognition. IEEE. Trans. Audio. Speech. Lang. Process 15, 2104–2115 (2007). doi:10.1109/TASL.2007.902747
Article Google Scholar
Gottfried, M., Miller, J.D., Meyer, D.J.: Three approaches to the classification of american english diphthongs. J. Phonetics 21, 205–229 (1993)
Google Scholar
Greisbach, R., Esser, O., Weinstock, C.: Speaker identification by formant contours. In: Braun, A., Köster, J.-P. (eds.) Studies in Forensic Phonetics, pp. 49–55. Wissenschaftlicher, Trier, Germany (1995)
Google Scholar
Guillemin, B.J., Watson, C.: Impact of the GSM mobile phone network on the speech signal: Some preliminary findings. Int. J. Speech, Lang. Law 15, 193–218 (2008). doi:10.1558/ijsll.v15i2.193
Google Scholar
Harrington, J.: An acoustic analysis of happy-tensing in the Queen’s Christmas broadcasts. J. Phonetics 34, 439–457 (2006). doi:10.1016/j.wocn.2005.08.001
Article Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009)
MATH Google Scholar
Hillenbrand, J.M., Clark, M.J., Nearey, T.M.: Effect of consonant environment on vowel formant patterns. J. Acoust. Soc. Am. 109, 748–763 (2001). doi:10.1121/1.1337959
Article Google Scholar
Ingram, J.C.L., Prandolini, R., Ong, S.: Formant trajectories as indices of speaker identification. Forensic. Linguist. Int. J. Speech. Lang. Law 3, 129–145 (1996)
Google Scholar
Jessen, M.: Forensic phonetics language and linguistics. Compass 2, 671–711 (2008). doi:10.1111/j.1749-818x.2008.00066.x
Google Scholar
Kasuya, H., Tan, X., Yang, C.-S.: Voice source and vocal tract characteristics associated with speaker individuality. In: Proceedings of the 3rd International Conference on Spoken-Language Processing, Yokohama, pp. 1459–1462 (1994)
Google Scholar
Kinoshita, Y., Osanai, T.: Within speaker variation in diphthongal dynamics: What can we compare?. In: Warren, P., Watson, C.I. (eds.) Proceedings of the 11th Australasian International Conference on Speech Science & Technology, Auckland, New Zealand. Australia: Australasian Speech Science & Technology Association, Canberra, pp. 112–117 (2006)
Google Scholar
Kuhn, T.S.: The Structure of Scientific Revolutions. University of Chicago Press, Chicago (1962)
Google Scholar
Lehiste, I., Peterson, G.E.: Transitions, glides, and diphthongs. J. Acoust. Soc. Am 33, 268–277 (1961). doi:10.1121/1.1908681
Article Google Scholar
Lucy, D.: Introduction to Statistics for Forensic Scientists. Wiley, Chichester (2005)
MATH Google Scholar
McDougall, K.: Speaker-specific formant dynamics: an experiment on Australian English /aɪ/. Int. J. Speech. Lang. Law 11, 103–130 (2004)
Google Scholar
McDougall, K.: Dynamic features of speech and the characterization of speakers. Int. J. Speech. Lang. Law 13, 89–126 (2006)
Article Google Scholar
McDougall, K., Nolan F.: Discrimination of speakers using the formant dynamics of /u/ in British English. In: Trouvain, J., Barry, W.J. (eds.) Proceedings of the 16th International Congress on Phonetic Sciences, Saarbrücken. Saarbrücken, Germany, pp. 1825–1828 (2007)
Google Scholar
Meuwly, D.: Reconnaissance de locuteurs en sciences forensiques: l’apport d’une approche automatique. Dissertation, University of Lausanne, Switzerland (2001)
Google Scholar
Morrison, G.S.: Forensic voice comparison using likelihood ratios based on polynomial curves fitted to the formant trajectories of Australian English /aɪ/. Int. J. Speech. Lang. Law 15, 247–264 (2008). doi:10.1558/ijsll.v15i2.249
Google Scholar
Morrison, G.S.: Comments on Coulthard & Johnson’s (2007) portrayal of the likelihood-ratio framework. Aust. J. Forensic. Sci 41, 155–161 (2009a). doi:10.1080/00450610903147701
Article Google Scholar
Morrison, G.S.: Forensic voice comparison and the paradigm shift. Sci. Justice 49, 298–308 (2009b). doi:10.1016/j.scijus.2009.09.002
Article Google Scholar
Morrison, G.S.: Likelihood-ratio forensic voice comparison using parametric representations of the formant trajectories of diphthongs. J. Acoust. Soc. Am. 125, 2387–2397 (2009c). doi:10.1121/1.3081384
Article Google Scholar
Morrison, G.S.: Forensic voice comparison. In: Freckelton, I., Selby, H. (eds.) Expert Evidence (Ch. 99). Sydney, Australia: Thomson Reuters (2010)
Google Scholar
Morrison, G.S.: A comparison of procedures for the calculation of forensic likelihood ratios from acoustic-phonetic data: Multvariate kernel density (MVKD) versus gaussian mixture model—universal background model (GMM-UBM). Speech. Commun 53, 242–256 (2011a). doi:10.1016/j.specom.2010.09.005
Article Google Scholar
Morrison, G.S.: Measuring the validity and reliability of forensic likelihood-ratio systems. Sci. Justice 51, 91–98 (2011b). doi:10.1016/j.scijus.2011.03.002
Article Google Scholar
Morrison, G.S.: Static and dynamic approaches to understanding vowel perception. In: Morrison, G.S., Assmann, P.F. (eds.) Theories of vowel inherent spectral change (ch. 3). Springer Verlag, Heidelberg (2013a)
Google Scholar
Morrison, G.S.: The likelihood-ratio framework and forensic evidence in court: A response to R v T. Int. J. Evid. Proof 16, 1–29 (2012b). http://vathek.org/doi/abs/10.1350/ijep.2012.16.1.390
Article Google Scholar
Morrison, G.S.: Tutorial on logistic regression calibration and fusion: Converting a score to a likelihood ratio. Aus. J. Forensic Sci. online 31 Oct 2012 (2012c). doi:10.1080/00450618.2012.733025
Morrison, G.S., Kinoshita, Y.: Automatic-type calibration of traditionally derived likelihood ratios: Forensic analysis of Australian English /o/ formant trajectories. In: Proceedings of Interspeech Incorporating SST, International Speech Communication Association, pp. 1501–1504 (2008)
Google Scholar
Morrison, G.S., Ochoa, F., Thiruvaran, T.: Database selection for forensic voice comparison. In: Proceedings of Odyssey 2012: The Language and Speaker Recognition Workshop, Singapore International Speech Communication Association, pp. 62–77 (2012)
Google Scholar
Morrison, G.S., Thiruvaran, T., Epps, J.: Estimating the precision of the likelihood-ratio output of a forensic-voice-comparison system. In: Proceedings of Odyssey 2010: The Language and Speaker Recognition Workshop, Brno, Czech Republic. International Speech Communication Association (2010)
Google Scholar
Nearey, T.M., Assmann, P.F.: Modeling the role of vowel inherent spectral change in vowel identification. J. Acoust. Soc. Am 80, 1297–1308 (1986). doi:10.1121/1.394433
Article Google Scholar
Nolan, F.: Speaker recognition and forensic phonetics. In: Hardcastle, W.J., Laver, J. (eds.) The Handbook of Phonetic Sciences, pp. 744–767. Blackwell, Oxford (1997)
Google Scholar
Pigeon, S., Druyts, P., Verlinde, P.: Applying logistic regression to the fusion of the NIST’99 1-speaker submissions. Digital Signal Process 10, 237–248 (2000). doi:10.1006/dspr.1999.0358
Article Google Scholar
Ramos Castro, D.: Forensic evaluation of the evidence using automatic speaker recognition systems. Dissertation, Universidad Autónoma de Madrid, Madrid, Spain (2007)
Google Scholar
Robertson, B., Vignaux, G.A.: Interpreting Evidence. Wiley, Chichester (1995)
Google Scholar
Rodman, R., McAllister, D., Bitzer, D., Cepeda, L., Abbitt, P.: Forensic speaker identification based on spectral moments. Int. J. Speech. Lang. Law 9, 22–43 (2002)
Google Scholar
Rose, P.: Forensic Speaker Identification. Taylor & Francis, London (2002)
Book Google Scholar
Rose, P.: The technical comparison of forensic voice samples. In: Freckelton, I., Selby, H. (eds.) Expert Evidence (Ch. 99). Sydney, Australia: Thomson Lawbook (2003)
Google Scholar
Rose, P.: Forensic speaker recognition at the beginning of the twenty-first century: An overview and a demonstration. Aust. J. Forensic. Sci 37, 49–72 (2005). doi:10.1080/00450610509410616
Article Google Scholar
Rose, P.: Technical forensic speaker recognition: Evaluation, types and testing of evidence. Comput. Speech. Lang 20, 159–191 (2006). doi:10.1016/j.csl.2005.07.003
Article Google Scholar
Rose, P., Kinoshita, Y., Alderman, T.: Realistic extrinsic forensic speaker discrimination with the diphthong /aɪ/. In: Warren, P., Watson, C.I. (eds.) Proceedings of the 11th Australasian International Conference on Speech Science & Technology, Auckland, New Zealand. Canberra, Australia: Australasian Speech Science & Technology Association, pp. 329–334 (2006)
Google Scholar
Rose, P., Morrison, G.S.: A response to the UK position statement on forensic speaker comparison. Int. J. Speech. Lang. Law 16, 139–163 (2009). doi:10.1558/ijsll.v16i1.139
Google Scholar
Saks, M.J., Koehler, J.J.: The coming paradigm shift in forensic identification science. Science 309, 892–895 (2005). doi:10.1126/science.1111565
Article Google Scholar
Sambur, M.R.: Selection of acoustic features for speaker identification. IEEE. Trans. Acoust. Speech. Signal. Process 23, 176–182 (1975). doi:10.1109/TASSP.1975.1162664
Article Google Scholar
Taitechawat, S., Foulkes, P.: Discrimination of speakers using tone and formant dynamics in Thai. In: Lee, W.-S., Zee, E. (eds.) Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong, China. Hong Kong: Organizers of ICPhS XVII at the Department of Chinese, Translation and Linguistics, City University of Hong Kong, pp. 1975–1981 (2011)
Google Scholar
van Leeuwen, D.A., Brümmer, N.: An introduction to application-independent evaluation of speaker recognition systems. In: Müller, C. (ed.) Speaker Classification I: Selected Projects, pp. 330–353. Springer-Verlag, Berlin (2007)
Chapter Google Scholar
Watson, C., Harrington, J.: Acoustic evidence of dynamic formant trajectories in Australian English vowels. J. Acoust. Soc. Am. 106, 458–468 (1999). doi:10.1121/1.427069
Article Google Scholar
Zahorian, S.A., Jagharghi, A.J.: Speaker normalization of static and dynamic vowel spectral features. J. Acoust. Soc. Am 90, 67–75 (1991). doi:10.1121/1.402350
Article Google Scholar
Zahorian, S.A., Jagharghi, A.J.: Spectral-shape features versus formants as acoustic correlates for vowels. J. Acoust. Soc. Am 94, 1966–1982 (1993). doi:10.1121/1.407520
Article Google Scholar
Zhang, C., Morrison, G.S., Thiruvaran, T.: Forensic voice comparison using Chinese/iau/. In: Lee, W.-S., Zee, E. (eds.) Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong, China. Hong Kong: Organizers of ICPhS XVII at the Department of Chinese, Translation and Linguistics, City University of Hong Kong, pp. 2280–2283 (2011)
Google Scholar
Zuo, D., Mok, P.P.K.: Formant dynamics of/ua/in the speech of Mandarin-Shanghainese bilingual identical twins. In: Lee, W.-S., Zee, E. (eds.) Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong, China. Hong Kong: Organizers of ICPhS XVII at the Department of Chinese, Translation and Linguistics, City University of Hong Kong, pp. 2332–2335 (2011)
Google Scholar

Download references

Acknowledgments

Thanks to Philip Rose, Peter F. Assmann, and Stephen A. Zahorian for comments on earlier versions of this chapter. The writing of this chapter was supported by the Australian Research Council, the Australian Federal Police, New South Wales Police, Queensland Police, the National Institute of Forensic Science, the Australasian Speech Science and Technology Association, and the Guardia Civil via Linkage Project LP100200142. Unless otherwise explicitly attributed, the opinions expressed herein are those of the author and do not necessarily represent the policies or opinions of any of the above mentioned organizations or individuals.

Author information

Authors and Affiliations

Forensic Voice Comparison Laboratory, School of Electrical Engineering & Telecommunications, University of New South Wales, Sydney, Australia
Geoffrey Stewart Morrison

Authors

Geoffrey Stewart Morrison
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Geoffrey Stewart Morrison .

Editor information

Editors and Affiliations

School of Electrical Engineering, and Telecommunications, University of New South Wales, Sydney, 2052, New South Wales, Australia
Geoffrey Stewart Morrison
School of Behavioral and, Brain Sciences, University of Texas at Dallas, Richardson, Richardson, TX, 75083, Texas, USA
Peter F. Assmann

Appendix: Interpretation of Tippett Plots

A graphical method for presenting the results of running a likelihood-ratio forensic-comparison system on a set of test data is a Tippett plot. Tippett plots were introduced in Meuwly (2001) (inspired by the work of C. F. Tippett and by Evett and Buckleton 1996), and are now a standard method for presenting results in likelihood-ratio forensic-voice-comparison research. Tippett plots provide more detailed information about the results than is available from a summary measure such as C _llr. This appendix is an extract from Morrison (2010 Sect. 99.930) and provides a guide to the interpretation of Tippett plots.

Figures 10, 11, 12 provide a series of Tippett plots drawn on the basis of hypothetical sets of output from forensic-comparison systems. The lines rising to the right represent the results from same-speaker comparisons in the test set, the cumulative proportion of log likelihood ratios less than or equal to the value indicated on the x axis. The lines rising to the left represent the results from different-speaker comparisons in the test set, the cumulative proportion of log likelihood ratios greater than or equal to the value indicated on the x axis. (Some authors draw both same-speaker and different-speaker lines as the cumulative proportion of log likelihood ratios greater than or equal to the value indicated on the x axis.) In these hypothetical results the same-speaker and different-speaker lines are symmetrical and cross at a log likelihood ratio of zero; this need not be the case for real test results.

An ideal forensic-comparison system should produce a large positive log likelihood ratio for a same-origin comparison, and a large negative log likelihood ratio for a different-origin comparison. Large-magnitude log likelihood ratios which support the consistent-with-fact hypothesis are better than small-magnitude log likelihood ratios which support the consistent-with-fact hypothesis. Log likelihood ratios which support the contrary-to-fact hypothesis are bad, and the larger their magnitude the worse they are. Therefore, in Tippet plots the further apart the same-speaker and different-speaker lines (the further to the right the same-speaker line and the further to the left the different-speaker line) the better the results. The results presented in the Tippett plot in Fig. 11 are therefore better than those presented in the Tippett plot in Fig. 10.

Note, however, that (consistent with the C _llr metric) log-likelihood-ratio results which support contrary-to-fact hypotheses are of greater concern than whether the consistent-with-fact log-likelihood-ratio results are relatively small or large—a system which minimizes support for contrary-to-fact hypotheses is preferable even if this leads to a reduction in its strength of support for consistent-with-fact hypotheses. The results presented in the Tippett plot in Fig. 12 are therefore also better than those presented in the Tippett plot in Fig. 10.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Morrison, G.S. (2013). Vowel Inherent Spectral Change in Forensic Voice Comparison. In: Morrison, G., Assmann, P. (eds) Vowel Inherent Spectral Change. Modern Acoustics and Signal Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14209-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-14209-3_11
Published: 14 December 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14208-6
Online ISBN: 978-3-642-14209-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Vowel Inherent Spectral Change in Forensic Voice Comparison

Abstract

Access this chapter

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Interpretation of Tippett Plots

Appendix: Interpretation of Tippett Plots

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation