Speaker Verification Using Spectral and Durational Segmental Characteristics

  • Elena Bulgakova
  • Aleksei Sholohov
  • Natalia Tomashenko
  • Yuri Matveev
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9319)


In the present paper we report on some of the results obtained by fusion of human assisted speaker verification methods based on formant features and statistics of phone durations. Our experiments on the database of spontaneous speech demonstrate that using segmental durational characteristics leads to better performance, which shows the applicability of these features for the speaker verification task.


Spectral formant features Segmental durations Speaker verification 



This work was financially supported by the Government of the Russian Federation, Grant 074-U01.


  1. 1.
    Kunzel, H., Masthoff, H., Koster, J.: The relation between speech tempo, loudness, and fundamental frequency: an important issue in forensic speaker recognition. Sci. Justice 35(4), 291–295 (1995)CrossRefGoogle Scholar
  2. 2.
    Nolan, F.: Intonation in speaker identification: an experiment on pitch alignment features. Forensic Linguist. 9(1), 1–21 (2002)Google Scholar
  3. 3.
    Smirnova, N., et al.: Using parameters of identical pitch contour elements for speaker discrimination. In: Proceedings of the 12th International Conference on Speech and Computer, SPECOM 2007, Moscow, Russia, pp. 361–366 (2007)Google Scholar
  4. 4.
    Morrison, G.: Likelihood-ratio-based forensic speaker comparison using representations of vowel formant trajectories. J. Acoust. Soc. Am. 125, 2387–2397 (2009)CrossRefGoogle Scholar
  5. 5.
    Nolan, F., Grigoras, C.: A case for formant analysis in forensic speaker identification. J. Speech Lang. Law 12(2), 143–173 (2005)CrossRefGoogle Scholar
  6. 6.
    Rose, P., Osanai, T., Kinoshita, Y.: Strength of forensic speaker identification evidence: multispeaker formant-and cepstrum-based segmental discrimination with a Bayesian likelihood ratio as threshold. Forensic Linguist. 10(2), 179–202 (2003)Google Scholar
  7. 7.
    Becker, T., Jessen, M., Grigoras, C.: Forensic speaker verification using formant features and Gaussian mixture models. In: Proceedings of the Interspeech 2008 Incorporating SST, International Speech Communication Association, pp. 1505–1508 (2008)Google Scholar
  8. 8.
    Dellwo, V., Leemann, A., Kolly, M.-J.: Speaker idiosyncratic rhythmic features in the speech signal. In: Proceedings of Interspeech, Portland, USA, 9–13 September, pp. 1584–1587 (2012)Google Scholar
  9. 9.
    Leemann, A., Kolly, M.-J., Dellwo, V.: Speaker-individuality in suprasegmental temporal features: implications for forensic voice comparison. Forensic Sci. Int. 238, 59–67 (2014)CrossRefGoogle Scholar
  10. 10.
    Van Heerden, C., Barnard, E.: Speaker-specific variability of phoneme durations. S. Afr. Comput. J. (SACJ) 40, 44–50 (2008)Google Scholar
  11. 11.
    Schwarz, P.: Phoneme recognition based on long temporal context. Ph.D. thesis, Brno University of Technology (2009)Google Scholar
  12. 12.
    Chernykh, G., Korenevsky, M., Levin, K., Ponomareva, I., Tomashenko, N.: State level control for acoustic model training. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS, vol. 8773, pp. 435–442. Springer, Heidelberg (2014) Google Scholar
  13. 13.
    Moreno, P., Joerg C., Van Thong, J.-M., Glickman, O.: A recursive algorithm for the forced alignment of very long audio segments. In: Proceedings of ICSLP 1998, Sydney, Australia, pp. 2711–2714. IEEE Press (1998)Google Scholar
  14. 14.
    Tomashenko, N.A., Khokhlov, Y.Y.: Fast algorithm for automatic alignment of speech and imperfect text data. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 146–153. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  15. 15.
    The NIST year 2010 Speaker Recognition Evaluation plan (2010).

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Elena Bulgakova
    • 1
  • Aleksei Sholohov
    • 1
  • Natalia Tomashenko
    • 1
    • 2
  • Yuri Matveev
    • 1
    • 2
  1. 1.ITMO UniversitySt. PetersburgRussia
  2. 2.Speech Technology CenterSt. PetersburgRussia

Personalised recommendations