Skip to main content
Log in

Speaker recognition regardless of context and language on a fixed set of competitors

  • Applied Problems
  • Published:
Pattern Recognition and Image Analysis Aims and scope Submit manuscript

Abstract

The problem of speaker recognition from a given set of speakers for any language and any context is considered. A database of Russian numerals that contains speech segments from 216 men and 177 women, each of whom spoke from 400 to 800 words, is used for recognition. Speech has been recorded on different types of microphones in different rooms at the natural noise level. Recognition is based on solutions of the inverse problem of finding the voice excitation pulse shape for each pitch period by the known speech segment. The pulse shape is defined as the inverse Fourier transform of the regularized ratio of speech signal spectra at the intervals of the open and closed glottis. Recognition is carried out by ten parameters: the pitch period, the open glottis interval duration, times when the source amplitude is maximum, minimum, or zero, the amplitude ratio for the minimum and maximum source pulses, three decomposition ratios of the source function by the principal component method, and the vowel duration. In such a recognition procedure, in the case of the utterance of a word that contains one vowel, the false reject rate (FRR) for men is 1.7–5.4%, and the false acceptance rate (FAR) is 5.4–7.1%. For women FRR = 2–5.2% and FAR = 5.2–6.3%. The recognition error decreases with an increasing number of vowels in the speech signal. At 10 vowels, for men FRR = 0.05–0.2% and FAR = 0.07–0.8%, and for women FRR = 0.09–0.2% and FAR = 0.17–2.1%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. N. Dhananjaya and B. Yegnanarayana, “Speaker change detection in casual conversations using excitation source features,” Speech Commun. 50, 153–161 (2008).

    Article  Google Scholar 

  2. M. Plumpe, T. Quatieri, and D. Reynolds, “Modeling the glottal flow derivative with application to speaker identification,” IEEE Trans. Speech, Audio Process 7 5, 569–585 (1999).

    Article  Google Scholar 

  3. S. Prasanna, C. Gupta, and B. Yegnanarayana, “Extraction of speaker specific excitation information from linear prediction residual of speech,” Speech Commun. 48, 1243–1261 (2006).

    Article  Google Scholar 

  4. B. Yegnanaraynana, S. M. Prasanna, J. Zachariah, and Ch. Gupta, “Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system,” IEEE Trans. Speech, Audio Process 13, 575–582 (2005).

    Article  Google Scholar 

  5. V. N. Sorokin, A. A. Tananykin, and V. G. Trunov, “Speaker recognition using vocal source model,” Pattern Recogn. Image Anal. 24 1, 156–173 (2014).

    Article  Google Scholar 

  6. T. V. Anathapadmanabha and G. Fant, “Calculation of true glottal flow and its components,” Speech Commun. 1, 167–184 (1982).

    Article  Google Scholar 

  7. O. Schleusing, T. Kinnunen, B. Story, and J.-M. Vesin, “Joint source-filter optimization for accurate vocal tract estimation using differential evolution,” IEEE Trans. Audio, Speech, Language Processing 21 8, 1560–1572 (2013).

    Article  Google Scholar 

  8. A. S. Leonov, I. S. Makarov, and V. N. Sorokin, “Estimation stability for format frequencies,” Rechevye Tekhnol., No. 1, 3–18 (2009).

    Google Scholar 

  9. B. Yegnanarayana and P. Satyanarayana, “Enhancement of reverberant speech using LP residual signal,” IEEE Trans. Acoust., Speech, Signal Processing 8 3, 267–281 (2000).

    Article  Google Scholar 

  10. M. B. Shuwmaker, E. R. Hapner, M. Gilman, A. M. Klein, and M. M. Johns, “Analysis of voice change during cellular phone use: a blinded controlled study,” J. Voice 23 3, 308–313 (2010).

    Article  Google Scholar 

  11. G. K. Vallabha and B. Tuller, “Systematic errors in formant analysis of steady-state vowels,” Speech Commun. 38, 141–160 (2002).

    Article  MATH  Google Scholar 

  12. V. N. Sorokin and V. P. Trifonenkov, “Autocorrelation analysis of speech signals,” Acoust. Phys. 42 3, 368–374 (1996).

    Google Scholar 

  13. A. S. Leonov and V. N. Sorokin, “On the uniqueness of determination of a vocal source from a speech signal and formant frequencies,” Dokl. Math. 85 33, 432–435 (2012).

    Article  MathSciNet  MATH  Google Scholar 

  14. A. S. Leonov and V. N. Sorokin, “Unique determination of vocal tract resonance frequencies from a speech signal,” Dokl. Math. 84 2, 671–673 (2011).

    MathSciNet  MATH  Google Scholar 

  15. A. N. Tikhonov and V. Ya. Arsenin, Methods for Solving Incorrect Problems (Nauka, Moscow, 1979) [in Russian].

    MATH  Google Scholar 

  16. A. S. Leonov and V. N. Sorokin, “Accuracy in determining voice source parameters,” Acoust. Phys. 60 6, 687–693 (2014).

    Article  Google Scholar 

  17. G. Fant, “Glottal source and excitation analysis,” STL-QPSR 1, 85–107 (1979).

    Google Scholar 

  18. G. Fant, J. Liljencrants, and Q. A. Lin, “Four parameter model of glottal flow,” STL-QPSR 4, 1–13 (1985).

    Google Scholar 

  19. A. I. Tsyplikhin and V. N. Sorokin, “Speech segmentation at cardinal elements,” Inf. Protsessy 6 3, 177–207 (2006). wwwjpgru

    Google Scholar 

  20. V. N. Sorokin and D. N. Chepelev, “Primary analysis of speech signals,” Acoust. Phys. 51 4, 457–462 (2005).

    Article  Google Scholar 

  21. D. Reynolds, “Speaker identification and verification using Gaussian mixture speaker models,” Speech Commun. 17, 91–108 (1995).

    Article  Google Scholar 

  22. D. Reynolds and R. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models,” IEEE Trans. Speech Audio Process 3, 72–83 (1995).

    Article  Google Scholar 

  23. A. Kounoudes, P. Naylor, and M. Brookes, “The DYPSA algorithm for estimation of glottal closure instants in voiced speech,” in Proc. IEEEICASSP (Orlando, FL, 2002), pp. 349–352.

    Google Scholar 

  24. M. Thomas, J. Gudnason, and P. Naylor, “Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm,” IEEE Trans. Audio, Speech, Lang. Process 20 1, 82–91 (2012).

    Article  Google Scholar 

  25. V. N. Sorokin, A. A. Tananykin, and Yu. N. Romashkin, “Gender identification,” Rechevye Tekhnol., No. 4, 49–67 (2012).

    Google Scholar 

  26. I. T. Jolliffe, Principal Component Analysis (Springer, New York, 2002).

    MATH  Google Scholar 

  27. V. N. Sorokin and A. I. Tsyplikhin, “Speaker verification using the spectral and time parameters of voice signal,” J. Commun. Technol. Electron. 55 12, 1561–1574 (2010).

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. S. Leonov.

Additional information

Viktor Nikolaevich Sorokin. Born in 1938. Graduated from the Moscow Aviation Institute in 1963. Senior Research Fellow at the Institute for Information Transmission Problems of the Russian Academy of Sciences, Doctor of Physical and Mathematical Sciences. Author of three monographs and about 150 scientific papers (Theory of Speech Production, 1985; Speech Synthesis, 1992; Speech Processes, 2012). Scientific interests: the theory of speech production, automatic speech and speaker recognition, and speech synthesis.

Aleksandr Sergeevich Leonov. Born in 1948. Graduated from Moscow State University in 1972. Received candidate’s degree in 1975 and doctoral degree in 1988. Professor of the National Research Nuclear University (MEPhI). Scientific interests: mathematical physics, mathematical modeling, and methods for solving inverse and ill-posed problems. Author of three books and more than 140 scientific papers.

Vladimir Grigor’evich Trunov. Born in 1947. Graduated from the Moscow Institute of Physics and Technology in 1970. Defended his thesis in 1986. Senior Research Fellow at the Institute for Information Transmission Problems of the Russian Academy of Sciences. Author of more than 80 scientific papers. Scientific interests: pattern recognition, probability theory, mathematical statistics, and mathematical modeling of bioelectric processes.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sorokin, V.N., Leonov, A.S. & Trunov, V.G. Speaker recognition regardless of context and language on a fixed set of competitors. Pattern Recognit. Image Anal. 26, 450–459 (2016). https://doi.org/10.1134/S105466181602022X

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S105466181602022X

Keywords

Navigation