Speaker recognition regardless of context and language on a fixed set of competitors

Sorokin, V. N.; Leonov, A. S.; Trunov, V. G.

doi:10.1134/S105466181602022X

Speaker recognition regardless of context and language on a fixed set of competitors

Applied Problems
Published: 05 June 2016

Volume 26, pages 450–459, (2016)
Cite this article

Pattern Recognition and Image Analysis Aims and scope Submit manuscript

V. N. Sorokin¹,
A. S. Leonov² &
V. G. Trunov¹

66 Accesses
2 Citations
Explore all metrics

Abstract

The problem of speaker recognition from a given set of speakers for any language and any context is considered. A database of Russian numerals that contains speech segments from 216 men and 177 women, each of whom spoke from 400 to 800 words, is used for recognition. Speech has been recorded on different types of microphones in different rooms at the natural noise level. Recognition is based on solutions of the inverse problem of finding the voice excitation pulse shape for each pitch period by the known speech segment. The pulse shape is defined as the inverse Fourier transform of the regularized ratio of speech signal spectra at the intervals of the open and closed glottis. Recognition is carried out by ten parameters: the pitch period, the open glottis interval duration, times when the source amplitude is maximum, minimum, or zero, the amplitude ratio for the minimum and maximum source pulses, three decomposition ratios of the source function by the principal component method, and the vowel duration. In such a recognition procedure, in the case of the utterance of a word that contains one vowel, the false reject rate (FRR) for men is 1.7–5.4%, and the false acceptance rate (FAR) is 5.4–7.1%. For women FRR = 2–5.2% and FAR = 5.2–6.3%. The recognition error decreases with an increasing number of vowels in the speech signal. At 10 vowels, for men FRR = 0.05–0.2% and FAR = 0.07–0.8%, and for women FRR = 0.09–0.2% and FAR = 0.17–2.1%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multisource Speech Analysis for Speaker Recognition

Article 01 January 2019

Vocal Source Contribution to Speaker Recognition

Article 01 July 2018

A Simple Method for Speaker Recognition and Speaker Verification

References

N. Dhananjaya and B. Yegnanarayana, “Speaker change detection in casual conversations using excitation source features,” Speech Commun. 50, 153–161 (2008).
Article Google Scholar
M. Plumpe, T. Quatieri, and D. Reynolds, “Modeling the glottal flow derivative with application to speaker identification,” IEEE Trans. Speech, Audio Process 7 5, 569–585 (1999).
Article Google Scholar
S. Prasanna, C. Gupta, and B. Yegnanarayana, “Extraction of speaker specific excitation information from linear prediction residual of speech,” Speech Commun. 48, 1243–1261 (2006).
Article Google Scholar
B. Yegnanaraynana, S. M. Prasanna, J. Zachariah, and Ch. Gupta, “Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system,” IEEE Trans. Speech, Audio Process 13, 575–582 (2005).
Article Google Scholar
V. N. Sorokin, A. A. Tananykin, and V. G. Trunov, “Speaker recognition using vocal source model,” Pattern Recogn. Image Anal. 24 1, 156–173 (2014).
Article Google Scholar
T. V. Anathapadmanabha and G. Fant, “Calculation of true glottal flow and its components,” Speech Commun. 1, 167–184 (1982).
Article Google Scholar
O. Schleusing, T. Kinnunen, B. Story, and J.-M. Vesin, “Joint source-filter optimization for accurate vocal tract estimation using differential evolution,” IEEE Trans. Audio, Speech, Language Processing 21 8, 1560–1572 (2013).
Article Google Scholar
A. S. Leonov, I. S. Makarov, and V. N. Sorokin, “Estimation stability for format frequencies,” Rechevye Tekhnol., No. 1, 3–18 (2009).
Google Scholar
B. Yegnanarayana and P. Satyanarayana, “Enhancement of reverberant speech using LP residual signal,” IEEE Trans. Acoust., Speech, Signal Processing 8 3, 267–281 (2000).
Article Google Scholar
M. B. Shuwmaker, E. R. Hapner, M. Gilman, A. M. Klein, and M. M. Johns, “Analysis of voice change during cellular phone use: a blinded controlled study,” J. Voice 23 3, 308–313 (2010).
Article Google Scholar
G. K. Vallabha and B. Tuller, “Systematic errors in formant analysis of steady-state vowels,” Speech Commun. 38, 141–160 (2002).
Article MATH Google Scholar
V. N. Sorokin and V. P. Trifonenkov, “Autocorrelation analysis of speech signals,” Acoust. Phys. 42 3, 368–374 (1996).
Google Scholar
A. S. Leonov and V. N. Sorokin, “On the uniqueness of determination of a vocal source from a speech signal and formant frequencies,” Dokl. Math. 85 33, 432–435 (2012).
Article MathSciNet MATH Google Scholar
A. S. Leonov and V. N. Sorokin, “Unique determination of vocal tract resonance frequencies from a speech signal,” Dokl. Math. 84 2, 671–673 (2011).
MathSciNet MATH Google Scholar
A. N. Tikhonov and V. Ya. Arsenin, Methods for Solving Incorrect Problems (Nauka, Moscow, 1979) [in Russian].
MATH Google Scholar
A. S. Leonov and V. N. Sorokin, “Accuracy in determining voice source parameters,” Acoust. Phys. 60 6, 687–693 (2014).
Article Google Scholar
G. Fant, “Glottal source and excitation analysis,” STL-QPSR 1, 85–107 (1979).
Google Scholar
G. Fant, J. Liljencrants, and Q. A. Lin, “Four parameter model of glottal flow,” STL-QPSR 4, 1–13 (1985).
Google Scholar
A. I. Tsyplikhin and V. N. Sorokin, “Speech segmentation at cardinal elements,” Inf. Protsessy 6 3, 177–207 (2006). wwwjpgru
Google Scholar
V. N. Sorokin and D. N. Chepelev, “Primary analysis of speech signals,” Acoust. Phys. 51 4, 457–462 (2005).
Article Google Scholar
D. Reynolds, “Speaker identification and verification using Gaussian mixture speaker models,” Speech Commun. 17, 91–108 (1995).
Article Google Scholar
D. Reynolds and R. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models,” IEEE Trans. Speech Audio Process 3, 72–83 (1995).
Article Google Scholar
A. Kounoudes, P. Naylor, and M. Brookes, “The DYPSA algorithm for estimation of glottal closure instants in voiced speech,” in Proc. IEEEICASSP (Orlando, FL, 2002), pp. 349–352.
Google Scholar
M. Thomas, J. Gudnason, and P. Naylor, “Estimation of glottal closing and opening instants in voiced speech using the YAGA algorithm,” IEEE Trans. Audio, Speech, Lang. Process 20 1, 82–91 (2012).
Article Google Scholar
V. N. Sorokin, A. A. Tananykin, and Yu. N. Romashkin, “Gender identification,” Rechevye Tekhnol., No. 4, 49–67 (2012).
Google Scholar
I. T. Jolliffe, Principal Component Analysis (Springer, New York, 2002).
MATH Google Scholar
V. N. Sorokin and A. I. Tsyplikhin, “Speaker verification using the spectral and time parameters of voice signal,” J. Commun. Technol. Electron. 55 12, 1561–1574 (2010).
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Information Transmission Problems, Russian Academy of Sciences, per. Bolshoi Karetnyi 19, Moscow, 127994, Russia
V. N. Sorokin & V. G. Trunov
National Research Nuclear University MEPhI, Kashirskoye sh. 31, Moscow, 115409, Russia
A. S. Leonov

Authors

V. N. Sorokin
View author publications
You can also search for this author in PubMed Google Scholar
A. S. Leonov
View author publications
You can also search for this author in PubMed Google Scholar
V. G. Trunov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. S. Leonov.

Additional information

Viktor Nikolaevich Sorokin. Born in 1938. Graduated from the Moscow Aviation Institute in 1963. Senior Research Fellow at the Institute for Information Transmission Problems of the Russian Academy of Sciences, Doctor of Physical and Mathematical Sciences. Author of three monographs and about 150 scientific papers (Theory of Speech Production, 1985; Speech Synthesis, 1992; Speech Processes, 2012). Scientific interests: the theory of speech production, automatic speech and speaker recognition, and speech synthesis.

Aleksandr Sergeevich Leonov. Born in 1948. Graduated from Moscow State University in 1972. Received candidate’s degree in 1975 and doctoral degree in 1988. Professor of the National Research Nuclear University (MEPhI). Scientific interests: mathematical physics, mathematical modeling, and methods for solving inverse and ill-posed problems. Author of three books and more than 140 scientific papers.

Vladimir Grigor’evich Trunov. Born in 1947. Graduated from the Moscow Institute of Physics and Technology in 1970. Defended his thesis in 1986. Senior Research Fellow at the Institute for Information Transmission Problems of the Russian Academy of Sciences. Author of more than 80 scientific papers. Scientific interests: pattern recognition, probability theory, mathematical statistics, and mathematical modeling of bioelectric processes.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sorokin, V.N., Leonov, A.S. & Trunov, V.G. Speaker recognition regardless of context and language on a fixed set of competitors. Pattern Recognit. Image Anal. 26, 450–459 (2016). https://doi.org/10.1134/S105466181602022X

Download citation

Received: 20 September 2014
Published: 05 June 2016
Issue Date: April 2016
DOI: https://doi.org/10.1134/S105466181602022X

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speaker recognition regardless of context and language on a fixed set of competitors

Abstract

Access this article

Similar content being viewed by others

Multisource Speech Analysis for Speaker Recognition

Vocal Source Contribution to Speaker Recognition

A Simple Method for Speaker Recognition and Speaker Verification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speaker recognition regardless of context and language on a fixed set of competitors

Abstract

Access this article

Similar content being viewed by others

Multisource Speech Analysis for Speaker Recognition

Vocal Source Contribution to Speaker Recognition

A Simple Method for Speaker Recognition and Speaker Verification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation