Multisource Speech Analysis for Speaker Recognition

Sorokin, V. N.; Leonov, A. S.

doi:10.1134/S1054661818040260

Multisource Speech Analysis for Speaker Recognition

Applied Problems
Published: 27 April 2019

Volume 29, pages 181–193, (2019)
Cite this article

Pattern Recognition and Image Analysis Aims and scope Submit manuscript

V. N. Sorokin¹ &
A. S. Leonov²

52 Accesses
2 Citations
Explore all metrics

Abstract

On a comprehensive speech database, speaker recognition characteristics are compared under the usage of various voice-source models. Inverse problems to find a source via vowel speech segments are solved on the base of a special speech-production model and voice-source models (A-source, piecewise-linear source, nonparametric source, and source found by means of the spectral relation method). In the first stage, we find the pulses such that the relative residuals of their segmented and their theoretical analogs computed by means of the speech-production model are less than 0.25. For the selected pulses, a posteriori estimates of the error of their determining are computed and the final selection of the source pulses is performed: for the recognition procedure, we leave only pulses with a posteriori estimates of the error less than the accepted level 0.3. In the space of parameters found for each source model, a statistical model is created for each speaker and the recognition is performed. For the speaker recognition with respect to one vowel, the mean error is approximately equal to 66% for the piecewise-linear source, 61% for the spectral relation method, and 33% for the A-source.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speaker recognition regardless of context and language on a fixed set of competitors

Article 01 April 2016

Accuracy in determining voice source parameters

Article 12 November 2014

Upper bound of errors in solving the inverse problem of identifying a voice source

Article 01 September 2017

References

E. Shriberg, L. Ferrer, S. Kajarekar, A. Venkataraman, and A. Stolcke, “Modeling prosodic feature sequences for speaker recognition,” Speech Commun. 46 (3–4), 455–472 (2005).
Article Google Scholar
V. N. Sorokin and I. S. Makarov, “Gender recognition from vocal source,” Acoust. Phys. 54 (4), 571–578 (2008).
Article Google Scholar
V. N. Sorokin, A. A. Tananykin, and V. G. Trunov, “Speaker recognition using vocal source model,” Pattern Recogn. Image Anal. 24 (1), 156–173 (2014).
Article Google Scholar
D. Wong, J. Markel, and A. Gray, “Least squares glottal inverse filtering from the acoustic speech waveform,” IEEE Trans. Acoust. Speech Signal Process. 27 (4), 350–355 (1979).
Article Google Scholar
P. Milenkovic, “Glottal inverse filtering by joint estimation of an AR system with a linear input model,” IEEE Trans. Acoust. Speech Signal Process. 34 (1), 28–42 (1986).
Article Google Scholar
P. Alku, “Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering,” Speech Commun. 11 (2–3), 109–118 (1992).
Article Google Scholar
Q. Fu and P. Murphy, “Robust glottal source estimation based on joint source-filter model optimization,” IEEE Trans. Audio Speech Lang. Process. 14 (2), 492–501 (2006).
Article Google Scholar
H. Deng, R. K. Ward, M. P. Beddoes, and M. Hodgson, “A new method for obtaining accurate estimates of vocal-tract filters and glottal waves from vowel sounds,” IEEE Trans. Speech Audio Process. 14 (2), 445–455 (2006).
Article Google Scholar
A. S. Leonov and V. N. Sorokin, “Two parametric voice source models and their asymptotic analysis,” Acoust. Phys. 60 (3), 323–334 (2014).
Article Google Scholar
J. Walker and P. Murphy, “A review of glottal waveform analysis,” in Progress in Nonlinear Speech Processing, Ed. by Y. Stylianou, M. Faundez-Zanuy, and A. Esposito, Lecture Notes in Computer Science (Springer, Berlin, Heidelberg, 2007), Vol. 4391, pp. 1–21.
Google Scholar
T. Drugman, B. Bozkurt, and T. Dutoit, “A comparative study of glottal source estimation techniques,” Comput. Speech Lang. 26 (1), 20–34 (2012).
Article Google Scholar
P. Alku, “Glottal inverse filtering analysis of human voice production–A review of estimation and parameterization methods of the glottal excitation and their applications,” Sadhana 36 (5), 623–650 (2011).
Article Google Scholar
M. D. Plumpe, T. F. Quatieri, and D. A. Reynolds, “Modeling of the glottal flow derivative waveform with application to speaker identification,” IEEE Trans. Speech Audio Process. 7 (5), 569–586 (1999).
Article Google Scholar
P. Thévenaz and H. Hügli, “Usefulness of the LPC-residue in text-independent speaker verification,” Speech Commun. 17 (1–2), 145–157 (1995).
Article Google Scholar
S. R. Mahadeva Prasanna, C. S. Gupta, and B. Yegnanarayana, “Extraction of speaker-specific excitation information from linear prediction residual of speech,” Speech Commun. 48 (10), 1243–1261 (2006).
Article Google Scholar
N. Dhananjaya and B. Yegnanarayana, “Speaker change detection in casual conversations using excitation source features,” Speech Commun. 50 (2), 153–161 (2008).
Article Google Scholar
V. N. Sorokin, A. S. Leonov, and V. G. Trunov, “Speaker recognition regardless of context and lan guage on a fixed set of competitors,” Pattern Recogn. Image Anal. 26 (2), 450–459 (2016).
Article Google Scholar
A. S. Leonov and V. N. Sorokin, “Upper bound of errors in solving the inverse problem of identifying a voice source,” Acoust. Phys. 63 (5), 570–582 (2017).
Article Google Scholar
A. S. Leonov, “A posteriori accuracy estimations of solutions to ill-posed inverse problems and extra-optimal regularizing algorithms for their solution,” Numer. Anal. Appl. 5 (1), 68–83 (2012).
Article MATH Google Scholar
A. S. Leonov, “Extra-optimal methods for solving ill-posed problems,” J. Inverse Ill-Posed Probl. 20 (5–6), 637–665 (2012).
MathSciNet MATH Google Scholar
CMU ARCTIC speech synthesis databases. http://festvox.org/cmu_arctic/
A. S. Leonov and V. N. Sorokin, “Unique determination of vocal tract resonance frequencies from a speech signal,” Dokl. Math. 84 (2), 740–742 (2011).
Article MathSciNet MATH Google Scholar
A. S. Leonov and V. N. Sorokin, “On the uniqueness of determination of a vocal source from a speech signal and formant frequencies,” Dokl. Math. 85 (3), 432–435 (2012).
Article MathSciNet MATH Google Scholar
G. Fant, “The LF-model revisited. Transformations and frequency domain analysis,” STL-QPSR 36 (2–3), 119–156 (1995).
Google Scholar
T. V. Ananthapadmanabha, “Acoustic analysis of voice source dynamics,” STL-QPSR 25 (2–3), 1–24 (1984).
Google Scholar
I. R. Titze and F. Alipour, The Myoelastic Aerodynamic Theory of Phonation (National Center for Voice and Speech, Iowa City, IA, 2006).
Google Scholar
O. Schleusing, T. Kinnunen, B. Story, and J.-M. Vesin, “Joint source-filter optimization for accurate vocal tract estimation using differential evolution,” IEEE Trans. Audio Speech Lang. Process. 21 (8), 1560–1572 (2013).
Article Google Scholar
D. G. Childers and C. Ahn, “Modeling the glottal volume-velocity waveform for three voice types,” J. Acoust. Soc. Am. 97 (1), 505–519 (1995).
Article Google Scholar
H. Strik and L. Boves, “On the relation between voice source parameters and prosodic features in connected speech,” Speech Commun. 11 (2–3), 167–174 (1992).
Article Google Scholar
V. N. Sorokin, “Segmentation of the period of the fundamental tone of a voice source,” Acoust. Phys. 62 (2), 244–254 (2016).
Article Google Scholar
V. K. Ivanov, V. V. Vasin, and V. P. Tanana, Theory of Linear Ill-Posed Problems and Its Applications (Nauka, Moscow, 1978; VSP, Utrecht, 2002).
MATH Google Scholar
J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed., in Springer Series in Operations Research and Financial Engineering (Springer-Verlag, New York, 2006).
Google Scholar
R. H. Byrd, M. E. Hribar, and J. Nocedal, “An interior point algorithm for large-scale nonlinear programming,” SIAM J. Optim. 9 (4), 877–900 (1999).
Article MathSciNet MATH Google Scholar
A. N. Tikhonov, A. S. Leonov, and A. G. Yagola, Nonlinear Ill-posed Problems. (Chapman and Hall, London, 1998), Vols. 1–2.
Book MATH Google Scholar
A. S. Leonov, Solution of Ill-Posed Inverse Problems. Theory Review, Practical Algorithms, and MATLAB Demonstrations (Librokom, Moscow, 2010) [in Russian].
Google Scholar
V. N. Sorokin and A. S. Leonov, “Determination of a vocal source by the spectral ratio method,” Pattern Recogn. Image Anal. 27 (1), 139–151 (2017).
Article Google Scholar
G. A. F. Seber, Multivariate Observations (Wiley, New York, 1984).
Book MATH Google Scholar
A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. Royal Statist. Soc., Ser. B 39 (1), 1–38 (1977).
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Information Transmission Problems, Russian Academy of Sciences, Bol’shoi Karetnyi per. 19, Moscow, 127994, Russia
V. N. Sorokin
National Research Nuclear University MEPhI, Kashirskoe sh. 31, Moscow, 115409, Russia
A. S. Leonov

Authors

V. N. Sorokin
View author publications
You can also search for this author in PubMed Google Scholar
A. S. Leonov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to V. N. Sorokin or A. S. Leonov.

Additional information

Viktor Nikolaevich Sorokin. Born in 1938. Graduated from the Moscow Aviation Institute in 1963. Senior Research Fellow at the Institute for Information Transmission Problems of the Russian Academy of Sciences, Doctor of Physical and Mathematical Sciences. Author of three monographs (Theory of Speech Production, 1985; Speech Synthesis, 1992; Speech Processes, 2012) and more than 150 scientific papers. Scientific interests: the theory of speech production, automatic speech and speaker recognition, and speech synthesis.

Aleksandr Sergeevich Leonov. Born in 1948. Graduated from Moscow State University in 1972. Received candidate’s degree in 1975 and doctoral degree in 1988. Professor of the National Research Nuclear University (MEPhI). Scientific interests: mathematical physics, mathematical modeling, and methods for solving inverse and ill-posed problems. Author of three books and more than 140 scientific papers.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sorokin, V.N., Leonov, A.S. Multisource Speech Analysis for Speaker Recognition. Pattern Recognit. Image Anal. 29, 181–193 (2019). https://doi.org/10.1134/S1054661818040260

Download citation

Received: 16 May 2018
Published: 27 April 2019
Issue Date: January 2019
DOI: https://doi.org/10.1134/S1054661818040260

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multisource Speech Analysis for Speaker Recognition

Abstract

Access this article

Similar content being viewed by others

Speaker recognition regardless of context and language on a fixed set of competitors

Accuracy in determining voice source parameters

Upper bound of errors in solving the inverse problem of identifying a voice source

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multisource Speech Analysis for Speaker Recognition

Abstract

Access this article

Similar content being viewed by others

Speaker recognition regardless of context and language on a fixed set of competitors

Accuracy in determining voice source parameters

Upper bound of errors in solving the inverse problem of identifying a voice source

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation