Skip to main content
Log in

Multisource Speech Analysis for Speaker Recognition

  • Applied Problems
  • Published:
Pattern Recognition and Image Analysis Aims and scope Submit manuscript

Abstract

On a comprehensive speech database, speaker recognition characteristics are compared under the usage of various voice-source models. Inverse problems to find a source via vowel speech segments are solved on the base of a special speech-production model and voice-source models (A-source, piecewise-linear source, nonparametric source, and source found by means of the spectral relation method). In the first stage, we find the pulses such that the relative residuals of their segmented and their theoretical analogs computed by means of the speech-production model are less than 0.25. For the selected pulses, a posteriori estimates of the error of their determining are computed and the final selection of the source pulses is performed: for the recognition procedure, we leave only pulses with a posteriori estimates of the error less than the accepted level 0.3. In the space of parameters found for each source model, a statistical model is created for each speaker and the recognition is performed. For the speaker recognition with respect to one vowel, the mean error is approximately equal to 66% for the piecewise-linear source, 61% for the spectral relation method, and 33% for the A-source.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. E. Shriberg, L. Ferrer, S. Kajarekar, A. Venkataraman, and A. Stolcke, “Modeling prosodic feature sequences for speaker recognition,” Speech Commun. 46 (3–4), 455–472 (2005).

    Article  Google Scholar 

  2. V. N. Sorokin and I. S. Makarov, “Gender recognition from vocal source,” Acoust. Phys. 54 (4), 571–578 (2008).

    Article  Google Scholar 

  3. V. N. Sorokin, A. A. Tananykin, and V. G. Trunov, “Speaker recognition using vocal source model,” Pattern Recogn. Image Anal. 24 (1), 156–173 (2014).

    Article  Google Scholar 

  4. D. Wong, J. Markel, and A. Gray, “Least squares glottal inverse filtering from the acoustic speech waveform,” IEEE Trans. Acoust. Speech Signal Process. 27 (4), 350–355 (1979).

    Article  Google Scholar 

  5. P. Milenkovic, “Glottal inverse filtering by joint estimation of an AR system with a linear input model,” IEEE Trans. Acoust. Speech Signal Process. 34 (1), 28–42 (1986).

    Article  Google Scholar 

  6. P. Alku, “Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering,” Speech Commun. 11 (2–3), 109–118 (1992).

    Article  Google Scholar 

  7. Q. Fu and P. Murphy, “Robust glottal source estimation based on joint source-filter model optimization,” IEEE Trans. Audio Speech Lang. Process. 14 (2), 492–501 (2006).

    Article  Google Scholar 

  8. H. Deng, R. K. Ward, M. P. Beddoes, and M. Hodgson, “A new method for obtaining accurate estimates of vocal-tract filters and glottal waves from vowel sounds,” IEEE Trans. Speech Audio Process. 14 (2), 445–455 (2006).

    Article  Google Scholar 

  9. A. S. Leonov and V. N. Sorokin, “Two parametric voice source models and their asymptotic analysis,” Acoust. Phys. 60 (3), 323–334 (2014).

    Article  Google Scholar 

  10. J. Walker and P. Murphy, “A review of glottal waveform analysis,” in Progress in Nonlinear Speech Processing, Ed. by Y. Stylianou, M. Faundez-Zanuy, and A. Esposito, Lecture Notes in Computer Science (Springer, Berlin, Heidelberg, 2007), Vol. 4391, pp. 1–21.

    Google Scholar 

  11. T. Drugman, B. Bozkurt, and T. Dutoit, “A comparative study of glottal source estimation techniques,” Comput. Speech Lang. 26 (1), 20–34 (2012).

    Article  Google Scholar 

  12. P. Alku, “Glottal inverse filtering analysis of human voice production–A review of estimation and parameterization methods of the glottal excitation and their applications,” Sadhana 36 (5), 623–650 (2011).

    Article  Google Scholar 

  13. M. D. Plumpe, T. F. Quatieri, and D. A. Reynolds, “Modeling of the glottal flow derivative waveform with application to speaker identification,” IEEE Trans. Speech Audio Process. 7 (5), 569–586 (1999).

    Article  Google Scholar 

  14. P. Thévenaz and H. Hügli, “Usefulness of the LPC-residue in text-independent speaker verification,” Speech Commun. 17 (1–2), 145–157 (1995).

    Article  Google Scholar 

  15. S. R. Mahadeva Prasanna, C. S. Gupta, and B. Yegnanarayana, “Extraction of speaker-specific excitation information from linear prediction residual of speech,” Speech Commun. 48 (10), 1243–1261 (2006).

    Article  Google Scholar 

  16. N. Dhananjaya and B. Yegnanarayana, “Speaker change detection in casual conversations using excitation source features,” Speech Commun. 50 (2), 153–161 (2008).

    Article  Google Scholar 

  17. V. N. Sorokin, A. S. Leonov, and V. G. Trunov, “Speaker recognition regardless of context and lan guage on a fixed set of competitors,” Pattern Recogn. Image Anal. 26 (2), 450–459 (2016).

    Article  Google Scholar 

  18. A. S. Leonov and V. N. Sorokin, “Upper bound of errors in solving the inverse problem of identifying a voice source,” Acoust. Phys. 63 (5), 570–582 (2017).

    Article  Google Scholar 

  19. A. S. Leonov, “A posteriori accuracy estimations of solutions to ill-posed inverse problems and extra-optimal regularizing algorithms for their solution,” Numer. Anal. Appl. 5 (1), 68–83 (2012).

    Article  MATH  Google Scholar 

  20. A. S. Leonov, “Extra-optimal methods for solving ill-posed problems,” J. Inverse Ill-Posed Probl. 20 (5–6), 637–665 (2012).

    MathSciNet  MATH  Google Scholar 

  21. CMU ARCTIC speech synthesis databases. http://festvox.org/cmu_arctic/

  22. A. S. Leonov and V. N. Sorokin, “Unique determination of vocal tract resonance frequencies from a speech signal,” Dokl. Math. 84 (2), 740–742 (2011).

    Article  MathSciNet  MATH  Google Scholar 

  23. A. S. Leonov and V. N. Sorokin, “On the uniqueness of determination of a vocal source from a speech signal and formant frequencies,” Dokl. Math. 85 (3), 432–435 (2012).

    Article  MathSciNet  MATH  Google Scholar 

  24. G. Fant, “The LF-model revisited. Transformations and frequency domain analysis,” STL-QPSR 36 (2–3), 119–156 (1995).

    Google Scholar 

  25. T. V. Ananthapadmanabha, “Acoustic analysis of voice source dynamics,” STL-QPSR 25 (2–3), 1–24 (1984).

    Google Scholar 

  26. I. R. Titze and F. Alipour, The Myoelastic Aerodynamic Theory of Phonation (National Center for Voice and Speech, Iowa City, IA, 2006).

    Google Scholar 

  27. O. Schleusing, T. Kinnunen, B. Story, and J.-M. Vesin, “Joint source-filter optimization for accurate vocal tract estimation using differential evolution,” IEEE Trans. Audio Speech Lang. Process. 21 (8), 1560–1572 (2013).

    Article  Google Scholar 

  28. D. G. Childers and C. Ahn, “Modeling the glottal volume-velocity waveform for three voice types,” J. Acoust. Soc. Am. 97 (1), 505–519 (1995).

    Article  Google Scholar 

  29. H. Strik and L. Boves, “On the relation between voice source parameters and prosodic features in connected speech,” Speech Commun. 11 (2–3), 167–174 (1992).

    Article  Google Scholar 

  30. V. N. Sorokin, “Segmentation of the period of the fundamental tone of a voice source,” Acoust. Phys. 62 (2), 244–254 (2016).

    Article  Google Scholar 

  31. V. K. Ivanov, V. V. Vasin, and V. P. Tanana, Theory of Linear Ill-Posed Problems and Its Applications (Nauka, Moscow, 1978; VSP, Utrecht, 2002).

    MATH  Google Scholar 

  32. J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed., in Springer Series in Operations Research and Financial Engineering (Springer-Verlag, New York, 2006).

    Google Scholar 

  33. R. H. Byrd, M. E. Hribar, and J. Nocedal, “An interior point algorithm for large-scale nonlinear programming,” SIAM J. Optim. 9 (4), 877–900 (1999).

    Article  MathSciNet  MATH  Google Scholar 

  34. A. N. Tikhonov, A. S. Leonov, and A. G. Yagola, Nonlinear Ill-posed Problems. (Chapman and Hall, London, 1998), Vols. 1–2.

    Book  MATH  Google Scholar 

  35. A. S. Leonov, Solution of Ill-Posed Inverse Problems. Theory Review, Practical Algorithms, and MATLAB Demonstrations (Librokom, Moscow, 2010) [in Russian].

    Google Scholar 

  36. V. N. Sorokin and A. S. Leonov, “Determination of a vocal source by the spectral ratio method,” Pattern Recogn. Image Anal. 27 (1), 139–151 (2017).

    Article  Google Scholar 

  37. G. A. F. Seber, Multivariate Observations (Wiley, New York, 1984).

    Book  MATH  Google Scholar 

  38. A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. Royal Statist. Soc., Ser. B 39 (1), 1–38 (1977).

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to V. N. Sorokin or A. S. Leonov.

Additional information

Viktor Nikolaevich Sorokin. Born in 1938. Graduated from the Moscow Aviation Institute in 1963. Senior Research Fellow at the Institute for Information Transmission Problems of the Russian Academy of Sciences, Doctor of Physical and Mathematical Sciences. Author of three monographs (Theory of Speech Production, 1985; Speech Synthesis, 1992; Speech Processes, 2012) and more than 150 scientific papers. Scientific interests: the theory of speech production, automatic speech and speaker recognition, and speech synthesis.

Aleksandr Sergeevich Leonov. Born in 1948. Graduated from Moscow State University in 1972. Received candidate’s degree in 1975 and doctoral degree in 1988. Professor of the National Research Nuclear University (MEPhI). Scientific interests: mathematical physics, mathematical modeling, and methods for solving inverse and ill-posed problems. Author of three books and more than 140 scientific papers.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sorokin, V.N., Leonov, A.S. Multisource Speech Analysis for Speaker Recognition. Pattern Recognit. Image Anal. 29, 181–193 (2019). https://doi.org/10.1134/S1054661818040260

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1054661818040260

Keywords

Navigation