Abstract
In this paper, a new method for the calculation of the observation-confidence value that is applied in the modified adaptive Gaussian mixture model framework is proposed for speaker verification. First, an adaptive version of the multiple low-rank representation method, for which a weighted decomposition that incorporates the prior information regarding the speech/non-speech content is considered, is proposed to find the enhanced speech and for the estimation of the frame signal-to-noise ratio (SNR) values. Then, a simple sigmoid function is applied to convert the frame SNR values into the observation-confidence values. To verify the accuracy of the system, we use utterances from the Korean movie You Came From The Stars. The experiment results show that our proposed approach achieves a greater accuracy compared with the other well-known baseline methods, such as the GMM-based universal background model, the GMM supervector-based support vector machine (SVM), the i-vector-based SVM, and the sparse representation, under the noisy environment.
Similar content being viewed by others
References
Bimbot, F., et al.: A tutorial on text-independent speaker verification. EURASIP J. Appl. Signal Process. 4, 430–451 (2004)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. 10, 19–41 (2000)
Trinh, T.D., Park, M.K., Kim, J.Y., Lee, K.R., Choi, S.H., Cho, K.S.: A modified adaptive GMM approach based GMM supervector and i-vector using NMF decomposition for robust speaker verification. J. KIIT 13(7), 117–125 (2015)
Ma, X., Trinh, T.D., Kim, J.Y., Kim, H.Y.: Speaker verification using a modified adaptive GMM approach based on low rank matrix recovery. In: Lecture Notes in Electrical Engineering, vol. 391, pp. 109–116 (2016)
Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)
Huang, P.S., Chen, S.D., Smaragdis, P., Johnson, H.M.: Singing-voice separation from monaural recordings using robust principal component analysis. In: Proceedings of the IEEE ICASSP, pp. 57–60 (2012)
Cohen, I., Berdugo, B.: Speech enhancement for non-stationary noise environments. Signal Process. 81, 2403–2418 (2001)
Yang, Y.H.: Low-rank representation of both singing voice and music accompaniment via learned dictionaries. In: Proceedings of the International Society for Music Information Retrieval, pp. 427–432 (2013)
Campbell, W.M., Sturim, D.E., Reynolds, D.A., Solomonoff, A.: SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. In: Proceedings of the IEEE ICASSP, vol. 1, pp. 97–100 (2006)
Dehak, N., Kenny, P., Dehak, R., Glembek, O., Dumouchel, P., Burget, L., Hubeika, V., Castaldo, F.: Support vector machines and joint factor analysis for speaker verification. In: Proceedings of the IEEE ICASSP, pp. 4237–4240 (2009)
Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Li, M., Zhang, X., Yan, Y., Narayanan, S.: Speaker verification using sparse representations on total variability i-vectors. In: Proceedings of the Interspeech, pp. 4548–4551 (2011)
Kua, J.M.K., Epps, J., Ambikairajah, E.: I-vector with sparse representation classification for speaker verification. Speech Commun. 55(5), 707–720 (2013)
Papadopoulos, H., Ellis, D.P.W.: Music-content-adaptive robust principal component analysis for a semantically consistent separation of foreground and background in music audio signals. In: Proceedings of the International Conference on Digital Audio Effects, pp. 1–8 (2014)
Kim, J.Y., et al.: Modified GMM training for inexact observation and its application to speaker identification. Speech Sci. 14, 163–175 (2007)
Liu, G., et al.: Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 171–184 (2013)
Lin, Z., Chen, M., Wu, L., Ma, Y.: The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices. UIUC Technical Report UILU-ENG-09-2215, Tech. Rep., pp. 1–20 (2009)
Sohn, J., Kim, N.S., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)
Brookes, M.: Voicebox: Speech Processing Toolbox for Matlab. Software from www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html, vol. 47 (1997). Accessed March 2011
Hermansky, H., Morgan, N.: RASTA processing of speech. IEEE Trans. Speech Audio Process. 2(4), 578–589 (1994)
Pelecanos, J., Sridharan, S.: Feature warping for robust speaker verification. In: Proceedings of the Speaker Odyssey, Crete, Greece, pp. 213–218 (2001)
Trinh, T.D., Kim, J.Y., Pham, T.B., Choi, S.H., Cho, K.S.: Robust speaker verification using low-rank matrix recovery and weighted sparse representation under total variability space. J. KIIT 14(3), 59–69 (2016)
Bui, N.N., Kim, J.Y., Trinh, T.D.: A non-linear GMM KL and GUMI kernel for SVM using GMM-UBM supervector in home acoustic event classification. IEICE Trans. Fundam. E97–A(8), 1791–1794 (2014)
You, C.H., Lee, K.A., Li, H.: An SVM kernel with GMM-supervector based on the Bhattacharyya distance for speaker recognition. IEEE Signal Process. Lett. 16(1), 49–52 (2009)
Acknowledgements
This research was supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2016-R2718-16-0011) supervised by the IITP (Institute for Information & communications Technology Promotion).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Trinh, T.D., Ma, X., Kim, J.Y. et al. Enhanced speaker verification using an adaptive multiple low-rank representation based on the modified adaptive Gaussian mixture model framework. Cluster Comput 20, 2333–2347 (2017). https://doi.org/10.1007/s10586-017-1051-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-017-1051-9