Prosodic Features Based Text-dependent Speaker Recognition with Short Utterance

Zhang, Jianwu; He, Jianchao; Wu, Zhendong; Li, Ping

doi:10.1007/978-981-10-0356-1_57

Jianwu Zhang¹⁴,
Jianchao He¹⁴,
Zhendong Wu¹⁴ &
…
Ping Li¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 575))

Included in the following conference series:

International Symposium on Computational Intelligence and Intelligent Systems

1663 Accesses

Abstract

Over the past several years, Gaussian mixtures models have been the dominant approach for modeling in text-independent speaker recognition field. But the recognition accuracy for these models declines when utterances’ length becomes short. Presently Mel-frequency cepstral coefficients are generally used to characterize the properties of the vocal tract and widely applied in speech recognition. In addition, prosodic features, such as pitch and formant, are generally considered to describe the glottal characteristics. However, the efficiency of those approaches remain unsatisfactory. In text-dependent short utterances speaker verification systems, prosodic features can assist to improve the recognition result theoretically. In order to optimize the performance of speaker verification systems under the framework of adapted GMM-UBM, we adopt a variant speaker verification system based on prosodic features, in which a dual-judgment-mechanism is used in order to integrate vocal tract features with prosodic features. Experimental results showed that the new speech recognition system led a better consequence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Jin, L., Xiaofeng, C., Mingqiang, L., et al.: Secure deduplication with efficient and reliable convergent key management. IEEE Trans. Parallel Distrib. Syst. 25(6), 1615–1625 (2014)
Article Google Scholar
Jin, L., Yatkit, L., Xiaofeng, C., et al.: A hybrid cloud approach for secure authorized deduplication. IEEE Trans. Parallel Distrib. Syst. 26(5), 1206–1216 (2015)
Article Google Scholar
Zhendong, W., Bin, L., et al.: High dimension space projection-based biometric encryption for fingerprint with fuzzy minutia. Soft Comput. (2015, in Press). doi:10.1007/s00500-015-1778-2
Google Scholar
Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85, 1437–1462 (1997)
Article Google Scholar
Reynolds, D.A., Quatieri, T., Dunn, R.: Speaker verification using adapted gaussian mixture models. Digital Signal Process. 10, 19–41 (2000)
Article Google Scholar
Reynolds, D.A.: Channel robust speaker verification via feature mapping. In: ICASSP, pp. 53–56 (2003)
Google Scholar
Vogt, R., Sridharan, S., Michael, M.: Making confident speaker verification decisions with minimal speech. IEEE Trans. ASLP 18(6), 1182–1192 (2010)
Google Scholar
Kenny, P., Boulianne, G., Dumouchel, P.: Eigenvoice modeling with sparse training data. IEEE Trans. Speech Audio Process. 13(3), 345–354 (2005)
Article Google Scholar
Dehak, N., Dehak, R., Glass, J., Reynolds, D., Kenny, P.: Cosine similarity scoring without score normalization techniques. In: Proceedings of Odyssey 2010 - The Speaker and Language Recognition Workshop (2010)
Google Scholar
Nosratighods, M., Ambikairajah, E., Epps, J., Carey, M.J.: A segment selection technique for speaker verification. Speech Commun. 52(9), 753–761 (2010)
Article Google Scholar
Fattah, M.A.: Phoneme based speaker modeling to improve speaker recognition. Information 9(1), 135–147 (2010)
Google Scholar
Davis, S.B., Mermelstein, P.: Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. ASLP 28(4), 357–366 (1980)
Google Scholar
Chow, D., Abdulla, W.H.: Robust speaker identification based perceptual log area ratio and Gaussian mixture models. In: INTERSPEECH (2004)
Google Scholar
Matthieu, H.: Text-Dependent Speaker Recognition. Springer, Heidelberg (2008)
Google Scholar
Vogt, R.J., Lustri, C.J., Sridharan, S.: Factor analysis modelling for speaker verification with short utterances. In: Odyssey Speaker and Language Recognition Workshop. IEEE (2008)
Google Scholar
Vogt, R., Baker, B., Sridharan, S.: Factor analysis subspace estimation for speaker verification with short utterances. In: INTERSPEECH 2008, pp. 853–856 (2008)
Google Scholar
Kanagasundaram, A., Vogt, R., Dean, D., Sridharan, S., Mason, M.: I-vector based speaker recognition on short utterances. In: Annual Conference of the International Speech Communication Association (2011)
Google Scholar
Larcher, A., Bousquet, P.M., Lee, K.A., Matrouf, D., et al.: I-vectors in the context of phonetically-constrained short utterances for speaker verification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2012)
Google Scholar
Bilmes, J.A.: A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Int. Comput. Sci. Inst. 4, 126 (1998)
Google Scholar
Rabiner, L., Cheng, M., Rosenberg, A.E., McGonegal, C.: A comparative performance study of several pitch detection algorithms. IEEE Trans. Acoust. Speech Signal Process. 24(5), 399–418 (1976)
Article Google Scholar
Zhendong, W., Jie, Y., Jianwu, Z., Huaxin, H.: A hierarchical face recognition algorithm based on humanoid nonlinear least-squares computation. J. Ambient Intell. Humanized Comput. (2015, in Press). doi:10.1007/s12652-015-0321-8
Google Scholar

Download references

Author information

Authors and Affiliations

School of Communication Engineering, Hangzhou Dianzi University, Hangzhou, China
Jianwu Zhang, Jianchao He & Zhendong Wu
School of Mathematics and Computational Science, Sun Yat-sen University, Guangzhou, China
Ping Li

Authors

Jianwu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jianchao He
View author publications
You can also search for this author in PubMed Google Scholar
Zhendong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Ping Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianchao He .

Editor information

Editors and Affiliations

College of Mathematics and Informatics, The South China Agricultural University, Guangzhou, China
Kangshun Li
School of Computer Science, Guangzhou University, Guangzhou, China
Jin Li
School of Computer Science and Engineeri, The University of Aizu, Aizu-Wakamatsu, Fukushima, Japan
Yong Liu
Dept. of Informatics, University of Salerno, Fisciano, Italy
Aniello Castiglione

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J., He, J., Wu, Z., Li, P. (2016). Prosodic Features Based Text-dependent Speaker Recognition with Short Utterance. In: Li, K., Li, J., Liu, Y., Castiglione, A. (eds) Computational Intelligence and Intelligent Systems. ISICA 2015. Communications in Computer and Information Science, vol 575. Springer, Singapore. https://doi.org/10.1007/978-981-10-0356-1_57

Download citation

DOI: https://doi.org/10.1007/978-981-10-0356-1_57
Published: 19 January 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0355-4
Online ISBN: 978-981-10-0356-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics