Abstract
A novel text independent speaker identification system is proposed. In the proposed system, the 12-order perceptual linear predictive cepstrum and their delta coefficients in the span of five frames are extracted from the segmented speech based on the method of pitch synchronous analysis. The Fisher ratios of the original coefficients then be calculated, and the coefficients whose Fisher ratios are bigger are selected to form the 13-dimensional feature vectors of speaker. The Gaussian mixture model is used to model the speakers. The experimental results show that the identification accuracy of the proposed system is obviously better than that of the systems based on other conventional coefficients like the linear predictive cepstral coefficients and the Mel-frequency cepstral coefficients.
Similar content being viewed by others
References
M. Faundez-Zanuy and E. Monte-Moreno. State-of-the-art in speaker recognition. IEEE A&E Systems Magazine, 5(2005), 7–12.
J. Campbell. Speaker recognition: a tutorial. Proceedings of the IEEE, 85(1997)9, 1437–1462.
F. Bimbot, J. Bonastre, C. Fredouille, et al. A tutorial on text-independent speaker verification. EURASIP Journal on Applied Signal Processing, 4(2004), 430–451.
H. Hermansky. Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustic Society of America, 87(1994)4, 1738–1752.
T. Quatieri, R. Dunn, and D. Reynolds. On the influence of rate, pitch, and spectrum on automatic speaker recognition performance. Proceedings of International Conference on Spoken Language Processing (ICSLP’2000), Beijing, China, Oct. 16–20, 2000, vol.2, 491–494.
G. Doddington, M. Przybocki, and A. Martin, et al. The NIST speaker recognition evaluation-Overview, methodology, system, results, perspective. Speech Communication, 31(2002)2/3, 225–254.
Y. J. Kim and J. H. Chung. Pitch synchronous cepstrum for robust speaker recognition over telephone channels. IEE Electronics Letters, 40(2004)3, 207–209.
S. Chen and H. Wang. Improvement of speaker recognition by combining residual and prosodic features with acoustic features. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’2004), Montreal, Canada, May 17–21, 2004, vol.1, 93–96.
K. Rama and B. Yegnanarayana. Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters, 13(2006)1, 52–55.
L. Rabiner, M. Cheng, and A. Rosenberg, et al. A comparative study of several pitch detection algorithms. IEEE Trans. on Acoustics, Speech, and Signal Processing, ASSP-24(1976)5, 399–417.
J. Wolf. Efficent acoustic parameters for speaker recognition. The Journal of the Acoustic Society of America, 51(1971)6, 2044–2056.
N. Kanedera, T. Arai, and H. Hermansky, et al. On the importance of various modulation frequencies for speech recognition. Proceedings of 5th European Conference on Speech Communication and Technology (EUROSPEECH’97), Rhodes, Greece, Sept. 22–25, 1997, 1079–1082.
D. Reynolds and R. Rose. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. on Speech and Audio Processing, 3(1995)1, 72–83.
NoiseX92 noise database. Http://spib.rice.edu/spib/select_noise.html, Nov. 15, 2002.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Zeng, Y., Wu, Z. Combination of pitch synchronous analysis and fisher criterion for speaker identification. J. Electron.(China) 24, 828–834 (2007). https://doi.org/10.1007/s11767-007-0034-z
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/s11767-007-0034-z