Abstract
This paper proposes a robust pitch extractor with application in Automatic Speech Recognition and based on selecting pitch lines of a tonegram (a representation of the different pitch energies at each frame time). First, the tonegram and its maximum energy regions are extracted and a Dynamic Time Warping algorithm finds the most energetic trajectories or pitch lines from these regions. A second stage estimates the tonegram of the most energetic lines by applying Computational Auditory Scene Analysis rules which reject and group octave-related lines. The mean pitch of the speaker is estimated and the final pitch is estimated by rejecting lines which are outside from the mean pitch. The proposed pitch extractor is evaluated in a novel way - by means of the word accuracy of a Missing Data recognizer on Aurora-2 database.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Barker, J., Cooke, M., Ellis, D.: Decoding speech in the presence of other sources. Speech Communication 45, 5–25 (2005)
De Cheveigné, A., Kawahara, H.: Yin, a fundamental frequency estimator for speech and music. Journal of the Acoustical Society of America 111(4), 1917–1930 (2002)
Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication 34, 267–285 (2001)
Gonzalez, S., Brookes, M.: A pitch estimation filter robust to high levels of noise (pefac). In: EUSIPCO (2011)
Ma, N., Green, P., Barker, J., Coy, A.: Exploiting correlogram structure for robust speech recognition with multiple speech sources. Speech Communication 49, 874–891 (2007)
Morales-Cordovilla, J.A.: Pitch-based technique for robust speech recognition. PhD thesis, Dept. of Teoría de la Señal Telemática y Comunicaciones, Universidad de Granada, Spain (2011)
Morales-Cordovilla, J.A., Ma, N., Sánchez, V., Carmona, J.L., Peinado, A.M., Barker, J.: A pitch based noise estimation technique for robust speech recognition with missing data. In: ICASSP, May 22-27, pp. 4808–4811 (2011)
Morales-Cordovilla, J.A., Peinado, A.M., Sánchez, V., Gonzalez, J.A.: Feature extraction based on pitch-synchronous averaging for robust speech recognition. IEEE Trans. on Audio, Speech and Lang. Proc. 19(3), 640–651 (2011)
Pearce, D., Hirsch, H.G.: The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: Proc. ICSLP, vol. 4, pp. 29–32 (2000)
Peinado, A.M., Segura, J.C.: Speech Recognition over Digital Channels. Wiley (2006)
Rabiner, L., Juang, B.-H.: Fundamentals of speech recognition. Prentice-Hall (1993)
Turetsky, R.J., Ellis, D.P.: Ground-truth transcriptions of real music from force-aligned midi syntheses. In: Int. Conf. Music Inf. Retrieval (ISMIR), pp. 135–141 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Morales-Cordovilla, J.A., Cabañas-Molero, P., Peinado, A.M., Sánchez, V. (2012). A Robust Pitch Extractor Based on DTW Lines and CASA with Application in Noisy Speech Recognition. In: Torre Toledano, D., et al. Advances in Speech and Language Technologies for Iberian Languages. Communications in Computer and Information Science, vol 328. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35292-8_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-35292-8_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35291-1
Online ISBN: 978-3-642-35292-8
eBook Packages: Computer ScienceComputer Science (R0)