A Robust Pitch Extractor Based on DTW Lines and CASA with Application in Noisy Speech Recognition

Morales-Cordovilla, Juan A.; Cabañas-Molero, Pablo; Peinado, Antonio M.; Sánchez, Victoria

doi:10.1007/978-3-642-35292-8_21

A Robust Pitch Extractor Based on DTW Lines and CASA with Application in Noisy Speech Recognition

Juan A. Morales-Cordovilla⁷,
Pablo Cabañas-Molero⁸,
Antonio M. Peinado⁷ &
…
Victoria Sánchez⁷

Conference paper

735 Accesses
1 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 328))

Abstract

This paper proposes a robust pitch extractor with application in Automatic Speech Recognition and based on selecting pitch lines of a tonegram (a representation of the different pitch energies at each frame time). First, the tonegram and its maximum energy regions are extracted and a Dynamic Time Warping algorithm finds the most energetic trajectories or pitch lines from these regions. A second stage estimates the tonegram of the most energetic lines by applying Computational Auditory Scene Analysis rules which reject and group octave-related lines. The mean pitch of the speaker is estimated and the final pitch is estimated by rejecting lines which are outside from the mean pitch. The proposed pitch extractor is evaluated in a novel way - by means of the word accuracy of a Missing Data recognizer on Aurora-2 database.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barker, J., Cooke, M., Ellis, D.: Decoding speech in the presence of other sources. Speech Communication 45, 5–25 (2005)
Article Google Scholar
De Cheveigné, A., Kawahara, H.: Yin, a fundamental frequency estimator for speech and music. Journal of the Acoustical Society of America 111(4), 1917–1930 (2002)
Article Google Scholar
Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication 34, 267–285 (2001)
Article MATH Google Scholar
Gonzalez, S., Brookes, M.: A pitch estimation filter robust to high levels of noise (pefac). In: EUSIPCO (2011)
Google Scholar
Ma, N., Green, P., Barker, J., Coy, A.: Exploiting correlogram structure for robust speech recognition with multiple speech sources. Speech Communication 49, 874–891 (2007)
Article Google Scholar
Morales-Cordovilla, J.A.: Pitch-based technique for robust speech recognition. PhD thesis, Dept. of Teoría de la Señal Telemática y Comunicaciones, Universidad de Granada, Spain (2011)
Google Scholar
Morales-Cordovilla, J.A., Ma, N., Sánchez, V., Carmona, J.L., Peinado, A.M., Barker, J.: A pitch based noise estimation technique for robust speech recognition with missing data. In: ICASSP, May 22-27, pp. 4808–4811 (2011)
Google Scholar
Morales-Cordovilla, J.A., Peinado, A.M., Sánchez, V., Gonzalez, J.A.: Feature extraction based on pitch-synchronous averaging for robust speech recognition. IEEE Trans. on Audio, Speech and Lang. Proc. 19(3), 640–651 (2011)
Article Google Scholar
Pearce, D., Hirsch, H.G.: The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: Proc. ICSLP, vol. 4, pp. 29–32 (2000)
Google Scholar
Peinado, A.M., Segura, J.C.: Speech Recognition over Digital Channels. Wiley (2006)
Google Scholar
Rabiner, L., Juang, B.-H.: Fundamentals of speech recognition. Prentice-Hall (1993)
Google Scholar
Turetsky, R.J., Ellis, D.P.: Ground-truth transcriptions of real music from force-aligned midi syntheses. In: Int. Conf. Music Inf. Retrieval (ISMIR), pp. 135–141 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Teoría de la Señal Telemática y Comunicaciones, Universidad de Granada, Spain
Juan A. Morales-Cordovilla, Antonio M. Peinado & Victoria Sánchez
Dept. of Engeniería de la Telecomunicación, Universidad de Jaén, Spain
Pablo Cabañas-Molero

Authors

Juan A. Morales-Cordovilla
View author publications
You can also search for this author in PubMed Google Scholar
Pablo Cabañas-Molero
View author publications
You can also search for this author in PubMed Google Scholar
Antonio M. Peinado
View author publications
You can also search for this author in PubMed Google Scholar
Victoria Sánchez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escuela Politecnica Superior, Universidad Autonoma de Madrid. C/ Francisco, Tomas y Valiente 11, 28049, Madrid, Spain
Doroteo Torre Toledano
Centro Politécnico Superior, Edificio Ada Byron, C/ María de Luna nº 1, 50018, Zaragoza, Spain
Alfonso Ortega Giménez
Universidade de Aveiro, Campus Universitário Aveiro, 3810-193, Aveiro, Portugal
António Teixeira
Escuela Politecnica Superior, Universidad Autonoma de Madrid, C/ Francisco, Tomas y Valiente 11, 28049, Madrid, Spain
Joaquín González Rodríguez
E.T.S.I.Telecomunicacion, Universidad Politécnica de Madrid, Ciudad Universitaria s/n, 28040, Madrid, Spain
Luis Hernández Gómez & Rubén San Segundo Hernández &
Escuela Politecnica Superior, Universidad Autonoma de Madrid, C/ Francisco, Tomas y Valiente 11, 28049, Madrid, Spain
Daniel Ramos Castro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Morales-Cordovilla, J.A., Cabañas-Molero, P., Peinado, A.M., Sánchez, V. (2012). A Robust Pitch Extractor Based on DTW Lines and CASA with Application in Noisy Speech Recognition. In: Torre Toledano, D., et al. Advances in Speech and Language Technologies for Iberian Languages. Communications in Computer and Information Science, vol 328. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35292-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-35292-8_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35291-1
Online ISBN: 978-3-642-35292-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics