Skip to main content

A Robust Pitch Extractor Based on DTW Lines and CASA with Application in Noisy Speech Recognition

  • Conference paper

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 328))

Abstract

This paper proposes a robust pitch extractor with application in Automatic Speech Recognition and based on selecting pitch lines of a tonegram (a representation of the different pitch energies at each frame time). First, the tonegram and its maximum energy regions are extracted and a Dynamic Time Warping algorithm finds the most energetic trajectories or pitch lines from these regions. A second stage estimates the tonegram of the most energetic lines by applying Computational Auditory Scene Analysis rules which reject and group octave-related lines. The mean pitch of the speaker is estimated and the final pitch is estimated by rejecting lines which are outside from the mean pitch. The proposed pitch extractor is evaluated in a novel way - by means of the word accuracy of a Missing Data recognizer on Aurora-2 database.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barker, J., Cooke, M., Ellis, D.: Decoding speech in the presence of other sources. Speech Communication 45, 5–25 (2005)

    Article  Google Scholar 

  2. De Cheveigné, A., Kawahara, H.: Yin, a fundamental frequency estimator for speech and music. Journal of the Acoustical Society of America 111(4), 1917–1930 (2002)

    Article  Google Scholar 

  3. Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication 34, 267–285 (2001)

    Article  MATH  Google Scholar 

  4. Gonzalez, S., Brookes, M.: A pitch estimation filter robust to high levels of noise (pefac). In: EUSIPCO (2011)

    Google Scholar 

  5. Ma, N., Green, P., Barker, J., Coy, A.: Exploiting correlogram structure for robust speech recognition with multiple speech sources. Speech Communication 49, 874–891 (2007)

    Article  Google Scholar 

  6. Morales-Cordovilla, J.A.: Pitch-based technique for robust speech recognition. PhD thesis, Dept. of Teoría de la Señal Telemática y Comunicaciones, Universidad de Granada, Spain (2011)

    Google Scholar 

  7. Morales-Cordovilla, J.A., Ma, N., Sánchez, V., Carmona, J.L., Peinado, A.M., Barker, J.: A pitch based noise estimation technique for robust speech recognition with missing data. In: ICASSP, May 22-27, pp. 4808–4811 (2011)

    Google Scholar 

  8. Morales-Cordovilla, J.A., Peinado, A.M., Sánchez, V., Gonzalez, J.A.: Feature extraction based on pitch-synchronous averaging for robust speech recognition. IEEE Trans. on Audio, Speech and Lang. Proc. 19(3), 640–651 (2011)

    Article  Google Scholar 

  9. Pearce, D., Hirsch, H.G.: The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: Proc. ICSLP, vol. 4, pp. 29–32 (2000)

    Google Scholar 

  10. Peinado, A.M., Segura, J.C.: Speech Recognition over Digital Channels. Wiley (2006)

    Google Scholar 

  11. Rabiner, L., Juang, B.-H.: Fundamentals of speech recognition. Prentice-Hall (1993)

    Google Scholar 

  12. Turetsky, R.J., Ellis, D.P.: Ground-truth transcriptions of real music from force-aligned midi syntheses. In: Int. Conf. Music Inf. Retrieval (ISMIR), pp. 135–141 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Morales-Cordovilla, J.A., Cabañas-Molero, P., Peinado, A.M., Sánchez, V. (2012). A Robust Pitch Extractor Based on DTW Lines and CASA with Application in Noisy Speech Recognition. In: Torre Toledano, D., et al. Advances in Speech and Language Technologies for Iberian Languages. Communications in Computer and Information Science, vol 328. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35292-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35292-8_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35291-1

  • Online ISBN: 978-3-642-35292-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics