Skip to main content

Speech Fragment Decoding Techniques Using Silent Pause Detection

  • Conference paper
Pattern Recognition (CCPR 2012)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 321))

Included in the following conference series:

Abstract

Silent pause frequently occurs in spontaneous speech. When recognizing spontaneous speech, silent pause tends to degrade the performance of typical speech recognizers. This paper proposes a fragment decoding method to improve the performance of speech recognizer using silent pause detection. This method automatically detects silent pauses and cuts long utterance into speech fragments. At decoding stage, instead of being skipped, these silent fragments are decoded separately. Final transcription of the whole utterance can be derived from corresponding fragmental results. Further improvement is made to reduce the run-time consumed on decoding. Because of an introduction of accurate word boundary, the misrecognition at silent frames is declined. Recognition experiments conducted on monolog speech in tourism field show that the proposed method outperforms the traditional frame skipping method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Goldwater, S., Jurafsky, D., Manning, C.: Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase ASR error rates. In: Proceedings of the Joint Meeting of the Association for Computational Linguistics and Human Language Technology Conference, ACL/HLT (2008)

    Google Scholar 

  2. Shriberg, E.: Spontaneous Speech: How People Really Talk And Why Engineers Should Care. In: Proc. of Interspeech 2005, Lisbon, Portugal, pp. 1781–1784 (2005)

    Google Scholar 

  3. Goto, M., Itou, K., Hayamizu, S.: A real-time filled pause detection system for Spontaneous Speech Recognition. In: Proc. of Eurospeech 1999, pp. 227–230 (1999)

    Google Scholar 

  4. Stolcke, A., Shriberg, E., Bates, R., Ostendorf, M., Hakkani, D., Plauche, M., Tur, G., Lu, Y.: Automatic detection of sentence boundaries and disfluencies based on recognized words. In: Proceedings of the International Conference on Spoken Language Processing, vol. 5, pp. 2247–2250 (1998)

    Google Scholar 

  5. Ogata, J., Goto, M., Itou, K.: The use of acoustically detected filled and silent pauses in spontaneous speech recognition. In: Proc. of ICASSP 2009, pp. 4305–4308 (2009)

    Google Scholar 

  6. Audhkhasi, K., Kandhway, K., Deshmukh, O.D., Verma, A.: Formant-based technique for automatic filled-pause detection in spontaneous spoken English. In: Proc. of ICASSP 2009, Taiwan, (2009)

    Google Scholar 

  7. Mporas, I., Ganchev, T., Fakotakis, N.: Speech segmentation using regression fusion of boundary predictions. Computer Speech and Language 24(2), 273–288 (2010)

    Article  Google Scholar 

  8. Wang, D., Lu, L., Zhang, H.J.: Speech segmentation without speech recognition. In: Proc. of ICASSP 2003, pp. 468–471 (2003)

    Google Scholar 

  9. Li, Y.X., He, Q.H., Li, T.: A novel detection method of filled pause in mandarin spontaneous speech. In: ICIS 2008, pp. 217–222 (2008)

    Google Scholar 

  10. Stouten, F., Martens, J.P.: A Feature-Based Filled Pause Detection System for Dutch. In: Procs of Workshop for Automatic Speech Recognition and Understanding, pp. 309–314 (2003)

    Google Scholar 

  11. Stouten, F., Duchateau, J., Martens, J.P., Wambacq, P.: Coping with disfluencies in Spontaneous Speech Recognition: acoustic detection and linguistic context manipulation. In: Speech Communication (48), pp. 1590–1606 (2006)

    Google Scholar 

  12. Ortmanns, S., Eiden, A., Ney, H., Coenen, N.: Look-Ahead Techniques for Fast Beam Search. In: Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Munich, Germany, pp. 1783–1786 (April 1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yang, Z., Liu, W., Jiang, W., Hu, P., Chen, M. (2012). Speech Fragment Decoding Techniques Using Silent Pause Detection. In: Liu, CL., Zhang, C., Wang, L. (eds) Pattern Recognition. CCPR 2012. Communications in Computer and Information Science, vol 321. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33506-8_71

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33506-8_71

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33505-1

  • Online ISBN: 978-3-642-33506-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics