Speech Fragment Decoding Techniques Using Silent Pause Detection

Yang, Zhanlei; Liu, Wenju; Jiang, Wei; Hu, Pengfei; Chen, Mingming

doi:10.1007/978-3-642-33506-8_71

Zhanlei Yang⁴,
Wenju Liu⁴,
Wei Jiang⁴,
Pengfei Hu⁴ &
…
Mingming Chen⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 321))

Included in the following conference series:

Chinese Conference on Pattern Recognition

3353 Accesses
3 Citations

Abstract

Silent pause frequently occurs in spontaneous speech. When recognizing spontaneous speech, silent pause tends to degrade the performance of typical speech recognizers. This paper proposes a fragment decoding method to improve the performance of speech recognizer using silent pause detection. This method automatically detects silent pauses and cuts long utterance into speech fragments. At decoding stage, instead of being skipped, these silent fragments are decoded separately. Final transcription of the whole utterance can be derived from corresponding fragmental results. Further improvement is made to reduce the run-time consumed on decoding. Because of an introduction of accurate word boundary, the misrecognition at silent frames is declined. Recognition experiments conducted on monolog speech in tourism field show that the proposed method outperforms the traditional frame skipping method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Goldwater, S., Jurafsky, D., Manning, C.: Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase ASR error rates. In: Proceedings of the Joint Meeting of the Association for Computational Linguistics and Human Language Technology Conference, ACL/HLT (2008)
Google Scholar
Shriberg, E.: Spontaneous Speech: How People Really Talk And Why Engineers Should Care. In: Proc. of Interspeech 2005, Lisbon, Portugal, pp. 1781–1784 (2005)
Google Scholar
Goto, M., Itou, K., Hayamizu, S.: A real-time filled pause detection system for Spontaneous Speech Recognition. In: Proc. of Eurospeech 1999, pp. 227–230 (1999)
Google Scholar
Stolcke, A., Shriberg, E., Bates, R., Ostendorf, M., Hakkani, D., Plauche, M., Tur, G., Lu, Y.: Automatic detection of sentence boundaries and disfluencies based on recognized words. In: Proceedings of the International Conference on Spoken Language Processing, vol. 5, pp. 2247–2250 (1998)
Google Scholar
Ogata, J., Goto, M., Itou, K.: The use of acoustically detected filled and silent pauses in spontaneous speech recognition. In: Proc. of ICASSP 2009, pp. 4305–4308 (2009)
Google Scholar
Audhkhasi, K., Kandhway, K., Deshmukh, O.D., Verma, A.: Formant-based technique for automatic filled-pause detection in spontaneous spoken English. In: Proc. of ICASSP 2009, Taiwan, (2009)
Google Scholar
Mporas, I., Ganchev, T., Fakotakis, N.: Speech segmentation using regression fusion of boundary predictions. Computer Speech and Language 24(2), 273–288 (2010)
Article Google Scholar
Wang, D., Lu, L., Zhang, H.J.: Speech segmentation without speech recognition. In: Proc. of ICASSP 2003, pp. 468–471 (2003)
Google Scholar
Li, Y.X., He, Q.H., Li, T.: A novel detection method of filled pause in mandarin spontaneous speech. In: ICIS 2008, pp. 217–222 (2008)
Google Scholar
Stouten, F., Martens, J.P.: A Feature-Based Filled Pause Detection System for Dutch. In: Procs of Workshop for Automatic Speech Recognition and Understanding, pp. 309–314 (2003)
Google Scholar
Stouten, F., Duchateau, J., Martens, J.P., Wambacq, P.: Coping with disfluencies in Spontaneous Speech Recognition: acoustic detection and linguistic context manipulation. In: Speech Communication (48), pp. 1590–1606 (2006)
Google Scholar
Ortmanns, S., Eiden, A., Ney, H., Coenen, N.: Look-Ahead Techniques for Fast Beam Search. In: Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Munich, Germany, pp. 1783–1786 (April 1997)
Google Scholar

Download references

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, 100190, Beijing, China
Zhanlei Yang, Wenju Liu, Wei Jiang, Pengfei Hu & Mingming Chen

Authors

Zhanlei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wenju Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Pengfei Hu
View author publications
You can also search for this author in PubMed Google Scholar
Mingming Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Automation, National Laboratory of Pattern Recognition, Chinese Academy of Sciences, No.95, Zhongguancun East Road, 100190, Beijing, China
Cheng-Lin Liu
Department of Automation, Tsinghua University, Haidian District, 100084, Beijing, China
Changshui Zhang
Institute of Automation, National Laboratory of Pattern Recognition, Chinese Academy of Sciences, 100190, Beijing, China
Liang Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, Z., Liu, W., Jiang, W., Hu, P., Chen, M. (2012). Speech Fragment Decoding Techniques Using Silent Pause Detection. In: Liu, CL., Zhang, C., Wang, L. (eds) Pattern Recognition. CCPR 2012. Communications in Computer and Information Science, vol 321. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33506-8_71

Download citation

DOI: https://doi.org/10.1007/978-3-642-33506-8_71
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33505-1
Online ISBN: 978-3-642-33506-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics