Advertisement

Punctuation Prediction for Chinese Spoken Sentence Based on Model Combination

  • Xiao Chen
  • Dengfeng Ke
  • Bo Xu
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 279)

Abstract

Punctuation prediction is very important for automatic speech recognition (ASR). It greatly improves the readability of transcripts and user experience, and facilitates following natural language processing tasks. In this study, we develop a model combination based approach for the recovery of punctuation for Chinese spoken sentence. Our approach models the relationships between punctuation and sentence by the different ways of sentence representation. And the relationships modeled are combined by multi-layer perception to predict punctuation (period, question mark, and exclamation mark). Different from previous studies, our proposed approach is designed to use global lexical information, not only local information. Results indicate that, compared with the baseline, our proposed method results in an absolute improvement of 10.0 % unweighted accuracy and 4.9 % weighted accuracy, respectively. Our approach finally achieves an unweighted accuracy of 86.9 % and a weighted accuracy of 92.4 %.

Keywords

Punctuation prediction Model combine Global lexical information 

Notes

Acknowledgments

This work is supported by National Program on Key Basic Research Project (973 Program) under Grant 2013CB329302 and National Natural Science Foundation of China under Grant 61103152.

References

  1. 1.
    Favre B, Grishman R, Hillard D, Heng J, Hakkani-Tur, D, Ostendorf M (2008) Punctuating speech for information extraction. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, ICASSP 2008, pp 5013–5016Google Scholar
  2. 2.
    Paulik M, Rao S, Lane I, Vogel S, Schultz T (2008) Sentence segmentation and punctuation recovery for spoken language translation. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, ICASSP 2008, pp 5105–5108Google Scholar
  3. 3.
    Beeferman, D., Berger, A., & Lafferty, J. (1998). Cyberpunc: a lightweight punctuation annotation system for speech. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, ICASSP 1998, pp 689–692Google Scholar
  4. 4.
    Stolcke A, Shriberg E, Bates R, Ostendorf M, Hakkani, D, Plauche M, Tür G, Lu Y (1998) Automatic detection of sentence boundaries and disfluencies based on recognized words. In: Proceedings of ICSLP, pp 2247–2250Google Scholar
  5. 5.
    Chen J (1999) Speech recognition with automatic punctuation. In : Proceedings of Eurospeech, pp 447–450Google Scholar
  6. 6.
    Kim J-H, Woodland P (2001) The use of prosody in a combined system for punctuation generation and speech recognition. In: Proceedings of EurospeechGoogle Scholar
  7. 7.
    Huang JGZ (2002) Maximum entropy model for punctuation annotation from speech, ICSLP-2002, pp 917–920Google Scholar
  8. 8.
    Batista F, Moniz H, Trancoso I, Mamede N (2012) Bilingual experiments on automatic recovery of capitalization and punctuation of automatic speech transcripts. IEEE Trans Audio Speech Lang Proc 20(2):474–485CrossRefGoogle Scholar
  9. 9.
    Gravano A, Jansche M, Bacchiani M (2009) Restoring punctuation and capitalization in transcribed speech. In: Proceedings of IEEE international conference on acoustics, speech and signal Processing, ICASSP 2009, pp 4741–4744Google Scholar
  10. 10.
    Wenzhu S, Yu RP, Seide F, Ji W (2009) Automatic punctuation generation for speech. In: Proceedings of IEEE workshop on automatic speech recognition and understanding, ASRU 2009, pp 586–589Google Scholar
  11. 11.
    Lu W, Ng HT (2010) Better punctuation prediction with dynamic conditional random fields. In: Proceedings of the 2010 conference on empirical methods in natural language processing, pp 177–186Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Interactive Digital Media Technology Research Center (IDMTech)Institute of Automation, Chinese Academy of SciencesBeijingPeople’s Republic of China

Personalised recommendations