Combining Prosodic and Lexical Classifiers for Two-Pass Punctuation Detection in a Russian ASR System

  • Olga Khomitsevich
  • Pavel Chistikov
  • Tatiana Krivosheeva
  • Natalia Epimakhova
  • Irina Chernykh
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9319)


We propose a system for automatic punctuation prediction in recognized speech using prosodic, word and grammatical features. An SVM classifier is trained using prosody, and a CRF classifier is trained on a large text dataset using word-based features. The probabilities are then fused to produce a joint decision on comma and period placement, with a second classification pass for question mark detection. Training two classifiers separately enables us to avoid data sparseness for the lexical classifier, and to increase the overall robustness of the system. This works well for Russian and could be applied to other inflected languages. The system was tested on different speech styles. On manual transcripts, we achieved an F-score of 50–71 % for periods, 46–66 % for commas, 19–47 % for question marks, and 77–87 % for “mark/no mark” classification. The results for recognizer output are 46–66 % for periods, 43–60 % for commas, 10–38 % for questions, and 64–80 % for “mark/no mark”.


Punctuation prediction Sentence boundary detection Speech recognition Conditional Random Fields Support Vector Machine Russian 



The work was financially supported by the Ministry of Education and Science of the Russian Federation, Contract 14.579.21.0008, ID RFMEFI57914X0008, and by the Government of the Russian Federation, Grant 074-U01.


  1. 1.
    Stolcke, A., Shriberg, E.: Automatic linguistic segmentation of conversational speech. In: Proceedings of the Fourth International Conference on Spoken Language, ICSLP 96, vol. 2, pp. 1005–1008 (1996)Google Scholar
  2. 2.
    Huang, J., Zweig, G.: Maximum entropy model for punctuation annotation from speech. In: Proceedings of ICSLP, pp. 917–920 (2002)Google Scholar
  3. 3.
    Shriberg, E., Stolcke, A., Hakkani-Tr, D., Tr, G.: Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun. 32(1), 127–154 (2000)CrossRefGoogle Scholar
  4. 4.
    Chistikov, P., Khomitsevich, O.: Online automatic sentence boundary detection in a Russian ASR System. In: Proceedings of the 14th International Conference Speech and Computer - Specom 2011, pp. 112–117 (2011)Google Scholar
  5. 5.
    Korenevsky, M., Ponomareva, I., Levin, K.: Online topic segmentation of russian broadcast news. In: Proceedings of the 14th International Conference on Speech and Computer - SPECOM 2011, pp. 373–378 (2011)Google Scholar
  6. 6.
    Christensen, H., Gotoh, Y., Renals, S.: Punctuation annotation using statistical prosody models. In: ISCA Tutorial and Research Workshop (ITRW) on Prosody in Speech Recognition and Understanding (2001)Google Scholar
  7. 7.
    Wang, X., Ng, H.T., Sim, K.C.: Dynamic conditional random fields for joint sentence boundary and punctuation prediction. In: INTERSPEECH 2012 - Proceedings of th 13th Annual Conference of the International Speech Communication Association, pp. 281–286 (2012)Google Scholar
  8. 8.
    Hasan, M., Doddipatla, R., Hain, T.: Multipass sentence end detection of lecture speech. In: INTERSPEECH 2014 - Proceedings of the 15th Annual Conference of the International Speech Communication Association (2014)Google Scholar
  9. 9.
    Kolar, J., Lamel, L.: Development and evaluation of automatic punctuation for french and english speech-to-text. In: INTERSPEECH 2012 - Proceedings of the 13th Annual Conference of the International Speech Communication Association (2012)Google Scholar
  10. 10.
    Boakye, K., Favre, B., Hakkani-Tr, D.: Any questions? Automatic question detection in meetings. In: ASRU 2009 - IEEE Workshop on Automatic Speech Recognition & Under-standing, pp. 485–489 (2009)Google Scholar
  11. 11.
    Margolis, A., Ostendorf, M.: Question detection in spoken conversations using textual conversations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 2, pp. 118–124 (2011)Google Scholar
  12. 12.
    Kudo, T.: CRF++: Yet another CRF toolkit (2005).
  13. 13.
    Chernykh, G., Korenevsky, M., Levin, K., Ponomareva, I., Tomashenko, N.: State level control for acoustic model training. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS, vol. 8773, pp. 435–442. Springer, Heidelberg (2014) Google Scholar
  14. 14.
    Tomashenko, N., Khokhlov, Y.: Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing. In: INTERSPEECH 2014 - Proceedings of the 15th Annual Conference of the International Speech Communication Association, pp. 2997–3001 (2014)Google Scholar
  15. 15.
    Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural net-works for large-vocabulary speech recognition. IEEE Trans. Audio, Speech and Lan-guage Proc. 20(1), 30–42 (2012)CrossRefGoogle Scholar
  16. 16.
    Schwarz, P.: Phoneme recognition based on long temporal context. Doctoral thesis, Brno, Brno University of Technology, Faculty of Information Technology (2008)Google Scholar
  17. 17.
    Ueffing, N., Bisani, M., Vozila, P.: Improved models for automatic punctuation prediction for spoken and written text. In: INTERSPEECH 2013 - Proceedings of the 14th Annual Conference of the International Speech Communication Association (2013)Google Scholar
  18. 18.
    Zhang, D., Wu, S., Yang, N., Li, M.: Punctuation prediction with transition-based parsing. ACL (1), 752–760 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Olga Khomitsevich
    • 1
    • 2
  • Pavel Chistikov
    • 1
  • Tatiana Krivosheeva
    • 3
  • Natalia Epimakhova
    • 3
  • Irina Chernykh
    • 2
    • 3
  1. 1.Speech Technology CenterSaint-PetersburgRussia
  2. 2.ITMO UniversitySaint-PetersburgRussia
  3. 3.STC-Innovations LtdSaint-PetersburgRussia

Personalised recommendations