Advertisement

Restoring Punctuation and Capitalization Using Transformer Models

  • Andris Vāravs
  • Askars Salimbajevs
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11171)

Abstract

Restoring punctuation and capitalization in the output of automatic speech recognition (ASR) system greatly improves readability and extends the number of downstream applications. We present a Transformer-based method for restoring punctuation and capitalization for Latvian and English, following the established approach of using neural machine translation (NMT) models. NMT methods here pose a challenge as the length of the predicted sequence does not always match the length of the input sequence. We offer two solutions to this problem: a simple target sequence cutting or padding by force and a more sophisticated attention alignment-based method. Our approach reaches new state of the art results for Latvian and competitive results on English.

Keywords

Speech recognition Punctuation restoration Capitalization restoration Transformer 

Notes

Acknowledgements

The research has been supported by the European Regional Development Fund within the project “Neural Network Modelling for Inflected Natural Languages” No. 1.1.1.1/16/A/215.

References

  1. 1.
    Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: OSDI, vol. 16, pp. 265–283 (2016)Google Scholar
  2. 2.
    Agbago, A., Foster, G.: Truecasing for the portage system. In. Recent Advances in Natural Language Processing (2005)Google Scholar
  3. 3.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
  4. 4.
    Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018)
  5. 5.
    Batista, F., Moniz, H., Trancoso, I., Mamede, N.: Bilingual experiments on automatic recovery of capitalization and punctuation of automatic speech transcripts. IEEE Trans. Audio Speech Lang. Process. 20(2), 474–485 (2012)CrossRefGoogle Scholar
  6. 6.
    Beaufays, F., Strope, B.: Language model capitalization. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6749–6752. IEEE (2013)Google Scholar
  7. 7.
    Brown, E.W., Coden, A.R.: Capitalization recovery for text. In: Coden, A.R., Brown, E.W., Srinivasan, S. (eds.) IRTSA 2001. LNCS, vol. 2273, pp. 11–22. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-45637-6_2CrossRefzbMATHGoogle Scholar
  8. 8.
    Cettolo, M., Niehues, J., Stüker, S., Bentivogli, L., Federico, M.: Report on the 10th IWSLT evaluation campaign. In: Proceedings of the International Workshop on Spoken Language Translation, Heidelberg, Germany (2013)Google Scholar
  9. 9.
    Cettolo, M., Niehues, J., Stüker, S., Bentivogli, L., Federico, M.: Report on the 11th IWSLT evaluation campaign, IWSLT 2014. In: Proceedings of the International Workshop on Spoken Language Translation, Hanoi, Vietnam (2014)Google Scholar
  10. 10.
    Cettolo, M., Niehues, J., Stüker, S., Bentivogli, L., Federico, M.: Report on the 12th IWSLT evaluation campaign, IWSLT 2015. In: Proceedings of the International Workshop on Spoken Language Translation, Da Nang, Vietnam (2015)Google Scholar
  11. 11.
    Chelba, C., Acero, A.: Adaptation of maximum entropy capitalizer: little data can help a lot. Comput. Speech Lang. 20(4), 382–399 (2006)CrossRefGoogle Scholar
  12. 12.
    Chen, M.X., et al.: The best of both worlds: combining recent advances in neural machine translation. arXiv preprint arXiv:1804.09849 (2018)
  13. 13.
    Cho, E., et al.: A real-world system for simultaneous translation of German lectures. In: INTERSPEECH, pp. 3473–3477 (2013)Google Scholar
  14. 14.
    Cho, E., Niehues, J., Waibel, A.: Segmentation and punctuation prediction in speech language translation using a monolingual translation system. In: International Workshop on Spoken Language Translation (IWSLT) 2012 (2012)Google Scholar
  15. 15.
    Cho, E., Niehues, J., Waibel, A.: NMT-based segmentation and punctuation insertion for real-time spoken language translation. In: Proc. Interspeech 2017. pp. 2645–2649 (2017),  https://doi.org/10.21437/Interspeech.2017-1320
  16. 16.
    Gravano, A., Jansche, M., Bacchiani, M.: Restoring punctuation and capitalization in transcribed speech. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2009, pp. 4741–4744. IEEE (2009)Google Scholar
  17. 17.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  18. 18.
    Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: MT Summit, vol. 5, pp. 79–86 (2005)Google Scholar
  19. 19.
    Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. Association for Computational Linguistics (2007)Google Scholar
  20. 20.
    Lita, L.V., Ittycheriah, A., Roukos, S., Kambhatla, N.: tRuEcasing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, ACL 2003, vol. 1. pp. 152–159. Association for Computational Linguistics, Stroudsburg (2003).  https://doi.org/10.3115/1075096.1075116
  21. 21.
    Lu, W., Ng, H.T.: Better punctuation prediction with dynamic conditional random fields. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 177–186. Association for Computational Linguistics (2010)Google Scholar
  22. 22.
    Ostendorf, M., et al.: Speech segmentation and spoken document processing. IEEE Sig. Process. Mag. 25(3), 59–69 (2008)CrossRefGoogle Scholar
  23. 23.
    Peitz, S., Freitag, M., Mauser, A., Ney, H.: Modeling punctuation prediction as machine translation. In: International Workshop on Spoken Language Translation (IWSLT) 2011 (2011)Google Scholar
  24. 24.
    Rao, S., Lane, I., Schultz, T.: Optimizing sentence segmentation for spoken language translation. In: Eighth Annual Conference of the International Speech Communication Association (2007)Google Scholar
  25. 25.
    Salimbajevs, A.: Bidirectional LSTM for automatic punctuation restoration. In: Human Language Technologies-The Baltic Perspective: Proceedings of the Seventh International Conference Baltic HLT 2016, vol. 289, p. 59. IOS Press (2016)Google Scholar
  26. 26.
    Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015)
  27. 27.
    Tilk, O., Alumäe, T.: Bidirectional recurrent neural network with attention mechanism for punctuation restoration. In: Interspeech, pp. 3047–3051 (2016)Google Scholar
  28. 28.
    Vaswani, A., et al.: Tensor2tensor for neural machine translation. CoRR abs/1803.07416 (2018), http://arxiv.org/abs/1803.07416
  29. 29.
    Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 6000–6010 (2017)Google Scholar
  30. 30.
    Wang, W., Knight, K., Marcu, D.: Capitalizing machine translation. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp. 1–8. Association for Computational Linguistics (2006)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.TildeRigaLatvia

Personalised recommendations