Skip to main content

Restoring Punctuation and Capitalization Using Transformer Models

  • Conference paper
  • First Online:
Statistical Language and Speech Processing (SLSP 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11171))

Included in the following conference series:

Abstract

Restoring punctuation and capitalization in the output of automatic speech recognition (ASR) system greatly improves readability and extends the number of downstream applications. We present a Transformer-based method for restoring punctuation and capitalization for Latvian and English, following the established approach of using neural machine translation (NMT) models. NMT methods here pose a challenge as the length of the predicted sequence does not always match the length of the input sequence. We offer two solutions to this problem: a simple target sequence cutting or padding by force and a more sophisticated attention alignment-based method. Our approach reaches new state of the art results for Latvian and competitive results on English.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Europarl results were not in print version of the paper, but they can be found at https://github.com/ottokart/punctuator2.

References

  1. Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: OSDI, vol. 16, pp. 265–283 (2016)

    Google Scholar 

  2. Agbago, A., Foster, G.: Truecasing for the portage system. In. Recent Advances in Natural Language Processing (2005)

    Google Scholar 

  3. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  4. Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018)

  5. Batista, F., Moniz, H., Trancoso, I., Mamede, N.: Bilingual experiments on automatic recovery of capitalization and punctuation of automatic speech transcripts. IEEE Trans. Audio Speech Lang. Process. 20(2), 474–485 (2012)

    Article  Google Scholar 

  6. Beaufays, F., Strope, B.: Language model capitalization. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6749–6752. IEEE (2013)

    Google Scholar 

  7. Brown, E.W., Coden, A.R.: Capitalization recovery for text. In: Coden, A.R., Brown, E.W., Srinivasan, S. (eds.) IRTSA 2001. LNCS, vol. 2273, pp. 11–22. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45637-6_2

    Chapter  MATH  Google Scholar 

  8. Cettolo, M., Niehues, J., StĂźker, S., Bentivogli, L., Federico, M.: Report on the 10th IWSLT evaluation campaign. In: Proceedings of the International Workshop on Spoken Language Translation, Heidelberg, Germany (2013)

    Google Scholar 

  9. Cettolo, M., Niehues, J., StĂźker, S., Bentivogli, L., Federico, M.: Report on the 11th IWSLT evaluation campaign, IWSLT 2014. In: Proceedings of the International Workshop on Spoken Language Translation, Hanoi, Vietnam (2014)

    Google Scholar 

  10. Cettolo, M., Niehues, J., StĂźker, S., Bentivogli, L., Federico, M.: Report on the 12th IWSLT evaluation campaign, IWSLT 2015. In: Proceedings of the International Workshop on Spoken Language Translation, Da Nang, Vietnam (2015)

    Google Scholar 

  11. Chelba, C., Acero, A.: Adaptation of maximum entropy capitalizer: little data can help a lot. Comput. Speech Lang. 20(4), 382–399 (2006)

    Article  Google Scholar 

  12. Chen, M.X., et al.: The best of both worlds: combining recent advances in neural machine translation. arXiv preprint arXiv:1804.09849 (2018)

  13. Cho, E., et al.: A real-world system for simultaneous translation of German lectures. In: INTERSPEECH, pp. 3473–3477 (2013)

    Google Scholar 

  14. Cho, E., Niehues, J., Waibel, A.: Segmentation and punctuation prediction in speech language translation using a monolingual translation system. In: International Workshop on Spoken Language Translation (IWSLT) 2012 (2012)

    Google Scholar 

  15. Cho, E., Niehues, J., Waibel, A.: NMT-based segmentation and punctuation insertion for real-time spoken language translation. In: Proc. Interspeech 2017. pp. 2645–2649 (2017), https://doi.org/10.21437/Interspeech.2017-1320

  16. Gravano, A., Jansche, M., Bacchiani, M.: Restoring punctuation and capitalization in transcribed speech. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2009, pp. 4741–4744. IEEE (2009)

    Google Scholar 

  17. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  18. Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: MT Summit, vol. 5, pp. 79–86 (2005)

    Google Scholar 

  19. Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. Association for Computational Linguistics (2007)

    Google Scholar 

  20. Lita, L.V., Ittycheriah, A., Roukos, S., Kambhatla, N.: tRuEcasing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, ACL 2003, vol. 1. pp. 152–159. Association for Computational Linguistics, Stroudsburg (2003). https://doi.org/10.3115/1075096.1075116

  21. Lu, W., Ng, H.T.: Better punctuation prediction with dynamic conditional random fields. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 177–186. Association for Computational Linguistics (2010)

    Google Scholar 

  22. Ostendorf, M., et al.: Speech segmentation and spoken document processing. IEEE Sig. Process. Mag. 25(3), 59–69 (2008)

    Article  Google Scholar 

  23. Peitz, S., Freitag, M., Mauser, A., Ney, H.: Modeling punctuation prediction as machine translation. In: International Workshop on Spoken Language Translation (IWSLT) 2011 (2011)

    Google Scholar 

  24. Rao, S., Lane, I., Schultz, T.: Optimizing sentence segmentation for spoken language translation. In: Eighth Annual Conference of the International Speech Communication Association (2007)

    Google Scholar 

  25. Salimbajevs, A.: Bidirectional LSTM for automatic punctuation restoration. In: Human Language Technologies-The Baltic Perspective: Proceedings of the Seventh International Conference Baltic HLT 2016, vol. 289, p. 59. IOS Press (2016)

    Google Scholar 

  26. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015)

  27. Tilk, O., Alumäe, T.: Bidirectional recurrent neural network with attention mechanism for punctuation restoration. In: Interspeech, pp. 3047–3051 (2016)

    Google Scholar 

  28. Vaswani, A., et al.: Tensor2tensor for neural machine translation. CoRR abs/1803.07416 (2018), http://arxiv.org/abs/1803.07416

  29. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 6000–6010 (2017)

    Google Scholar 

  30. Wang, W., Knight, K., Marcu, D.: Capitalizing machine translation. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp. 1–8. Association for Computational Linguistics (2006)

    Google Scholar 

Download references

Acknowledgements

The research has been supported by the European Regional Development Fund within the project “Neural Network Modelling for Inflected Natural Languages” No. 1.1.1.1/16/A/215.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andris Vāravs .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Š 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vāravs, A., Salimbajevs, A. (2018). Restoring Punctuation and Capitalization Using Transformer Models. In: Dutoit, T., Martín-Vide, C., Pironkov, G. (eds) Statistical Language and Speech Processing. SLSP 2018. Lecture Notes in Computer Science(), vol 11171. Springer, Cham. https://doi.org/10.1007/978-3-030-00810-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00810-9_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00809-3

  • Online ISBN: 978-3-030-00810-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics