Punctuation Prediction in Vietnamese ASRs Using Transformer-Based Models

Bui, Viet The; Tran, Oanh Thi

doi:10.1007/978-3-030-89363-7_15

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13032))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

1391 Accesses
1 Citations

Abstract

Punctuation prediction is the task of predicting and inserting punctuation like periods, commas, exclamation marks, etc. into the appropriate positions in transcribed texts in ASR systems. This helps to improve user readability and the performance of many downstream tasks. While most related studies have been performed for popular languages like English and Chinese, there is very little work done for low-resource languages. In order to stimulate the research on these languages, in this paper, we target to improve the quality of punctuation prediction for Vietnamese ASRs. Specifically, we propose a method based on recent advances on pre-trained language models (LMs) for general purposes such as BERT and ELECTRA. The benefit of using these models is that they can be effectively fine-tuned on this punctuation prediction task where only a small amount of training data is available. To further enhance the performance, a simple yet effective technique to provide more context information in predicting punctuation marks for the very left and right words in each segment is also proposed. The experimental results of the proposed model on public benchmark datasets are quite promising. Overall, the proposed architecture substantially enhanced the prediction performance by a large margin and yielded a new state-of-the-art result on these datasets. Specifically, we achieved the \(F_1\) scores of 71.49% and 80.38% on the Novel and Newspaper public datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Efficient Transformer-Based Model for Vietnamese Punctuation Prediction

Vietnamese Punctuation Prediction Using Deep Neural Networks

Transformer-Based Automatic Punctuation Prediction and Word Casing Reconstruction of the ASR Output

Notes

References

Alam, T., Khan, A., Alam, F.: Punctuation restoration using transformer models for high-and low-resource languages. In: Proceedings of the 2020 EMNLP Workshop W-NUT: The Sixth Workshop on Noisy User-Generated Text. Association for Computational Linguistics, pp. 132–142 (2020)
Google Scholar
Ballesteros, M., Wanner, L.: A neural network architecture for multilingual punctuation generation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, 1–5 November, pp. 1048–1053 (2016)
Google Scholar
Bui, V.T., Tran, O.T., Le, P.H.: Improving sequence tagging for Vietnamese text using transformer-based neural models. In: Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation, pp. 13–20 (2020)
Google Scholar
Che, X., Wang, C., Yang, H., Meinel, C.: Punctuation prediction for unsegmented transcript based on word vector. In: The 10th International Conference on Language Resources and Evaluation (LREC), pp. 654–658 (2016)
Google Scholar
Cho, E., Niehues, J., Kilgour, K., Waibel, A.: Punctuation insertion for real-time spoken language translation. In: Proceedings of the Eleventh International Workshop on Spoken Language Translation (2015)
Google Scholar
Christensen, H., Gotoh, Y., Renals, S.: Punctuation annotation using statistical prosody models. In: ISCA Tutorial and Research Workshop (ITRW) on Prosody in Speech Recognition and Understanding (2001)
Google Scholar
Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.: ELECTRA: pretraining text encoders as discriminators rather than generators. In: Proceedings of ICLR (2020)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL, Minnesota, USA, pp. 1–16 (2019)
Google Scholar
Igras-Cybulska, M., Ziołko, B., Zelasko, P., Witkowski, M.: Structure of pauses in speech in the context of speaker verification and classification of speech type. EURASIP J. Audio Speech Music Process. 2016(1), Article ID. 18 (2016)
Google Scholar
Levy, T., Silber-Varod, V., Moyal, A.: The effect of pitch, intensity and pause duration in punctuation detection. In: IEEE 27th Convention of Electrical and Electronics Engineers in Israel (IEEEI), pp. 1–4. IEEE (2012)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. In: Proceedings of ICLR (2019)
Google Scholar
Lu, W., Ng, H.T.: Better punctuation prediction with dynamic conditional random fields proceedings of the 2010 conference on empirical methods in natural language processing, pp. 177–186. MIT, Massachusetts, USA. Association for Computational Linguistics (2010)
Google Scholar
Ngo, X.B., Tu, M.P.: Leveraging user ratings for resource-poor sentiment classification. Procedia Comput. Sci. 60, 322–331 (2015). ISSN: 1877-0509, https://doi.org/10.1016/j.procs.2015.08.134
Nguyen, B., et al.: Fast and accurate capitalization and punctuation for automatic speech recognition using transformer and chunk merging. In: 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), pp. 1–5 (2019)
Google Scholar
Pham, T., Nguyen, N., Pham, Q., Cao, H., Nguyen, B.: Vietnamese punctuation prediction using deep neural networks. In: proceedings of the International Conference on Current Trends in Theory and Practice of Informatics: SOFSEM 2020: Theory and Practice of Computer Science, pp. 388–400 (2020)
Google Scholar
Schutze, H.: Ambiguity Resolution in Language Learning: Computational and Cognitive Models, 176 p. CSLI Publications, Stanford (1997)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Germany, pp. 1715–1725. Association for Computational Linguistics (2016)
Google Scholar
Sproat, R., Jaitly, N.: RNN approaches to text normalization: a challenge. arXiv preprint arXiv:1611.00068 (2016)
Sunkara, M., Ronanki, S., Dixit, K., Bodapati, S., Kirchhoff, K.: Robust prediction of punctuation and truecasing for medical ASR. In: Proceedings of the 1st Workshop on NLP for Medical Conversations, pp. 53–62. Association for Computational Linguistics (2020)
Google Scholar
Tilk, O., Alum, T.: Bidirectional recurrent neural network with attention mechanism for punctuation restoration. In: Interspeech, pp. 3047–3051 (2016)
Google Scholar
Tran, O.T., Ngo, B.X., Le Nguyen, M., Shimazu, A.: Answering legal questions by mining reference information. In: Nakano, Y., Satoh, K., Bekki, D. (eds.) JSAI-isAI 2013. LNCS (LNAI), vol. 8417, pp. 214–229. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10061-6_15
Chapter Google Scholar
Tran, O.T., Bui, V.T.: A BERT-based hierarchical model for Vietnamese aspect based sentiment analysis. In: 12th International Conference on Knowledge and Systems Engineering (KSE), 2020, pp. 269–274 (2020). https://doi.org/10.1109/KSE50997.2020.9287650
Tran, O.T., Bui, V.T.: Neural text normalization in Speech-to-Text systems with rich features. Appl. Artif. Intell. 35(3), 193–205 (2021)
Article Google Scholar
Ueffing, N., Bisani, M., Vozila, P.: Improved models for automatic punctuation prediction for spoken and written text. In: Interspeech, pp. 3097–3101, Lyon, France (2013)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017)
Google Scholar
Zhao, Y., Wang, C., Fu, G.: A CRF sequence labeling approach to Chinese punctuation prediction. In: Proceedings of PACLIC, pp. 508–514 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

FPT School of Business and Technology, FPT University, Hanoi, Vietnam
Viet The Bui
International School, Vietnam National University, Hanoi, Hanoi, Vietnam
Oanh Thi Tran

Authors

Viet The Bui
View author publications
You can also search for this author in PubMed Google Scholar
Oanh Thi Tran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oanh Thi Tran .

Editor information

Editors and Affiliations

MIMOS Berhad, Kuala Lumpur, Malaysia
Duc Nghia Pham
Sirindhorn International Institute of Science and Technology, Thammasat University, Mueang Pathum Thani, Thailand
Thanaruk Theeramunkong
Data61, CSIRO, Brisbane, QLD, Australia
Guido Governatori
Department of Philosophy, Tsinghua University, Beijing, China
Fenrong Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bui, V.T., Tran, O.T. (2021). Punctuation Prediction in Vietnamese ASRs Using Transformer-Based Models. In: Pham, D.N., Theeramunkong, T., Governatori, G., Liu, F. (eds) PRICAI 2021: Trends in Artificial Intelligence. PRICAI 2021. Lecture Notes in Computer Science(), vol 13032. Springer, Cham. https://doi.org/10.1007/978-3-030-89363-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-89363-7_15
Published: 01 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89362-0
Online ISBN: 978-3-030-89363-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Punctuation Prediction in Vietnamese ASRs Using Transformer-Based Models

Abstract

Access this chapter

Similar content being viewed by others

An Efficient Transformer-Based Model for Vietnamese Punctuation Prediction

Vietnamese Punctuation Prediction Using Deep Neural Networks

Transformer-Based Automatic Punctuation Prediction and Word Casing Reconstruction of the ASR Output

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Punctuation Prediction in Vietnamese ASRs Using Transformer-Based Models

Abstract

Access this chapter

Similar content being viewed by others

An Efficient Transformer-Based Model for Vietnamese Punctuation Prediction

Vietnamese Punctuation Prediction Using Deep Neural Networks

Transformer-Based Automatic Punctuation Prediction and Word Casing Reconstruction of the ASR Output

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation