Abstract
Punctuation prediction is the task of predicting and inserting punctuation like periods, commas, exclamation marks, etc. into the appropriate positions in transcribed texts in ASR systems. This helps to improve user readability and the performance of many downstream tasks. While most related studies have been performed for popular languages like English and Chinese, there is very little work done for low-resource languages. In order to stimulate the research on these languages, in this paper, we target to improve the quality of punctuation prediction for Vietnamese ASRs. Specifically, we propose a method based on recent advances on pre-trained language models (LMs) for general purposes such as BERT and ELECTRA. The benefit of using these models is that they can be effectively fine-tuned on this punctuation prediction task where only a small amount of training data is available. To further enhance the performance, a simple yet effective technique to provide more context information in predicting punctuation marks for the very left and right words in each segment is also proposed. The experimental results of the proposed model on public benchmark datasets are quite promising. Overall, the proposed architecture substantially enhanced the prediction performance by a large margin and yielded a new state-of-the-art result on these datasets. Specifically, we achieved the \(F_1\) scores of 71.49% and 80.38% on the Novel and Newspaper public datasets, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alam, T., Khan, A., Alam, F.: Punctuation restoration using transformer models for high-and low-resource languages. In: Proceedings of the 2020 EMNLP Workshop W-NUT: The Sixth Workshop on Noisy User-Generated Text. Association for Computational Linguistics, pp. 132–142 (2020)
Ballesteros, M., Wanner, L.: A neural network architecture for multilingual punctuation generation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, 1–5 November, pp. 1048–1053 (2016)
Bui, V.T., Tran, O.T., Le, P.H.: Improving sequence tagging for Vietnamese text using transformer-based neural models. In: Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation, pp. 13–20 (2020)
Che, X., Wang, C., Yang, H., Meinel, C.: Punctuation prediction for unsegmented transcript based on word vector. In: The 10th International Conference on Language Resources and Evaluation (LREC), pp. 654–658 (2016)
Cho, E., Niehues, J., Kilgour, K., Waibel, A.: Punctuation insertion for real-time spoken language translation. In: Proceedings of the Eleventh International Workshop on Spoken Language Translation (2015)
Christensen, H., Gotoh, Y., Renals, S.: Punctuation annotation using statistical prosody models. In: ISCA Tutorial and Research Workshop (ITRW) on Prosody in Speech Recognition and Understanding (2001)
Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.: ELECTRA: pretraining text encoders as discriminators rather than generators. In: Proceedings of ICLR (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL, Minnesota, USA, pp. 1–16 (2019)
Igras-Cybulska, M., Ziołko, B., Zelasko, P., Witkowski, M.: Structure of pauses in speech in the context of speaker verification and classification of speech type. EURASIP J. Audio Speech Music Process. 2016(1), Article ID. 18 (2016)
Levy, T., Silber-Varod, V., Moyal, A.: The effect of pitch, intensity and pause duration in punctuation detection. In: IEEE 27th Convention of Electrical and Electronics Engineers in Israel (IEEEI), pp. 1–4. IEEE (2012)
Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. In: Proceedings of ICLR (2019)
Lu, W., Ng, H.T.: Better punctuation prediction with dynamic conditional random fields proceedings of the 2010 conference on empirical methods in natural language processing, pp. 177–186. MIT, Massachusetts, USA. Association for Computational Linguistics (2010)
Ngo, X.B., Tu, M.P.: Leveraging user ratings for resource-poor sentiment classification. Procedia Comput. Sci. 60, 322–331 (2015). ISSN: 1877-0509, https://doi.org/10.1016/j.procs.2015.08.134
Nguyen, B., et al.: Fast and accurate capitalization and punctuation for automatic speech recognition using transformer and chunk merging. In: 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), pp. 1–5 (2019)
Pham, T., Nguyen, N., Pham, Q., Cao, H., Nguyen, B.: Vietnamese punctuation prediction using deep neural networks. In: proceedings of the International Conference on Current Trends in Theory and Practice of Informatics: SOFSEM 2020: Theory and Practice of Computer Science, pp. 388–400 (2020)
Schutze, H.: Ambiguity Resolution in Language Learning: Computational and Cognitive Models, 176 p. CSLI Publications, Stanford (1997)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Germany, pp. 1715–1725. Association for Computational Linguistics (2016)
Sproat, R., Jaitly, N.: RNN approaches to text normalization: a challenge. arXiv preprint arXiv:1611.00068 (2016)
Sunkara, M., Ronanki, S., Dixit, K., Bodapati, S., Kirchhoff, K.: Robust prediction of punctuation and truecasing for medical ASR. In: Proceedings of the 1st Workshop on NLP for Medical Conversations, pp. 53–62. Association for Computational Linguistics (2020)
Tilk, O., Alum, T.: Bidirectional recurrent neural network with attention mechanism for punctuation restoration. In: Interspeech, pp. 3047–3051 (2016)
Tran, O.T., Ngo, B.X., Le Nguyen, M., Shimazu, A.: Answering legal questions by mining reference information. In: Nakano, Y., Satoh, K., Bekki, D. (eds.) JSAI-isAI 2013. LNCS (LNAI), vol. 8417, pp. 214–229. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10061-6_15
Tran, O.T., Bui, V.T.: A BERT-based hierarchical model for Vietnamese aspect based sentiment analysis. In: 12th International Conference on Knowledge and Systems Engineering (KSE), 2020, pp. 269–274 (2020). https://doi.org/10.1109/KSE50997.2020.9287650
Tran, O.T., Bui, V.T.: Neural text normalization in Speech-to-Text systems with rich features. Appl. Artif. Intell. 35(3), 193–205 (2021)
Ueffing, N., Bisani, M., Vozila, P.: Improved models for automatic punctuation prediction for spoken and written text. In: Interspeech, pp. 3097–3101, Lyon, France (2013)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017)
Zhao, Y., Wang, C., Fu, G.: A CRF sequence labeling approach to Chinese punctuation prediction. In: Proceedings of PACLIC, pp. 508–514 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Bui, V.T., Tran, O.T. (2021). Punctuation Prediction in Vietnamese ASRs Using Transformer-Based Models. In: Pham, D.N., Theeramunkong, T., Governatori, G., Liu, F. (eds) PRICAI 2021: Trends in Artificial Intelligence. PRICAI 2021. Lecture Notes in Computer Science(), vol 13032. Springer, Cham. https://doi.org/10.1007/978-3-030-89363-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-89363-7_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89362-0
Online ISBN: 978-3-030-89363-7
eBook Packages: Computer ScienceComputer Science (R0)