Evaluation on Noise Reduction in Subtitle Generator for Videos

Nguyen, Hai Thanh; Thanh, Tan Nguyen Lam; Ngoc, Tai Le; Le, Anh Duy; Tran, Dien Thanh

doi:10.1007/978-3-031-08819-3_14

Hai Thanh Nguyen¹⁰,
Tan Nguyen Lam Thanh¹⁰,
Tai Le Ngoc¹⁰,
Anh Duy Le¹⁰ &
…
Dien Thanh Tran¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 496))

Included in the following conference series:

International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing

537 Accesses
2 Citations

Abstract

Currently, watching movies and videos on the internet serves all needs such as learning, entertainment, and research. The application of artificial intelligence in translation and speech recognition is also discussed. They are also developing in many research directions, such as Speech-to-Text recognition applications based on specific audio files. However, studies often focus on improving the processing speed and the accuracy of words converted into text inside the audio file but have not focused on clarifying the voice inside the audio file to facilitate easy and accurate identification. Like no tool can automatically create subtitles for videos for free, but only manually create subtitles based on timestamps and adding subtitles, which is quite time-consuming for long movies or videos. Therefore, this study proposes a new approach by combining audio processing for noise reduction, noise removal, and audio-to-text recognition to create a tool to generate subtitles automatically with high accuracy. The study results are only experimental to create a research direction that can be developed and implemented into viable applications for creating subtitles for videos without having to do it manually and with an accuracy of about 80%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/alexkay/spek, accessed on 10 March 2022.
2.
https://pypi.org/project/PyQt5/.

References

Aswin, V.B., et al.: NLP-driven ensemble-based automatic subtitle generation and semantic video summarization technique. In: Chiplunkar, N.N., Fukao, T. (eds.) Advances in Artificial Intelligence and Data Engineering. AISC, vol. 1133, pp. 3–13. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-3514-7_1
Chapter Google Scholar
Chootong, C., Shih, T.K., Ochirbat, A., Sommool, W., Zhuang, Y.Y.: An attention enhanced sentence feature network for subtitle extraction and summarization. Expert Syst. Appl. 178, 114946 (2021). https://doi.org/10.1016/j.eswa.2021.114946
Degadwala, S., Vyas, D., Biswas, H., Chakraborty, U., Saha, S.: Image captioning using inception v3 transfer learning model. In: 2021 6th International Conference on Communication and Electronics Systems (ICCES). IEEE (2021). https://doi.org/10.1109/icces51350.2021.9489111
Domingo, I.V.R., Mamanta, M.N.G., Regpala, J.T.S.: FILENG: an automatic English subtitle generator from Filipino video clips using hidden Markov model. In: The 2021 9th International Conference on Computer and Communications Management. ACM (2021). https://doi.org/10.1145/3479162.3479172
Elshahaby, H., Rashwan, M.: An end to end system for subtitle text extraction from movie videos. J. Ambient Intell. Human. Comput. (2021). https://doi.org/10.1007/s12652-021-02951-1
Halpern, Y., et al.: Contextual prediction models for speech recognition. In: Proceedings of Interspeech 2016 (2016). http://www.isca-speech.org/archive/Interspeech_2016/pdfs/1358.PDF
Hunter, J.D.: Matplotlib: a 2D graphics environment. Computi. Sci. Eng. 9(3), 90–95 (2007). https://doi.org/10.1109/MCSE.2007.55
Article Google Scholar
Orero, P., Brescia-Zapata, M., Hughes, C.: Evaluating subtitle readability in media immersive environments. In: 9th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion. ACM, December 2020. https://doi.org/10.1145/3439231.3440602
Linhares Pontes, E., González-Gallardo, C.-E., Torres-Moreno, J.-M., Huet, S.: Cross-lingual speech-to-text summarization. In: Choroś, K., Kopel, M., Kukla, E., Siemiński, A. (eds.) MISSI 2018. AISC, vol. 833, pp. 385–395. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-98678-4_39
Chapter Google Scholar
Roy, A., Phadikar, S.: Automatic segmentation of spoken word signals into letters based on amplitude variation for speech to text transcription. In: Mandal, J.K., Satapathy, S.C., Sanyal, M.K., Sarkar, P.P., Mukhopadhyay, A. (eds.) Information Systems Design and Intelligent Applications. AISC, vol. 340, pp. 621–628. Springer, New Delhi (2015). https://doi.org/10.1007/978-81-322-2247-7_63
Chapter Google Scholar
Sainburg, T., Thielk, M., Gentner, T.Q.: Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires. PLoS Comput. Biol. 16(10), e1008228 (2020)
Article Google Scholar
Seo, D., Gil, J.-M.: Speech-to-text-based life log system for smartphones. In: Park, D.S., Chao, H.C., Jeong, Y.S., Park, J. (eds.) Advances in Computer Science and Ubiquitous Computing. LNEE, vol. 373, pp. 637–642. Springer, Singapore (2015). https://doi.org/10.1007/978-981-10-0281-6_90
Chapter Google Scholar
Verboom, M., Crombie, D., Dijk, E., Theunisz, M.: Spoken subtitles: making subtitled TV programmes accessible. In: Miesenberger, K., Klaus, J., Zagler, W. (eds.) ICCHP 2002. LNCS, vol. 2398, pp. 295–302. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45491-8_62
Chapter Google Scholar
Victor, D.M., Eduardo, F.F., Biswas, R., Alegre, E., Fernández-Robles, L.: Application of extractive text summarization algorithms to speech-to-text media. In: Pérez García, H., Sánchez González, L., Castejón Limas, M., Quintián Pardo, H., Corchado Rodríguez, E. (eds.) HAIS 2019. LNCS (LNAI), vol. 11734, pp. 540–550. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29859-3_46
Chapter Google Scholar
Yim, J.: Design of a subtitle generator. In: Advanced Science and Technology Letters. Science and Engineering Research Support soCiety, November 2015. https://doi.org/10.14257/astl.2015.117.17

Download references

Author information

Authors and Affiliations

Can Tho University, Can Tho, Vietnam
Hai Thanh Nguyen, Tan Nguyen Lam Thanh, Tai Le Ngoc, Anh Duy Le & Dien Thanh Tran

Authors

Hai Thanh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Tan Nguyen Lam Thanh
View author publications
You can also search for this author in PubMed Google Scholar
Tai Le Ngoc
View author publications
You can also search for this author in PubMed Google Scholar
Anh Duy Le
View author publications
You can also search for this author in PubMed Google Scholar
Dien Thanh Tran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dien Thanh Tran .

Editor information

Editors and Affiliations

Department of Information and Communication Engineering, Fukuoka Institute of Technology, Fukuoka, Japan
Leonard Barolli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, H.T., Thanh, T.N.L., Ngoc, T.L., Le, A.D., Tran, D.T. (2022). Evaluation on Noise Reduction in Subtitle Generator for Videos. In: Barolli, L. (eds) Innovative Mobile and Internet Services in Ubiquitous Computing. IMIS 2022. Lecture Notes in Networks and Systems, vol 496. Springer, Cham. https://doi.org/10.1007/978-3-031-08819-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-08819-3_14
Published: 16 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08818-6
Online ISBN: 978-3-031-08819-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Evaluation on Noise Reduction in Subtitle Generator for Videos