Abstract
Virtual assistant or human-like robot has become the most attractive technology nowadays., where we badly need communication. And verbal communication is the most comfortable one. Here, it comes with the necessity of voice recognition system. Recently, researchers are really focusing on developing voice recognition systems with the help of machine learning and deep learning algorithms. And for that, researchers need a large amount of audio data. In this research, we will discuss from collecting audio files from different speakers to represent them into numeric formation so that it becomes possible to apply machine learning or deep learning algorithm on it. Its need some requirement as well as specific formation, instruction to the speaker, organized the speeches and making the scripts are the most important things. And all the description in this work has been described with experience of our own work with speech data. The dataset of this work has exactly 10,992 speech data of 1000 unique words of more than 50 speakers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Shadiev, R., Hwang, W.-Y., Huang, Y.-M., Liu, C.-J.: Investigating applications of speech-to-text recognition technology for a face-to-face seminar to assist learning of non-native English-speaking participants Technology. Pedag Educ 25(1), 119–134 (2016)
Bansal, S., Kamper, H., Livescu, K., Lopez, A., Goldwater, S.: Pre-training on high-resource speech recognition improves low-resource speech-to-text translation. CoRR, abs/1809.01431 (2018)
Bansal, S., Kamper, H., Lopez, A., Goldwater, S.: Towards speech-to-text translation without speech recognition. CoRR, abs/1702.03856 (2017)
Mukherjee, H., Phadikar, S., Rakshit, P., Roy,K.:REARC-a Bangla Phoneme recognizer. In: 2016 International Conference on Accessibility to Digital World (ICADW), Guwahati, 2016, pp. 177–180
Saurav, J., Amin, S., Kibria, S., Rahman, M.: Bangla Speech Recognition for Voice Search. 1–4 (2018). https://doi.org/10.1109/ICBSLP.2018.8554944
Reddy, K.: Improved HMM-based mixed-language (Telugu–Hindi) polyglot speech synthesis. In: Advances in Communication, Signal Processing, VLSI, and Embedded Systems, pp. 279–287. Springer Singapore (2020)
Modale, R.: A review: Devnagri speech to text for Marathwada Region. In: Proceedings of International Conference on Wireless Communication, pp. 525–532. Springer Singapore (2020)
Ying, W., Zhang, L., Deng, H.: Sichuan dialect speech recognition with deep LSTM network. Front. Comput. Sci. 14, 378–387 (2020)
Warden, P.: Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition (2018)
Amin, M.A.A., Islam, M.T., Kibria, S., Rahman, M.S.: Continuous Bengali speech recognition based on deep neural network. In: 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox'sBazar, Bangladesh, pp. 1–6 (2019)
Chaudhuri, S., Roth, J., Ellis, D.P.W., Gallagher, A.C., Kaver, L., Marvin, R., Pantofaru, C., Reale, N., Reid, L.G., Wilson, K.W., Xi, Z.: AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies. CoRR, abs/1808.00606 (2018)
Benk, S., Elmir, Y., Dennai, A.: A study on automatic speech recognition. 10, 77–85 (2019). https://doi.org/10.6025/jitr/2019/10/3/77-85
Pala, M.: A new human voice recognition system. AJSAT 5, 23–30 (2016)
Alim, S.A., Rashid, N.K.A.: Some Commonly Used Speech Feature Extraction Algorithms, From Natural to Artificial Intelligence—Algorithms and Applications, Ricardo Lopez-Ruiz, IntechOpen, 12 Dec 2018. https://doi.org/10.5772/intechopen.80419
Music Feature Extraction in Python, https://towardsdatascience.com/extract-features-of-music-75a3f9bc265d. Last access 21 Mar 2020
Bhattacharjee, A., et al.: Bangla voice controlled robot for rescue operation for noisy environment. In: 2016 IEEE Region 10 Conference (TENCON), Singapore, pp. 3284–3288 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Saiful Islam Badhon, S.M., Rupon, F.R., Habibur Rahaman, M., Abujar, S. (2021). A Language-Independent Speech Data Collection and Preprocessing Technique. In: Hassanien, A.E., Bhattacharyya, S., Chakrabati, S., Bhattacharya, A., Dutta, S. (eds) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 1300. Springer, Singapore. https://doi.org/10.1007/978-981-33-4367-2_9
Download citation
DOI: https://doi.org/10.1007/978-981-33-4367-2_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-4366-5
Online ISBN: 978-981-33-4367-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)