A Language-Independent Speech Data Collection and Preprocessing Technique

Saiful Islam Badhon, S. M.; Rupon, Farea Rehnuma; Habibur Rahaman, Md.; Abujar, Sheikh

doi:10.1007/978-981-33-4367-2_9

S. M. Saiful Islam Badhon¹⁹,
Farea Rehnuma Rupon¹⁹,
Md. Habibur Rahaman¹⁹ &
…
Sheikh Abujar¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1300))

853 Accesses

Abstract

Virtual assistant or human-like robot has become the most attractive technology nowadays., where we badly need communication. And verbal communication is the most comfortable one. Here, it comes with the necessity of voice recognition system. Recently, researchers are really focusing on developing voice recognition systems with the help of machine learning and deep learning algorithms. And for that, researchers need a large amount of audio data. In this research, we will discuss from collecting audio files from different speakers to represent them into numeric formation so that it becomes possible to apply machine learning or deep learning algorithm on it. Its need some requirement as well as specific formation, instruction to the speaker, organized the speeches and making the scripts are the most important things. And all the description in this work has been described with experience of our own work with speech data. The dataset of this work has exactly 10,992 speech data of 1000 unique words of more than 50 speakers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Shadiev, R., Hwang, W.-Y., Huang, Y.-M., Liu, C.-J.: Investigating applications of speech-to-text recognition technology for a face-to-face seminar to assist learning of non-native English-speaking participants Technology. Pedag Educ 25(1), 119–134 (2016)
Article Google Scholar
Bansal, S., Kamper, H., Livescu, K., Lopez, A., Goldwater, S.: Pre-training on high-resource speech recognition improves low-resource speech-to-text translation. CoRR, abs/1809.01431 (2018)
Google Scholar
Bansal, S., Kamper, H., Lopez, A., Goldwater, S.: Towards speech-to-text translation without speech recognition. CoRR, abs/1702.03856 (2017)
Google Scholar
Mukherjee, H., Phadikar, S., Rakshit, P., Roy,K.:REARC-a Bangla Phoneme recognizer. In: 2016 International Conference on Accessibility to Digital World (ICADW), Guwahati, 2016, pp. 177–180
Google Scholar
Saurav, J., Amin, S., Kibria, S., Rahman, M.: Bangla Speech Recognition for Voice Search. 1–4 (2018). https://doi.org/10.1109/ICBSLP.2018.8554944
Reddy, K.: Improved HMM-based mixed-language (Telugu–Hindi) polyglot speech synthesis. In: Advances in Communication, Signal Processing, VLSI, and Embedded Systems, pp. 279–287. Springer Singapore (2020)
Google Scholar
Modale, R.: A review: Devnagri speech to text for Marathwada Region. In: Proceedings of International Conference on Wireless Communication, pp. 525–532. Springer Singapore (2020)
Google Scholar
Ying, W., Zhang, L., Deng, H.: Sichuan dialect speech recognition with deep LSTM network. Front. Comput. Sci. 14, 378–387 (2020)
Article Google Scholar
Warden, P.: Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition (2018)
Google Scholar
Amin, M.A.A., Islam, M.T., Kibria, S., Rahman, M.S.: Continuous Bengali speech recognition based on deep neural network. In: 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox'sBazar, Bangladesh, pp. 1–6 (2019)
Google Scholar
Chaudhuri, S., Roth, J., Ellis, D.P.W., Gallagher, A.C., Kaver, L., Marvin, R., Pantofaru, C., Reale, N., Reid, L.G., Wilson, K.W., Xi, Z.: AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies. CoRR, abs/1808.00606 (2018)
Google Scholar
Benk, S., Elmir, Y., Dennai, A.: A study on automatic speech recognition. 10, 77–85 (2019). https://doi.org/10.6025/jitr/2019/10/3/77-85
Pala, M.: A new human voice recognition system. AJSAT 5, 23–30 (2016)
Google Scholar
Alim, S.A., Rashid, N.K.A.: Some Commonly Used Speech Feature Extraction Algorithms, From Natural to Artificial Intelligence—Algorithms and Applications, Ricardo Lopez-Ruiz, IntechOpen, 12 Dec 2018. https://doi.org/10.5772/intechopen.80419
Music Feature Extraction in Python, https://towardsdatascience.com/extract-features-of-music-75a3f9bc265d. Last access 21 Mar 2020
Bhattacharjee, A., et al.: Bangla voice controlled robot for rescue operation for noisy environment. In: 2016 IEEE Region 10 Conference (TENCON), Singapore, pp. 3284–3288 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh
S. M. Saiful Islam Badhon, Farea Rehnuma Rupon, Md. Habibur Rahaman & Sheikh Abujar

Authors

S. M. Saiful Islam Badhon
View author publications
You can also search for this author in PubMed Google Scholar
Farea Rehnuma Rupon
View author publications
You can also search for this author in PubMed Google Scholar
Md. Habibur Rahaman
View author publications
You can also search for this author in PubMed Google Scholar
Sheikh Abujar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. M. Saiful Islam Badhon .

Editor information

Editors and Affiliations

Faculty of Computers And Information, Cairo University, Giza, Egypt
Aboul Ella Hassanien
CHRIST (Deemed to be University), Bengaluru, Karnataka, India
Siddhartha Bhattacharyya
Institute of Engineering & Management, Kolkata, West Bengal, India
Satyajit Chakrabati
Institute of Engineering & Management, Kolkata, West Bengal, India
Abhishek Bhattacharya
Institute of Engineering & Management, Kolkata, West Bengal, India
Soumi Dutta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saiful Islam Badhon, S.M., Rupon, F.R., Habibur Rahaman, M., Abujar, S. (2021). A Language-Independent Speech Data Collection and Preprocessing Technique. In: Hassanien, A.E., Bhattacharyya, S., Chakrabati, S., Bhattacharya, A., Dutta, S. (eds) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 1300. Springer, Singapore. https://doi.org/10.1007/978-981-33-4367-2_9

Download citation

DOI: https://doi.org/10.1007/978-981-33-4367-2_9
Published: 05 May 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-4366-5
Online ISBN: 978-981-33-4367-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics