Skip to main content

A Language-Independent Speech Data Collection and Preprocessing Technique

  • Conference paper
  • First Online:
Emerging Technologies in Data Mining and Information Security

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1300))

  • 853 Accesses

Abstract

Virtual assistant or human-like robot has become the most attractive technology nowadays., where we badly need communication. And verbal communication is the most comfortable one. Here, it comes with the necessity of voice recognition system. Recently, researchers are really focusing on developing voice recognition systems with the help of machine learning and deep learning algorithms. And for that, researchers need a large amount of audio data. In this research, we will discuss from collecting audio files from different speakers to represent them into numeric formation so that it becomes possible to apply machine learning or deep learning algorithm on it. Its need some requirement as well as specific formation, instruction to the speaker, organized the speeches and making the scripts are the most important things. And all the description in this work has been described with experience of our own work with speech data. The dataset of this work has exactly 10,992 speech data of 1000 unique words of more than 50 speakers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Shadiev, R., Hwang, W.-Y., Huang, Y.-M., Liu, C.-J.: Investigating applications of speech-to-text recognition technology for a face-to-face seminar to assist learning of non-native English-speaking participants Technology. Pedag Educ 25(1), 119–134 (2016)

    Article  Google Scholar 

  2. Bansal, S., Kamper, H., Livescu, K., Lopez, A., Goldwater, S.: Pre-training on high-resource speech recognition improves low-resource speech-to-text translation. CoRR, abs/1809.01431 (2018)

    Google Scholar 

  3. Bansal, S., Kamper, H., Lopez, A., Goldwater, S.: Towards speech-to-text translation without speech recognition. CoRR, abs/1702.03856 (2017)

    Google Scholar 

  4. Mukherjee, H., Phadikar, S., Rakshit, P., Roy,K.:REARC-a Bangla Phoneme recognizer. In: 2016 International Conference on Accessibility to Digital World (ICADW), Guwahati, 2016, pp. 177–180

    Google Scholar 

  5. Saurav, J., Amin, S., Kibria, S., Rahman, M.: Bangla Speech Recognition for Voice Search. 1–4 (2018). https://doi.org/10.1109/ICBSLP.2018.8554944

  6. Reddy, K.: Improved HMM-based mixed-language (Telugu–Hindi) polyglot speech synthesis. In: Advances in Communication, Signal Processing, VLSI, and Embedded Systems, pp. 279–287. Springer Singapore (2020)

    Google Scholar 

  7. Modale, R.: A review: Devnagri speech to text for Marathwada Region. In: Proceedings of International Conference on Wireless Communication, pp. 525–532. Springer Singapore (2020)

    Google Scholar 

  8. Ying, W., Zhang, L., Deng, H.: Sichuan dialect speech recognition with deep LSTM network. Front. Comput. Sci. 14, 378–387 (2020)

    Article  Google Scholar 

  9. Warden, P.: Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition (2018)

    Google Scholar 

  10. Amin, M.A.A., Islam, M.T., Kibria, S., Rahman, M.S.: Continuous Bengali speech recognition based on deep neural network. In: 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox'sBazar, Bangladesh, pp. 1–6 (2019)

    Google Scholar 

  11. Chaudhuri, S., Roth, J., Ellis, D.P.W., Gallagher, A.C., Kaver, L., Marvin, R., Pantofaru, C., Reale, N., Reid, L.G., Wilson, K.W., Xi, Z.: AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies. CoRR, abs/1808.00606 (2018)

    Google Scholar 

  12. Benk, S., Elmir, Y., Dennai, A.: A study on automatic speech recognition. 10, 77–85 (2019). https://doi.org/10.6025/jitr/2019/10/3/77-85

  13. Pala, M.: A new human voice recognition system. AJSAT 5, 23–30 (2016)

    Google Scholar 

  14. Alim, S.A., Rashid, N.K.A.: Some Commonly Used Speech Feature Extraction Algorithms, From Natural to Artificial Intelligence—Algorithms and Applications, Ricardo Lopez-Ruiz, IntechOpen, 12 Dec 2018. https://doi.org/10.5772/intechopen.80419

  15. Music Feature Extraction in Python, https://towardsdatascience.com/extract-features-of-music-75a3f9bc265d. Last access 21 Mar 2020

  16. Bhattacharjee, A., et al.: Bangla voice controlled robot for rescue operation for noisy environment. In: 2016 IEEE Region 10 Conference (TENCON), Singapore, pp. 3284–3288 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. M. Saiful Islam Badhon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Saiful Islam Badhon, S.M., Rupon, F.R., Habibur Rahaman, M., Abujar, S. (2021). A Language-Independent Speech Data Collection and Preprocessing Technique. In: Hassanien, A.E., Bhattacharyya, S., Chakrabati, S., Bhattacharya, A., Dutta, S. (eds) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 1300. Springer, Singapore. https://doi.org/10.1007/978-981-33-4367-2_9

Download citation

Publish with us

Policies and ethics