Skip to main content

Deep Learning-Based Automatic Speaker Recognition Using Self-Organized Feature Mapping

  • Conference paper
  • First Online:
High Performance Computing, Smart Devices and Networks (CHSN 2022)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 1087))

  • 133 Accesses

Abstract

Automatic speaker recognition (ASR) plays the major role in many applications including forensics, dictionary learning, voice verification, biometric systems, and so on. The performance of these application depends on efficiency of ASR system. However, the conventional ASR systems were developed using standard machine learning algorithms, which resulted in low recognition performance. Therefore, this work is focused on development of deep learning-based ASR system. Initially, voice features are extracted using Mel-frequency cepstral coefficients (MFCC), which analyzed the spectral properties of various voice samples. Then, self-organized feature map (SOFM) is applied to reduce the number of available features, which selects the best features using Euclidian similarity between features. Further, deep learning convolutional neural network (DLCNN) model is used to train the features and forms the feature database. Finally, a test voice sample is applied to the trained DLCNN model, which recognizes the speaker detail. The simulations carried out on Anaconda (TensorFlow) showed that the proposed ASR-Net system resulted in superior recognition performance as compared to conventional systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ali A, Gravino C (2019) A systematic literature review of software effort prediction using machine learning methods. J Softw Evol Process 31(10):e2211

    Google Scholar 

  2. De Lima TA, Speech C (2019) A survey on automatic speech recognition systems for Portuguese language and its variations. Comput Speech Lang 62:101055

    Article  Google Scholar 

  3. Claus F, Rosales HG, Petrick R, Hain HU (2015) A survey about databases of children’s speech a survey about databases of children’s speech Dresden University of Technology, Chair for System Theory and Speech Technology. INTERSPEECH. 2015, pp. 2410–2414. Available online: https://www.isca-speech.org/archive_v0/archive_papers/interspeech_2013/i13_2410.pdf. Accessed on 15 Mar 2021.

  4. HTK Speech Recognition Toolkit. Available online: http://htk.eng.cam.ac.uk/. Accessed on 2 Sept 2020

  5. Overview of the CMUSphinx Toolkit. Available online: https://cmusphinx.github.io/wiki/tutorialoverview/. Accessed on 2 Sept 2020

  6. Povey D, Ghoshal A, Boulianne G (2011) The Kaldi speech recognition toolkit. IEEE Signal Process Soc, pp 1–4. Available online: http://kaldi.sf.net/. Accessed on 19 July 2020

  7. Open-Source Large Vocabulary CSR Engine Julius. Available online: http://julius.osdn.jp/en_index.php. Accessed on 2 Sept 2020

  8. Sunil Y, Prasanna SRM, Sinha R (2016) Children’s speech recognition under mismatched condition: a review. IETE J Educ 57:96–108

    Article  Google Scholar 

  9. Bhardwaj V, Kadyan V (2020) Deep neural network trained punjabi children speech recognition system using Kaldi toolkit. In :Proceedings of the 2020 IEEE 5th international conference on computing communication and automation (ICCCA), Greater Noida, India, 30–31 Oct 2020, pp 374–378

    Google Scholar 

  10. Claus F, Rosales HG, Petrick R, Hain H (2013) A survey about ASR for children. ISCA Arch, pp 26–30. Availableonline: https://www.isca-speech.org/archive_v0/slate_2013/papers/sl13_026.pdf. Accessed on 5 July 2021

  11. Kathania HK, Kadiri SR, Alku P, Kurimo M (2021) Spectral modification for recognition of children’s speech under mismatched conditions. In: Proceedings of the 23rd Nordic conference on computational linguistics (NoDaLiDa); Linköping University Electronic Press, Linköping, Sweden, pp 94–100. Available online: https://aclanthology.org/2021.nodalida-main.10. Accessed on 5 Sept 2021

  12. Madhavi MC, Patil HA (2019) Vocal tract length normalization using a Gaussian mixture model framework for query-by-example spoken term detection. Comput Speech Lang 58:175–202

    Article  Google Scholar 

  13. Kathania HK, Kadiri SR, Alku P, Kurimo M (2021) A formant modification method for improved ASR of children’s speech. Speech Commun 136:98–106

    Article  Google Scholar 

  14. Tsao Y, Lai YH (2016) Generalized maximum a posteriori spectral amplitude estimation for speech enhancement. Speech Commun 76:112–126

    Article  Google Scholar 

  15. Bhardwaj V, Kukreja V (2021) Effect of pitch enhancement in Punjabi children’s speech recognition system under disparate acoustic conditions. Appl Acoust 177:107918

    Article  Google Scholar 

  16. Bhardwaj V, Kukreja V, Singh A (2021) Usage of prosody modification and acoustic adaptation for robust automatic speech recognition (ASR) system. Rev d’Intell Artif 35:235–242

    Google Scholar 

  17. Takaki S, Kim S, Yamagishi J (2016) Speaker adaptation of various components in deep neural network based speech synthesis. In: Speech synthesis workshop, pp 153–159. Available online: https://206.189.82.22/archive_v0/SSW_2016/pdfs/ssw9_PS2-5_Takaki.pdf. Accessed on 15 Apr 2021

  18. Kathania HK, Kadiri SR, Alku P, Kurimo M (2021) Using data augmentation and time-scale modification to improve asr of children’s speech in noisy environments. Appl Sci 11:8420

    Article  Google Scholar 

  19. Kaur H, Bhardwaj V, Kadyan V (2021) Punjabi children speech recognition system under mismatch conditions using discriminative techniques. In: Innovations in computer science and engineering. Springer, Singapore, pp 195–202

    Google Scholar 

  20. Klejch O, Fainberg J, Bell P, Renals S (2019) Speaker adaptive training using model agnostic meta-learning. In: Proceedings of the 2019 IEEE automatic speech recognition and understanding workshop (ASRU), Sentosa, Singapore, 14–18 Dec 2019, pp 881–888

    Google Scholar 

  21. Bhardwaj V, Bala S, Kadyan V, Kukreja V (2020) Development of robust automatic speech recognition system for children’s using Kaldi toolkit. In: Proceedings of the second international conference on inventive research in computing applications (ICIRCA-2020), Coimbatore, India, 15–17 July 2020, pp 10–13

    Google Scholar 

  22. Bala S, Kadyan V, Bhardwaj V (2021) Bottleneck feature extraction in punjabi adult speech recognition system. In: Innovations in computer science and engineering. Springer, Singapore, pp 493–501

    Google Scholar 

  23. Shivakumar PG, Georgiou P (2020) Transfer learning from adult to children for speech recognition: evaluation, analysis and recommendations. Comput Speech Lang 63:101077

    Article  Google Scholar 

  24. Shahnawazuddin S, Bandarupalli TS, Chakravarthy R (2020) Improving automatic speech recognition by classifying adult and child speakers into separate groups using speech rate rhythmicity parameter. In: Proceedings of the international conference on signal processing and communications (SPCOM), Bangalore, India, 28 Aug 2020, pp 1–5

    Google Scholar 

  25. Kathania HK, Kadiri SR, Alku P, Kurimo M (2020) Study of formant modification for children ASR. In: Proceedings of the international conference on acoustics, speech and signal processing (ICASSP), Virtual Barcelona, 4–8 May 2020, pp 7424–7428

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. V. P. R. Prasad .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Preethi, K., Prasad, C.V.P.R. (2024). Deep Learning-Based Automatic Speaker Recognition Using Self-Organized Feature Mapping. In: Malhotra, R., Sumalatha, L., Yassin, S.M.W., Patgiri, R., Muppalaneni, N.B. (eds) High Performance Computing, Smart Devices and Networks. CHSN 2022. Lecture Notes in Electrical Engineering, vol 1087. Springer, Singapore. https://doi.org/10.1007/978-981-99-6690-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-6690-5_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-6689-9

  • Online ISBN: 978-981-99-6690-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics