Abstract
Automatic speaker recognition (ASR) plays the major role in many applications including forensics, dictionary learning, voice verification, biometric systems, and so on. The performance of these application depends on efficiency of ASR system. However, the conventional ASR systems were developed using standard machine learning algorithms, which resulted in low recognition performance. Therefore, this work is focused on development of deep learning-based ASR system. Initially, voice features are extracted using Mel-frequency cepstral coefficients (MFCC), which analyzed the spectral properties of various voice samples. Then, self-organized feature map (SOFM) is applied to reduce the number of available features, which selects the best features using Euclidian similarity between features. Further, deep learning convolutional neural network (DLCNN) model is used to train the features and forms the feature database. Finally, a test voice sample is applied to the trained DLCNN model, which recognizes the speaker detail. The simulations carried out on Anaconda (TensorFlow) showed that the proposed ASR-Net system resulted in superior recognition performance as compared to conventional systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ali A, Gravino C (2019) A systematic literature review of software effort prediction using machine learning methods. J Softw Evol Process 31(10):e2211
De Lima TA, Speech C (2019) A survey on automatic speech recognition systems for Portuguese language and its variations. Comput Speech Lang 62:101055
Claus F, Rosales HG, Petrick R, Hain HU (2015) A survey about databases of children’s speech a survey about databases of children’s speech Dresden University of Technology, Chair for System Theory and Speech Technology. INTERSPEECH. 2015, pp. 2410–2414. Available online: https://www.isca-speech.org/archive_v0/archive_papers/interspeech_2013/i13_2410.pdf. Accessed on 15 Mar 2021.
HTK Speech Recognition Toolkit. Available online: http://htk.eng.cam.ac.uk/. Accessed on 2 Sept 2020
Overview of the CMUSphinx Toolkit. Available online: https://cmusphinx.github.io/wiki/tutorialoverview/. Accessed on 2 Sept 2020
Povey D, Ghoshal A, Boulianne G (2011) The Kaldi speech recognition toolkit. IEEE Signal Process Soc, pp 1–4. Available online: http://kaldi.sf.net/. Accessed on 19 July 2020
Open-Source Large Vocabulary CSR Engine Julius. Available online: http://julius.osdn.jp/en_index.php. Accessed on 2 Sept 2020
Sunil Y, Prasanna SRM, Sinha R (2016) Children’s speech recognition under mismatched condition: a review. IETE J Educ 57:96–108
Bhardwaj V, Kadyan V (2020) Deep neural network trained punjabi children speech recognition system using Kaldi toolkit. In :Proceedings of the 2020 IEEE 5th international conference on computing communication and automation (ICCCA), Greater Noida, India, 30–31 Oct 2020, pp 374–378
Claus F, Rosales HG, Petrick R, Hain H (2013) A survey about ASR for children. ISCA Arch, pp 26–30. Availableonline: https://www.isca-speech.org/archive_v0/slate_2013/papers/sl13_026.pdf. Accessed on 5 July 2021
Kathania HK, Kadiri SR, Alku P, Kurimo M (2021) Spectral modification for recognition of children’s speech under mismatched conditions. In: Proceedings of the 23rd Nordic conference on computational linguistics (NoDaLiDa); Linköping University Electronic Press, Linköping, Sweden, pp 94–100. Available online: https://aclanthology.org/2021.nodalida-main.10. Accessed on 5 Sept 2021
Madhavi MC, Patil HA (2019) Vocal tract length normalization using a Gaussian mixture model framework for query-by-example spoken term detection. Comput Speech Lang 58:175–202
Kathania HK, Kadiri SR, Alku P, Kurimo M (2021) A formant modification method for improved ASR of children’s speech. Speech Commun 136:98–106
Tsao Y, Lai YH (2016) Generalized maximum a posteriori spectral amplitude estimation for speech enhancement. Speech Commun 76:112–126
Bhardwaj V, Kukreja V (2021) Effect of pitch enhancement in Punjabi children’s speech recognition system under disparate acoustic conditions. Appl Acoust 177:107918
Bhardwaj V, Kukreja V, Singh A (2021) Usage of prosody modification and acoustic adaptation for robust automatic speech recognition (ASR) system. Rev d’Intell Artif 35:235–242
Takaki S, Kim S, Yamagishi J (2016) Speaker adaptation of various components in deep neural network based speech synthesis. In: Speech synthesis workshop, pp 153–159. Available online: https://206.189.82.22/archive_v0/SSW_2016/pdfs/ssw9_PS2-5_Takaki.pdf. Accessed on 15 Apr 2021
Kathania HK, Kadiri SR, Alku P, Kurimo M (2021) Using data augmentation and time-scale modification to improve asr of children’s speech in noisy environments. Appl Sci 11:8420
Kaur H, Bhardwaj V, Kadyan V (2021) Punjabi children speech recognition system under mismatch conditions using discriminative techniques. In: Innovations in computer science and engineering. Springer, Singapore, pp 195–202
Klejch O, Fainberg J, Bell P, Renals S (2019) Speaker adaptive training using model agnostic meta-learning. In: Proceedings of the 2019 IEEE automatic speech recognition and understanding workshop (ASRU), Sentosa, Singapore, 14–18 Dec 2019, pp 881–888
Bhardwaj V, Bala S, Kadyan V, Kukreja V (2020) Development of robust automatic speech recognition system for children’s using Kaldi toolkit. In: Proceedings of the second international conference on inventive research in computing applications (ICIRCA-2020), Coimbatore, India, 15–17 July 2020, pp 10–13
Bala S, Kadyan V, Bhardwaj V (2021) Bottleneck feature extraction in punjabi adult speech recognition system. In: Innovations in computer science and engineering. Springer, Singapore, pp 493–501
Shivakumar PG, Georgiou P (2020) Transfer learning from adult to children for speech recognition: evaluation, analysis and recommendations. Comput Speech Lang 63:101077
Shahnawazuddin S, Bandarupalli TS, Chakravarthy R (2020) Improving automatic speech recognition by classifying adult and child speakers into separate groups using speech rate rhythmicity parameter. In: Proceedings of the international conference on signal processing and communications (SPCOM), Bangalore, India, 28 Aug 2020, pp 1–5
Kathania HK, Kadiri SR, Alku P, Kurimo M (2020) Study of formant modification for children ASR. In: Proceedings of the international conference on acoustics, speech and signal processing (ICASSP), Virtual Barcelona, 4–8 May 2020, pp 7424–7428
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Preethi, K., Prasad, C.V.P.R. (2024). Deep Learning-Based Automatic Speaker Recognition Using Self-Organized Feature Mapping. In: Malhotra, R., Sumalatha, L., Yassin, S.M.W., Patgiri, R., Muppalaneni, N.B. (eds) High Performance Computing, Smart Devices and Networks. CHSN 2022. Lecture Notes in Electrical Engineering, vol 1087. Springer, Singapore. https://doi.org/10.1007/978-981-99-6690-5_10
Download citation
DOI: https://doi.org/10.1007/978-981-99-6690-5_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6689-9
Online ISBN: 978-981-99-6690-5
eBook Packages: Computer ScienceComputer Science (R0)