Deep Learning-Based Automatic Speaker Recognition Using Self-Organized Feature Mapping

Preethi, K.; Prasad, C. V. P. R.

doi:10.1007/978-981-99-6690-5_10

K. Preethi⁴¹ &
C. V. P. R. Prasad⁴¹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 1087))

Included in the following conference series:

International Conference on Computer Vision, High-Performance Computing, Smart Devices, and Networks

133 Accesses

Abstract

Automatic speaker recognition (ASR) plays the major role in many applications including forensics, dictionary learning, voice verification, biometric systems, and so on. The performance of these application depends on efficiency of ASR system. However, the conventional ASR systems were developed using standard machine learning algorithms, which resulted in low recognition performance. Therefore, this work is focused on development of deep learning-based ASR system. Initially, voice features are extracted using Mel-frequency cepstral coefficients (MFCC), which analyzed the spectral properties of various voice samples. Then, self-organized feature map (SOFM) is applied to reduce the number of available features, which selects the best features using Euclidian similarity between features. Further, deep learning convolutional neural network (DLCNN) model is used to train the features and forms the feature database. Finally, a test voice sample is applied to the trained DLCNN model, which recognizes the speaker detail. The simulations carried out on Anaconda (TensorFlow) showed that the proposed ASR-Net system resulted in superior recognition performance as compared to conventional systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ali A, Gravino C (2019) A systematic literature review of software effort prediction using machine learning methods. J Softw Evol Process 31(10):e2211
Google Scholar
De Lima TA, Speech C (2019) A survey on automatic speech recognition systems for Portuguese language and its variations. Comput Speech Lang 62:101055
Article Google Scholar
Claus F, Rosales HG, Petrick R, Hain HU (2015) A survey about databases of children’s speech a survey about databases of children’s speech Dresden University of Technology, Chair for System Theory and Speech Technology. INTERSPEECH. 2015, pp. 2410–2414. Available online: https://www.isca-speech.org/archive_v0/archive_papers/interspeech_2013/i13_2410.pdf. Accessed on 15 Mar 2021.
HTK Speech Recognition Toolkit. Available online: http://htk.eng.cam.ac.uk/. Accessed on 2 Sept 2020
Overview of the CMUSphinx Toolkit. Available online: https://cmusphinx.github.io/wiki/tutorialoverview/. Accessed on 2 Sept 2020
Povey D, Ghoshal A, Boulianne G (2011) The Kaldi speech recognition toolkit. IEEE Signal Process Soc, pp 1–4. Available online: http://kaldi.sf.net/. Accessed on 19 July 2020
Open-Source Large Vocabulary CSR Engine Julius. Available online: http://julius.osdn.jp/en_index.php. Accessed on 2 Sept 2020
Sunil Y, Prasanna SRM, Sinha R (2016) Children’s speech recognition under mismatched condition: a review. IETE J Educ 57:96–108
Article Google Scholar
Bhardwaj V, Kadyan V (2020) Deep neural network trained punjabi children speech recognition system using Kaldi toolkit. In :Proceedings of the 2020 IEEE 5th international conference on computing communication and automation (ICCCA), Greater Noida, India, 30–31 Oct 2020, pp 374–378
Google Scholar
Claus F, Rosales HG, Petrick R, Hain H (2013) A survey about ASR for children. ISCA Arch, pp 26–30. Availableonline: https://www.isca-speech.org/archive_v0/slate_2013/papers/sl13_026.pdf. Accessed on 5 July 2021
Kathania HK, Kadiri SR, Alku P, Kurimo M (2021) Spectral modification for recognition of children’s speech under mismatched conditions. In: Proceedings of the 23rd Nordic conference on computational linguistics (NoDaLiDa); Linköping University Electronic Press, Linköping, Sweden, pp 94–100. Available online: https://aclanthology.org/2021.nodalida-main.10. Accessed on 5 Sept 2021
Madhavi MC, Patil HA (2019) Vocal tract length normalization using a Gaussian mixture model framework for query-by-example spoken term detection. Comput Speech Lang 58:175–202
Article Google Scholar
Kathania HK, Kadiri SR, Alku P, Kurimo M (2021) A formant modification method for improved ASR of children’s speech. Speech Commun 136:98–106
Article Google Scholar
Tsao Y, Lai YH (2016) Generalized maximum a posteriori spectral amplitude estimation for speech enhancement. Speech Commun 76:112–126
Article Google Scholar
Bhardwaj V, Kukreja V (2021) Effect of pitch enhancement in Punjabi children’s speech recognition system under disparate acoustic conditions. Appl Acoust 177:107918
Article Google Scholar
Bhardwaj V, Kukreja V, Singh A (2021) Usage of prosody modification and acoustic adaptation for robust automatic speech recognition (ASR) system. Rev d’Intell Artif 35:235–242
Google Scholar
Takaki S, Kim S, Yamagishi J (2016) Speaker adaptation of various components in deep neural network based speech synthesis. In: Speech synthesis workshop, pp 153–159. Available online: https://206.189.82.22/archive_v0/SSW_2016/pdfs/ssw9_PS2-5_Takaki.pdf. Accessed on 15 Apr 2021
Kathania HK, Kadiri SR, Alku P, Kurimo M (2021) Using data augmentation and time-scale modification to improve asr of children’s speech in noisy environments. Appl Sci 11:8420
Article Google Scholar
Kaur H, Bhardwaj V, Kadyan V (2021) Punjabi children speech recognition system under mismatch conditions using discriminative techniques. In: Innovations in computer science and engineering. Springer, Singapore, pp 195–202
Google Scholar
Klejch O, Fainberg J, Bell P, Renals S (2019) Speaker adaptive training using model agnostic meta-learning. In: Proceedings of the 2019 IEEE automatic speech recognition and understanding workshop (ASRU), Sentosa, Singapore, 14–18 Dec 2019, pp 881–888
Google Scholar
Bhardwaj V, Bala S, Kadyan V, Kukreja V (2020) Development of robust automatic speech recognition system for children’s using Kaldi toolkit. In: Proceedings of the second international conference on inventive research in computing applications (ICIRCA-2020), Coimbatore, India, 15–17 July 2020, pp 10–13
Google Scholar
Bala S, Kadyan V, Bhardwaj V (2021) Bottleneck feature extraction in punjabi adult speech recognition system. In: Innovations in computer science and engineering. Springer, Singapore, pp 493–501
Google Scholar
Shivakumar PG, Georgiou P (2020) Transfer learning from adult to children for speech recognition: evaluation, analysis and recommendations. Comput Speech Lang 63:101077
Article Google Scholar
Shahnawazuddin S, Bandarupalli TS, Chakravarthy R (2020) Improving automatic speech recognition by classifying adult and child speakers into separate groups using speech rate rhythmicity parameter. In: Proceedings of the international conference on signal processing and communications (SPCOM), Bangalore, India, 28 Aug 2020, pp 1–5
Google Scholar
Kathania HK, Kadiri SR, Alku P, Kurimo M (2020) Study of formant modification for children ASR. In: Proceedings of the international conference on acoustics, speech and signal processing (ICASSP), Virtual Barcelona, 4–8 May 2020, pp 7424–7428
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Malla Reddy Engineering College for Women (UGC Autonomous), Hyderabad, Telangana State, India
K. Preethi & C. V. P. R. Prasad

Authors

K. Preethi
View author publications
You can also search for this author in PubMed Google Scholar
C. V. P. R. Prasad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to C. V. P. R. Prasad .

Editor information

Editors and Affiliations

Delhi Technological University, Delhi, India
Ruchika Malhotra
Jawaharlal Nehru Technological University Kakinada, Kakinada, Andhra Pradesh, India
L. Sumalatha
Universiti Teknikal Malaysia Melaka, Melaka, Malaysia
S. M. Warusia Yassin
National Institute of Technology Silchar, Silchar, Assam, India
Ripon Patgiri
National Institute of Technology Silchar, Silchar, Assam, India
Naresh Babu Muppalaneni

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Preethi, K., Prasad, C.V.P.R. (2024). Deep Learning-Based Automatic Speaker Recognition Using Self-Organized Feature Mapping. In: Malhotra, R., Sumalatha, L., Yassin, S.M.W., Patgiri, R., Muppalaneni, N.B. (eds) High Performance Computing, Smart Devices and Networks. CHSN 2022. Lecture Notes in Electrical Engineering, vol 1087. Springer, Singapore. https://doi.org/10.1007/978-981-99-6690-5_10

Download citation

DOI: https://doi.org/10.1007/978-981-99-6690-5_10
Published: 02 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6689-9
Online ISBN: 978-981-99-6690-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics