Abstract
Speaker indexing or diarization is the process of automatically partitioning the conversation involving multiple speakers into homogeneous segments and grouping together all the segments that correspond to the same speaker. So far, certain works have been done under this aspect; still, the need of accurate partitioning process gets lagged under certain criteria. With this in mind, this paper aims to introduce a new speaker indexing or diarization model (Telugu language) that initially involves Mel Frequency Cepstral coefficient based feature extraction. Subsequently, a new Optimized Artificial Neural Network (ANN) is introduced for clustering process. The novelty behind the clustering process is: the training of ANN takes place through optimization logic that updates the weight of ANN by a hybrid concept of Artificial Bee Colony (ABC) and Lion Algorithm (LA). Thereby, the proposed model is named as ANN-ABC-LA model. Finally, the performance of the proposed ANN-ABC-LA model is compared over the state-of-the-art models with respect to different performance measures.
Similar content being viewed by others
References
Sahidullah M, Saha G (2013) A novel windowing technique for efficient computation of MFCC for speaker recognition. IEEE Signal Process Lett 20(2):149–152
May T, van de Par S, Kohlrausch A (2012) Noise-robust speaker recognition combining missing data techniques and universal background modeling. IEEE Trans Audio Speech Lang Process 20(1):108–121
Abrol V, Malhotra J (2013) Data dashboard-integrating data mining with data deduplication. Int J Comput Appl 71(22):28–33
Richardson F, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. IEEE Signal Process Lett 22(10):1671–1675
Stafylakis T, Kenny P, Alam MJ, Kockmann M (2016) Speaker and channel factors in text-dependent speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 24(1):65–78
Cumani S, Laface P (2018) Speaker recognition using e–vectors. IEEE/ACM Trans Audio Speech Lang Process 26(4):736–748
Tang Z, Li L, Wang D, Vipperla R (2017) Collaborative joint training with multitask recurrent model for speech and speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 25(3):493–504
Li L, Wang D, Zhang C, Zheng TF (2016) Improving short utterance speaker recognition by modeling speech unit classes. IEEE/ACM Trans Audio Speech Lang Process 24(6):1129–1139
McLaren M, van Leeuwen D (2012) Source-normalized LDA for robust speaker recognition using i-vectors from multiple speech sources. IEEE/ACM Trans Audio Speech Lang Process 20(3):755–766
Barbari M, Leso L, Rossi G, Simonini S (2013) Use of radio frequency identification active technology to monitor animals in open spaces. Aust J Multi-Discip Eng 10(1):18–25
Mandasari MI, Saeidi R, McLaren M, van Leeuwen DA (2013) Quality measure functions for calibration of speaker recognition systems in various duration conditions. IEEE/ACM Trans Audio Speech Lang Process 21(11):2425–2438
Zhang X, Zou X, Sun M, Zheng TF, Jia C, Wang Y (2019) Noise robust speaker recognition based on adaptive frame weighting in GMM for i-vector extraction. IEEE Access 7:27874–27882
Ferràs M, Madikeri S, Motlicek P, Dey S, Bourlard H (2016) A large-scale open-source acoustic simulator for speaker recognition. IEEE Signal Process Lett 23(4):527–531
Cumani S, Laface P (2014) Large-scale training of pairwise support vector machines for speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 22(11):1590–1600
Sarao V, Veritti D, Furino C, Giancipoli E, Alessio G, Boscia F, Lanzetta P (2017) Dexamethasone implant with fixed or individualized regimen in the treatment of diabetic macular oedema: six-month outcomes of the UDBASA study. Acta Ophthalmol 95(4):e255–e260
Stafylakis T, Alam MJ, Kenny P (2016) Text-dependent speaker recognition with random digit strings. IEEE/ACM Trans Audio Speech Lang Process 24(7):1194–1203
Diez M, Varona A, Penagarikano M, Rodriguez-Fuentes LJ, Bordel G (2014) On the complementarity of phone posterior probabilities for improved speaker recognition. IEEE Signal Process Lett 21(6):649–652
Ferrer L, Nandwana MK, McLaren M, Castan D, Lawson A (2019) Toward fail-safe speaker recognition: trial-based calibration with a reject option. IEEE/ACM Trans Audio Speech Lang Process 27(1):140–153
Remmiya R, Abisha C (2018) Artifacts removal in EEG signal using a NARX model based CS learning algorithm. Multim Res 1(1):1–8
Wagh MB, Gomathi N (2018) Route discovery for vehicular ad hoc networks using modified lion algorithm. Alex Eng J 57(4):3075–3087
Ghahabi O, Hernando J (2017) Deep learning backend for single and multisession i-vector speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 25(4):807–817
Yan F, Men A, Yang B, Jiang Z (2016) An improved ranking-based feature enhancement approach for robust speaker recognition. IEEE Access 4:5258–5267
Cumani S, Laface P (2012) Analysis of large-scale SVM training algorithms for language and speaker recognition. IEEE Trans Audio Speech Lang Process 20(5):1585–1596
Liu Z, Wu Z, Li T, Li J, Shen C (2018) GMM and CNN hybrid method for short utterance speaker recognition. IEEE Trans Ind Inform 14(7):3244–3252
Jokinen E, Saeidi R, Kinnunen T, Alku P (2019) Vocal effort compensation for MFCC feature extraction in a shouted versus normal speaker recognition task. Comput Speech Lang 53:1–11
Alsulaiman M, Mahmood A, Muhammad G (2017) Speaker recognition based on Arabic phonemes. Speech Commun 86:42–51
Ghahabi O, Hernando J (2018) Restricted Boltzmann machines for vector representation of speech in speaker recognition. Comput Speech Lang 47:16–29
Franco-Pedroso J, Gonzalez-Rodriguez J (2016) Linguistically-constrained formant-based i-vectors for automatic speaker recognition. Speech Commun 76:61–81
You CH, Li H, Lee KA (2015) Relevance factor of maximum a posteriori adaptation for GMM–NAP–SVM in speaker and language recognition. Comput Speech Lang 30(1):116–134
Khosravani A, Homayounpour MM (2017) A PLDA approach for language and text independent speaker recognition. Comput Speech Lang 45:457–474
Mohan Y, Chee SS, Xin DKP, Foong LP (2016) Artificial neural network for classification of depressive and normal in EEG. In: 2016 IEEE EMBS conference on biomedical engineering and sciences (IECBES), Kuala Lumpur, pp 286–290
Boothalingam R (2018) Optimization using lion algorithm: a biological inspiration from lion’s social behavior. Evol Intell 11(1–2):31–52
Xu Y, Fan P, Yuan L (2013) A simple and efficient artificial bee colony algorithm. Math Prob Eng 2013:1–9
Finsterle S, Kowalsky MB (2011) A truncated Levenberg–Marquardt algorithm for the calibration of highly parameterized nonlinear models. Comput Geosci 37(6):731–738
Fister I, Fister I, Yang X-S, Brest J (2013) A comprehensive review of firefly algorithms. Swarm Evol Comput 13:34–46
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
Pandit P, Rao P (2015) Speaker diarization of broadcast news audios
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sethuram, V., Prasad, A. & Rao, R.R. Optimal trained artificial neural network for Telugu speaker diarization. Evol. Intel. 13, 631–648 (2020). https://doi.org/10.1007/s12065-020-00378-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-020-00378-9