Optimal trained artificial neural network for Telugu speaker diarization

Sethuram, V.; Prasad, Ande; Rao, R. Rajeshwara

doi:10.1007/s12065-020-00378-9

Optimal trained artificial neural network for Telugu speaker diarization

Research Paper
Published: 17 March 2020

Volume 13, pages 631–648, (2020)
Cite this article

Evolutionary Intelligence Aims and scope Submit manuscript

V. Sethuram¹,
Ande Prasad¹ &
R. Rajeshwara Rao²

132 Accesses
4 Citations
Explore all metrics

Abstract

Speaker indexing or diarization is the process of automatically partitioning the conversation involving multiple speakers into homogeneous segments and grouping together all the segments that correspond to the same speaker. So far, certain works have been done under this aspect; still, the need of accurate partitioning process gets lagged under certain criteria. With this in mind, this paper aims to introduce a new speaker indexing or diarization model (Telugu language) that initially involves Mel Frequency Cepstral coefficient based feature extraction. Subsequently, a new Optimized Artificial Neural Network (ANN) is introduced for clustering process. The novelty behind the clustering process is: the training of ANN takes place through optimization logic that updates the weight of ANN by a hybrid concept of Artificial Bee Colony (ABC) and Lion Algorithm (LA). Thereby, the proposed model is named as ANN-ABC-LA model. Finally, the performance of the proposed ANN-ABC-LA model is compared over the state-of-the-art models with respect to different performance measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Birdsong classification based on multi-feature fusion

Article 08 September 2021

References

Sahidullah M, Saha G (2013) A novel windowing technique for efficient computation of MFCC for speaker recognition. IEEE Signal Process Lett 20(2):149–152
Google Scholar
May T, van de Par S, Kohlrausch A (2012) Noise-robust speaker recognition combining missing data techniques and universal background modeling. IEEE Trans Audio Speech Lang Process 20(1):108–121
Google Scholar
Abrol V, Malhotra J (2013) Data dashboard-integrating data mining with data deduplication. Int J Comput Appl 71(22):28–33
Google Scholar
Richardson F, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. IEEE Signal Process Lett 22(10):1671–1675
Google Scholar
Stafylakis T, Kenny P, Alam MJ, Kockmann M (2016) Speaker and channel factors in text-dependent speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 24(1):65–78
Google Scholar
Cumani S, Laface P (2018) Speaker recognition using e–vectors. IEEE/ACM Trans Audio Speech Lang Process 26(4):736–748
Google Scholar
Tang Z, Li L, Wang D, Vipperla R (2017) Collaborative joint training with multitask recurrent model for speech and speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 25(3):493–504
Google Scholar
Li L, Wang D, Zhang C, Zheng TF (2016) Improving short utterance speaker recognition by modeling speech unit classes. IEEE/ACM Trans Audio Speech Lang Process 24(6):1129–1139
Google Scholar
McLaren M, van Leeuwen D (2012) Source-normalized LDA for robust speaker recognition using i-vectors from multiple speech sources. IEEE/ACM Trans Audio Speech Lang Process 20(3):755–766
Google Scholar
Barbari M, Leso L, Rossi G, Simonini S (2013) Use of radio frequency identification active technology to monitor animals in open spaces. Aust J Multi-Discip Eng 10(1):18–25
Google Scholar
Mandasari MI, Saeidi R, McLaren M, van Leeuwen DA (2013) Quality measure functions for calibration of speaker recognition systems in various duration conditions. IEEE/ACM Trans Audio Speech Lang Process 21(11):2425–2438
Google Scholar
Zhang X, Zou X, Sun M, Zheng TF, Jia C, Wang Y (2019) Noise robust speaker recognition based on adaptive frame weighting in GMM for i-vector extraction. IEEE Access 7:27874–27882
Google Scholar
Ferràs M, Madikeri S, Motlicek P, Dey S, Bourlard H (2016) A large-scale open-source acoustic simulator for speaker recognition. IEEE Signal Process Lett 23(4):527–531
Google Scholar
Cumani S, Laface P (2014) Large-scale training of pairwise support vector machines for speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 22(11):1590–1600
Google Scholar
Sarao V, Veritti D, Furino C, Giancipoli E, Alessio G, Boscia F, Lanzetta P (2017) Dexamethasone implant with fixed or individualized regimen in the treatment of diabetic macular oedema: six-month outcomes of the UDBASA study. Acta Ophthalmol 95(4):e255–e260
Google Scholar
Stafylakis T, Alam MJ, Kenny P (2016) Text-dependent speaker recognition with random digit strings. IEEE/ACM Trans Audio Speech Lang Process 24(7):1194–1203
Google Scholar
Diez M, Varona A, Penagarikano M, Rodriguez-Fuentes LJ, Bordel G (2014) On the complementarity of phone posterior probabilities for improved speaker recognition. IEEE Signal Process Lett 21(6):649–652
Google Scholar
Ferrer L, Nandwana MK, McLaren M, Castan D, Lawson A (2019) Toward fail-safe speaker recognition: trial-based calibration with a reject option. IEEE/ACM Trans Audio Speech Lang Process 27(1):140–153
Article Google Scholar
Remmiya R, Abisha C (2018) Artifacts removal in EEG signal using a NARX model based CS learning algorithm. Multim Res 1(1):1–8
Google Scholar
Wagh MB, Gomathi N (2018) Route discovery for vehicular ad hoc networks using modified lion algorithm. Alex Eng J 57(4):3075–3087
Google Scholar
Ghahabi O, Hernando J (2017) Deep learning backend for single and multisession i-vector speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 25(4):807–817
Google Scholar
Yan F, Men A, Yang B, Jiang Z (2016) An improved ranking-based feature enhancement approach for robust speaker recognition. IEEE Access 4:5258–5267
Google Scholar
Cumani S, Laface P (2012) Analysis of large-scale SVM training algorithms for language and speaker recognition. IEEE Trans Audio Speech Lang Process 20(5):1585–1596
Google Scholar
Liu Z, Wu Z, Li T, Li J, Shen C (2018) GMM and CNN hybrid method for short utterance speaker recognition. IEEE Trans Ind Inform 14(7):3244–3252
Google Scholar
Jokinen E, Saeidi R, Kinnunen T, Alku P (2019) Vocal effort compensation for MFCC feature extraction in a shouted versus normal speaker recognition task. Comput Speech Lang 53:1–11
Google Scholar
Alsulaiman M, Mahmood A, Muhammad G (2017) Speaker recognition based on Arabic phonemes. Speech Commun 86:42–51
Google Scholar
Ghahabi O, Hernando J (2018) Restricted Boltzmann machines for vector representation of speech in speaker recognition. Comput Speech Lang 47:16–29
Google Scholar
Franco-Pedroso J, Gonzalez-Rodriguez J (2016) Linguistically-constrained formant-based i-vectors for automatic speaker recognition. Speech Commun 76:61–81
Google Scholar
You CH, Li H, Lee KA (2015) Relevance factor of maximum a posteriori adaptation for GMM–NAP–SVM in speaker and language recognition. Comput Speech Lang 30(1):116–134
Google Scholar
Khosravani A, Homayounpour MM (2017) A PLDA approach for language and text independent speaker recognition. Comput Speech Lang 45:457–474
Google Scholar
Mohan Y, Chee SS, Xin DKP, Foong LP (2016) Artificial neural network for classification of depressive and normal in EEG. In: 2016 IEEE EMBS conference on biomedical engineering and sciences (IECBES), Kuala Lumpur, pp 286–290
Boothalingam R (2018) Optimization using lion algorithm: a biological inspiration from lion’s social behavior. Evol Intell 11(1–2):31–52
Google Scholar
Xu Y, Fan P, Yuan L (2013) A simple and efficient artificial bee colony algorithm. Math Prob Eng 2013:1–9
Google Scholar
https://www.etv.co.in/showsentitys/home/6
Finsterle S, Kowalsky MB (2011) A truncated Levenberg–Marquardt algorithm for the calibration of highly parameterized nonlinear models. Comput Geosci 37(6):731–738
Google Scholar
Fister I, Fister I, Yang X-S, Brest J (2013) A comprehensive review of firefly algorithms. Swarm Evol Comput 13:34–46
Google Scholar
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
Google Scholar
Pandit P, Rao P (2015) Speaker diarization of broadcast news audios

Download references

Author information

Authors and Affiliations

Vikrama simhapuri University, Nellore, Andhra Pradesh, India
V. Sethuram & Ande Prasad
JNTU, Vizayanagaram, Andhra Pradesh, India
R. Rajeshwara Rao

Authors

V. Sethuram
View author publications
You can also search for this author in PubMed Google Scholar
Ande Prasad
View author publications
You can also search for this author in PubMed Google Scholar
R. Rajeshwara Rao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to V. Sethuram.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sethuram, V., Prasad, A. & Rao, R.R. Optimal trained artificial neural network for Telugu speaker diarization. Evol. Intel. 13, 631–648 (2020). https://doi.org/10.1007/s12065-020-00378-9

Download citation

Received: 25 September 2019
Revised: 05 February 2020
Accepted: 23 February 2020
Published: 17 March 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s12065-020-00378-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal trained artificial neural network for Telugu speaker diarization

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Birdsong classification based on multi-feature fusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal trained artificial neural network for Telugu speaker diarization

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Birdsong classification based on multi-feature fusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation