A novel approach for speaker diarization system using TMFCC parameterization and Lion optimization

Subba Ramaiah, V.; Rajeswara Rao, R.

doi:10.1007/s11771-017-3678-3

A novel approach for speaker diarization system using TMFCC parameterization and Lion optimization

Published: 16 December 2017

Volume 24, pages 2649–2663, (2017)
Cite this article

Journal of Central South University Aims and scope Submit manuscript

V. Subba Ramaiah¹ &
R. Rajeswara Rao²

4 Citations
Explore all metrics

Abstract

In audio stream containing multiple speakers, speaker diarization aids in ascertaining “who speak when”. This is an unsupervised task as there is no prior information about the speakers. It labels the speech signal conforming to the identity of the speaker, namely, input audio stream is partitioned into homogeneous segments. In this work, we present a novel speaker diarization system using the Tangent weighted Mel frequency cepstral coefficient (TMFCC) as the feature parameter and Lion algorithm for the clustering of the voice activity detected audio streams into particular speaker groups. Thus the two main tasks of the speaker indexing, i.e., speaker segmentation and speaker clustering, are improved. The TMFCC makes use of the low energy frame as well as the high energy frame with more effect, improving the performance of the proposed system. The experiments using the audio signal from the ELSDSR corpus datasets having three speakers, four speakers and five speakers are analyzed for the proposed system. The evaluation of the proposed speaker diarization system based on the tracking distance, tracking time as the evaluation metrics is done and the experimental results show that the speaker diarization system with the TMFCC parameterization and Lion based clustering is found to be superior over existing diarization systems with 95% tracking accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speaker diarization system using MKMFCC parameterization and WLI-fuzzy clustering

Article 20 October 2016

A Novel Audio Segmentation for Audio Diarization

The use of long-term features for GMM- and i-vector-based speaker diarization systems

Article Open access 26 September 2018

References

MOATTAR M H, HOMAYOUNPOUR M M. A reveiw on speaker diarization systems and approaches [J]. Speech Communications. 2012, 54(10): 1065-1103.
Google Scholar
TRANTER S E, DOUGLAS A. Reynolds, an overview of automatic speaker diarization systems [J]. IEEE Transactions on Audio, Speech and Language Processing, 2006, 14(5): 1557–1565.
Article Google Scholar
KENNY P, GUPTA V, STAFYLAKIS T, OUELLET P, ALAM J. Deep neural networks for extracting baum welch statistics for speaker recognition [C]//Proceedings of the Speaker and Language Recognition. 2014: 293-298.
Google Scholar
SAYOUD H, OUAMOUR S, KHENNOUF S. Virtual system of speaker tracking by camera using an audio-based source localization [C]//Proceedings of World Congress on Engineering. 2012, 2.
Google Scholar
HUANG Y, BENESTY J, ELKO G W. Micro phone arrays for video camera steering [M]//Acoustic Signal Processing for Telecommunication. Hingham, MA, USA: Kluwer Academic Publishers, 2000: 239-260.
Google Scholar
CHEN Jian-feng, LOUIS S, WEE S. A new approach for speaker tracking in reverberant environment [J]. Signal Processing, 2002, 82: 1023–1028.
Article MATH Google Scholar
HU M, SHARMA D, DOCLO S, BROOKES M, NAYLOR P A. Speaker Change detection and speaker diarization using spatial information [C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Brisbane, QLD, Australia: IEEE, 2015: 5743-5747.
Google Scholar
MOATTAR M H, HOMAYOUNPOUR M M. Variational conditional random fields for online speaker detection and tracking [J]. Speech Communication, 2012, 54: 763–780.
Article Google Scholar
SUN X, FOOTE J, KIMBER D, MANJUNATH B S. Region of interest extraction and virtual camera control based on panoramic video [J]. IEEE Transactions on Multimedia, 2005, 7(5): 981–990.
Article Google Scholar
CHEN Yun-qiang, RUI Yong, Real-time speaker tracking using particle filter sensor fusion [J]. Proceedings of the IEEE, 2004, 92(3): 485–494.
Article Google Scholar
SWAMY R K, RAMA M K, YEGNANARAYANA B. Determining number of speakers from multi-speaker speech signals using excitation source information [J]. IEEE Signal Processing Letters, 2007, 14(7): 481–484.
Article Google Scholar
PERTILA P. Online blind speech separation using multiple acoustic speaker tracking and time-frequency masking [J]. Computer Speech and Language, 2013, 27: 683–702.
Article Google Scholar
MA Zhong-hong, YANG Yong, GE Qi, DENG Li-jun, XU Zhen-xin, SUN Xu-na. Nonlinear filtering method of zero-order term suppression for improving the image quality in off-axis holography [J]. Optics Communications, 2014, 315: 232–237.
Article Google Scholar
YE Tian, CHEN Zhe, YIN Fu-liang. Distributed Kalman filter-based speaker tracking in microphone array networks [J]. Applied Acoustics, 2015, 89: 71–77.
Article Google Scholar
RAJAKUMAR B. The Lion’s algorithm: A new nature-inspired search algorithm [J]. Procedia Technology, 2012, 6: 126–135.
Article Google Scholar
DUNN R B, REYNOLDS D A, QUATIERI T F. Approaches to speaker detection and tracking in conversational speech [J]. Digital Signal Processing, 2000, 10: 93–112.
Article Google Scholar
DAI Xiao-feng, LAHDESMAKI H, YLI-HARJA O. A stratified beta-gaussian mixture model for clustering genes with multiple data sources [C]//Proceedings of Biocomputation, Bioinformatics, and Biomedical Technologies. Bucharest, Romania: IEEE, 2008: 94-99.
Google Scholar
MARKOVIC I, PETROVIC I. Speaker localization and tracking with a microphone array on a mobile robot using von Mises distribution and particle filtering [J]. Robotics and Autonomous Systems, 2010, 58: 1185–1196.
Article Google Scholar
YEGNANARAYANA B, MAHADEVA PRASANNA S R. Analysis of instantaneous F0 contours from two speakers mixed signal using zero frequency filtering [C]//Proceedings of Acoustics Speech and Signal Processing. Dallas, TX, USA: IEEE, 2010: 5074-5077.
Google Scholar
ALAM M J, OUELLET P, KENNY P, O’SHAUGHNESSY D. Comparative evaluation of feature normalization techniques for speaker verification [J]. Advances in Nonlinear Speech Processing, 2011, 7015: 246–253.
Article Google Scholar
KUMAR K, KIM C, STERN R M. Delta-spectral cepstral Coefficients for robust speech recognition [C]//Proceedings of ICASSP. Prague, Czech: IEEE, 2011: 4784-4787.
Google Scholar
GUPTA V, BOULIANNE G, KENNY P, OUELLET P, DUMOUCHEL P. Speaker diarization of the French broadcast news [C]//Proceedings of ICASSP. Las Vegas, NV, USA: IEEE, 2008: 4365-4368.
Google Scholar
BARRAS C, ZHU Xuan. MEIGNIER S, GAUVAIN J L. Multistage speaker diarization of broadcast news [J]. IEEE Transactions on Audio, Speech and Language Processing. 2006, 14(5): 1505-1512.
Article Google Scholar
MIRO X A, BOZONNET S, EVANS N, FREDOUILLE C, FRIEDLAND G, VINYALS O, DIARIZATION S. Speaker diarization: A review of recent research [J]. IEEE Transactions on Audio, Speech and Language Processing, 2012, 20(2): 356–370.
Article Google Scholar
CAMPBELL W M, STURIM D E, REYNOLDS D A. Support vector machines using GMM supervectors for speaker verification [J]. IEEE Signal Processing Letters, 2006, 13(5): 308–311.
Article Google Scholar
PEELING P, CEMGIL A T, GODSILL S. Bayesian hierarchical models and inference for musical audio processing [C]//Proceedings of IEEE Wireless Pervasive Computing. Las Vegas, NV, USA: IEEE, 2008: 278-282.
Google Scholar
ZHENG Rong, ZHANG Ce, ZHANG Shan-shan, XU Bo. Variational bayes based i-vector for speaker diarization of telephone Conversations [C]//Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP). Florence, Italy, IEEE, 2014: 91-95.
Google Scholar
KENNY P, GUPTA V, STAFYLAKIS T, OUELLET P, ALAM J. Deep neural networks for Baum-Welch statistics for speaker Recognition [C]//Proceedings of Neural Networks for Speaker and Language Modelling, 2014.
Google Scholar
FLSDSR corpus dataset. [2016–05–02]. http://cogsys.compute.dtu. dk/soundshare/elsdsr.zip.

Download references

Author information

Authors and Affiliations

Mahatma Gandhi Institute of Technology, Kokapet, Hyderabad, Telangana, 500075, India
V. Subba Ramaiah
Jawaharlal Nehru Technological University Kakinada, Kakinada, Andhra Pradesh, 535002, India
R. Rajeswara Rao

Authors

V. Subba Ramaiah
View author publications
You can also search for this author in PubMed Google Scholar
R. Rajeswara Rao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to V. Subba Ramaiah.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Subba Ramaiah, V., Rajeswara Rao, R. A novel approach for speaker diarization system using TMFCC parameterization and Lion optimization. J. Cent. South Univ. 24, 2649–2663 (2017). https://doi.org/10.1007/s11771-017-3678-3

Download citation

Received: 05 May 2016
Accepted: 23 September 2016
Published: 16 December 2017
Issue Date: November 2017
DOI: https://doi.org/10.1007/s11771-017-3678-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel approach for speaker diarization system using TMFCC parameterization and Lion optimization

Abstract

Access this article

Similar content being viewed by others

Speaker diarization system using MKMFCC parameterization and WLI-fuzzy clustering

A Novel Audio Segmentation for Audio Diarization

The use of long-term features for GMM- and i-vector-based speaker diarization systems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel approach for speaker diarization system using TMFCC parameterization and Lion optimization

Abstract

Access this article

Similar content being viewed by others

Speaker diarization system using MKMFCC parameterization and WLI-fuzzy clustering

A Novel Audio Segmentation for Audio Diarization

The use of long-term features for GMM- and i-vector-based speaker diarization systems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation