A linear visual assessment tendency based clustering with power normalized cepstral coefficients for audio signal recognition system

Therese, S. Shanthi; Lingam, Chelpa

doi:10.1007/s12652-017-0653-7

A linear visual assessment tendency based clustering with power normalized cepstral coefficients for audio signal recognition system

Original Research
Published: 18 December 2017

(2017)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

S. Shanthi Therese^1,2 &
Chelpa Lingam³

135 Accesses
3 Citations
Explore all metrics

Abstract

Feature extraction in speech signals under the influence of background excitation is a challenging task. In this research, we propose phoneme subspace integrated with the linear visual assessment tendency (LVAT) algorithm to retrieve the audio feature based on spectral depth analysis. LVAT algorithm performs a clustering of different spectral features to define the intensity of signal weight. The Fast Fourier transform (FFT) projects selection of weight estimated samples from the signal for phoneme subspace. The FFT-phoneme subspace combination enhances the feature by analyzing the low, middle and high-frequency signals based on phone subspace weight update. Traditional feature extraction techniques like mel frequency cepstral coefficients, linear predictor cepstral coefficients and power normalized cepstral coefficients are analyzed under different noise conditions and compared with the results of clustering with power normalized cepstral coefficients. The experimental results demonstrate improvement in the performance by comparing the objective measures such as sensitivity, specificity, accuracy and recognition rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Access the cluster tendency by visual methods for robust speech clustering

Article 27 October 2015

Subspace filtering approach based on orthogonal projection for better analysis of stressed speech under clean and noisy environments

Article 03 September 2016

Single-Channel Speech Enhancement Using Single Dimension Change Accelerated Particle Swarm Optimization for Subspace Partitioning

Article 01 March 2023

References

Abdelaziz AH et al (2015) Learning dynamic stream weights for coupled-HMM-based audio-visual speech recognition. IEEE/ACM Trans Audio Speech Lang Process 23:863–876
Google Scholar
Biswas A et al (2015) Hindi phoneme classification using Wiener filtered wavelet packet decomposed periodic and aperiodic acoustic feature. Comput Electr Eng 42:12–22
Article Google Scholar
Fartash M et al (2013) A scale-rate filter selection method in the spectro-temporal domain for phoneme classification. Comput Electr Eng 39:1537–1548
Article Google Scholar
Ferdinand fuhrmann (2015). http://www.dtic.upf.edu/~ffuhrmann/PhD/data/. Accessed 10 Sept 2015
Galluccia L et al (2013) Clustering with a new distance measure based on a dual rooted tree. Inf Sci 251:96–113
Ganapathy S et al (2014) Robust feature extraction using modulation filtering of autoregressive models. IEEE/ACM Trans Audio Speech Lang Process 22:1285–1295
Article Google Scholar
Gao B, Woo WL (2014) Wearable audio monitoring: content-based processing methodology and implementation. IEEE Trans Hum Mach Syst 44:222–233
Article Google Scholar
Gerazov B, Ivanovski Z (2015) Kernel power flow orientation coefficients for noise-robust speech recognition. IEEE/ACM Trans Audio Speech Lang Process 23:407–419
Article Google Scholar
Govindan SM et al (2014) Adaptive wavelet shrinkage for noise robust speaker recognition. Digit Signal Proc 33:180–190
Article Google Scholar
Havens TC, Bezdek JC (2012) An efficient formulation of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE Trans Knowl Data Eng 24:813–822
Article Google Scholar
Hermansky H, Hanson BA, Wakita H (1985) Perceptually based linear predictive analysis of speech. In: Proceedings of IEEE international conference on acoustic, speech, and signal processing, pp 509–512
Hu Y, Loizou P (2007) Subjective evaluation and comparison of speech enhancement algorithms. Speech Commun 49:588–601
Article Google Scholar
Jalalvand A et al (2015)) Robust continuous digit recognition using reservoir computing. Comput Speech Lang 30:135–158
Article Google Scholar
Jensen J, Tan Z-H (2015) Minimum mean-square error estimation of mel-frequency cepstral features—a theoretically consistent approach. IEEE/ACM Trans Audio Speech Lang Process (TASLP) 23:186–197
Article Google Scholar
Joshi V et al (2015) Sub-band based histogram equalization in cepstral domain for speech recognition. Speech Commun 69:46–65
Article Google Scholar
Kallasjoki H et al (2014) Estimating uncertainty to improve exemplar-based feature enhancement for noise robust speech recognition. IEEE/ACM Trans Audio Speech Lang Process 22:368–380
Article Google Scholar
Kim C, Stern RM (2012) Power-normalized cepstral coefficients (PNCC) for robust speech recognition. In: 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4101–4104
Kim C, Stern RM (2016) Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE Trans Audio Speech Lang Process 24:1315–1329
Kopelman NM et al (2015) CLUMPAK: a program for identifying clustering modes and packaging population structure inferences across K. Mol Ecol Resour 15(5):1179–1191. https://doi.org/10.1111/1755-0998.12387
Li Y et al (2013) Feature space generalized variable parameter HMMs for noise robust recognition. In Interspeech, pp 2968–2972
Li J et al (2014) An overview of noise-robust automatic speech recognition. IEEE/ACM Trans Audio Speech Lang Process 22:pp 745–777
Article Google Scholar
Loizou P (2017) NOIZEUS: a noisy speech corpus for evaluation of speech enhancement algorithm. Speech Commun 49:588–601
Moritz N et al (2015) An auditory inspired amplitude modulation filter bank for robust feature extraction in automatic speech recognition. IEEE/ACM Trans Audio Speech Lang Process 23:1926–1937
Google Scholar
Noll AM (1969) Bell Telephone Laboratories, Inc, Pitch determination of human speech by the harmonic product spectrum. The harmonic spectrum, and a maximum likelihood estimate, symposium on computer processing in communications
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Am Assoc Adv Sci 27(6191):1491–1496
Google Scholar
Sainath TN et al (2017) Multichannel signal processing with deep neural networks for automatic speech recognition. IEEE/ACM Trans Audio Speech Lang Process 25(5):965–979
Saxena R, Singh K (2013) Fractional Fourier transform: a novel tool for signal processing. J Indian Inst Sci 85(1):11–26
Seltzer ML et al (2013) An investigation of deep neural networks for noise robust speech recognition. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 7398–7402
Shrawankar U, Thakare VM (2013) Techniques for feature extraction in speech recognition system: a comparative study. arXiv preprint arXiv:1305.1145
Su R et al (2015) Automatic complexity control of generalized variable parameter HMMs for noise robust speech recognition. IEEE/ACM Trans Audio Speech Lang Process 23:102–114
Article Google Scholar
Sun Y et al (2015) Weighted spectral features based on local Hu moments for speech emotion recognition. Biomed Signal Process Control 18:80–90
Article Google Scholar
Tzanetakis G (2015) Music analysis, retrieval and synthesis for audio signal (Marsyas). http://marsyasweb.appspot.com/download/data_sets/. Accessed 11 Sept 2015
Wang L et al (2010) Enhanced visual analysis for cluster tendency assessment and data partitioning. IEEE Trans Knowl Data Eng 22:1401–1414
Article Google Scholar
Wang H et al (2014) An effective image representation method using kernel classification. In: IEEE 26th international conference on tools with artificial intelligence
Yan Y et al (2015) Egocentric daily activity recognition via multitask clustering. IEEE Trans Image Process 24(10):2984–2995
Zheng F, Zhang G, Song Z (2001) Comparison of different implementations of MFCC. J Comput Sci Technol 16:582–589
Article MATH Google Scholar
Zhou J et al (2014) Unsupervised learning of phonemes of whispered speech in a noisy environment based on convolutive non-negative matrix factorization. Inf Sci 257:115–126
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors sincerely thank Mr. Bin Gao for giving access details to use the Standard English Language Speech Database for Speaker Recognition (ELSDS). This data set is used in experimental analysis of this research.

Author information

Authors and Affiliations

Department of Computer Engineering, Ramrao Adik Institute of Technology, Affiliated to the University of Mumbai, Nerul, Navi Mumbai, India
S. Shanthi Therese
Department of Information Technology, Thadomal Shahani Engineering College, University of Mumbai, Bandra, Mumbai, India
S. Shanthi Therese
Pillai HOC College of Engineering & Technology, Affiliated to the University of Mumbai, Rasayani, Raigad, India
Chelpa Lingam

Authors

S. Shanthi Therese
View author publications
You can also search for this author in PubMed Google Scholar
Chelpa Lingam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Shanthi Therese.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Therese, S.S., Lingam, C. A linear visual assessment tendency based clustering with power normalized cepstral coefficients for audio signal recognition system. J Ambient Intell Human Comput (2017). https://doi.org/10.1007/s12652-017-0653-7

Download citation

Received: 27 July 2017
Accepted: 06 December 2017
Published: 18 December 2017
DOI: https://doi.org/10.1007/s12652-017-0653-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A linear visual assessment tendency based clustering with power normalized cepstral coefficients for audio signal recognition system

Abstract

Access this article

Similar content being viewed by others

Access the cluster tendency by visual methods for robust speech clustering

Subspace filtering approach based on orthogonal projection for better analysis of stressed speech under clean and noisy environments

Single-Channel Speech Enhancement Using Single Dimension Change Accelerated Particle Swarm Optimization for Subspace Partitioning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A linear visual assessment tendency based clustering with power normalized cepstral coefficients for audio signal recognition system

Abstract

Access this article

Similar content being viewed by others

Access the cluster tendency by visual methods for robust speech clustering

Subspace filtering approach based on orthogonal projection for better analysis of stressed speech under clean and noisy environments

Single-Channel Speech Enhancement Using Single Dimension Change Accelerated Particle Swarm Optimization for Subspace Partitioning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation