An Improved Endpoint Detection Algorithm Based on MFCC Cosine Value

Cao, Danyang; Gao, Xue; Gao, Lei

doi:10.1007/s11277-017-3958-0

An Improved Endpoint Detection Algorithm Based on MFCC Cosine Value

Published: 20 January 2017

Volume 95, pages 2073–2090, (2017)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

Danyang Cao¹,
Xue Gao¹ &
Lei Gao¹

535 Accesses
11 Citations
Explore all metrics

Abstract

Endpoint detection is one of the most important steps in speech recognition. In a high SNR environment, the algorithm based on short-time energy and zero rate could be used. But when the SNR is low, this method may not be accurate. Some researchers proposed an algorithm which is based on MFCC Euclidean distance. It has a better performance in a noise environment. But that algorithm needs two thresholds to find the start and end point. However, when the values of two thresholds are not suitable, the detected result could be extremely bad. In this paper, we proposed an improved algorithm which is based on MFCC cosine value. This method can reduce errors, since it only needs one single threshold. The benefit of this improved algorithm is that the result can surely contain the real voice component. According to the experiment data, this improved algorithm can improve the speech recognition rate by 10% even in noise environment (SNR = 0). Thus, it proved that this improved methods has better robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of methods for time series change point detection

Article 08 September 2016

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

References

Beh, J., Baran, R. H., & Ko, H. (2006). Dual channel based speech enhancement using novelty filter for robust speech recognition in automobile environment. IEEE Transactions on Consumer Electronics, 52(2), 583–589.
Article Google Scholar
Beh, J., & Ko, H. (2003). Spectral subtraction using spectral harmonics for robust speech recognition in car environments. In Computational science, vol. 2660 of lecture notes in computer science (pp. 1109–1116). Springer.
Zhang, J., Zhang, D., & Cui, L. (2015). One speech endpoint detection with a robust adaptive threshold. Xi’an University of Electronic Technology (10), 42(5), 115–119.
Google Scholar
Wilpon, J. G., & Rabiner, L. R. (1987). Application of hidden Markov models to automatic speech endpoint detection. Computer Speech & Language, 2(3–4), 321–341.
Article Google Scholar
Wu, B.-F., & Wang, K.-C. (2005). Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments. IEEE Transactions on Speech and Audio Processing, 13(5), 762–774.
Article Google Scholar
Marcheret, E., Chu, S., Goel, V., & Potamianos, G. (2004). “Ef-ficient Likelihood Computation in Multi-Stream HMM Based Audio-Visual Speech Recognition”, in Int (p. 2004). Speech and Language Processing: Conf.
Google Scholar
Povey, D., & Woodland, P. C. (2002). Minimum phone error and i-smoothing for improved discriminative training. In Proceedings of the ICASSP.
Povey, D., Kingsbury, B., Mangu, L., Saon, G., Soltau, H., & Zweig, G. (2005). fMPE: Discriminatively trained features for speech recognition. In Proceedings of the ICASSP.
Huang, J., & Povey, D. (2005). Discriminatively trained features using fMPE for multi-stream audio–visual speech recognition. In Proceedings of the interspeech.
Huang, J., & Visweswariah, K. (2009). Combined discriminative training for multi-stream HMM-based audio–visual speech recognition. In Proceedings of the interspeech.
Shen, J. L., Hung J. W., & Lee L. S. (1998). Robust entropy-based endpoint detection for speech recognition in noisy environments. In International conference on spoken language processing (pp. 232–238), Sydney, Australia.
Medina, C. A., & Alcaim, A. (2008). Wavelet denoising of speech using neural networks for threshold election. Electronics Letter, 39(25), 1869–1871.
Article Google Scholar
Hwanga, I., Parkb, H.-M., & Changa, J.-H. (2016). Ensemble of deep neural networks using acoustic environment classification for statistical model-based voice activity detection. Computer Speech & Language, 38, 1–12.
Article Google Scholar
Wilpon, J. C., Rabiner, L. R., & Martin, T. (1984). An improved word-detection algorithm for telephone-quality speech incorporating both syntactic and semantic constraints. AT&T Bell Laboratories Technical Journal, 63, 479–498.
Article Google Scholar
Chengalvarayan, R. (1999). Robust energy normalization using speech/non-speech discriminator for German connected digit recognition. In: Proceedings of the Euro speech 99 (pp. 61–64), Budapest, Hungary.
Haigh, J. A., & Mason, J. S. (1993). Robust voice activity detection using cepstral features. In Proceedings of the IEEE TENCON (pp. 321–324).
Zhang, R., & Cui, H. (2005). Study endpoint detection algorithm based on short-term energy. Audio Engineering, 7, 52–54.
Google Scholar
Wu, B.–F., & Wang, K.–C. (2005). Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments. IEEE Transactions on Speech and Audio Processing, 13(5), 762–774.
Article Google Scholar
Liu, H., Li, X., Xu, B., & Jiang, N. (2008). The summary and outlook for speech signal endpoint detection. Computer Application sand Research (10), 25(8), 2278–2283.
Google Scholar
Rabiner, L. R., & Sambur, M. R. (1977). Voiced-unvoiced-silence detection using the Itakura LPC distance measure. In Proceedings of the ICASSP (pp. 323–326).
Haign, J. A., & Mason, J. S. (1993). Robust voice activity detection using cepstral features. In Proceedings of the IEEE TEN-CON (pp. 321–324).
Chengalvarayan, R. (1999). Robust energy normalization using speech/non-speech discriminator for German connected digit recognition. In Proceedings of the Euro speech (pp. 61–64).
Voice activity detector (VAD) for adaptive multi-rate (AMR) speech traffic channels, ETSI EN 301 708 recommendation, ETSI, 1999.
Speech processing, transmission and quality aspects (STQ), distributed speech recognition; front-end feature extraction algorithm; compression algorithm, ETSI ES 202 050 recommendation, ETSI, 2002.
Yang, L. (2015). Voice endpoint detection based on MFCC distance. Information and Communication, 7, 31–32.
Han, H., Wang, B., & Duan, S. (2014). Voice activity detection technology research and development. Computer Applications and Research, 4, 1220–1226.
Google Scholar
Shu, Q., & Li, Y. (2007). Speech endpoint detection based on MFCC. Communications Technology, 40(11), 374–375.
Wang, H., Xu, Y., & Li, M. (2011). Study on the MFCC similarity-based voice activity detection algorithm. In 2nd International conference on artificial intelligence.
Kotta, M., & Preen, R. (2006). Speech enhancement in non-stationary noise environments using noise properties. Speech Communication, 48(1), 96–109.
Article Google Scholar
Liu, J., Xu, Z., Zheng, Z., & Cheng, Q. (2005). DTW-based speech recognition and speaker recognition feature selection. Pattern Recognition and Artificial Intelligence, 18(1), 50–54.

Download references

Acknowledgements

This paper is supported by the National Natural Science Foundation of China (41471303), training project for outstanding young teachers of North China University of Technology, Special Research Foundation of North China University of Technology (PXM2017_014212_000014), Beijing Natural Science Foundation (4162022), and advantage disciplinary projects of North China University of Technology.

Author information

Authors and Affiliations

College of Computer Science and Technology, North China University of Technology, Beijing, 100144, China
Danyang Cao, Xue Gao & Lei Gao

Authors

Danyang Cao
View author publications
You can also search for this author in PubMed Google Scholar
Xue Gao
View author publications
You can also search for this author in PubMed Google Scholar
Lei Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Danyang Cao.

Ethics declarations

Conflict of interest

The authors confirm that this article content has no conflicts of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, D., Gao, X. & Gao, L. An Improved Endpoint Detection Algorithm Based on MFCC Cosine Value. Wireless Pers Commun 95, 2073–2090 (2017). https://doi.org/10.1007/s11277-017-3958-0

Download citation

Published: 20 January 2017
Issue Date: August 2017
DOI: https://doi.org/10.1007/s11277-017-3958-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Improved Endpoint Detection Algorithm Based on MFCC Cosine Value

Abstract

Access this article

Similar content being viewed by others

A survey of methods for time series change point detection

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Improved Endpoint Detection Algorithm Based on MFCC Cosine Value

Abstract

Access this article

Similar content being viewed by others

A survey of methods for time series change point detection

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation