Skip to main content
Log in

An Improved Endpoint Detection Algorithm Based on MFCC Cosine Value

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

Endpoint detection is one of the most important steps in speech recognition. In a high SNR environment, the algorithm based on short-time energy and zero rate could be used. But when the SNR is low, this method may not be accurate. Some researchers proposed an algorithm which is based on MFCC Euclidean distance. It has a better performance in a noise environment. But that algorithm needs two thresholds to find the start and end point. However, when the values of two thresholds are not suitable, the detected result could be extremely bad. In this paper, we proposed an improved algorithm which is based on MFCC cosine value. This method can reduce errors, since it only needs one single threshold. The benefit of this improved algorithm is that the result can surely contain the real voice component. According to the experiment data, this improved algorithm can improve the speech recognition rate by 10% even in noise environment (SNR = 0). Thus, it proved that this improved methods has better robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Beh, J., Baran, R. H., & Ko, H. (2006). Dual channel based speech enhancement using novelty filter for robust speech recognition in automobile environment. IEEE Transactions on Consumer Electronics, 52(2), 583–589.

    Article  Google Scholar 

  2. Beh, J.,  & Ko, H. (2003). Spectral subtraction using spectral harmonics for robust speech recognition in car environments. In Computational science, vol. 2660 of lecture notes in computer science (pp. 1109–1116). Springer.

  3. Zhang, J., Zhang, D., & Cui, L. (2015). One speech endpoint detection with a robust adaptive threshold. Xi’an University of Electronic Technology (10), 42(5), 115–119.

    Google Scholar 

  4. Wilpon, J. G., & Rabiner, L. R. (1987). Application of hidden Markov models to automatic speech endpoint detection. Computer Speech & Language, 2(3–4), 321–341.

    Article  Google Scholar 

  5. Wu, B.-F., & Wang, K.-C. (2005). Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments. IEEE Transactions on Speech and Audio Processing, 13(5), 762–774.

    Article  Google Scholar 

  6. Marcheret, E., Chu, S., Goel, V., & Potamianos, G. (2004). “Ef-ficient Likelihood Computation in Multi-Stream HMM Based Audio-Visual Speech Recognition”, in Int (p. 2004). Speech and Language Processing: Conf.

    Google Scholar 

  7. Povey, D., & Woodland, P. C. (2002). Minimum phone error and i-smoothing for improved discriminative training. In Proceedings of the ICASSP.

  8. Povey, D., Kingsbury, B., Mangu, L., Saon, G., Soltau, H., & Zweig, G. (2005). fMPE: Discriminatively trained features for speech recognition. In Proceedings of the ICASSP.

  9. Huang, J., & Povey, D. (2005). Discriminatively trained features using fMPE for multi-stream audio–visual speech recognition. In Proceedings of the interspeech.

  10. Huang, J., & Visweswariah, K. (2009). Combined discriminative training for multi-stream HMM-based audio–visual speech recognition. In Proceedings of the interspeech.

  11. Shen, J. L., Hung J. W., & Lee L. S. (1998). Robust entropy-based endpoint detection for speech recognition in noisy environments. In International conference on spoken language processing (pp. 232–238), Sydney, Australia.

  12. Medina, C. A., & Alcaim, A. (2008). Wavelet denoising of speech using neural networks for threshold election. Electronics Letter, 39(25), 1869–1871.

    Article  Google Scholar 

  13. Hwanga, I., Parkb, H.-M., & Changa, J.-H. (2016). Ensemble of deep neural networks using acoustic environment classification for statistical model-based voice activity detection. Computer Speech & Language, 38, 1–12.

    Article  Google Scholar 

  14. Wilpon, J. C., Rabiner, L. R., & Martin, T. (1984). An improved word-detection algorithm for telephone-quality speech incorporating both syntactic and semantic constraints. AT&T Bell Laboratories Technical Journal, 63, 479–498.

    Article  Google Scholar 

  15. Chengalvarayan, R. (1999). Robust energy normalization using speech/non-speech discriminator for German connected digit recognition. In: Proceedings of the Euro speech 99 (pp. 61–64), Budapest, Hungary.

  16. Haigh, J. A., & Mason, J. S. (1993). Robust voice activity detection using cepstral features. In Proceedings of the IEEE TENCON (pp. 321–324).

  17. Zhang, R., & Cui, H. (2005). Study endpoint detection algorithm based on short-term energy. Audio Engineering, 7, 52–54.

    Google Scholar 

  18. Wu, B.–F., & Wang, K.–C. (2005). Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments. IEEE Transactions on Speech and Audio Processing, 13(5), 762–774.

    Article  Google Scholar 

  19. Liu, H., Li, X., Xu, B., & Jiang, N. (2008). The summary and outlook for speech signal endpoint detection. Computer Application sand Research (10), 25(8), 2278–2283.

    Google Scholar 

  20. Rabiner, L. R., & Sambur, M. R. (1977). Voiced-unvoiced-silence detection using the Itakura LPC distance measure. In Proceedings of the ICASSP (pp. 323–326).

  21. Haign, J. A., & Mason, J. S. (1993). Robust voice activity detection using cepstral features. In Proceedings of the IEEE TEN-CON (pp. 321–324).

  22. Chengalvarayan, R. (1999). Robust energy normalization using speech/non-speech discriminator for German connected digit recognition. In Proceedings of the Euro speech (pp. 61–64).

  23. Voice activity detector (VAD) for adaptive multi-rate (AMR) speech traffic channels, ETSI EN 301 708 recommendation, ETSI, 1999.

  24. Speech processing, transmission and quality aspects (STQ), distributed speech recognition; front-end feature extraction algorithm; compression algorithm, ETSI ES 202 050 recommendation, ETSI, 2002.

  25. Yang, L. (2015). Voice endpoint detection based on MFCC distance. Information and Communication, 7, 31–32.

  26. Han, H., Wang, B., & Duan, S. (2014). Voice activity detection technology research and development. Computer Applications and Research, 4, 1220–1226.

    Google Scholar 

  27. Shu, Q., & Li, Y. (2007). Speech endpoint detection based on MFCC. Communications Technology, 40(11), 374–375.

  28. Wang, H., Xu, Y., & Li, M. (2011). Study on the MFCC similarity-based voice activity detection algorithm. In 2nd International conference on artificial intelligence.

  29. Kotta, M., & Preen, R. (2006). Speech enhancement in non-stationary noise environments using noise properties. Speech Communication, 48(1), 96–109.

    Article  Google Scholar 

  30. Liu, J., Xu, Z., Zheng, Z., & Cheng, Q. (2005). DTW-based speech recognition and speaker recognition feature selection. Pattern Recognition and Artificial Intelligence, 18(1), 50–54.

Download references

Acknowledgements

This paper is supported by the National Natural Science Foundation of China (41471303), training project for outstanding young teachers of North China University of Technology, Special Research Foundation of North China University of Technology (PXM2017_014212_000014), Beijing Natural Science Foundation (4162022), and advantage disciplinary projects of North China University of Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Danyang Cao.

Ethics declarations

Conflict of interest

The authors confirm that this article content has no conflicts of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, D., Gao, X. & Gao, L. An Improved Endpoint Detection Algorithm Based on MFCC Cosine Value. Wireless Pers Commun 95, 2073–2090 (2017). https://doi.org/10.1007/s11277-017-3958-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-017-3958-0

Keywords

Navigation