Skip to main content

Spectral difference for statistical model-based speech enhancement in speech recognition

Abstract

In this paper, we propose a statistical model-based speech enhancement technique using the spectral difference scheme for the speech recognition in virtual reality. In the analyzing step, two principal parameters, the weighting parameter in the decision-directed (DD) method and the long-term smoothing parameter in noise estimation, are uniquely determined as optimal operating points according to the spectral difference under various noise conditions. These optimal operating points, which are specific according to different spectral differences, are estimated based on the composite measure, which is a relevant criterion in terms of speech quality. An efficient mapping function is also presented to provide an index of the metric table associated with the spectral difference so that operating points can be determined according to various noise conditions for an on-line step. In the on-line speech enhancement step, different parameters are chosen on a frame-by-frame basis under the metric table of the spectral difference. The performance of the proposed method is evaluated using objective and subjective speech quality measures in various noise environments. Our experimental results show that the proposed algorithm yields better performances than conventional algorithms.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

References

  1. Chang JH (2006) Perceptual weighting filter for robust speech modification. Signal Process 86(15):1089–1093

    Article  MATH  Google Scholar 

  2. Choi JH, Chang JH, Kim DK, Kim SH (2011) Speech enhancement besed on adaptive noise power estimation using spectral difference. IEICE Trans Fundam E94-A (10):2031–2034

    Article  Google Scholar 

  3. Choi JH, Chang JH (2012) On using acoustic environment classification for statistical model-based speech enhancement. Speech Commun 54(3):477–490

    Article  Google Scholar 

  4. Cohen I, Berdugo B (2002) Speech enhancement for non-stationary noise environments. Signal Process 81(11):2403–2418

    Article  MATH  Google Scholar 

  5. Ephraim Y, Malah D (1985) Speech enhancement using a minimum mean-square error log spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process ASSP-33(2):443–445

    Article  Google Scholar 

  6. Hu Y, Loizou P (2008) Evaluation of objective quality measures for speech enhancement. IEEE Trans Audio Speech Lang Process 16(1):229–238

    Article  Google Scholar 

  7. ITU-T Rec. P. 862 (2000) Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs

  8. Kim NS, Chang JH (2000) Spectral enhancement based on global soft decision. IEEE Signal Process Lett 7(5):108–110

    Article  Google Scholar 

  9. Krishnamurthy N, Hansen J (2006) Noise update modeling for speech enhancement: when do we do enough?. In: Proceedings of interspeech 2006, pp 1431–1434

  10. Lee S, Kim SH (2008) Noise reduction using noise power estimates and updated gain function for speech enhancement in stationary and non-stationary noisy environments. Int J Control Autom Syst 6(6):818–827

    Google Scholar 

  11. Lee S, Lim C, Chang JH (2014) A new a priori SNR estimator based on multiple linear regression technique for speech enhancement. Digital Signal Process 30 (7):154–164

    Article  Google Scholar 

  12. Lee S, Chang JH (2016) On using multivariate polynomial regression model with spectral difference for statistical model-based speech enhancement. J Syst Archit 64:76–85

    Article  Google Scholar 

  13. Lee S, Park CH, Chang JH (2016) Improved Gaussian mixture regression based on pseudo feature generation using bootstrap in blood pressure measurement. IEEE Trans Ind Inf. doi:10.1109/TII.2015.2484278

    Google Scholar 

  14. McAuay RJ, Malpass ML (1980) Speech enhancement using a soft decision noise suppression filter. IEEE Trans Acoust Speech Signal Process ASSP-28(2):137–145

    Article  Google Scholar 

  15. Park YS, Chang JH (2007) A novel approach to a robust a priori SNR estimator in speech enhancement. IEICE Transations on Communications E90-B(8):2182–2185

    Article  Google Scholar 

  16. Sangwan A, Krishnamurthy N, Hansen JHL (2007) Environmentally aware voice activity detector. In: Proceedings of interspeech 2007, pp 2929–2932

  17. TIA/EIA/IS-127 (1996) Enhanced variable rate codec, speech service option 3 for wideband spread spectrum digital systems

  18. Westerlund N, Dahl M, Claesson I (2005) Speech enhancement for personal communication using an adaptive gain equalizer. Signal Process 85(6):1089–1101

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work was also supported by National Research Foundation (NRF) of Korea grant funded by (2014R1A2A1A10049735).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joon-Hyuk Chang.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lee, S., Chang, JH. Spectral difference for statistical model-based speech enhancement in speech recognition. Multimed Tools Appl 76, 24917–24929 (2017). https://doi.org/10.1007/s11042-016-4122-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-4122-7

Keywords

  • Speech enhancement
  • Noise reduction
  • Speech recognition
  • Spectral difference