Advertisement

Circuits, Systems, and Signal Processing

, Volume 38, Issue 2, pp 643–663 | Cite as

Speech Enhancement Using Source Information for Phoneme Recognition of Speech with Background Music

  • Banriskhem K. KhonglahEmail author
  • Abhishek Dey
  • S. R. Mahadeva Prasanna
Article
  • 54 Downloads

Abstract

This work explores the significance of source information for speech enhancement resulting in better phoneme recognition of speech with background music segments. Standard procedure for speech enhancement in noisy conditions involves sequential processing in terms of the temporal, spectral and perceptual methods. This work follows the same sequential processing but with the additional modification of studying the effect of source, particularly in the temporal and perceptual-based enhancement techniques for enhancing speech with background music segments. The source information is studied in terms of the epoch locations and epoch strength, obtained after passing the sum of the mean and standard deviation of the component envelopes computed across frequencies obtained using the single frequency filter (SFF), through a zero frequency filter (ZFF). This method of obtaining epoch locations and epoch strength will be termed as SFF-ZFF in this work. The enhanced segments are passed through a phoneme recognizer built using Gaussian mixture model-hidden Markov model (GMM-HMM), subspace Gaussian mixture model-hidden Markov model (SGMM-HMM) and deep neural network-hidden Markov model (DNN-HMM) system, where the models are trained on clean speech. The enhanced audio files show a better phone error rate than the degraded audio files, which means that performing enhancement in terms of the source information is significant for the speech with background music regions.

Keywords

Source information Single frequency filter Zero frequency filter Temporal enhancement Spectral enhancement 

Notes

Acknowledgements

This work is part of the project titled Multi-modal Broadcast Analytics: Structured Evidence Visualization for Events of Security Concern funded by the e-Security division of the Department of Electronics and Information Technology (DeitY), Govt. of India.

References

  1. 1.
    G. Aneeja, B. Yegnanarayana, Single frequency filtering approach for discriminating speech and nonspeech. IEEE/ACM Trans. Audio Speech Lang. Process. 23(4), 705–717 (2015)CrossRefGoogle Scholar
  2. 2.
    M. Berouti, R. Schwartz, J. Makhoul, Enhancement of speech corrupted by acoustic noise. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’79, vol. 4 (IEEE, 1979), pp. 208–211Google Scholar
  3. 3.
    S. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979)CrossRefGoogle Scholar
  4. 4.
    O. Cappé, Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Trans. Speech Audio Process. 2(2), 345–349 (1994)CrossRefGoogle Scholar
  5. 5.
    K.T. Deepak, S.R.M. Prasanna, Foreground speech segmentation and enhancement using glottal closure instants and mel cepstral coefficients. IEEE/ACM Trans. Audio Speech Lang. Process. 24(7), 1204–1218 (2016)CrossRefGoogle Scholar
  6. 6.
    M. Dendrinos, S. Bakamidis, G. Carayannis, Speech enhancement from noise: a regenerative approach. Speech Commun. 10(1), 45–57 (1991)CrossRefGoogle Scholar
  7. 7.
    Y. Ephraim, A Bayesian estimation approach for speech enhancement using hidden Markov models. IEEE Trans. Signal Process. 40(4), 725–735 (1992)CrossRefGoogle Scholar
  8. 8.
    Y. Ephraim, D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)CrossRefGoogle Scholar
  9. 9.
    Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)CrossRefGoogle Scholar
  10. 10.
    Y. Ephraim, H.L. Van Trees, A signal subspace approach for speech enhancement. IEEE Trans. Speech Audio Process. 3(4), 251–266 (1995)CrossRefGoogle Scholar
  11. 11.
    G. Fant, Speech Sounds and Features (The MIT Press, Cambridge, MA, 1973)Google Scholar
  12. 12.
    S.H. Jensen, P.C. Hansen, S.D. Hansen, J.A. Sorensen, Reduction of broad-band noise in speech by truncated QSVD. IEEE Trans. Speech Audio Process. 3(6), 439–448 (1995)CrossRefzbMATHGoogle Scholar
  13. 13.
    S.R. Kadiri, B. Yegnanarayana, Epoch extraction from emotional speech using single frequency filtering approach. Speech Commun. 86, 52–63 (2017)CrossRefGoogle Scholar
  14. 14.
    S. Kamath, P. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In: IEEE International Conference on Acoustics Speech and Signal Processing, vol. 4. (Citeseer, 2002), pp. 4164–4164Google Scholar
  15. 15.
    B.K. Khonglah, S.R.M. Prasanna, Speech/music classification using speech-specific features. Digit. Signal Process. 48, 71–83 (2016)MathSciNetCrossRefGoogle Scholar
  16. 16.
    P. Krishnamoorthy, S.R.M. Prasanna, Reverberant speech enhancement by temporal and spectral processing. IEEE Trans. Audio Speech Lang. Process. 17(2), 253–266 (2009)CrossRefGoogle Scholar
  17. 17.
    P. Krishnamoorthy, S.R.M. Prasanna, Enhancement of noisy speech by temporal and spectral processing. Speech Commun. 53(2), 154–174 (2011)CrossRefGoogle Scholar
  18. 18.
    R. Martin, Speech enhancement based on minimum mean-square error estimation and supergaussian priors. IEEE Trans. Speech Audio Process. 13(5), 845–856 (2005)CrossRefGoogle Scholar
  19. 19.
    R. McAulay, M. Malpass, Speech enhancement using a soft-decision noise suppression filter. IEEE Trans. Acoust. Speech Signal Process. 28(2), 137–145 (1980)CrossRefGoogle Scholar
  20. 20.
    K.S.R. Murthy, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16, 1602–1613 (2008)CrossRefGoogle Scholar
  21. 21.
    D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, et al., The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, EPFL-CONF-192584 (IEEE Signal Processing Society, 2011)Google Scholar
  22. 22.
    D. Povey, X. Zhang, S. Khudanpur, Parallel training of dnns with natural gradient and parameter averaging. arXiv preprint arXiv:1410.7455 (2014)
  23. 23.
    S. Shahnawazuddin, D. Thotappa, A. Dey, S. Imani, S.R.M. Prasanna, R. Sinha, Improvements in IITG Assamese spoken query system: Background noise suppression and alternate acoustic modeling. J. Signal Process. Syst. 88, 91–102 (2016)CrossRefGoogle Scholar
  24. 24.
    K. Tokuda, T. Kobayashi, T. Masuko, S. Imai, Mel-generalized cepstral analysis—a unified approach to speech spectral estimation. In: ICSLP (1994)Google Scholar
  25. 25.
    S.V. Vaseghi, Advanced Digital Signal Processing and Noise Reduction (Wiley, New York, 2008)CrossRefGoogle Scholar
  26. 26.
    K. Veselỳ, A. Ghoshal, L. Burget, D. Povey, Sequence-discriminative training of deep neural networks. In: Interspeech (2013), pp. 2345–2349Google Scholar
  27. 27.
    C.M. Vikram, S.M. Mahadeva Prasanna, Epoch extraction from telephone quality speech using single pole filter. IEEE/ACM Trans. Audio Speech Lang. Process. (ASLP) 25(3), 624–636 (2017)CrossRefGoogle Scholar
  28. 28.
    D. Wang, J. Lim, The unimportance of phase in speech enhancement. IEEE Trans. Acoust. Speech Signal Process. 30(4), 679–681 (1982)CrossRefGoogle Scholar
  29. 29.
    B. Yegnanarayana, C. Avendano, H. Hermansky, P.S. Murthy, Speech enhancement using linear prediction residual. Speech Commun. 28(1), 25–42 (1999)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Banriskhem K. Khonglah
    • 1
    Email author
  • Abhishek Dey
    • 2
  • S. R. Mahadeva Prasanna
    • 3
  1. 1.Department of Electronics and Electrical EngineeringIndian Institute of Technology GuwahatiGuwahatiIndia
  2. 2.Department of Electronics and Communication EngineeringGauhati University Institute of Science and Technology (GUIST)GuwahatiIndia
  3. 3.Department of Electrical EngineeringIndian Institute of Technology DharwadDharwadIndia

Personalised recommendations