Advertisement

National Academy Science Letters

, Volume 41, Issue 1, pp 15–22 | Cite as

Robust Recognition of English Speech in Noisy Environments Using Frequency Warped Signal Processing

  • Navneet Upadhyay
  • Hamurabi Gamboa Rosales
Short Communication
  • 133 Downloads

Abstract

The performance level of speech recognizer drops significantly when there is an acoustic mismatch between training and operational environments. A speech recognizer is called robust if it preserves good recognition accuracy even in the mismatch conditions. Present study addresses the recognition of English speech in noisy environments and presents the comparative study of various frequency scales used in parameterization based on the average recognition rate. For the robust automatic speech reorganization, a front end signal enhancement component, spectral subtraction algorithm, is used to prefilter the noisy input speech prior fed to the recognizer. A number of frequency warped scales namely, perceptual scales viz, Mel scale, Bark scale, equivalent rectangular bandwidth rate scale, and a non-perceptual scale called uniform scale are used in the parameterization for feature extraction from enhanced speech. A suite of experiments is carried out to evaluate the performance of the speech recognizer, with and without the use of a front end signal enhancement component, in a variety of noisy environments. Recognition accuracy is tested in terms of word linguistic levels on a wide range of signal to noise ratios for both stationary and non-stationary noises.

Keywords

Speech enhancement Spectral over subtraction Parameterization Frequency warped scales Speech recognition 

References

  1. 1.
    Cutajar M, Gatt E, Grech I, Casha O, Micallef J (2013) Comparative study of automatic speech recognition techniques. IET Signal Proc 7(1):25–46CrossRefGoogle Scholar
  2. 2.
    Gong Y (1995) Speech recognition in noisy environments: a survey. Comput Speech Lang 16:261–291Google Scholar
  3. 3.
    O’Shaughnessy D (2008) Invited paper: automatic speech recognition: history, methods and challenges. Pattern Recogn 41(10):2965–2979CrossRefzbMATHGoogle Scholar
  4. 4.
    Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice Hall, Englewood CliffszbMATHGoogle Scholar
  5. 5.
    Juang BH (1991) Speech recognition in adverse environments. Comput Speech Lang 5:275–294CrossRefGoogle Scholar
  6. 6.
    Acero A, Stern RM (1990) Environmental robustness in automatic speech recognition. In: IEEE international conference acoustic, speech and signal processing, Albuquerque, USA, vol 2, pp 849–852Google Scholar
  7. 7.
    Boll SF (1979) Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoust Speech Signal Process 27(2):113–120CrossRefGoogle Scholar
  8. 8.
    Berouti M, Schwartz R, Makhoul J (1979) Enhancement of speech corrupted by acoustic noise. In: IEEE international conference acoustic, speech and signal processing, pp 208–211Google Scholar
  9. 9.
    Lizou PC (2013) Speech enhancement: theory and practice, 2nd edn. CRC Press, Boca RatonGoogle Scholar
  10. 10.
    Ephraim Y, Malah D (1984) Speech enhancement using a minimum mean square error short time spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 32(6):1109–1121CrossRefGoogle Scholar
  11. 11.
    Ephraim Y, Malah D (1985) Speech enhancement using a minimum mean square error log spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 33(2):443–445CrossRefGoogle Scholar
  12. 12.
    O’Shaughnessy D (1990) Speech communications. Addison Wesley, BostonzbMATHGoogle Scholar
  13. 13.
    Volkmann J, Stevens SS, Newman EB (1937) A scale for the measurement of the psychological magnitude pitch. J Acoust Soc Am 8(3):208ADSCrossRefGoogle Scholar
  14. 14.
    Hermansky H (1990) Perceptual linear prediction analysis of speech. J Acoust Soc Am 87(4):1738–1752ADSCrossRefPubMedGoogle Scholar
  15. 15.
    Leggetter C, Woodland P (2012) Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput Speech Lang 9(2):171–186CrossRefGoogle Scholar
  16. 16.
    Zwicker E (1961) Subdivision of the audible frequency range into critical bands. J Acoust Soc Am 33(2):248ADSCrossRefGoogle Scholar
  17. 17.
    Moore BCJ, Glasberg BR (1996) A revision of Zwicker’s loudness model. Acust Acta Acust 82:335–345Google Scholar
  18. 18.
    Rabiner LR, Schafer RW (1978) Digital processing of speech signals. Prentice Hall, Englewood CliffsGoogle Scholar
  19. 19.
    Deller JR, Proakis JG, Hansen JHL (2000) Discrete time processing of speech signals. IEEE Press, New YorkGoogle Scholar
  20. 20.
    Martin R (2001) Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans Speech Audio Process 9(5):504–512CrossRefGoogle Scholar
  21. 21.
    Schroeder MR, Atal BS, Hall JL (1979) Optimizing digital speech coders by exploiting masking properties of the human ear. J Acoust Soc Am 66(16):1647–1651ADSCrossRefGoogle Scholar
  22. 22.
    Hirsch HG, Pearce D (2000) The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. ISCA Tutorial and Research Workshop, ParisGoogle Scholar
  23. 23.
    Young S, Evermann G, Kershaw D, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2002) The HTK book. Cambridge University Engineering DepartmentGoogle Scholar
  24. 24.
    Juang BH, Rabiner LR (1991) Hidden Markov models for speech recognition. Technometrics 33(3):251–272MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Pallett DS (1985) Performance assessment of automatic speech recognizers. J Res Natl Bureau Stand 90(5):371–385CrossRefGoogle Scholar
  26. 26.
    Yamada T, Kumakura M, Kitawaki N (2006) Performance estimation of speech recognition system under noise conditions using objective quality measures and artificial voice. IEEE Trans Audio Speech Lang Process 14(6):2006–2013CrossRefGoogle Scholar

Copyright information

© The National Academy of Sciences, India 2018

Authors and Affiliations

  1. 1.Department of Signal Processing and Acoustics, Faculty of Electrical EngineeringAutonomous University of ZacatecasZacatecasMexico

Personalised recommendations